Who Read What (Else)?

This notebook, like the others I'm releasing this October, is a bit of a mess. It serves more to document explorations than to make some case or another . . .

Load the "Lost Cause" corpus manifest

The "Lost Cause" corpus manifest is the spreadsheet which connects text files back to the Muncie database. Here, I'm using it as a rough-and-ready source of coarse transaction counts by accession number; I could have as easily gone back to the database for this data.

In [1]:
# Note that I load only rows for which I have a file (i.e. something in column 10)

import unicodecsv as csv

spreadsheet_data = []
accession_numbers_in_corpus = []

for l in list(csv.reader(open('selected_muncie_titles_102218.csv', 'r'), encoding='utf-8'))[1:]:
    if int(l[5]) > 0:
        spreadsheet_data.append({'author': l[0],
                                    'title': l[1],
                                    'accession_number': int(l[3]),
                                    'n_transactions': int(l[5]),
                                    'file_name': l[10]})
        accession_numbers_in_corpus.append(int(l[3]))
        
accession_numbers_in_corpus = set(accession_numbers_in_corpus)
    
print
print 'len(spreadsheet_data)', len(spreadsheet_data)
print 'len(accession_numbers_in_corpus)', len(accession_numbers_in_corpus)
len(spreadsheet_data) 159
len(accession_numbers_in_corpus) 159

Connect to the database

In [2]:
import MySQLdb
from datetime import date

db = MySQLdb.connect(host="localhost",
                     user="root",
                     passwd="p5a1m99",
                     db="middletown1",
                     use_unicode=True,
                     charset="utf8")

Trying out some date math

I wasn't quite clear on how to subtract dates. I worked that out here.

from November 5, 1891 through December 3, 1902

one gap from May 28, 1892 to November 5, 1894

In [3]:
from datetime import date

start_date_1 = date(1891, 11, 5)
end_date_1 = date(1892, 5, 27)

start_date_2 = date(1894, 11, 6)
end_date_2 = date(1902, 12, 3)

n_days_span_1 = abs((end_date_1 - start_date_1).days)
n_days_span_2 = abs((end_date_2 - start_date_2).days)

gap_days = abs((date(1894, 11, 5) - date(1892, 5, 28)).days)

print 'n_days_span_1', n_days_span_1
print 'n_days_span_2', n_days_span_2
print 'gap_days', gap_days
n_days_span_1 204
n_days_span_2 2948
gap_days 891

Add information to the "Lost Cause" manifiest

For each book, I'm getting the accession and discard dates, and computing the number of days the text was in the library and the average days between checkouts. This code is a good example of the "average days per checkout" math.

In [4]:
import re

for mn, m in enumerate(spreadsheet_data):
    
    cA = db.cursor()
    cA.execute('SELECT ACC_DATE_STAND, DISC_DATE_STAND, BOOKS_OCLC_ID FROM books ' + \
                'WHERE ACCESSION_NUMBER = ' + str(m['accession_number']))
    resultsA = cA.fetchall()
    
    accession_date = date(1891, 11, 5)
    discard_date = date(1902, 12, 4)
    oclc_id = -1
    
    if len(resultsA) != 1:
        print 'ERROR', len(resultsA)
    else:
        
        if resultsA[0][0] != None:
            accession_date = resultsA[0][0]
        
        if resultsA[0][1] != None and resultsA[0][1] < discard_date:
            discard_date = resultsA[0][1]
    
    days_in_library = abs((discard_date - accession_date).days)
    if accession_date < start_date_2:
        days_in_library = days_in_library - gap_days
        
    avg_days_between_checkouts = (days_in_library / float(m['n_transactions']))
    
    oclc_id = resultsA[0][2]
        
    spreadsheet_data[mn]['accession_date'] = accession_date
    spreadsheet_data[mn]['discard_date'] = discard_date
    spreadsheet_data[mn]['days_in_library'] = days_in_library
    spreadsheet_data[mn]['avg_days_between_checkouts'] = avg_days_between_checkouts
    spreadsheet_data[mn]['oclc_id'] = oclc_id
    
    # --------------------------------------------------------------------
    
print spreadsheet_data[0]
{'oclc_id': 8742L, 'title': u'The planters northern bride', 'file_name': u'Hentz_Caroline_Lee_The_planters_northern_bride_IA_plantersnorthern00hent_djvu.txt', 'author': u'Hentz, Caroline Lee', 'n_transactions': 284, 'accession_number': 8513, 'avg_days_between_checkouts': 9.880281690140846, 'discard_date': datetime.date(1902, 12, 4), 'accession_date': datetime.date(1892, 10, 19), 'days_in_library': 2806}

Who read these books?

In [5]:
all_patrons = []

for m in spreadsheet_data:
    
    cA = db.cursor()
    cA.execute('SELECT patron_number, FIRST_NAME, LAST_NAME FROM flattenedData WHERE accession_number = ' + \
               str(m['accession_number']))
    resultsA = cA.fetchall()

    for r in resultsA:
        if r[1] > '' and r[2] > '':
            all_patrons.append(r)
                
print 'len(all_patrons)', len(all_patrons)
            
all_patrons = sorted(list(set(all_patrons)))

print 'len(all_patrons)', len(all_patrons)
len(all_patrons) 11896
len(all_patrons) 2536
In [6]:
print all_patrons[0]
(6L, u'Albert', u'Carpenter')

What exactly did those people read?

In [7]:
# Dead code?

keyed_spreadsheet_data = {}

for row in spreadsheet_data:
    keyed_spreadsheet_data[int(row['accession_number'])] = row
In [8]:
for pn, p in enumerate(all_patrons):
    
    cA = db.cursor()
    cA.execute('SELECT standardizedTitle, standardizedAuthor, accession_number ' + \
                   'FROM flattenedData WHERE patron_number = ' + \
                   str(p[0]))
    resultsA = cA.fetchall()
        
    books = []
    for r in resultsA:
        if r[0] == '' and r[1] == '':
            continue
        books.append((r[0], r[1], r[2], (int(r[2]) in accession_numbers_in_corpus)))
               
    all_patrons[pn] = list(all_patrons[pn])
    all_patrons[pn].append(books)

Display the data for one patron

Another cell which serves to reassure me that I'm gathering what I expect.

In [9]:
print all_patrons[0]
print 'len(all_patrons)', len(all_patrons)
[6L, u'Albert', u'Carpenter', [(u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'At the councillors', u'John, Eugenie', 8565L, False), (u'Harpers young people', u'', 8846L, False), (u'Lippincotts monthly magazine', u'', 7175L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'Harpers young people', u'', 9274L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'St Nicholas', u'', 8855L, False), (u'Harpers young people', u'', 8851L, False), (u'The Popular science monthly', u'', 9264L, False), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'Half-hours with the best authors', u'Knight, Charles', 9022L, False), (u'Her life, letters and journals', u'Alcott, Louisa May', 8221L, False), (u'Annual report of the Chief Signal Officer made to the Secretary of War for the year', u'United States Army Signal Corps', 8553L, False), (u'Ragged Dick', u'Alger, Horatio', 7727L, False), (u'Fame and fortune, or, The progress of Richard Hunter', u'Alger, Horatio', 7728L, False), (u'Little Lord Fauntleroy', u'Burnett, Frances Hodgson', 6571L, False), (u'My hearts darling', u'Heimburg, W', 8649L, False), (u'The story of Patsy', u'Wiggin, Kate Douglas Smith', 8347L, False), (u'Five little Peppers and how they grew', u'Sidney, Margaret', 8472L, False), (u'Pecks boss book', u'Peck, George W', 7779L, False), (u'Helens babies', u'Habberton, John', 2670L, False), (u'The Holly-tree Inn', u'Dickens, Charles', 2008L, False), (u'Five little Peppers midway', u'Sidney, Margaret', 8473L, False), (u'His sombre rivals', u'Roe, Edward Payson', 8493L, True), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'The lady with the rubies', u'John, Eugenie', 9046L, False), (u'Sweet as a rose', u'Durward, Mostyn', 7435L, False), (u'Harpers young people', u'', 8978L, False), (u'St Nicholas', u'', 9270L, False), (u'The Swiss family Robinson', u'Wyss, Johann David', 10451L, False), (u'The starry flag', u'Adams, William Tq', 8334L, False), (u'Infelice', u'Evans, Augusta J', 8505L, True), (u'The practical metal worker', u'', 2113L, False), (u'Two little pilgrims progress', u'Burnett, Frances Hodgson', 9586L, False), (u'Make or break', u'Adams, William Tq', 9135L, False), (u'On time, or, The young captain of the Ucayga steamer', u'Adams, William Tq', 8332L, False), (u'Seek and find', u'Adams, William Tq', 9134L, False), (u'Freaks of fortune, or, Half round the world', u'Adams, William Tq', 9132L, False), (u'The telegraph boy', u'Alger, Horatio', 8444L, False), (u'Brake up', u'Adams, William Tq', 9145L, False), (u'Lightning express, or, The Rival academies', u'Adams, William Tq', 535L, False), (u'Switch off, or, The war of the students', u'Adams, William Tq', 8333L, False), (u'The land of pluck', u'Dodge, Mary Mapes', 9649L, False), (u'Down the river', u'Adams, William Tq', 9136L, True), (u'Through by daylight, or, The young engineer of the Lake Shore Railroad', u'Adams, William Tq', 534L, False), (u'Luck and pluck', u'Alger, Horatio', 8447L, False), (u'Rough and ready, or, Life among the New York newsboys', u'Alger, Horatio', 7730L, False), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A popular account of the ancient Egyptians', u'Wilkinson, J Gardner', 325L, False), (u'Taken by the enemy', u'Adams, William Tq', 9147L, True), (u'Bens nugget, or, A boys search for fortune', u'Alger, Horatio', 7725L, False), (u'Stand by the Union', u'Adams, William Tq', 9150L, True), (u'The War of the Rebellion', u'United States War Dept', 9078L, False), (u'Jacks ward, or, The boy guardian', u'Alger, Horatio', 8456L, False), (u'Within the enemys lines', u'Adams, William Tq', 9148L, True), (u'On the blockade', u'Adams, William Tq', 9149L, True), (u'Wait and hope, or, Ben Bradfords motto', u'Alger, Horatio', 8458L, False), (u'Erlach court', u'Schubin, Ossip', 8563L, False), (u'A brief history of the United States', u'Steele, Joel Dorman', 7338L, False), (u'Don Gordons shooting-box', u'Fosdick, Charles Austin', 8435L, False), (u'Jack Hazard and his fortunes', u'Trowbridge, J T', 8377L, False), (u'The fast mail', u'Drysdale, William', 10962L, False), (u'Prince Tip-Top', u'Bouvet, Marguerite', 9174L, False), (u'The story of Babette', u'Stuart, Ruth McEnery', 9650L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'Helping himself, or, Grant Thorntons ambition', u'Alger, Horatio', 8479L, False), (u'Bound to rise', u'Alger, Horatio', 8452L, False), (u'The childrens wonder book', u'', 9657L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'The jo-boat boys', u'Cowan, John F', 10558L, False), (u'All adrift, or, The Goldwing Club', u'Adams, William Tq', 10226L, False), (u'Snug Harbor, or, The Champlain mechanics', u'Adams, William Tq', 10227L, False), (u'Square and compasses', u'Adams, William Tq', 10228L, False), (u'Papers relating to the foreign relations of the United States', u'United States Dept of State', 8256L, False), (u'Little Saint Elizabeth', u'Burnett, Frances Hodgson', 7703L, False), (u'Elsie Dinsmore', u'Finley, Martha', 9196L, False), (u'Go-ahead', u'Fosdick, Charles Austin', 11865L, False), (u'Frank in the woods', u'Fosdick, Charles Austin', 11849L, False), (u'No moss, or, The career of a rolling stone', u'Fosdick, Charles Austin', 11864L, False), (u'Through by daylight', u'Adams, William Tq', 11500L, False), (u'The story of a bad boy', u'Aldrich, Thomas Bailey', 9161L, True), (u'At war with Pontiac', u'', 9599L, False), (u'The domestic blunders of women, by a mere man', u'Moore, Augustus', 12143L, False), (u'Harpers round table', u'', 11145L, False), (u'Tom Thatchers fortune', u'Alger, Horatio', 11890L, False), (u'The store boy', u'Alger, Horatio', 11882L, False), (u'Dan, the newsboy', u'Alger, Horatio', 10430L, False), (u'Joe Wayring at home, or, The adventures of a fly-rod', u'Fosdick, Charles Austin', 8431L, False), (u'Dorsey the young inventor', u'Ellis, Edward Sylvester', 11480L, False), (u'The last of the Mohicans', u'Cooper, James Fenimore', 9667L, False), (u'A woman tenderfoot', u'Seton-Thompson, Grace Gallatin', 12629L, False), (u'The prisoner of Zenda', u'Hawkins, Anthony Hope', 11429L, False), (u'Billy Baxters letters', u'Kountz, William J', 12617L, False), (u'The other fellow', u'Smith, Francis Hopkinson', 11779L, True), (u'A popular history of the United States of America', u'Ridpath, John Clark', 8535L, False), (u'George in camp', u'Fosdick, Charles Austin', 11866L, False), (u'Tony, the hero', u'Alger, Horatio', 11879L, False), (u'Hoosier schoolboy', u'Eggleston, Edward', 9623L, False), (u'Making fate', u'Alden, Isabella Macdonald', 10932L, False), (u'The reign of law', u'Allen, James Lane', 11914L, True), (u'Lorraine', u'Chambers, Robert W', 12422L, False), (u'Alice of old Vincennes', u'Thompson, Maurice', 12194L, False), (u'Little men', u'Alcott, Louisa May', 11835L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False)]]
len(all_patrons) 2536

Count what they read

In [10]:
for pn, p in enumerate(all_patrons):
    
    counts = {True: 0, False: 0}
    
    for a in p[3]:
        counts[a[3]] += 1
        
    all_patrons[pn].append(counts)
    
print all_patrons[0]
[6L, u'Albert', u'Carpenter', [(u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'At the councillors', u'John, Eugenie', 8565L, False), (u'Harpers young people', u'', 8846L, False), (u'Lippincotts monthly magazine', u'', 7175L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'Harpers young people', u'', 9274L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'St Nicholas', u'', 8855L, False), (u'Harpers young people', u'', 8851L, False), (u'The Popular science monthly', u'', 9264L, False), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'Half-hours with the best authors', u'Knight, Charles', 9022L, False), (u'Her life, letters and journals', u'Alcott, Louisa May', 8221L, False), (u'Annual report of the Chief Signal Officer made to the Secretary of War for the year', u'United States Army Signal Corps', 8553L, False), (u'Ragged Dick', u'Alger, Horatio', 7727L, False), (u'Fame and fortune, or, The progress of Richard Hunter', u'Alger, Horatio', 7728L, False), (u'Little Lord Fauntleroy', u'Burnett, Frances Hodgson', 6571L, False), (u'My hearts darling', u'Heimburg, W', 8649L, False), (u'The story of Patsy', u'Wiggin, Kate Douglas Smith', 8347L, False), (u'Five little Peppers and how they grew', u'Sidney, Margaret', 8472L, False), (u'Pecks boss book', u'Peck, George W', 7779L, False), (u'Helens babies', u'Habberton, John', 2670L, False), (u'The Holly-tree Inn', u'Dickens, Charles', 2008L, False), (u'Five little Peppers midway', u'Sidney, Margaret', 8473L, False), (u'His sombre rivals', u'Roe, Edward Payson', 8493L, True), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'The lady with the rubies', u'John, Eugenie', 9046L, False), (u'Sweet as a rose', u'Durward, Mostyn', 7435L, False), (u'Harpers young people', u'', 8978L, False), (u'St Nicholas', u'', 9270L, False), (u'The Swiss family Robinson', u'Wyss, Johann David', 10451L, False), (u'The starry flag', u'Adams, William Tq', 8334L, False), (u'Infelice', u'Evans, Augusta J', 8505L, True), (u'The practical metal worker', u'', 2113L, False), (u'Two little pilgrims progress', u'Burnett, Frances Hodgson', 9586L, False), (u'Make or break', u'Adams, William Tq', 9135L, False), (u'On time, or, The young captain of the Ucayga steamer', u'Adams, William Tq', 8332L, False), (u'Seek and find', u'Adams, William Tq', 9134L, False), (u'Freaks of fortune, or, Half round the world', u'Adams, William Tq', 9132L, False), (u'The telegraph boy', u'Alger, Horatio', 8444L, False), (u'Brake up', u'Adams, William Tq', 9145L, False), (u'Lightning express, or, The Rival academies', u'Adams, William Tq', 535L, False), (u'Switch off, or, The war of the students', u'Adams, William Tq', 8333L, False), (u'The land of pluck', u'Dodge, Mary Mapes', 9649L, False), (u'Down the river', u'Adams, William Tq', 9136L, True), (u'Through by daylight, or, The young engineer of the Lake Shore Railroad', u'Adams, William Tq', 534L, False), (u'Luck and pluck', u'Alger, Horatio', 8447L, False), (u'Rough and ready, or, Life among the New York newsboys', u'Alger, Horatio', 7730L, False), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A popular account of the ancient Egyptians', u'Wilkinson, J Gardner', 325L, False), (u'Taken by the enemy', u'Adams, William Tq', 9147L, True), (u'Bens nugget, or, A boys search for fortune', u'Alger, Horatio', 7725L, False), (u'Stand by the Union', u'Adams, William Tq', 9150L, True), (u'The War of the Rebellion', u'United States War Dept', 9078L, False), (u'Jacks ward, or, The boy guardian', u'Alger, Horatio', 8456L, False), (u'Within the enemys lines', u'Adams, William Tq', 9148L, True), (u'On the blockade', u'Adams, William Tq', 9149L, True), (u'Wait and hope, or, Ben Bradfords motto', u'Alger, Horatio', 8458L, False), (u'Erlach court', u'Schubin, Ossip', 8563L, False), (u'A brief history of the United States', u'Steele, Joel Dorman', 7338L, False), (u'Don Gordons shooting-box', u'Fosdick, Charles Austin', 8435L, False), (u'Jack Hazard and his fortunes', u'Trowbridge, J T', 8377L, False), (u'The fast mail', u'Drysdale, William', 10962L, False), (u'Prince Tip-Top', u'Bouvet, Marguerite', 9174L, False), (u'The story of Babette', u'Stuart, Ruth McEnery', 9650L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'Helping himself, or, Grant Thorntons ambition', u'Alger, Horatio', 8479L, False), (u'Bound to rise', u'Alger, Horatio', 8452L, False), (u'The childrens wonder book', u'', 9657L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'The jo-boat boys', u'Cowan, John F', 10558L, False), (u'All adrift, or, The Goldwing Club', u'Adams, William Tq', 10226L, False), (u'Snug Harbor, or, The Champlain mechanics', u'Adams, William Tq', 10227L, False), (u'Square and compasses', u'Adams, William Tq', 10228L, False), (u'Papers relating to the foreign relations of the United States', u'United States Dept of State', 8256L, False), (u'Little Saint Elizabeth', u'Burnett, Frances Hodgson', 7703L, False), (u'Elsie Dinsmore', u'Finley, Martha', 9196L, False), (u'Go-ahead', u'Fosdick, Charles Austin', 11865L, False), (u'Frank in the woods', u'Fosdick, Charles Austin', 11849L, False), (u'No moss, or, The career of a rolling stone', u'Fosdick, Charles Austin', 11864L, False), (u'Through by daylight', u'Adams, William Tq', 11500L, False), (u'The story of a bad boy', u'Aldrich, Thomas Bailey', 9161L, True), (u'At war with Pontiac', u'', 9599L, False), (u'The domestic blunders of women, by a mere man', u'Moore, Augustus', 12143L, False), (u'Harpers round table', u'', 11145L, False), (u'Tom Thatchers fortune', u'Alger, Horatio', 11890L, False), (u'The store boy', u'Alger, Horatio', 11882L, False), (u'Dan, the newsboy', u'Alger, Horatio', 10430L, False), (u'Joe Wayring at home, or, The adventures of a fly-rod', u'Fosdick, Charles Austin', 8431L, False), (u'Dorsey the young inventor', u'Ellis, Edward Sylvester', 11480L, False), (u'The last of the Mohicans', u'Cooper, James Fenimore', 9667L, False), (u'A woman tenderfoot', u'Seton-Thompson, Grace Gallatin', 12629L, False), (u'The prisoner of Zenda', u'Hawkins, Anthony Hope', 11429L, False), (u'Billy Baxters letters', u'Kountz, William J', 12617L, False), (u'The other fellow', u'Smith, Francis Hopkinson', 11779L, True), (u'A popular history of the United States of America', u'Ridpath, John Clark', 8535L, False), (u'George in camp', u'Fosdick, Charles Austin', 11866L, False), (u'Tony, the hero', u'Alger, Horatio', 11879L, False), (u'Hoosier schoolboy', u'Eggleston, Edward', 9623L, False), (u'Making fate', u'Alden, Isabella Macdonald', 10932L, False), (u'The reign of law', u'Allen, James Lane', 11914L, True), (u'Lorraine', u'Chambers, Robert W', 12422L, False), (u'Alice of old Vincennes', u'Thompson, Maurice', 12194L, False), (u'Little men', u'Alcott, Louisa May', 11835L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False)], {False: 99, True: 14}]
In [11]:
counts = {True: 0, False: 0}

for pn, p in enumerate(all_patrons):
    for k, v in p[4].iteritems():
        counts[k] += v

#for k, v in counts.iteritems():
#    print k, v
    
print 'number of checkouts by these readers', (counts[True] + counts[False])
print '% of their reading which is Lost Cause', float(counts[True]) / (counts[True] + counts[False]) * 100

print 

n_heavy_readers = 0
max_readers = []

x_for_plot_1 = []
y_for_plot_1 = []

x_for_plot_2 = []
y_for_plot_2 = []

for pn, p in enumerate(all_patrons):
    
    max_readers.append([[p[4][True],] + p[:3] + [p[4],]])
    
    y_for_plot_1.append(p[4][True])
    x_for_plot_1.append((p[4][True] + p[4][False]))
    
    if (p[4][True] + p[4][False]) <= 100:
        y_for_plot_2.append(p[4][True])
        x_for_plot_2.append((p[4][True] + p[4][False]))
        
        
    if (p[4][True] + p[4][False]) > 5:
        pct_ours = float(p[4][True]) / (p[4][True] + p[4][False]) * 100
        if pct_ours > 15.0:
            print p[:3], p[4], (p[4][True] + p[4][False]), pct_ours
            n_heavy_readers += 1
            
print
print 'n_heavy_readers', n_heavy_readers

max_readers.sort(reverse=True)

print
for r in max_readers[:50]:
    print r
number of checkouts by these readers 157452
% of their reading which is Lost Cause 7.55531844626

[315L, u'A.', u'Sheek'] {False: 7, True: 2} 9 22.2222222222
[318L, u'William', u'Dragu'] {False: 15, True: 5} 20 25.0
[325L, u'Victor', u'Silverburg'] {False: 21, True: 4} 25 16.0
[630L, u'Mary', u'Carpenter'] {False: 31, True: 12} 43 27.9069767442
[933L, u'Millie', u'Dare'] {False: 4, True: 2} 6 33.3333333333
[1332L, u'Arthur', u'Shideler'] {False: 14, True: 3} 17 17.6470588235
[1529L, u'Linda', u'Merriman'] {False: 5, True: 1} 6 16.6666666667
[1536L, u'Frank', u'Brown'] {False: 6, True: 2} 8 25.0
[1594L, u'Nella', u'Cochran'] {False: 4, True: 2} 6 33.3333333333
[2218L, u'Nellie', u'Wilson'] {False: 27, True: 6} 33 18.1818181818
[2305L, u'Mary', u'Boyer'] {False: 8, True: 2} 10 20.0
[2446L, u'Wm', u'Archer'] {False: 13, True: 4} 17 23.5294117647
[2471L, u'Louisa', u'Koerner'] {False: 82, True: 16} 98 16.3265306122
[2481L, u'Laura', u'Greely'] {False: 9, True: 2} 11 18.1818181818
[2506L, u'J.', u'Richards'] {False: 12, True: 3} 15 20.0
[2685L, u'Estella', u'McClillan'] {False: 9, True: 2} 11 18.1818181818
[2703L, u'E.', u'Ellsworth'] {False: 7, True: 2} 9 22.2222222222
[2787L, u'Walter', u'Jones'] {False: 10, True: 2} 12 16.6666666667
[2858L, u'Maude', u'Johnson'] {False: 11, True: 3} 14 21.4285714286
[2965L, u'Emma', u'Lacey'] {False: 7, True: 2} 9 22.2222222222
[2991L, u'Joe', u'Boehm'] {False: 74, True: 14} 88 15.9090909091
[2999L, u'Gulia', u'Cates'] {False: 14, True: 3} 17 17.6470588235
[3031L, u'Luella', u'Wineburg'] {False: 19, True: 4} 23 17.3913043478
[3034L, u'Donna', u'Cunningham'] {False: 16, True: 3} 19 15.7894736842
[3045L, u'Frank', u'Garner'] {False: 21, True: 6} 27 22.2222222222
[3056L, u'Albert', u'Lewellen'] {False: 12, True: 4} 16 25.0
[3108L, u'Clifford', u'Brown'] {False: 66, True: 16} 82 19.512195122
[3119L, u'Rollie', u'Bunch'] {False: 35, True: 11} 46 23.9130434783
[3133L, u'Wilbur', u'Personett'] {False: 49, True: 16} 65 24.6153846154
[3139L, u'Mary', u'Geiger'] {False: 11, True: 2} 13 15.3846153846
[3219L, u'John', u'Russey'] {False: 57, True: 11} 68 16.1764705882
[3244L, u'Ralph', u'Hyer'] {False: 29, True: 6} 35 17.1428571429
[3254L, u'M', u'Thornton'] {False: 26, True: 5} 31 16.1290322581
[3300L, u'Otto', u'Gundlach'] {False: 20, True: 8} 28 28.5714285714
[3309L, u'Jannie', u'Ross'] {False: 11, True: 2} 13 15.3846153846
[3334L, u'Charlie', u'Ebrite'] {False: 16, True: 10} 26 38.4615384615
[3346L, u'W.', u'Dicks'] {False: 13, True: 3} 16 18.75
[3364L, u'Leona', u'Heonze'] {False: 4, True: 2} 6 33.3333333333
[3381L, u'Addison', u'Templer'] {False: 38, True: 7} 45 15.5555555556
[3387L, u'Louis', u'Barber'] {False: 7, True: 2} 9 22.2222222222
[3393L, u'Frank', u'Marts'] {False: 9, True: 2} 11 18.1818181818
[3414L, u'Frank', u'Macdonald'] {False: 14, True: 4} 18 22.2222222222
[3419L, u'Ira', u'Koin'] {False: 48, True: 10} 58 17.2413793103
[3438L, u'May', u'Daugherty'] {False: 17, True: 4} 21 19.0476190476
[3454L, u'Dennie', u'Gray'] {False: 10, True: 3} 13 23.0769230769
[3491L, u'Marian', u'Ferguson'] {False: 11, True: 3} 14 21.4285714286
[3495L, u'Lizzie', u'Hardie'] {False: 35, True: 7} 42 16.6666666667
[3508L, u'D.', u'McAfee'] {False: 19, True: 6} 25 24.0
[3541L, u'Bird', u'Richardson'] {False: 30, True: 10} 40 25.0
[3545L, u'Frank', u'Stephens'] {False: 13, True: 3} 16 18.75
[3562L, u'Roy', u'Coffeen'] {False: 33, True: 10} 43 23.2558139535
[3573L, u'Edgar', u'Williams'] {False: 12, True: 4} 16 25.0
[3603L, u'Jno', u'Meeks'] {False: 15, True: 5} 20 25.0
[3610L, u'Carl', u'Coffin'] {False: 43, True: 9} 52 17.3076923077
[3633L, u'Harry', u'Dunnington'] {False: 27, True: 7} 34 20.5882352941
[3642L, u'James', u'Mathew'] {False: 10, True: 3} 13 23.0769230769
[3650L, u'Lizzie', u'Howard'] {False: 11, True: 2} 13 15.3846153846
[3652L, u'Walter', u'Chambers'] {False: 25, True: 5} 30 16.6666666667
[3655L, u'Besse', u'Baughman'] {False: 22, True: 6} 28 21.4285714286
[3657L, u'E.', u'Scanland'] {False: 8, True: 2} 10 20.0
[3671L, u'J.', u'Brown'] {False: 21, True: 4} 25 16.0
[3688L, u'Alice', u'Burt'] {False: 22, True: 7} 29 24.1379310345
[3693L, u'Austin', u'Claypool'] {False: 11, True: 2} 13 15.3846153846
[3694L, u'Pearl', u'Cocran'] {False: 26, True: 6} 32 18.75
[3699L, u'Chester', u'Ryan'] {False: 10, True: 3} 13 23.0769230769
[3715L, u'Harry', u'Robbins'] {False: 34, True: 9} 43 20.9302325581
[3718L, u'Ralph', u'Ault'] {False: 10, True: 2} 12 16.6666666667
[3752L, u'Mary', u'Petty'] {False: 47, True: 11} 58 18.9655172414
[3766L, u'Rosalind', u'Ringolsky'] {False: 17, True: 8} 25 32.0
[3772L, u'Edith', u'Jones'] {False: 49, True: 9} 58 15.5172413793
[3783L, u'Harry', u'Millerns'] {False: 18, True: 8} 26 30.7692307692
[3790L, u'Geo.', u'Arnold'] {False: 30, True: 8} 38 21.0526315789
[3805L, u'Clifford', u'Hilty'] {False: 5, True: 1} 6 16.6666666667
[3812L, u'A.', u'Bingham'] {False: 8, True: 2} 10 20.0
[3831L, u'Frank', u'Glass'] {False: 37, True: 7} 44 15.9090909091
[3862L, u'W', u'Morrow'] {False: 21, True: 4} 25 16.0
[3865L, u'Nellie', u'Campbell'] {False: 13, True: 6} 19 31.5789473684
[3873L, u'E', u'Merlin'] {False: 5, True: 1} 6 16.6666666667
[3881L, u'R', u'Clark'] {False: 27, True: 5} 32 15.625
[3887L, u'Jef', u'Brooker'] {False: 6, True: 2} 8 25.0
[3899L, u'May', u'Hinshaw'] {False: 26, True: 5} 31 16.1290322581
[3905L, u'Maggie', u'Femyer'] {False: 27, True: 8} 35 22.8571428571
[3932L, u'Agnes', u'Zeller'] {False: 7, True: 3} 10 30.0
[3939L, u'Lola', u'Woolfington'] {False: 46, True: 10} 56 17.8571428571
[3940L, u'Louis', u'McCann'] {False: 6, True: 2} 8 25.0
[3956L, u'Mary', u'Glenn'] {False: 37, True: 7} 44 15.9090909091
[3957L, u'C.', u'Sherritt'] {False: 70, True: 13} 83 15.6626506024
[3978L, u'Leo', u'Lyons'] {False: 21, True: 10} 31 32.2580645161
[3989L, u'Otto', u'Ream'] {False: 43, True: 8} 51 15.6862745098
[4027L, u'Nellie', u'Weisse'] {False: 11, True: 2} 13 15.3846153846
[4032L, u'Lida', u'Neville'] {False: 12, True: 3} 15 20.0
[4042L, u'Mamie', u'Beck'] {False: 13, True: 5} 18 27.7777777778
[4078L, u'Frank', u'Haines'] {False: 46, True: 9} 55 16.3636363636
[4084L, u'Dan', u'McAber'] {False: 6, True: 2} 8 25.0
[4109L, u'F.', u'Hays'] {False: 9, True: 2} 11 18.1818181818
[4126L, u'Lloyd', u'Carver'] {False: 23, True: 5} 28 17.8571428571
[4147L, u'Edward', u'Turner'] {False: 9, True: 6} 15 40.0
[4158L, u'Frank', u'Abrams'] {False: 8, True: 6} 14 42.8571428571
[4199L, u'Howard', u'Dudley'] {False: 34, True: 8} 42 19.0476190476
[4207L, u'Anna', u'Johnson'] {False: 28, True: 5} 33 15.1515151515
[4215L, u'Orville', u'Franklin'] {False: 23, True: 5} 28 17.8571428571
[4241L, u'Frank', u'Jewett'] {False: 67, True: 14} 81 17.2839506173
[4253L, u'Earl', u'Tuhey'] {False: 43, True: 8} 51 15.6862745098
[4259L, u'Frank', u'Doherty'] {False: 8, True: 2} 10 20.0
[4268L, u'Leonard', u'Boomer'] {False: 9, True: 2} 11 18.1818181818
[4285L, u'Katie', u'Sullivan'] {False: 63, True: 22} 85 25.8823529412
[4289L, u'Anna', u'Hostetter'] {False: 8, True: 3} 11 27.2727272727
[4304L, u'Garfield', u'Olin'] {False: 32, True: 7} 39 17.9487179487
[4319L, u'Ulric', u'Hurrle'] {False: 21, True: 5} 26 19.2307692308
[4372L, u'Samuel', u'Higgitt'] {False: 35, True: 9} 44 20.4545454545
[4403L, u'Ralph', u'Huff'] {False: 16, True: 3} 19 15.7894736842
[4419L, u'Flora', u'King'] {False: 46, True: 9} 55 16.3636363636
[4436L, u'Harry', u'Mock'] {False: 36, True: 10} 46 21.7391304348
[4454L, u'Frank', u'Morgan'] {False: 64, True: 12} 76 15.7894736842
[4462L, u'Mary', u'Snider'] {False: 141, True: 27} 168 16.0714285714
[4479L, u'Roscoe', u'Jones'] {False: 46, True: 14} 60 23.3333333333
[4482L, u'D.', u'Ault'] {False: 11, True: 5} 16 31.25
[4498L, u'Clara', u'Horlacher'] {False: 24, True: 5} 29 17.2413793103
[4511L, u'Mayme', u'Mahoney'] {False: 3, True: 4} 7 57.1428571429
[4519L, u'S.', u'Jump'] {False: 16, True: 3} 19 15.7894736842
[4520L, u'Mamie', u'Ryan'] {False: 16, True: 4} 20 20.0
[4532L, u'S.', u'Dye'] {False: 21, True: 5} 26 19.2307692308
[4556L, u'Emma', u'Yockey'] {False: 16, True: 4} 20 20.0
[4559L, u'Arthur', u'Bell'] {False: 6, True: 2} 8 25.0
[4585L, u'Charles', u'Higgins'] {False: 19, True: 5} 24 20.8333333333
[4594L, u'William', u'Doherty'] {False: 9, True: 3} 12 25.0
[4622L, u'James', u'Guthrie'] {False: 16, True: 6} 22 27.2727272727
[4635L, u'May', u'Harvey'] {False: 27, True: 5} 32 15.625
[4648L, u'Minnie', u'Higgins'] {False: 15, True: 3} 18 16.6666666667
[4672L, u'H.', u'Hayden'] {False: 22, True: 6} 28 21.4285714286
[4674L, u'Maude', u'Dotson'] {False: 12, True: 3} 15 20.0
[4676L, u'Jas', u'Bingham'] {False: 15, True: 3} 18 16.6666666667
[4704L, u'Katie', u'Doherty'] {False: 10, True: 3} 13 23.0769230769
[4721L, u'Nellie', u'Lutz'] {False: 5, True: 1} 6 16.6666666667
[4736L, u'Austin', u'Kerin'] {False: 106, True: 21} 127 16.5354330709
[4739L, u'Mary', u'Sample'] {False: 11, True: 2} 13 15.3846153846
[4747L, u'Cecil', u'Jones'] {False: 72, True: 13} 85 15.2941176471
[4760L, u'Edith', u'Manor'] {False: 13, True: 3} 16 18.75
[4766L, u'Jesse', u'Nixon'] {False: 8, True: 4} 12 33.3333333333
[4797L, u'Gertie', u'Nicholson'] {False: 87, True: 18} 105 17.1428571429
[4807L, u'Cleone', u'Hene'] {False: 25, True: 5} 30 16.6666666667
[4855L, u'Ralph', u'Martin'] {False: 15, True: 7} 22 31.8181818182
[4858L, u'Josephine', u'Maddux'] {False: 28, True: 8} 36 22.2222222222
[4861L, u'Tillie', u'Cline'] {False: 4, True: 3} 7 42.8571428571
[4885L, u'Ada', u'Williams'] {False: 30, True: 9} 39 23.0769230769
[4887L, u'J.', u'Fennimore'] {False: 11, True: 5} 16 31.25
[4897L, u'Nettie', u'Buster'] {False: 11, True: 2} 13 15.3846153846
[4903L, u'G.', u'Kidnocker'] {False: 10, True: 3} 13 23.0769230769
[4909L, u'Orel', u'Williams'] {False: 30, True: 7} 37 18.9189189189
[4934L, u'Ora', u'Snowberger'] {False: 10, True: 3} 13 23.0769230769
[4935L, u'Cliffie', u'Newman'] {False: 31, True: 6} 37 16.2162162162
[4953L, u'Mamie', u'Grundy'] {False: 24, True: 10} 34 29.4117647059
[4956L, u'J', u'Dickson'] {False: 21, True: 4} 25 16.0
[4980L, u'Anna', u'Lambert'] {False: 16, True: 4} 20 20.0
[4981L, u'Bessie', u'Resoner'] {False: 11, True: 2} 13 15.3846153846
[5009L, u'Mar', u'McCoy'] {False: 16, True: 9} 25 36.0
[5014L, u'Lawrence', u'Norton'] {False: 50, True: 9} 59 15.2542372881
[5017L, u'G', u'McVicker'] {False: 5, True: 1} 6 16.6666666667
[5030L, u'Elizabeth', u'Malone'] {False: 28, True: 6} 34 17.6470588235
[5037L, u'J.', u'Canfield'] {False: 39, True: 8} 47 17.0212765957
[5058L, u'Hoover', u'Dragoo'] {False: 28, True: 5} 33 15.1515151515
[5059L, u'Cecil', u'Allspaw'] {False: 28, True: 7} 35 20.0
[5060L, u'Albert', u'Williams'] {False: 17, True: 6} 23 26.0869565217
[5069L, u'Loretta', u'Hene'] {False: 17, True: 4} 21 19.0476190476
[5147L, u'Emily', u'Olcott'] {False: 28, True: 8} 36 22.2222222222
[5157L, u'Ernest', u'Collins'] {False: 8, True: 3} 11 27.2727272727
[5160L, u'Walter', u'Tuhey'] {False: 10, True: 3} 13 23.0769230769
[5164L, u'J.', u'Williams'] {False: 16, True: 3} 19 15.7894736842
[5168L, u'Bessie', u'Rinard'] {False: 12, True: 3} 15 20.0
[5169L, u'J.', u'White'] {False: 7, True: 2} 9 22.2222222222
[5173L, u'John', u'Sullivan'] {False: 5, True: 5} 10 50.0
[5189L, u'Harry', u'Cowan'] {False: 5, True: 1} 6 16.6666666667
[5199L, u'Karl', u'Cecil'] {False: 15, True: 3} 18 16.6666666667
[5202L, u'Thos.', u'Richey'] {False: 8, True: 2} 10 20.0
[5205L, u'Louis', u'Buettner'] {False: 10, True: 2} 12 16.6666666667
[5217L, u'Joseph', u'Rosbosbottom'] {False: 20, True: 4} 24 16.6666666667
[5221L, u'Fidelia', u'Royse'] {False: 14, True: 3} 17 17.6470588235
[5247L, u'Newton', u'Miller'] {False: 23, True: 5} 28 17.8571428571
[5250L, u'G.', u'Glume [?]'] {False: 10, True: 4} 14 28.5714285714
[5258L, u'L.', u'Cash'] {False: 22, True: 4} 26 15.3846153846
[5265L, u'Harold', u'Milligan'] {False: 31, True: 6} 37 16.2162162162
[5272L, u'L.', u'Young'] {False: 5, True: 4} 9 44.4444444444
[5298L, u'J.', u'Scott'] {False: 26, True: 5} 31 16.1290322581
[5315L, u'Chester', u'Wardlow'] {False: 45, True: 14} 59 23.7288135593
[5316L, u'J.', u'Mac Neill'] {False: 21, True: 5} 26 19.2307692308
[5326L, u'Margaret', u'Wagner'] {False: 33, True: 7} 40 17.5
[5342L, u'Vida', u'Stacy'] {False: 22, True: 4} 26 15.3846153846
[5355L, u'James', u'McDowell'] {False: 16, True: 3} 19 15.7894736842
[5358L, u'Nora', u'Bradbury'] {False: 36, True: 7} 43 16.2790697674
[5361L, u'James', u'Harkins'] {False: 9, True: 2} 11 18.1818181818
[5366L, u'Lillian', u'Gibson'] {False: 16, True: 7} 23 30.4347826087
[5368L, u'W.', u'Malone'] {False: 13, True: 3} 16 18.75
[5391L, u'Pearl', u'Humfeld'] {False: 15, True: 3} 18 16.6666666667
[5404L, u'Gilbert', u'Humfeld'] {False: 11, True: 2} 13 15.3846153846
[5414L, u'James', u'Rice'] {False: 4, True: 2} 6 33.3333333333
[5418L, u'E.', u'Schmitts'] {False: 7, True: 3} 10 30.0
[5420L, u'Belle', u'Johnson'] {False: 61, True: 13} 74 17.5675675676
[5440L, u'Gilbert', u'Woodyard'] {False: 50, True: 9} 59 15.2542372881
[5471L, u'Ada', u'Gentry'] {False: 17, True: 4} 21 19.0476190476
[5478L, u'Bessie', u'Cohen'] {False: 26, True: 5} 31 16.1290322581
[5480L, u'J.', u'McGormley'] {False: 5, True: 3} 8 37.5
[5485L, u'M.', u'Pearson'] {False: 13, True: 4} 17 23.5294117647
[5489L, u'Alice', u'Wall'] {False: 29, True: 7} 36 19.4444444444
[5492L, u'Fred', u'Bowman'] {False: 48, True: 10} 58 17.2413793103
[5502L, u'Chas', u'Hanley'] {False: 49, True: 11} 60 18.3333333333
[5510L, u'James', u'Whinrey'] {False: 21, True: 4} 25 16.0
[5523L, u'Claud', u'Mathews'] {False: 4, True: 4} 8 50.0
[5545L, u'Olive', u'Gunder'] {False: 13, True: 3} 16 18.75
[5548L, u'Della', u'Wright'] {False: 44, True: 10} 54 18.5185185185
[5564L, u'Melvin', u'Cramer'] {False: 44, True: 8} 52 15.3846153846
[5572L, u'Minnie', u'Westfall'] {False: 31, True: 6} 37 16.2162162162
[5578L, u'Clara', u'Elrod'] {False: 16, True: 3} 19 15.7894736842
[5587L, u'A.', u'Stahl'] {False: 13, True: 3} 16 18.75
[5591L, u'Harold', u'Hamilton'] {False: 38, True: 7} 45 15.5555555556
[5602L, u'Cora', u'Bruns'] {False: 23, True: 6} 29 20.6896551724
[5612L, u'Earl', u'Dennis'] {False: 13, True: 3} 16 18.75
[5619L, u'Cora', u'Nobel'] {False: 49, True: 11} 60 18.3333333333
[5620L, u'Anna', u'Cory'] {False: 17, True: 5} 22 22.7272727273
[5633L, u'Mary', u'Young'] {False: 24, True: 7} 31 22.5806451613
[5637L, u'E.', u'Ice'] {False: 12, True: 5} 17 29.4117647059
[5659L, u'L.', u'Preston'] {False: 11, True: 2} 13 15.3846153846
[5663L, u'Willie', u'Doermann'] {False: 11, True: 2} 13 15.3846153846
[5672L, u'Con', u'Hanley'] {False: 36, True: 7} 43 16.2790697674
[5681L, u'May', u'Shafer'] {False: 34, True: 9} 43 20.9302325581
[5683L, u'Fleeta', u'McProud'] {False: 5, True: 3} 8 37.5
[5689L, u'Charlotte', u'Bishop'] {False: 5, True: 1} 6 16.6666666667
[5691L, u'Grace', u'Williams'] {False: 1, True: 5} 6 83.3333333333
[5712L, u'Alen', u'Todd'] {False: 9, True: 2} 11 18.1818181818
[5716L, u'May', u'Dunlap'] {False: 15, True: 3} 18 16.6666666667
[5720L, u'Laura', u'Corbly'] {False: 16, True: 4} 20 20.0
[5725L, u'Helen', u'Jackson'] {False: 7, True: 2} 9 22.2222222222
[5731L, u'Marie', u'Long'] {False: 9, True: 2} 11 18.1818181818
[5735L, u'Alvin', u'Hancock'] {False: 30, True: 9} 39 23.0769230769
[5755L, u'Mary', u'Bartle'] {False: 7, True: 2} 9 22.2222222222
[5778L, u'Nellie', u'Russey'] {False: 8, True: 2} 10 20.0
[5783L, u'Grace', u'Parker'] {False: 6, True: 2} 8 25.0
[5787L, u'Howard', u'Rinewalt'] {False: 6, True: 2} 8 25.0
[5795L, u'Clair', u'Stephens'] {False: 12, True: 5} 17 29.4117647059
[5812L, u'Clarence', u'Reamer'] {False: 8, True: 3} 11 27.2727272727
[5814L, u'Kenneth', u'Reid'] {False: 19, True: 5} 24 20.8333333333
[5823L, u'Velma', u'Witamyer'] {False: 22, True: 4} 26 15.3846153846
[5825L, u'Gertie', u'Kennedy'] {False: 28, True: 5} 33 15.1515151515
[5826L, u'Charlie', u'Thompson'] {False: 6, True: 2} 8 25.0
[5832L, u'Malcolm', u'Harriott'] {False: 22, True: 5} 27 18.5185185185
[5835L, u'Chas', u'Oxley'] {False: 10, True: 5} 15 33.3333333333
[5844L, u'Crawford', u'Murton'] {False: 12, True: 4} 16 25.0
[5850L, u'Cora', u'Calvert'] {False: 5, True: 2} 7 28.5714285714
[5854L, u'Bertie', u'Reisor'] {False: 38, True: 8} 46 17.3913043478
[5861L, u'Carl', u'May'] {False: 9, True: 2} 11 18.1818181818
[5863L, u'Grace', u'Flinn'] {False: 37, True: 10} 47 21.2765957447
[5866L, u'Daisy', u'Hamilton'] {False: 22, True: 5} 27 18.5185185185
[5867L, u'Walter', u'Mercer'] {False: 29, True: 6} 35 17.1428571429
[5873L, u'Lillie', u'Lynn'] {False: 12, True: 3} 15 20.0
[5876L, u'Haley', u'McVay'] {False: 11, True: 2} 13 15.3846153846
[5881L, u'C.', u'Helvie'] {False: 4, True: 3} 7 42.8571428571
[5882L, u'Earl', u'Dillman'] {False: 34, True: 7} 41 17.0731707317
[5888L, u'Orva', u'Emerson'] {False: 8, True: 2} 10 20.0
[5889L, u'Turner', u'McKinney'] {False: 25, True: 7} 32 21.875
[5893L, u'Jennie', u'Brooker'] {False: 15, True: 3} 18 16.6666666667
[5902L, u'Grace', u'Chew'] {False: 8, True: 2} 10 20.0
[5903L, u'Medora', u'Hopkins'] {False: 12, True: 3} 15 20.0
[5904L, u'E.', u'Roberts'] {False: 5, True: 1} 6 16.6666666667
[5913L, u'Winfred', u'Weaver'] {False: 16, True: 3} 19 15.7894736842
[5926L, u'Clara', u'Smith'] {False: 39, True: 7} 46 15.2173913043
[5929L, u'Willie', u'Mudge'] {False: 10, True: 4} 14 28.5714285714
[5930L, u'Harry', u'McDaniel'] {False: 16, True: 4} 20 20.0
[5937L, u'Frank', u'Lashley'] {False: 5, True: 1} 6 16.6666666667
[5938L, u'V.', u'Prather'] {False: 13, True: 3} 16 18.75
[5940L, u'Willard', u'Lego'] {False: 22, True: 5} 27 18.5185185185
[5944L, u'Charles', u'Martin'] {False: 14, True: 4} 18 22.2222222222
[5950L, u'Louis', u'Shaffer'] {False: 10, True: 4} 14 28.5714285714
[5957L, u'Frank', u'Reese'] {False: 36, True: 8} 44 18.1818181818
[5959L, u'Edgar', u'Driscoll'] {False: 36, True: 7} 43 16.2790697674
[5973L, u'John', u'Jensma'] {False: 21, True: 6} 27 22.2222222222
[5977L, u'Cora', u'Retherford'] {False: 7, True: 4} 11 36.3636363636
[5995L, u'M.', u'Thompson'] {False: 18, True: 5} 23 21.7391304348
[6003L, u'Norman', u'Winters'] {False: 26, True: 5} 31 16.1290322581
[6007L, u'Lawrence', u'Hermann'] {False: 5, True: 2} 7 28.5714285714
[6009L, u'Carrie', u'Little'] {False: 9, True: 3} 12 25.0
[6011L, u'Blanche', u'Hughes'] {False: 10, True: 4} 14 28.5714285714
[6013L, u'Emory', u'Ullom'] {False: 31, True: 6} 37 16.2162162162
[6027L, u'Orville', u'Zook'] {False: 5, True: 2} 7 28.5714285714
[6030L, u'H', u'Spickermon'] {False: 9, True: 4} 13 30.7692307692
[6048L, u'Homer', u'Murray'] {False: 5, True: 1} 6 16.6666666667
[6050L, u'R.', u'Murphy'] {False: 28, True: 5} 33 15.1515151515
[6069L, u'Mary', u'Finley'] {False: 5, True: 3} 8 37.5
[6075L, u'Minnie', u'Parkin'] {False: 16, True: 5} 21 23.8095238095
[6092L, u'Della', u'Ault'] {False: 16, True: 4} 20 20.0
[6100L, u'Bessie', u'McAuley'] {False: 5, True: 1} 6 16.6666666667
[6101L, u'Gale', u'Bunton'] {False: 4, True: 2} 6 33.3333333333
[6104L, u'M.', u'Ludwig'] {False: 6, True: 3} 9 33.3333333333
[6105L, u'O', u'Owens'] {False: 5, True: 1} 6 16.6666666667
[6114L, u'F.', u'Heisenheimer'] {False: 26, True: 5} 31 16.1290322581
[6134L, u'Laura', u'Aery'] {False: 13, True: 3} 16 18.75
[6136L, u'Bertha', u'Bryan'] {False: 6, True: 2} 8 25.0
[6137L, u'Ray', u'Clark'] {False: 10, True: 2} 12 16.6666666667
[6138L, u'Max', u'Hutzel'] {False: 10, True: 2} 12 16.6666666667
[6156L, u'Edna', u'Calvin'] {False: 7, True: 2} 9 22.2222222222
[6162L, u'Lulu', u'Badders'] {False: 10, True: 2} 12 16.6666666667
[6163L, u'Ella', u'Phillips'] {False: 15, True: 3} 18 16.6666666667
[6164L, u'Robert', u'Best'] {False: 21, True: 4} 25 16.0
[6169L, u'Geo.', u'Rivers'] {False: 13, True: 3} 16 18.75
[6178L, u'Rosa', u'Needles'] {False: 6, True: 4} 10 40.0
[6194L, u'Bertha', u'DeMount'] {False: 7, True: 2} 9 22.2222222222
[6210L, u'Harry', u'Walton'] {False: 6, True: 3} 9 33.3333333333
[6212L, u'Arthur', u'McWhorter'] {False: 9, True: 2} 11 18.1818181818
[6217L, u'Jessie', u'Whitcomb'] {False: 5, True: 1} 6 16.6666666667
[6218L, u'Carl', u'O Harra'] {False: 4, True: 2} 6 33.3333333333
[6222L, u'E', u'Albright'] {False: 23, True: 7} 30 23.3333333333
[6228L, u'Clarence', u'Hutchinson'] {False: 15, True: 3} 18 16.6666666667
[6235L, u'Warren', u'Freeman'] {False: 5, True: 2} 7 28.5714285714
[6242L, u'Harry', u'Lowe.'] {False: 5, True: 1} 6 16.6666666667
[6244L, u'Charlotte', u'Stump'] {False: 5, True: 1} 6 16.6666666667
[6267L, u'Florence', u'Bavis'] {False: 5, True: 1} 6 16.6666666667

n_heavy_readers 314

[[49, 2814L, u'Bobbie', u'Knowlton', {False: 548, True: 49}]]
[[47, 715L, u'Addie', u'Knowlton', {False: 435, True: 47}]]
[[43, 2087L, u'W', u'Snyder', {False: 394, True: 43}]]
[[38, 5028L, u'Harry', u'Ritter', {False: 356, True: 38}]]
[[35, 4014L, u'E.', u'Templer', {False: 395, True: 35}]]
[[34, 3200L, u'Earl', u'Williams', {False: 278, True: 34}]]
[[34, 1217L, u'R.', u'Monroe', {False: 300, True: 34}]]
[[31, 4524L, u'Wysor', u'Marsh', {False: 429, True: 31}]]
[[31, 4314L, u'Edna', u'Smith', {False: 263, True: 31}]]
[[30, 3788L, u'Ralph', u'Jackson', {False: 284, True: 30}]]
[[29, 3271L, u'Claud', u'Smith', {False: 227, True: 29}]]
[[28, 4501L, u'Edna', u'Hoover', {False: 306, True: 28}]]
[[28, 3494L, u'Herbert', u'Houze', {False: 290, True: 28}]]
[[28, 3325L, u'I.', u'Saxon', {False: 607, True: 28}]]
[[28, 3110L, u'Wayman', u'Adams', {False: 257, True: 28}]]
[[28, 2850L, u'Theo.', u'Johnson', {False: 305, True: 28}]]
[[27, 4462L, u'Mary', u'Snider', {False: 141, True: 27}]]
[[26, 3487L, u'Harry', u'Ault', {False: 200, True: 26}]]
[[25, 1802L, u'Vollie', u'Bower', {False: 234, True: 25}]]
[[24, 5375L, u'Frank', u'Leon', {False: 312, True: 24}]]
[[24, 3116L, u'Jas.', u'Ross', {False: 202, True: 24}]]
[[24, 2885L, u'Orville', u'Spurgeon', {False: 171, True: 24}]]
[[23, 3331L, u'Karl', u'Nutting', {False: 193, True: 23}]]
[[23, 3308L, u'Guy', u'Tweedy', {False: 218, True: 23}]]
[[23, 2783L, u'Earl', u'Nutting', {False: 143, True: 23}]]
[[23, 2452L, u'Omer', u'Mitchell', {False: 483, True: 23}]]
[[22, 4513L, u'Ralph', u'Winters', {False: 218, True: 22}]]
[[22, 4285L, u'Katie', u'Sullivan', {False: 63, True: 22}]]
[[22, 3170L, u'Ella', u'Carey', {False: 332, True: 22}]]
[[22, 2649L, u'Helen', u'Hickman', {False: 350, True: 22}]]
[[22, 2534L, u'Allie', u'McMillan', {False: 263, True: 22}]]
[[21, 4736L, u'Austin', u'Kerin', {False: 106, True: 21}]]
[[21, 4380L, u'Merril', u'Skinner', {False: 242, True: 21}]]
[[21, 4168L, u'Rudolph', u'Bloom', {False: 317, True: 21}]]
[[21, 4103L, u'Clark', u'Munn', {False: 233, True: 21}]]
[[21, 3358L, u'Nellie', u'Spooner', {False: 145, True: 21}]]
[[21, 2970L, u'Louis', u'Bloom', {False: 266, True: 21}]]
[[21, 2942L, u'Homer', u'Dowell', {False: 145, True: 21}]]
[[20, 4991L, u'Warren', u'Hutsell', {False: 157, True: 20}]]
[[20, 4979L, u'Leonard', u'Leslie', {False: 157, True: 20}]]
[[20, 4242L, u'Dora', u'Mitchell', {False: 368, True: 20}]]
[[20, 4023L, u'Roy', u'Harrington', {False: 139, True: 20}]]
[[20, 3927L, u'Roscoe', u'Lorentz', {False: 192, True: 20}]]
[[20, 3418L, u'Leslie', u'Greely', {False: 155, True: 20}]]
[[20, 3390L, u'Warren', u'Sample', {False: 195, True: 20}]]
[[19, 4637L, u'Harry', u'White', {False: 161, True: 19}]]
[[19, 4358L, u'J.', u'Leatherman', {False: 317, True: 19}]]
[[19, 3742L, u'Carrie', u'Cohn', {False: 344, True: 19}]]
[[19, 3689L, u'Charles', u'Reece', {False: 298, True: 19}]]
[[19, 3647L, u'E.', u'Younce', {False: 115, True: 19}]]
In [12]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

plt.figure(figsize=(12,12))

plt.title('LOST CAUSE CORPUS READING -- ALL PATRONS')

plt.xlabel('total number of checkouts')
plt.ylabel('checkouts from our corpus')

plt.scatter(x_for_plot_1, y_for_plot_1, s=50, alpha=.15)

slope, intercept, r_value, p_value, std_err = stats.linregress(x_for_plot_1, y_for_plot_1)
        
line = slope*np.array(x_for_plot_1)+intercept

plt.plot(x_for_plot_1, line, 'r')
Out[12]:
[<matplotlib.lines.Line2D at 0x7f788a372c50>]
In [13]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

plt.figure(figsize=(12,12))

plt.title('LOST CAUSE CORPUS READING -- PATRONS <= 100 CHECKOUTS')

plt.xlabel('total number of checkouts')
plt.ylabel('checkouts from our corpus')

plt.scatter(x_for_plot_2, y_for_plot_2, s=50, alpha=.15)

slope, intercept, r_value, p_value, std_err = stats.linregress(x_for_plot_2, y_for_plot_2)
        
line = slope*np.array(x_for_plot_2)+intercept

plt.plot(x_for_plot_2, line, 'r')
Out[13]:
[<matplotlib.lines.Line2D at 0x7f7889fe1f10>]

What sort of demographics do we have?

In [14]:
from collections import defaultdict

patron_trust = defaultdict(int)
transaction_trust = defaultdict(int)

for pn, p in enumerate(all_patrons):
    
    cA = db.cursor()
    cA.execute('SELECT DISTINCT TRUST_THIS_CENSUS, PATRON_AGE, GENDER ' + \
                   'FROM flattenedData WHERE patron_number = ' + \
                   str(p[0]))
    resultsA = cA.fetchall()
               
    all_patrons[pn] = list(all_patrons[pn])
    all_patrons[pn].append(resultsA[0])
    
    patron_trust[resultsA[0][0]] += 1
    transaction_trust[resultsA[0][0]] += len(p[3])
    
print
print 'nbr patrons with "trust this census" (value 1) vs not (value 0)'
print
for k, v in patron_trust.iteritems():
    print k, v
    
print
print 'nbr transactions with "trust this census" (value 1) vs not (value 0)'
print
for k, v in transaction_trust.iteritems():
    print k, v
nbr patrons with "trust this census" (value 1) vs not (value 0)

0 1041
1 1495

nbr transactions with "trust this census" (value 1) vs not (value 0)

0 52787
1 104665

. . . so I can see what I have

In [15]:
print all_patrons[0]
[6L, u'Albert', u'Carpenter', [(u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'At the councillors', u'John, Eugenie', 8565L, False), (u'Harpers young people', u'', 8846L, False), (u'Lippincotts monthly magazine', u'', 7175L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'Harpers young people', u'', 9274L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Gold Elsie', u'John, Eugenie', 6617L, False), (u'Sweet', u'Bouvet, Marguerite', 9164L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'The circuit rider', u'Eggleston, Edward', 2022L, False), (u'St Nicholas', u'', 8855L, False), (u'Harpers young people', u'', 8851L, False), (u'The Popular science monthly', u'', 9264L, False), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'St Elmo', u'Evans, Augusta J', 6597L, True), (u'Half-hours with the best authors', u'Knight, Charles', 9022L, False), (u'Her life, letters and journals', u'Alcott, Louisa May', 8221L, False), (u'Annual report of the Chief Signal Officer made to the Secretary of War for the year', u'United States Army Signal Corps', 8553L, False), (u'Ragged Dick', u'Alger, Horatio', 7727L, False), (u'Fame and fortune, or, The progress of Richard Hunter', u'Alger, Horatio', 7728L, False), (u'Little Lord Fauntleroy', u'Burnett, Frances Hodgson', 6571L, False), (u'My hearts darling', u'Heimburg, W', 8649L, False), (u'The story of Patsy', u'Wiggin, Kate Douglas Smith', 8347L, False), (u'Five little Peppers and how they grew', u'Sidney, Margaret', 8472L, False), (u'Pecks boss book', u'Peck, George W', 7779L, False), (u'Helens babies', u'Habberton, John', 2670L, False), (u'The Holly-tree Inn', u'Dickens, Charles', 2008L, False), (u'Five little Peppers midway', u'Sidney, Margaret', 8473L, False), (u'His sombre rivals', u'Roe, Edward Payson', 8493L, True), (u'Castle Hohenwald', u'Streckfuss, Adolf', 8350L, False), (u'The lady with the rubies', u'John, Eugenie', 9046L, False), (u'Sweet as a rose', u'Durward, Mostyn', 7435L, False), (u'Harpers young people', u'', 8978L, False), (u'St Nicholas', u'', 9270L, False), (u'The Swiss family Robinson', u'Wyss, Johann David', 10451L, False), (u'The starry flag', u'Adams, William Tq', 8334L, False), (u'Infelice', u'Evans, Augusta J', 8505L, True), (u'The practical metal worker', u'', 2113L, False), (u'Two little pilgrims progress', u'Burnett, Frances Hodgson', 9586L, False), (u'Make or break', u'Adams, William Tq', 9135L, False), (u'On time, or, The young captain of the Ucayga steamer', u'Adams, William Tq', 8332L, False), (u'Seek and find', u'Adams, William Tq', 9134L, False), (u'Freaks of fortune, or, Half round the world', u'Adams, William Tq', 9132L, False), (u'The telegraph boy', u'Alger, Horatio', 8444L, False), (u'Brake up', u'Adams, William Tq', 9145L, False), (u'Lightning express, or, The Rival academies', u'Adams, William Tq', 535L, False), (u'Switch off, or, The war of the students', u'Adams, William Tq', 8333L, False), (u'The land of pluck', u'Dodge, Mary Mapes', 9649L, False), (u'Down the river', u'Adams, William Tq', 9136L, True), (u'Through by daylight, or, The young engineer of the Lake Shore Railroad', u'Adams, William Tq', 534L, False), (u'Luck and pluck', u'Alger, Horatio', 8447L, False), (u'Rough and ready, or, Life among the New York newsboys', u'Alger, Horatio', 7730L, False), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A victorious union', u'Adams, William Tq', 9152L, True), (u'A popular account of the ancient Egyptians', u'Wilkinson, J Gardner', 325L, False), (u'Taken by the enemy', u'Adams, William Tq', 9147L, True), (u'Bens nugget, or, A boys search for fortune', u'Alger, Horatio', 7725L, False), (u'Stand by the Union', u'Adams, William Tq', 9150L, True), (u'The War of the Rebellion', u'United States War Dept', 9078L, False), (u'Jacks ward, or, The boy guardian', u'Alger, Horatio', 8456L, False), (u'Within the enemys lines', u'Adams, William Tq', 9148L, True), (u'On the blockade', u'Adams, William Tq', 9149L, True), (u'Wait and hope, or, Ben Bradfords motto', u'Alger, Horatio', 8458L, False), (u'Erlach court', u'Schubin, Ossip', 8563L, False), (u'A brief history of the United States', u'Steele, Joel Dorman', 7338L, False), (u'Don Gordons shooting-box', u'Fosdick, Charles Austin', 8435L, False), (u'Jack Hazard and his fortunes', u'Trowbridge, J T', 8377L, False), (u'The fast mail', u'Drysdale, William', 10962L, False), (u'Prince Tip-Top', u'Bouvet, Marguerite', 9174L, False), (u'The story of Babette', u'Stuart, Ruth McEnery', 9650L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'Helping himself, or, Grant Thorntons ambition', u'Alger, Horatio', 8479L, False), (u'Bound to rise', u'Alger, Horatio', 8452L, False), (u'The childrens wonder book', u'', 9657L, False), (u'The young circus rider, or, The mystery of Robert Rudd', u'Alger, Horatio', 8475L, False), (u'The jo-boat boys', u'Cowan, John F', 10558L, False), (u'All adrift, or, The Goldwing Club', u'Adams, William Tq', 10226L, False), (u'Snug Harbor, or, The Champlain mechanics', u'Adams, William Tq', 10227L, False), (u'Square and compasses', u'Adams, William Tq', 10228L, False), (u'Papers relating to the foreign relations of the United States', u'United States Dept of State', 8256L, False), (u'Little Saint Elizabeth', u'Burnett, Frances Hodgson', 7703L, False), (u'Elsie Dinsmore', u'Finley, Martha', 9196L, False), (u'Go-ahead', u'Fosdick, Charles Austin', 11865L, False), (u'Frank in the woods', u'Fosdick, Charles Austin', 11849L, False), (u'No moss, or, The career of a rolling stone', u'Fosdick, Charles Austin', 11864L, False), (u'Through by daylight', u'Adams, William Tq', 11500L, False), (u'The story of a bad boy', u'Aldrich, Thomas Bailey', 9161L, True), (u'At war with Pontiac', u'', 9599L, False), (u'The domestic blunders of women, by a mere man', u'Moore, Augustus', 12143L, False), (u'Harpers round table', u'', 11145L, False), (u'Tom Thatchers fortune', u'Alger, Horatio', 11890L, False), (u'The store boy', u'Alger, Horatio', 11882L, False), (u'Dan, the newsboy', u'Alger, Horatio', 10430L, False), (u'Joe Wayring at home, or, The adventures of a fly-rod', u'Fosdick, Charles Austin', 8431L, False), (u'Dorsey the young inventor', u'Ellis, Edward Sylvester', 11480L, False), (u'The last of the Mohicans', u'Cooper, James Fenimore', 9667L, False), (u'A woman tenderfoot', u'Seton-Thompson, Grace Gallatin', 12629L, False), (u'The prisoner of Zenda', u'Hawkins, Anthony Hope', 11429L, False), (u'Billy Baxters letters', u'Kountz, William J', 12617L, False), (u'The other fellow', u'Smith, Francis Hopkinson', 11779L, True), (u'A popular history of the United States of America', u'Ridpath, John Clark', 8535L, False), (u'George in camp', u'Fosdick, Charles Austin', 11866L, False), (u'Tony, the hero', u'Alger, Horatio', 11879L, False), (u'Hoosier schoolboy', u'Eggleston, Edward', 9623L, False), (u'Making fate', u'Alden, Isabella Macdonald', 10932L, False), (u'The reign of law', u'Allen, James Lane', 11914L, True), (u'Lorraine', u'Chambers, Robert W', 12422L, False), (u'Alice of old Vincennes', u'Thompson, Maurice', 12194L, False), (u'Little men', u'Alcott, Louisa May', 11835L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False), (u'Donald and Dorothy', u'Dodge, Mary Mapes', 11672L, False)], {False: 99, True: 14}, (1L, 36L, u'Male')]

Create the data for the bubble graphs

In [31]:
from collections import defaultdict
import numpy as np

book_demographics = defaultdict(list)

for pn, p in enumerate(all_patrons):
    
    trust_this_one = p[-1][0]
    age = p[-1][1]
    gender = p[-1][2]
    
    if trust_this_one == 1:
        
        for b in p[3]:
            if b[-1] == True:
                book_demographics[b[:2]].append([age, gender])
                
print 'len(book_demographics)', len(book_demographics)

book_demographics_counts = defaultdict(dict)
author_demographics_counts = {}

for k, v in book_demographics.iteritems():
    
    all_genders = {'Male': 0, 'Female': 0}
    all_ages = []
        
    for a in v:
        all_ages.append(a[0])
        all_genders[a[1]] += 1
        
    book_demographics_counts[k] = {'Male': all_genders['Male'], 
                                    'Female': all_genders['Female'],
                                    'mean_age': np.mean(all_ages)}
    try:
        noop = author_demographics_counts[k[1]]
    except KeyError:
        author_demographics_counts[k[1]] = {'Male': 0, 'Female': 0, 'mean_age': 0.0, 'all_ages': []}
    
    author_demographics_counts[k[1]]['Male'] += all_genders['Male']
    author_demographics_counts[k[1]]['Female'] += all_genders['Female']
    author_demographics_counts[k[1]]['all_ages'] += all_ages

for k, v in author_demographics_counts.iteritems():
    author_demographics_counts[k]['mean_age'] = np.mean(author_demographics_counts[k]['all_ages'])
    
#for k, v in author_demographics_counts.iteritems():
    #print k, v
    
#    n_transactions = v['Male'] + v['Female']
#    pct_male = float(v['Male']) / n_transactions * 100
    
#    if v['Male'] == 0 or v['Female'] == 0:
#        continue
    
#    print '[' + \
#        ('%.2f' % pct_male) + \
#        ',' + \
#        ('%.2f' % v['mean_age']) + \
#        ',' + \
#        str(n_transactions) + \
#        ',' + \
#        '"' + k + '"],'
    
print
for k, v in book_demographics_counts.iteritems():
    print k, v


#print
#for k, v in book_demographics_counts.iteritems():
    
#    n_transactions = v['Male'] + v['Female']
#    pct_male = float(v['Male']) / n_transactions * 100
    
#    if v['Male'] == 0 or v['Female'] == 0:
#        continue
    
#    print '[' + \
#        ('%.2f' % pct_male) + \
#        ',' + \
#        ('%.2f' % v['mean_age']) + \
#        ',' + \
#        str(n_transactions) + \
#        ',' + \
#        '"' + k[1] + ': ' + k[0] + '"],'
len(book_demographics) 123

(u'St Elmo', u'Evans, Augusta J') {'Male': 65, 'Female': 134, 'mean_age': 24.34673366834171}
(u'The earth trembled', u'Roe, Edward Payson') {'Male': 30, 'Female': 92, 'mean_age': 25.10655737704918}
(u'Northern Georgia sketches', u'Harben, Will N') {'Male': 7, 'Female': 5, 'mean_age': 29.166666666666668}
(u'Dorothy South', u'Eggleston, George Cary') {'Male': 6, 'Female': 8, 'mean_age': 26.428571428571427}
(u'Eastover Court House', u'Boone, Henry Burnham') {'Male': 15, 'Female': 29, 'mean_age': 25.318181818181817}
(u'Miss Lou', u'Roe, Edward Payson') {'Male': 21, 'Female': 77, 'mean_age': 21.489795918367346}
(u'To have and to hold', u'Johnston, Mary') {'Male': 28, 'Female': 69, 'mean_age': 25.195876288659793}
(u'Pocahontas', u'Musick, John R') {'Male': 26, 'Female': 6, 'mean_age': 22.03125}
(u'Nemesis', u'Harland, Marion') {'Male': 34, 'Female': 76, 'mean_age': 26.654545454545456}
(u'The puritan and his daughter', u'Paulding, James Kirke') {'Male': 0, 'Female': 6, 'mean_age': 42.666666666666664}
(u'Infelice', u'Evans, Augusta J') {'Male': 30, 'Female': 81, 'mean_age': 27.792792792792792}
(u'The grapes of wrath', u'Norris, Mary Harriott') {'Male': 9, 'Female': 24, 'mean_age': 27.545454545454547}
(u'Throckmorton', u'Seawell, Molly Elliot') {'Male': 0, 'Female': 4, 'mean_age': 33.25}
(u'Aftermath', u'Allen, James Lane') {'Male': 13, 'Female': 36, 'mean_age': 30.020408163265305}
(u'Prisoners of hope', u'Johnston, Mary') {'Male': 30, 'Female': 87, 'mean_age': 26.384615384615383}
(u'With Lee in Virginia', u'Henty, G A') {'Male': 47, 'Female': 12, 'mean_age': 19.084745762711865}
(u'Pocahontas', u'Eggleston, Edward') {'Male': 3, 'Female': 0, 'mean_age': 12.333333333333334}
(u'In circling camps', u'Altsheler, Joseph A') {'Male': 12, 'Female': 18, 'mean_age': 23.7}
(u'A victorious union', u'Adams, William Tq') {'Male': 88, 'Female': 15, 'mean_age': 18.37864077669903}
(u'The choir invisible', u'Allen, James Lane') {'Male': 34, 'Female': 85, 'mean_age': 26.689075630252102}
(u'Down the river', u'Adams, William Tq') {'Male': 90, 'Female': 28, 'mean_age': 18.78813559322034}
(u'The planters northern bride', u'Hentz, Caroline Lee') {'Male': 52, 'Female': 119, 'mean_age': 22.976608187134502}
(u'Adventures of Huckleberry Finn', u'Clemens, Samuel Langhorne') {'Male': 70, 'Female': 16, 'mean_age': 18.546511627906977}
(u'In connection with the De Willoughby claim', u'Burnett, Frances Hodgson') {'Male': 16, 'Female': 34, 'mean_age': 27.56}
(u'Frank on a gun-boat', u'Fosdick, Charles Austin') {'Male': 197, 'Female': 42, 'mean_age': 16.778242677824267}
(u'Bear and forbear', u'Adams, William Tq') {'Male': 96, 'Female': 28, 'mean_age': 17.870967741935484}
(u'Hearts courageous', u'Rives, Hallie Erminie') {'Male': 4, 'Female': 8, 'mean_age': 20.166666666666668}
(u'The blue-grass region of Kentucky', u'Allen, James Lane') {'Male': 10, 'Female': 21, 'mean_age': 26.774193548387096}
(u'Ramona', u'Jackson, Helen Hunt') {'Male': 7, 'Female': 32, 'mean_age': 26.94871794871795}
(u'Louisisana', u'Burnett, Frances Hodgson') {'Male': 32, 'Female': 79, 'mean_age': 23.936936936936938}
(u'A fools errand', u'Tourge\u0301e, Albion Winegar') {'Male': 2, 'Female': 2, 'mean_age': 29.5}
(u'Guert Ten Eyck', u'Stoddard, William Osborn') {'Male': 49, 'Female': 16, 'mean_age': 17.96923076923077}
(u'Two little Confederates', u'Page, Thomas Nelson') {'Male': 48, 'Female': 34, 'mean_age': 19.9390243902439}
(u'The boys of 61', u'Coffin, Charles Carleton') {'Male': 33, 'Female': 5, 'mean_age': 19.973684210526315}
(u'Lena Rivers', u'Holmes, Mary Jane') {'Male': 34, 'Female': 84, 'mean_age': 22.059322033898304}
(u'The cavalier', u'Cable, George Washington') {'Male': 5, 'Female': 15, 'mean_age': 23.95}
(u'Moriahs mourning', u'Stuart, Ruth McEnery') {'Male': 12, 'Female': 14, 'mean_age': 24.46153846153846}
(u'Linda', u'Hentz, Caroline Lee') {'Male': 12, 'Female': 45, 'mean_age': 25.31578947368421}
(u'The young lieutenant, or The adventures of an army officer', u'Adams, William Tq') {'Male': 81, 'Female': 14, 'mean_age': 17.263157894736842}
(u'The head of a hundred', u'Goodwin, Maud Wilder') {'Male': 15, 'Female': 23, 'mean_age': 31.105263157894736}
(u'Africa and the American flag', u'Foote, Andrew H') {'Male': 3, 'Female': 0, 'mean_age': 12.666666666666666}
(u'The Kentuckians', u'Fox, John') {'Male': 24, 'Female': 47, 'mean_age': 26.91549295774648}
(u'The generals double', u'King, Charles') {'Male': 21, 'Female': 39, 'mean_age': 26.133333333333333}
(u'The strength of Gideon, and other stories', u'Dunbar, Paul Laurence') {'Male': 7, 'Female': 13, 'mean_age': 24.3}
(u'Maryland manor', u'Emory, Frederic') {'Male': 5, 'Female': 31, 'mean_age': 24.77777777777778}
(u'John March, Southerner', u'Cable, George Washington') {'Male': 0, 'Female': 2, 'mean_age': 36.0}
(u'The soldier boy', u'Adams, William Tq') {'Male': 1, 'Female': 0, 'mean_age': 14.0}
(u'Macaria', u'Evans, Augusta J') {'Male': 22, 'Female': 66, 'mean_age': 25.045454545454547}
(u'Elsies womanhood', u'Finley, Martha') {'Male': 42, 'Female': 185, 'mean_age': 19.94713656387665}
(u'The voice of the people', u'Glasgow, Ellen Anderson Gholson') {'Male': 15, 'Female': 23, 'mean_age': 26.736842105263158}
(u'Stand by the Union', u'Adams, William Tq') {'Male': 98, 'Female': 27, 'mean_age': 18.888}
(u'The red badge of courage', u'Crane, Stephen') {'Male': 10, 'Female': 21, 'mean_age': 26.419354838709676}
(u'The three beauties', u'Southworth, Emma Dorothy Eliza Nevitte') {'Male': 23, 'Female': 43, 'mean_age': 27.060606060606062}
(u'The Orpheus C Kerr papers', u'Newell, R H') {'Male': 1, 'Female': 0, 'mean_age': 15.0}
(u'The boys of 76', u'Coffin, Charles Carleton') {'Male': 53, 'Female': 15, 'mean_age': 20.220588235294116}
(u'The battle of New York', u'Stoddard, William Osborn') {'Male': 24, 'Female': 3, 'mean_age': 17.333333333333332}
(u'Red Rock', u'Page, Thomas Nelson') {'Male': 35, 'Female': 82, 'mean_age': 27.521367521367523}
(u'Franks campaign, or, The farm and the camp', u'Alger, Horatio') {'Male': 43, 'Female': 18, 'mean_age': 19.721311475409838}
(u'On the winning side', u'Walworth, Jeannette H') {'Male': 11, 'Female': 46, 'mean_age': 26.19298245614035}
(u'Beulah', u'Evans, Augusta J') {'Male': 30, 'Female': 92, 'mean_age': 25.434426229508198}
(u'The crisis', u'Churchill, Winston') {'Male': 16, 'Female': 33, 'mean_age': 30.714285714285715}
(u'Nature and human nature', u'Haliburton, Thomas Chandler') {'Male': 6, 'Female': 2, 'mean_age': 35.0}
(u'Swallow barn', u'Kennedy, John Pendleton') {'Male': 0, 'Female': 3, 'mean_age': 32.0}
(u'True to his colors', u'Fosdick, Charles Austin') {'Male': 101, 'Female': 16, 'mean_age': 16.333333333333332}
(u'Science in story', u'Foote, Edward B') {'Male': 3, 'Female': 2, 'mean_age': 15.0}
(u'Tom, the bootblack', u'Alger, Horatio') {'Male': 48, 'Female': 25, 'mean_age': 15.986301369863014}
(u'The iron game', u'Keenan, Henry F') {'Male': 2, 'Female': 4, 'mean_age': 24.333333333333332}
(u'A Kentucky cardinal', u'Allen, James Lane') {'Male': 26, 'Female': 72, 'mean_age': 25.816326530612244}
(u'On the plantation', u'Harris, Joel Chandler') {'Male': 8, 'Female': 23, 'mean_age': 25.870967741935484}
(u'The Berkeleys and their neighbors', u'Seawell, Molly Elliot') {'Male': 0, 'Female': 1, 'mean_age': 24.0}
(u'Men of iron', u'Pyle, Howard') {'Male': 31, 'Female': 7, 'mean_age': 20.31578947368421}
(u'Elsie Dinsmore', u'Finley, Martha') {'Male': 28, 'Female': 128, 'mean_age': 18.94871794871795}
(u'From school to battlefield', u'King, Charles') {'Male': 47, 'Female': 13, 'mean_age': 17.916666666666668}
(u'Rodney the partisan', u'Fosdick, Charles Austin') {'Male': 54, 'Female': 10, 'mean_age': 18.53125}
(u'In connection with the DeWilloughby claim', u'Burnett, Frances Hodgson') {'Male': 4, 'Female': 12, 'mean_age': 30.75}
(u'Marcy, the blockade-runner', u'Fosdick, Charles Austin') {'Male': 77, 'Female': 12, 'mean_age': 16.876404494382022}
(u'Knights in fustian', u'Brown, Caroline') {'Male': 16, 'Female': 48, 'mean_age': 28.09375}
(u'His sombre rivals', u'Roe, Edward Payson') {'Male': 35, 'Female': 109, 'mean_age': 25.145833333333332}
(u'Taken by the enemy', u'Adams, William Tq') {'Male': 94, 'Female': 21, 'mean_age': 17.860869565217392}
(u'My Kalulu, prince, king, and slave', u'Stanley, Henry M') {'Male': 28, 'Female': 8, 'mean_age': 22.944444444444443}
(u'In old Virginia', u'Page, Thomas Nelson') {'Male': 18, 'Female': 33, 'mean_age': 25.392156862745097}
(u'My Apingi kingdom', u'Du Chaillu, Paul B') {'Male': 31, 'Female': 9, 'mean_age': 17.9}
(u'Within the enemys lines', u'Adams, William Tq') {'Male': 120, 'Female': 21, 'mean_age': 17.602836879432623}
(u'Stringtown on the pike', u'Lloyd, John Uri') {'Male': 9, 'Female': 24, 'mean_age': 24.848484848484848}
(u'Bear and forbear, or, The young skipper of Lake Ucayga', u'Adams, William Tq') {'Male': 0, 'Female': 1, 'mean_age': 24.0}
(u'The bondwoman', u'Ryan, Marah Ellis') {'Male': 13, 'Female': 35, 'mean_age': 25.958333333333332}
(u'Between the lines', u'King, Charles') {'Male': 33, 'Female': 51, 'mean_age': 24.05952380952381}
(u'The legionaries', u'Clark, Henry Scott') {'Male': 15, 'Female': 27, 'mean_age': 27.69047619047619}
(u'Union', u'Musick, John R') {'Male': 17, 'Female': 3, 'mean_age': 18.15}
(u'The other fellow', u'Smith, Francis Hopkinson') {'Male': 16, 'Female': 22, 'mean_age': 30.0}
(u'Horse Shoe Robinson', u'Kennedy, John Pendleton') {'Male': 7, 'Female': 0, 'mean_age': 25.571428571428573}
(u'The young marooners, on the Florida Coast', u'Goulding, F R') {'Male': 3, 'Female': 1, 'mean_age': 15.75}
(u'Flute and violin and other Kentucky tales and romances', u'Allen, James Lane') {'Male': 13, 'Female': 25, 'mean_age': 24.394736842105264}
(u'Fighting for the right', u'Adams, William Tq') {'Male': 87, 'Female': 22, 'mean_age': 18.55045871559633}
(u'Winning his way', u'Coffin, Charles Carleton') {'Male': 79, 'Female': 26, 'mean_age': 16.895238095238096}
(u'What answer?', u'Dickinson, Anna E') {'Male': 2, 'Female': 4, 'mean_age': 18.166666666666668}
(u'Robert Graham', u'Hentz, Caroline Lee') {'Male': 1, 'Female': 0, 'mean_age': 40.0}
(u'Marian Grey', u'Holmes, Mary Jane') {'Male': 16, 'Female': 45, 'mean_age': 24.0327868852459}
(u'On the wing of occasions', u'Harris, Joel Chandler') {'Male': 5, 'Female': 9, 'mean_age': 25.0}
(u'Elsies motherhood', u'Finley, Martha') {'Male': 27, 'Female': 90, 'mean_age': 20.358974358974358}
(u'A Kentucky colonel', u'Read, Opie Percival') {'Male': 31, 'Female': 27, 'mean_age': 25.344827586206897}
(u'The old gentleman of the black stock', u'Page, Thomas Nelson') {'Male': 10, 'Female': 20, 'mean_age': 24.733333333333334}
(u'Belle Scott', u'Jolliffe, John') {'Male': 6, 'Female': 8, 'mean_age': 23.142857142857142}
(u'Marcy, the refugee', u'Fosdick, Charles Austin') {'Male': 99, 'Female': 20, 'mean_age': 15.974789915966387}
(u'On the blockade', u'Adams, William Tq') {'Male': 108, 'Female': 16, 'mean_age': 17.629032258064516}
(u'A war-time wooing', u'King, Charles') {'Male': 31, 'Female': 74, 'mean_age': 23.695238095238096}
(u'Frank on the lower Mississippi', u'Fosdick, Charles Austin') {'Male': 172, 'Female': 35, 'mean_age': 16.95169082125604}
(u'True as steel', u'Harland, Marion') {'Male': 26, 'Female': 81, 'mean_age': 23.813084112149532}
(u'The reign of law', u'Allen, James Lane') {'Male': 17, 'Female': 34, 'mean_age': 26.568627450980394}
(u'The hearts highway', u'Freeman, Mary Eleanor Wilkins') {'Male': 11, 'Female': 52, 'mean_age': 22.444444444444443}
(u'A royal gentleman', u'Tourge\u0301e, Albion Winegar') {'Male': 8, 'Female': 25, 'mean_age': 31.666666666666668}
(u'The death-shot', u'Reid, Mayne') {'Male': 26, 'Female': 6, 'mean_age': 17.78125}
(u'Daughter of the elm', u'Hall, Granville Davisson') {'Male': 5, 'Female': 5, 'mean_age': 37.0}
(u'The house behind the cedars', u'Chesnutt, Charles Waddell') {'Male': 15, 'Female': 39, 'mean_age': 24.685185185185187}
(u'Inez', u'Evans, Augusta J') {'Male': 17, 'Female': 54, 'mean_age': 27.47887323943662}
(u'Daisy', u'Warner, Susan') {'Male': 13, 'Female': 30, 'mean_age': 25.790697674418606}
(u'The story of a bad boy', u'Aldrich, Thomas Bailey') {'Male': 88, 'Female': 26, 'mean_age': 17.710526315789473}
(u'Warwick of the Knobs', u'Lloyd, John Uri') {'Male': 16, 'Female': 31, 'mean_age': 25.72340425531915}
(u'From Atlanta to the sea', u'Dunn, Byron A') {'Male': 12, 'Female': 4, 'mean_age': 21.4375}
(u'Herman, or, Young knighthood', u'Foxton, E') {'Male': 16, 'Female': 31, 'mean_age': 26.27659574468085}
(u'Tempest and sunshine', u'Holmes, Mary Jane') {'Male': 29, 'Female': 63, 'mean_age': 21.619565217391305}
(u'Cudjos cave', u'Trowbridge, J T') {'Male': 31, 'Female': 11, 'mean_age': 22.785714285714285}
(u'On General Thomass staff', u'Dunn, Byron A') {'Male': 14, 'Female': 8, 'mean_age': 14.818181818181818}
In [25]:
print all_patrons[0][:3], all_patrons[0][-1], all_patrons[0][-2]
#print all_patrons[0][3]
[6L, u'Albert', u'Carpenter'] (1L, 36L, u'Male') {False: 99, True: 14}

Try some basic clustering and correlation . . .

In [26]:
from collections import defaultdict, Counter

flag_counts = defaultdict(int)

checkouts = []

for pn, p in enumerate(all_patrons):
    
    what_reader_read = []
    this_readers_checkouts = []
    
    for b in p[3]:
        #this_readers_checkouts.append(b[0] + '. ' + b[1] + '.')
        if b[3] == True:
            this_readers_checkouts.append(b[1] + '.')
        
    this_readers_checkouts = sorted(list(set(this_readers_checkouts)))
    
    if len(this_readers_checkouts) > 1:
        checkouts.append(this_readers_checkouts)
    
print 'len(checkouts)', len(checkouts)
print
print checkouts[0]
len(checkouts) 1742

[u'Adams, William Tq.', u'Aldrich, Thomas Bailey.', u'Allen, James Lane.', u'Evans, Augusta J.', u'Roe, Edward Payson.', u'Smith, Francis Hopkinson.']
In [27]:
from gensim import corpora, models, similarities

dictionary = corpora.Dictionary(checkouts)
corpus = [dictionary.doc2bow(text) for text in checkouts]
In [28]:
from gensim.matutils import corpus2dense

matrix = corpus2dense(corpus, len(dictionary))

print 'matrix.shape', matrix.shape

matrix = matrix.T

print 'matrix.shape', matrix.shape
matrix.shape (71, 1742)
matrix.shape (1742, 71)
In [29]:
%matplotlib inline
import matplotlib.pyplot as plt
from gensim.matutils import corpus2dense
import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
results = pca.fit_transform(matrix)

print
print 'X VARIANCE'

x_values = []
for k in sorted(dictionary.keys()):
    x_values.append([pca.components_[0][k], dictionary[k]])
x_values.sort()

print
for x in x_values[:10]:
    print x[0], x[1]

print
for x in x_values[-10:]:
    print x[0], x[1]

print
print 'Y VARIANCE'

y_values = []
for k in sorted(dictionary.keys()):
    y_values.append([pca.components_[1][k], dictionary[k]])
y_values.sort()

print
for y in y_values[:10]:
    print y[0], y[1]

print
for y in y_values[-10:]:
    print y[0], y[1]
    
    
print
print 'explained_variance_ratio_', pca.explained_variance_ratio_

x = []
y = []
for r in results:
    x.append(r[0])
    y.append(r[1])

plt.figure(figsize=(12,12))

plt.title('READERS')

plt.xlabel('principal component 1')
plt.ylabel('principal component 2')

plt.ylim(-0.35, 0.45)
plt.xlim(-0.55, 0.65)

plt.scatter(x, y, s=50, alpha=.5)
X VARIANCE

-0.53477705 Fosdick, Charles Austin.
-0.4652385 Adams, William Tq.
-0.2276858 Coffin, Charles Carleton.
-0.14182778 Aldrich, Thomas Bailey.
-0.12076008 Stoddard, William Osborn.
-0.11672985 Alger, Horatio.
-0.09100121 Clemens, Samuel Langhorne.
-0.064173765 Henty, G A.
-0.05292368 Trowbridge, J T.
-0.047757063 Reid, Mayne.

0.050156187 Freeman, Mary Eleanor Wilkins.
0.065768264 Finley, Martha.
0.14188142 Holmes, Mary Jane.
0.14216243 Burnett, Frances Hodgson.
0.14618126 Hentz, Caroline Lee.
0.16367756 Johnston, Mary.
0.18952315 Harland, Marion.
0.19805679 Allen, James Lane.
0.24551004 Roe, Edward Payson.
0.3383056 Evans, Augusta J.

Y VARIANCE

-0.34402698 Holmes, Mary Jane.
-0.29012182 Roe, Edward Payson.
-0.27996445 Evans, Augusta J.
-0.2636945 Hentz, Caroline Lee.
-0.18894713 Finley, Martha.
-0.15801626 Harland, Marion.
-0.09317814 Fosdick, Charles Austin.
-0.073913135 Southworth, Emma Dorothy Eliza Nevitte.
-0.057344705 Foxton, E.
-0.039655745 Adams, William Tq.

0.07686005 Ryan, Marah Ellis.
0.07868098 Boone, Henry Burnham.
0.084773056 Brown, Caroline.
0.100491494 Chesnutt, Charles Waddell.
0.10537118 Freeman, Mary Eleanor Wilkins.
0.12693001 Lloyd, John Uri.
0.20341319 Burnett, Frances Hodgson.
0.30801192 Johnston, Mary.
0.3530637 Page, Thomas Nelson.
0.45173246 Allen, James Lane.

explained_variance_ratio_ [0.11600465 0.06910527]
Out[29]:
<matplotlib.collections.PathCollection at 0x7f788853fed0>
In [30]:
from gensim.matutils import corpus2dense
from scipy.stats import pearsonr

correlation_results = []

for a in range(0, len(dictionary) - 1):
    
    a_array = matrix[:,a]
    
    for b in range(a + 1, len(dictionary)):
        
        #if a == b:
        #    continue
    
        b_array = matrix[:,b]
        
        correlation_results.append([pearsonr(a_array, b_array)[0],  dictionary[a],  dictionary[b]])
        
correlation_results.sort(reverse=True)

print
for c in correlation_results[:10]:
    print c

print
for c in correlation_results[-10:]:
    print c
[0.46001825, u'Adams, William Tq.', u'Fosdick, Charles Austin.']
[0.3528419, u'Baker, Samuel White.', u'Dickinson, Anna E.']
[0.3006442, u'Seawell, Molly Elliot.', u'Baker, Samuel White.']
[0.2810167, u'Adams, William Tq.', u'Stoddard, William Osborn.']
[0.27759272, u'Fosdick, Charles Austin.', u'Coffin, Charles Carleton.']
[0.2496033, u'Adams, William Tq.', u'Coffin, Charles Carleton.']
[0.24178106, u'Adams, William Tq.', u'Aldrich, Thomas Bailey.']
[0.22713684, u'Boone, Henry Burnham.', u'Glasgow, Ellen Anderson Gholson.']
[0.2269212, u'Aldrich, Thomas Bailey.', u'Coffin, Charles Carleton.']
[0.22039758, u'Allen, James Lane.', u'Johnston, Mary.']

[-0.12713929, u'Burnett, Frances Hodgson.', u'Fosdick, Charles Austin.']
[-0.13480136, u'Adams, William Tq.', u'Harland, Marion.']
[-0.13529739, u'Adams, William Tq.', u'Allen, James Lane.']
[-0.14282578, u'Adams, William Tq.', u'Evans, Augusta J.']
[-0.14334886, u'Fosdick, Charles Austin.', u'Harland, Marion.']
[-0.14879398, u'Roe, Edward Payson.', u'Fosdick, Charles Austin.']
[-0.15869786, u'Johnston, Mary.', u'Fosdick, Charles Austin.']
[-0.16911077, u'Adams, William Tq.', u'Roe, Edward Payson.']
[-0.17570198, u'Allen, James Lane.', u'Fosdick, Charles Austin.']
[-0.20156749, u'Evans, Augusta J.', u'Fosdick, Charles Austin.']