Jane-Eyre-Effekt

In what follows, I'm working from two critical/theoretical sources: 1) in her reseach proposal titled "The Purchase of Romans in a Time of Inequality, 1847-1920" (circulated during the summer, 2016), Professor Tatlock wrote that the wanted to "trace the presence of what I am provisionally labeling the 'Jane-Eyre-Effekt'", and 2) Andrew Piper's paper, "The Werther Effect I: Goethe, Objecthood, and the Handling of Knowledge," to which Professor Tatlock pointed in her research proposal.

Andrew Piper's essay describes a straight-forward method. He defines a texts' "Wetherness" as a function of "the relative presence or absence of a set of words within them drawn from Werther." He begins by identifying 91 "most frequent significant words" ("significant" meaning, not in a list of stopwords). He does not lemmatize these words. Using his list of 91 words, he contructs a document-term matrix, then used the matrix to compute the Euclidian distance between the documents. Texts which are close share a similar distribution of the 91 words.

Piper goes on to visualize his results, although the form of his visualization isn't necessary: the distances alone are sufficient communicate the relative "Wertherness" of texts.

This notebook follows that method, more or less, although there are some important differences:

  • This notebook includes 1,096 texts in its corpus (many more than Piper looks at); the corpus is the fiction which circulated in the Muncie Public Library in the lare 19th century.
  • I focus on various combinations of nouns, verbs, adjectives and adverbs. This version of the notebook uses all nouns (but not proper nouns or pronouns) whether they occur in JE or not, although it could easily be run on other combinations of part of speech and word frequencies.
  • Instead of pushing multi-dimensional distance matrix data down to two dimensions, I prefer to look at distances from novels of interest.

Please note that, despite the title of this notebook, I am not actually testing for "effect". Instead, I'm testing for textual similarities.

Declarations

Note that I'm not using the current version of spacy, which seems to be buggy on my platform.

In [1]:
import glob, re, codecs, json, glob
from collections import defaultdict, Counter

INPUT_FOLDER = '/home/spenteco/0/muncie_public_library_corpus/PG_no_backmatter_fiction/'

import spacy
nlp = spacy.load('en')
print spacy.__version__
1.9.0

The standard "get text" function

Note that I can pass in several lists to select specific parts of speech, to drop some lemma, and to include only certain words in the results; if this last list is empty, then all words for selected part(s) of speech are included.

In [2]:
def get_document(path_to_file, selected_part_of_speech, lemma_to_drop, words_to_select):
    
    text = codecs.open(path_to_file, 'r', encoding='utf-8').read()
    
    doc = nlp(unicode(text))
    
    document = []
    
    for t in doc:
        if t.pos_ in selected_part_of_speech and t.lemma_.lower() not in lemma_to_drop:
            if len(words_to_select) == 0 or t.lemma_.lower() in words_to_select:
                document.append(t.lemma_.lower())
            
    return document

What are the most frequent words in Jane Eyre?

This code, in addition to producing a nice report, also pulls lists for use in later steps, should we want to restrict our analysis to a set of most frequent words, much as Piper does.

Many of the words which have interested us in the past (eye, face, hand, voice) are among the most common words in JE. I was more than a little surprised to see that "eye" is the most common noun in the novel!

However, I should have been less surprised: in another notebook, I learned that the top 25 words in JE each appear in almost every novel in the corpus. I.e., these are common words not just in JE, but in the corpus.

Note that I do use je_common_nouns in the last cell of the notebook, although I use it only to draw attention to words, and not for any calculation.

What did I learn in this step?

It's rather astounding how often words we've been interested in occur in JE. I would not have guessed, for example, that "eye" is the most common (non-proper) noun in the novel, nor that words relating to time appear so high on the list.

In [3]:
def pad_width(s, width):

    new_s = s
    while len(new_s) < width:
        new_s += ' '
    return new_s

def count_most_frequent_words_in_je(part_of_speech, n_to_list, words_to_select):

    je_doc = get_document(INPUT_FOLDER + 'Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', 
                               [part_of_speech], ['-pron-', 'what', 'who'], words_to_select)

    je_counts = defaultdict(int)
    for a in je_doc:
        je_counts[a] += 1

    print
    print part_of_speech
    print
    
    most_frequent_words = []
    n_printed = 0
    for w in Counter(je_counts).most_common(n_to_list):
        
        most_frequent_words.append(w[0])
        
        print pad_width(w[0] + ' ' + str(w[1]), 20),
        
        n_printed += 1
        if n_printed > 0 and n_printed % 5 == 0:
            print
    
    print
           
        
    return most_frequent_words
        
# -------------------------------------------------------------

je_common_nouns = count_most_frequent_words_in_je('NOUN', 250, [])
je_common_verbs = count_most_frequent_words_in_je('VERB', 50, [])
je_common_adj = count_most_frequent_words_in_je('ADJ', 25, [])
je_common_adv = count_most_frequent_words_in_je('ADV', 25, [])
NOUN

eye 304              day 303              sir 291              room 283             time 274            
hand 257             night 227            face 205             door 204             word 189            
heart 185            man 178              house 174            hour 165             nothing 165         
life 163             lady 161             way 157              thing 144            head 141            
something 136        child 134            voice 131            fire 129             one 127             
girl 125             morning 124          moment 124           woman 117            mind 112            
year 108             place 102            bed 100              feeling 97           arm 96              
minute 95            light 93             thought 92           side 91              window 91           
evening 91           school 89            master 88            name 86              love 85             
table 85             book 83              chair 82             wife 82              world 81            
sister 80            nature 79            pleasure 78          gentleman 78         lip 75              
foot 74              sort 72              servant 72           point 71             answer 70           
hair 68              step 66              people 64            half 64              spirit 64           
question 64          home 64              air 63               friend 62            whom 62             
idea 61              part 61              water 60             candle 59            week 59             
wall 59              power 58             anything 58          sense 58             death 57            
look 55              order 55             tone 55              hope 55              hall 55             
character 54         person 54            mother 54            silence 54           tree 53             
father 53            course 52            chamber 52           work 52              end 52              
ear 52               brother 52           wind 51              matter 50            mine 50             
road 50              fear 50              feature 49           seat 49              soul 48             
curtain 48           month 48             ground 46            glance 46            manner 46           
chapter 46           hill 46              effort 45            sound 45             rest 45             
stranger 45          sky 45               flower 44            glass 43             smile 43            
moon 43              tear 42              field 42             bread 42             strength 42         
presence 42          afternoon 41         form 41              none 41              letter 41           
reader 40            dress 40             governess 40         business 40          scene 40            
pupil 40             picture 40           teacher 40           family 40            blood 39            
self 39              doubt 39             conversation 39      care 39              horse 39            
cloud 39             church 39            mile 38              existence 38         circumstance 38     
gate 38              case 38              fortune 38           dream 38             truth 37            
daughter 37          dinner 37            breakfast 37         wood 37              tea 37              
cheek 37             distance 36          brow 36              stone 36             drawing 36          
change 36            husband 36           interest 35          rain 35              shoulder 35         
kind 35              bell 35              state 35             beauty 34            tale 34             
knee 34              bird 34              age 34               trouble 34           wish 34             
carriage 34          movement 33          garden 33            sight 33             finger 33           
kitchen 33           schoolroom 33        sun 33               task 32              morrow 32           
ma'am 32             reason 32            countenance 31       money 31             forehead 31         
deal 31              figure 31            mistress 30          pain 30              affection 30        
mama 30              account 30           passion 30           flesh 30             nurse 30            
fault 30             prayer 30            gallery 30           other 30             subject 29          
delight 29           class 29             nursery 29           hearth 29            o'clock 29          
reply 29             duty 29              taste 29             will 29              marriage 29         
dog 28               line 28              earth 28             attention 28         walk 28             
party 28             language 27          company 27           town 27              pound 27            
habit 27             aid 27               uncle 27             bonnet 27            position 26         
content 26           object 26            expression 26        creature 26          charm 26            
society 26           path 26              return 26            interval 26          parlour 26          
service 25           being 25             opinion 25           pity 25              liberty 25          


VERB

be 7083              have 2931            do 1202              say 834              would 681           
will 671             see 566              go 536               could 506            come 456            
think 447            look 437             know 396             can 380              take 373            
make 345             must 309             may 293              should 289           give 286            
hear 281             feel 273             seem 269             leave 254            tell 247            
shall 240            ask 228              get 221              find 213             sit 205             
speak 193            stand 175            wish 169             rise 167             turn 160            
put 158              pass 155             like 150             want 144             let 136             
keep 136             live 134             call 129             return 117           love 114            
draw 108             answer 107           bring 99             talk 98              enter 96            


ADJ

which 596            good 365             little 330           such 255             that 215            
own 200              more 157             other 145            all 140              last 139            
great 128            old 119              new 108              long 107             strange 105         
first 104            much 102             young 95             sure 92              few 88              
low 84               full 82              black 81             dark 81              large 75            


ADV

not 1876             now 667              so 598               when 573             there 514           
then 493             very 361             never 295            how 278              only 266            
too 251              again 245            still 243            where 233            well 229            
more 206             as 204               here 195             yet 191              once 177            
just 162             ever 159             soon 153             long 140             away 135            

Extract "documents" for comparision

A "document" is a list of lemma selected from a text.

This version selects nouns whether they appear in JE or not (WORDS_TO_SELECT is empty), drops pronouns.

Why nouns? I believe that nouns carry most of the semantic content of text; they seem to have more lexical variety than other parts of speech.

Why all nouns? If, as I suspect, the nouns carry most of a text's semantic content, then including all of the nouns of a text is a more accurate reflection of its content than limiting the word list to just those words appearing in JE.

In [4]:
labels = []
documents = []

WORDS_TO_SELECT = []
#WORDS_TO_SELECT = je_common_nouns

for a, path_to_file in enumerate(glob.glob(INPUT_FOLDER + '*.txt')):
    
    if a % 100 == 0:
        print 'processing', a
        
    labels.append(path_to_file.split('/')[-1])
    documents.append(get_document(path_to_file, 
                        ['NOUN'], 
                        ['-pron-', 'what', 'who'],
                        WORDS_TO_SELECT))

f = codecs.open('labels.js', 'w', encoding='utf-8')
f.write(json.dumps(labels))
f.close()
                     
f = codecs.open('documents.js', 'w', encoding='utf-8')
f.write(json.dumps(documents))
f.close()
processing 0
processing 100
processing 200
processing 300
processing 400
processing 500
processing 600
processing 700
processing 800
processing 900
processing 1000
processing 1100

Reload the labels and documents from the previous step

I'm doing this just so I can restart the notebook, since the previous step takes quite a bit of time to complete.

In [3]:
import codecs, json

f = codecs.open('labels.js', 'r', encoding='utf-8')
labels = json.loads(f.read())
f.close()

f = codecs.open('documents.js', 'r', encoding='utf-8')
documents = json.loads(f.read())
f.close()

Generate gensim (and gensim-like) corpora

I use as much of the gensim machinery as I can; the creation of a dictionary (a mapping to and from words and word ids), corpus (a set of word counts for each text), and MatrixSimilarity index are all boilerplate right from the gensim tutorials.

The wrinkle is corpus_tf. The gensim tutorials describe creating several kinds of corpora, of which the tf-idf corpus is closest to what we might use here. However, the gensim tf-idf process computes zero tf-idf scores for somes words (eye, day, time, hand, etc) which are both very common in JE and in the corpus as a whole (eye, for example, appears at least once in every text in the corpus). Because the gensim tf-idf process computes zero tf-idf scores for these words, they do not figure in the following distance calculations. Therefore, I create a simple term-frequency corpus, so that every word included in documents gets some sort of score.

In [6]:
from gensim import corpora, models, similarities

dictionary = corpora.Dictionary(documents)
corpus = [dictionary.doc2bow(text) for text in documents]

corpus_tf = []
for a in range(0, len(corpus)):
    new_row = []
    for b in corpus[a]:
        new_row.append([b[0], float(b[1]) / float(len(documents[a]))])
    corpus_tf.append(new_row)

index_tf = similarities.MatrixSimilarity(corpus_tf)

Computing similarity between one novel and the rest of the corpus

The process here is to 1) select a novel, then 2) lookup the similarity between it and every other novel in the corpus. In some cases, I list the 10 novels most similar to the selected novel, and the 10 novels most distant from the selected novel. And I always graph the distribution of similarity scores for the selected novel.

In this scheme, the higher the score, the more similar the novels. Novels with a similarity score of 1.0 are identical; a score of 0.0 means the novels have nothing in common.

Note that this information hearkens back to the "Manhattan" visualizations we constructed four or five years ago.

Also, please note that this sort of similarity information is usually presented in one of two ways: 1) as a heat map; I chose not to do that because the resulting visualization would be enormous; 2) as a network of one kind or another (e.g., as a hierarchical clustering diagram, or as a Voronoi diagram, as Piper does in the "The Werther Effect"); I did not do that because such visualizations flatten information about the similarites between novels.

I did experiment with hierarchical clustering (see below), where I discuss my dissatisfactions with it.

What did I learn in this step?

  • The method seems reasonable. I would, for example, expect that any novel by Charlotte Bronte would be similar to other novels by her, and the results bear that out. Similarly, I would have expected that JE would be very different from the Frank in/on a Thing novels, and that's what I see.

  • The from one novel to every other novel are skewed (see the graphs); it feels like there's a lot of JE like material in the corpus.

  • The JE and Gold Elsie relationship is interesting: there are 68 or so novels more similar to JE than Gold Elsie; however, there are only 10 novels which are more similar to Gold Elsie than JE. This suggests it's possible for Gold Elsie and Hypothetical Novel A to be equally similar to JE, but for Gold Elsie and Hypothetical Novel A to be not particularly close.

In [7]:
%matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats, integrate
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(color_codes=True)
plt.rcParams['figure.figsize']=(15,5)

def distances_from_one_novel_to_all_others(novel_label, print_details):

    for a in range(0, len(corpus_tf)):

        if labels[a].find(novel_label) == -1:
            continue

        sims = index_tf[corpus_tf[a]]

        sims = sorted(enumerate(sims), key=lambda item: -item[1])

        print
        print labels[a].upper()
        print

        all_sims_scores = []

        for b in range(1, len(sims)):

            all_sims_scores.append(sims[b][1])

            if b < 11 and print_details == True:

                print '\t', sims[b][1], labels[sims[b][0]]

        if print_details == True:
                
            print
            for b in range(1, len(sims)):
                if labels[sims[b][0]].find('Marlitt') > -1 or labels[sims[b][0]].find('Jane_Eyre') > -1:
                    print '\t', sims[b][1], ('(' + str(b) + ')'), labels[sims[b][0]]

            reversed_sims = []
            for s in sims:
                reversed_sims.append((s[1], s[0]))
            reversed_sims.sort()

            print
            for b in range(0, 10):
                print '\t', reversed_sims[b][0], labels[reversed_sims[b][1]]
        
            print
            print '\tmean', np.mean(all_sims_scores),  \
                    'median', np.median(all_sims_scores),  \
                    'std', np.std(all_sims_scores), \
                    'plus 1 std', (np.mean(all_sims_scores) + (1 * np.std(all_sims_scores)))
            print

        ax = sns.distplot(all_sims_scores, bins=100)
        ax.set(xlabel='sim score', ylabel='n sims')
        plt.show()
        
# ------------------------------------------------

distances_from_one_novel_to_all_others('Bront_Charlotte_Jane_Eyre', True)
distances_from_one_novel_to_all_others('Marlitt_E_Eugenie_Gold_Elsie', True)
distances_from_one_novel_to_all_others('Marlitt_E_Eugenie_At_the_Councillor', True)
distances_from_one_novel_to_all_others('Marlitt_OMS_Wister', True)
distances_from_one_novel_to_all_others('Malory_Thomas_Sir_King_Arthur', False)
distances_from_one_novel_to_all_others('Castlemon_Harry_Frank_in_the_Woods', False)
BRONT_CHARLOTTE_JANE_EYRE_AN_AUTOBIOGRAPHY_PG_1260.TXT

	0.91628 Bront_Charlotte_Shirley_PG_30486.txt
	0.906306 Bront_Charlotte_Villette_PG_9182.txt
	0.889028 Wood_Henry_Mrs_Verner_s_Pride_PG_15627.txt
	0.885551 Harland_Marion_Alone_PG_46505.txt
	0.881325 Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	0.880198 Bront_Charlotte_The_Professor_PG_1028.txt
	0.87613 Dickens_Charles_Little_Dorrit_PG_963.txt
	0.872083 Wood_Henry_Mrs_East_Lynne_PG_3322.txt
	0.871131 Dickens_Charles_Bleak_House_PG_1023.txt
	0.870276 Dickens_Charles_Dombey_and_Son_PG_821.txt

	0.821537 (69) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.805818 (111) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.801226 (125) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.79305 (153) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.764612 (276) Marlitt_Wister_Gisela.txt
	0.751629 (337) Marlitt_Wister_Owls.txt
	0.733419 (435) Marlitt_Wister_Schillingscourt.txt
	0.725105 (482) Marlitt_Wister_Baliff.txt
	0.671565 (679) Marlitt_Wister_Rubies.txt

	0.295975 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.312278 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.31812 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	0.323416 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.350273 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.360298 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.373243 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.38492 Dumas_Alexandre_Man_in_the_Iron_Mask_an_Essay_PG_2751.txt
	0.387923 Maclaren_Ian_The_Days_of_Auld_Lang_Syne_PG_43726.txt
	0.388081 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt

	mean 0.685713 median 0.707959 std 0.107867 plus 1 std 0.793580196798

MARLITT_E_EUGENIE_GOLD_ELSIE_PG_42426.TXT

	0.863071 Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.856397 Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	0.848516 Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	0.845684 Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.845043 Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	0.842751 Marlitt_Wister_Gisela.txt
	0.841481 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.840599 The_Second_Wife_Wister_corrected.txt
	0.833686 Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	0.827982 Dickens_Charles_Dombey_and_Son_PG_821.txt

	0.863071 (1) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.845684 (4) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.842751 (6) Marlitt_Wister_Gisela.txt
	0.841481 (7) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.821537 (11) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.810601 (28) Marlitt_Wister_Owls.txt
	0.787089 (59) Marlitt_Wister_Schillingscourt.txt
	0.786405 (61) Marlitt_Wister_Baliff.txt
	0.736449 (224) Marlitt_Wister_Rubies.txt

	0.285013 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	0.294574 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.294854 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.31674 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.325014 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.331154 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.331686 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.34028 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.348959 Optic_Oliver_A_Victorious_Union_PG_18678.txt
	0.351405 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt

	mean 0.642233 median 0.664944 std 0.109971 plus 1 std 0.752204023302

MARLITT_E_EUGENIE_AT_THE_COUNCILLOR_S_OR_A_NAMELESS_HISTORY_PG_43393_0.TXT

	0.850945 Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.846272 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.845684 Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.839881 Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	0.823893 Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	0.822712 The_Second_Wife_Wister_corrected.txt
	0.81857 Heimburg_W_Gertrude_s_Marriage_PG_32442.txt
	0.812194 Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	0.808955 Marlitt_Wister_Gisela.txt
	0.805223 Stephens_Ann_S_Ann_Sophia_A_Noble_Woman_PG_30111_8.txt

	0.850945 (1) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.846272 (2) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.845684 (3) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.808955 (9) Marlitt_Wister_Gisela.txt
	0.79702 (16) Marlitt_Wister_Owls.txt
	0.79305 (19) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.792964 (20) Marlitt_Wister_Baliff.txt
	0.779517 (28) Marlitt_Wister_Schillingscourt.txt
	0.734924 (116) Marlitt_Wister_Rubies.txt

	0.254751 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.256882 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.265951 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.267132 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	0.268783 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.29478 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.313083 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.32248 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.332614 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.334667 Reid_Mayne_The_Young_Yagers_A_Narrative_of_Hunting_Adventures_in_Southern_Africa_PG_34668.txt

	mean 0.609486 median 0.626667 std 0.110675 plus 1 std 0.720161668956

MARLITT_OMS_WISTER TRANSLATION_CLEANED_110617.TXT

	0.883744 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.865038 Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	0.863071 Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.853944 Heimburg_W_Gertrude_s_Marriage_PG_32442.txt
	0.850945 Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.846971 Evans_Augusta_J_Augusta_Jane_St_Elmo_PG_4553.txt
	0.846767 Stephens_Ann_S_Ann_Sophia_The_Old_Homestead_PG_8078.txt
	0.846378 Austin_Jane_G_Jane_Goodwin_Outpost_PG_4676.txt
	0.840332 Evans_Augusta_J_Augusta_Jane_Infelice_PG_17718_8.txt
	0.836877 Holmes_Mary_Jane_Gretchen_A_Novel_PG_40702_0.txt

	0.883744 (1) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.863071 (3) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.850945 (5) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.820285 (19) Marlitt_Wister_Gisela.txt
	0.816708 (22) Marlitt_Wister_Schillingscourt.txt
	0.810176 (28) Marlitt_Wister_Owls.txt
	0.805818 (31) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.764991 (108) Marlitt_Wister_Baliff.txt
	0.758347 (122) Marlitt_Wister_Rubies.txt

	0.248045 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.250922 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.262297 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.273827 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	0.285659 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.291992 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.317775 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.320741 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.323162 Eggleston_George_Cary_Captain_Sam_The_Boy_Scouts_of_PG_18622_8.txt
	0.324563 Optic_Oliver_A_Victorious_Union_PG_18678.txt

	mean 0.623734 median 0.645243 std 0.120374 plus 1 std 0.744108475745

MALORY_THOMAS_SIR_KING_ARTHUR_AND_THE_KNIGHTS_OF_THE_ROUND_TABLE_PG_36462_8.TXT

CASTLEMON_HARRY_FRANK_IN_THE_WOODS_PG_42307_8.TXT

Transform the gensim-like corpus into a matrix

I need a document-term matrix for what follows . . .

In [8]:
matrix =  []

for a in range(0, len(corpus_tf)):

    row = []
    for b in range(0, len(dictionary)):
        row.append(0.0)
    
    for b in corpus_tf[a]:
        row[b[0]] = b[1]
        
    matrix.append(row)

f = codecs.open('matrix.js', 'w', encoding='utf-8')
f.write(json.dumps(matrix))
f.close()
    
print 'len(matrix)', len(matrix)
print 'len(matrix[0])', len(matrix[0])
len(matrix) 1102
len(matrix[0]) 110808

Load saved matrix . . .

. . . so the notebook is restartable at this point.

In [1]:
import codecs, json

f = codecs.open('matrix.js', 'r', encoding='utf-8')
matrix = json.loads(f.read())
f.close()

Calculate distances between novels

Much as I did with similarity scores (see above), I compute the distances between a selected novel and the rest of the novels. If the distance between two novels is 0.0, they are identical; the greater the distance, the more unlike they are.

The point here is be sure that the results are not an artifact of any particular distance- or similarity-calculating scheme.

What did I learn in this step?

Not much, really. We're getting roughly the same results we saw earlier; same skewed graphs except flipped; same close-and-distant novels, more or less; same Marlitt-JE relationships.

The cityblock distance metric does seem to draw JE and Marlitt closer than the other methods. I use cityblock in what follows, although not because it gives the results we like, but because I think it will be easier to explain word-by-word contributions to the results.

In [76]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial.distance import *
import numpy as np

def examine_scipy_distances(novel_label, print_details):

    sns.set(color_codes=True)
    plt.rcParams['figure.figsize']=(15,5)

    je_a = -1
    for a in range(0, len(labels)):
        if novel_label in labels[a]:
            je_a = a
            break

    print
    print labels[je_a].upper()
    
    print
    print 'Cosine'

    distances = []
    graph_distances = []
    for a in range(0, len(matrix)):
        distances.append([cosine(matrix[a], matrix[je_a]), labels[a]])
        graph_distances.append(cosine(matrix[a], matrix[je_a]))

    distances.sort()

    if print_details == True:

        print
        for a in distances[1:11]:
            print '\t', a[0], a[1]

        print
        for position, a in enumerate(distances[1:]):
            if a[1].find('Marlitt') > -1 or a[1].find('Jane_Eyre') > -1:
                print '\t', a[0], ('(' + str(position) + ')'), a[1]

        print
        for a in distances[len(distances) - 10:]:
            print '\t', a[0], a[1] 

    print
    print '\tmean', np.mean(graph_distances),  \
            'median', np.median(graph_distances),  \
            'std', np.std(graph_distances), \
            'less 1 std', (np.mean(graph_distances) - (1 * np.std(graph_distances)))

    ax = sns.distplot(graph_distances, bins=100)
    ax.set(xlabel='COSINE', ylabel='n texts')
    plt.show()
    
    print
    print 'Euclidean'

    distances = []
    graph_distances = []
    for a in range(0, len(matrix)):
        distances.append([euclidean(matrix[a], matrix[je_a]), labels[a]])
        graph_distances.append(euclidean(matrix[a], matrix[je_a]))

    distances.sort()


    if print_details == True:

        print
        for a in distances[1:11]:
            print '\t', a[0], a[1]

        print
        for position, a in enumerate(distances[1:]):
            if a[1].find('Marlitt') > -1 or a[1].find('Jane_Eyre') > -1:
                print '\t', a[0], ('(' + str(position) + ')'), a[1]

        print
        for a in distances[len(distances) - 10:]:
            print '\t', a[0], a[1] 

    print
    print '\tmean', np.mean(graph_distances),  \
            'median', np.median(graph_distances),  \
            'std', np.std(graph_distances), \
            'less 1 std', (np.mean(graph_distances) - (1 * np.std(graph_distances)))

    ax = sns.distplot(graph_distances, bins=100)
    ax.set(xlabel='EUCLIDEAN', ylabel='n texts')
    plt.show()

    print
    print 'Cityblock'

    distances = []
    graph_distances = []
    for a in range(0, len(matrix)):
        distances.append([cityblock(matrix[a], matrix[je_a]), labels[a]])
        graph_distances.append(cityblock(matrix[a], matrix[je_a]))

    distances.sort()

    if print_details == True:

        print
        for a in distances[1:11]:
            print '\t', a[0], a[1]

        print
        for position, a in enumerate(distances[1:]):
            if a[1].find('Marlitt') > -1 or a[1].find('Jane_Eyre') > -1:
                print '\t', a[0], ('(' + str(position) + ')'), a[1]

        print
        for a in distances[len(distances) - 10:]:
            print '\t', a[0], a[1] 

    print
    print '\tmean', np.mean(graph_distances),  \
            'median', np.median(graph_distances),  \
            'std', np.std(graph_distances), \
            'less 1 std', (np.mean(graph_distances) - (1 * np.std(graph_distances)))

    ax = sns.distplot(graph_distances, bins=100)
    ax.set(xlabel='CITYBLOCK', ylabel='n texts')
    plt.show()

    
# --------------------------------------------------------
        
# ------------------------------------------------

examine_scipy_distances('Bront_Charlotte_Jane_Eyre', True)
examine_scipy_distances('Marlitt_E_Eugenie_Gold_Elsie', True)
#examine_scipy_distances('Marlitt_E_Eugenie_At_the_Councillor', True)
#examine_scipy_distances('Marlitt_OMS_Wister', True)
#examine_scipy_distances('Malory_Thomas_Sir_King_Arthur', False)
#examine_scipy_distances('Castlemon_Harry_Frank_in_the_Woods', False)
BRONT_CHARLOTTE_JANE_EYRE_AN_AUTOBIOGRAPHY_PG_1260.TXT

Cosine

	0.0837202100008 Bront_Charlotte_Shirley_PG_30486.txt
	0.0936935459424 Bront_Charlotte_Villette_PG_9182.txt
	0.110972255078 Wood_Henry_Mrs_Verner_s_Pride_PG_15627.txt
	0.11444849003 Harland_Marion_Alone_PG_46505.txt
	0.118674875195 Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	0.119801409019 Bront_Charlotte_The_Professor_PG_1028.txt
	0.123870163873 Dickens_Charles_Little_Dorrit_PG_963.txt
	0.127916552866 Wood_Henry_Mrs_East_Lynne_PG_3322.txt
	0.128869280129 Dickens_Charles_Bleak_House_PG_1023.txt
	0.129723512724 Dickens_Charles_Dombey_and_Son_PG_821.txt

	0.178462632033 (68) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.19418217999 (110) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.198773889816 (124) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.206949917079 (152) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.235388414709 (275) Marlitt_Wister_Gisela.txt
	0.248370759526 (336) Marlitt_Wister_Owls.txt
	0.266581299629 (434) Marlitt_Wister_Schillingscourt.txt
	0.274895404209 (481) Marlitt_Wister_Baliff.txt
	0.328435377984 (678) Marlitt_Wister_Rubies.txt

	0.611919054551 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.612077371515 Maclaren_Ian_The_Days_of_Auld_Lang_Syne_PG_43726.txt
	0.615080348677 Dumas_Alexandre_Man_in_the_Iron_Mask_an_Essay_PG_2751.txt
	0.626757440501 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.639702450825 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.649727408714 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.676584089006 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.681879649361 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	0.687721833304 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.704024672937 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt

	mean 0.314001375338 median 0.291941653735 std 0.108232335229 less 1 std 0.205769040109
Euclidean

	0.0162839545918 Bront_Charlotte_Shirley_PG_30486.txt
	0.0171640872706 Bront_Charlotte_Villette_PG_9182.txt
	0.0189611026527 Harland_Marion_Alone_PG_46505.txt
	0.0193797502978 Bront_Charlotte_The_Professor_PG_1028.txt
	0.0211520400308 Bront_Emily_Wuthering_Heights_PG_768.txt
	0.0215403532344 Bront_Anne_The_Tenant_of_Wildfell_Hall_PG_969_0.txt
	0.0222175826219 MacDonald_George_David_Elginbrod_PG_2291.txt
	0.0231383894773 Harte_Bret_The_Luck_of_Roaring_Camp_and_Other_Tales_With_Condensed_Nov_PG_6373.txt
	0.023203289564 Dickens_Charles_Little_Dorrit_PG_963.txt
	0.023208363031 Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt

	0.0268570691017 (66) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.0293253961657 (148) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.0295611425867 (157) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.0313098083464 (252) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.0323171821896 (302) Marlitt_Wister_Gisela.txt
	0.0336529529339 (386) Marlitt_Wister_Schillingscourt.txt
	0.0347678611761 (433) Marlitt_Wister_Baliff.txt
	0.0349865019709 (444) Marlitt_Wister_Owls.txt
	0.0396913044545 (662) Marlitt_Wister_Rubies.txt

	0.0732243439247 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.0741810550967 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.0751449936705 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.0753385842906 Howells_William_Dean_The_Flight_of_Pony_Baker_A_Boy_s_Town_Story_PG_22219.txt
	0.077638987205 Page_Thomas_Nelson_Two_Little_Confederates_PG_26725.txt
	0.0812422158281 Aikin_Lucy_Defoe_Daniel_Robinson_Crusoe_in_Words_of_One_Syllable_PG_6936.txt
	0.0891208545318 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.0912232928823 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.0959656310778 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.0969619683226 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt

	mean 0.0391048616374 median 0.0372237469866 std 0.0103635454567 less 1 std 0.0287413161808
Cityblock

	0.574124057822 Bront_Charlotte_Shirley_PG_30486.txt
	0.6128181201 Bront_Charlotte_Villette_PG_9182.txt
	0.676367163728 Bront_Charlotte_The_Professor_PG_1028.txt
	0.695358789743 Dickens_Charles_Dombey_and_Son_PG_821.txt
	0.70037711091 Dickens_Charles_David_Copperfield_PG_766.txt
	0.702059809761 Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt
	0.704994238847 Harland_Marion_Alone_PG_46505.txt
	0.706455526935 Bront_Emily_Wuthering_Heights_PG_768.txt
	0.715822754499 Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	0.717453640861 Dickens_Charles_Little_Dorrit_PG_963.txt

	0.786523244186 (38) Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	0.83218783419 (97) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.836019025155 (105) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.862575081625 (168) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.953904288782 (454) Marlitt_Wister_Baliff.txt
	0.971104879732 (526) Marlitt_Wister_Gisela.txt
	1.02651618842 (690) Marlitt_Wister_Schillingscourt.txt
	1.06440630141 (775) Marlitt_Wister_Owls.txt
	1.21455417824 (1005) Marlitt_Wister_Rubies.txt

	1.36304220185 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	1.3651526006 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	1.36537511079 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	1.36908005461 Aikin_Lucy_Defoe_Daniel_Robinson_Crusoe_in_Words_of_One_Syllable_PG_6936.txt
	1.38662305492 Eliot_George_How_Lisa_Loved_the_King_PG_20813.txt
	1.39258555283 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	1.42316055668 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	1.42338054224 Dunne_Finley_Peter_Mr_Dooley_in_Peace_and_in_War_PG_22537.txt
	1.43071748584 Seton_Ernest_Thompson_The_Biography_of_a_Grizzly_PG_9330.txt
	1.46825280318 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt

	mean 0.998631273745 median 0.980326931605 std 0.141098116217 less 1 std 0.857533157527
MARLITT_E_EUGENIE_GOLD_ELSIE_PG_42426.TXT

Cosine

	0.13692903909 Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.143602655891 Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	0.151484419089 Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	0.154315789662 Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.154956574989 Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	0.157249031113 Marlitt_Wister_Gisela.txt
	0.158518767532 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.159400671679 The_Second_Wife_Wister_corrected.txt
	0.166313811959 Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	0.172018401441 Dickens_Charles_Dombey_and_Son_PG_821.txt

	0.13692903909 (0) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.154315789662 (3) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.157249031113 (5) Marlitt_Wister_Gisela.txt
	0.158518767532 (6) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.178462632033 (10) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.189398789326 (27) Marlitt_Wister_Owls.txt
	0.212911394065 (58) Marlitt_Wister_Schillingscourt.txt
	0.213594916094 (60) Marlitt_Wister_Baliff.txt
	0.263551417569 (223) Marlitt_Wister_Rubies.txt

	0.648594528319 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.651041418966 Optic_Oliver_A_Victorious_Union_PG_18678.txt
	0.659719999924 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	0.668314361255 Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
	0.668845749359 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.674985717897 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.683260343243 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.705145935733 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.705425825756 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.714986736297 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt

	mean 0.357442290273 median 0.334874147338 std 0.110447627388 less 1 std 0.246994662885
Euclidean

	0.0245155291484 Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	0.026384593435 Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.0266934595416 Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.0268068122452 Bront_Charlotte_Shirley_PG_30486.txt
	0.0268570691017 Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.0269228740509 Harland_Marion_Alone_PG_46505.txt
	0.0272111841395 Bront_Charlotte_Villette_PG_9182.txt
	0.0272212224496 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.0272264733487 Schubin_Ossip_Erlach_Court_PG_35541.txt
	0.0273488748788 Marlitt_Wister_Gisela.txt

	0.026384593435 (1) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.0266934595416 (2) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.0268570691017 (4) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.0272212224496 (7) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.0273488748788 (9) Marlitt_Wister_Gisela.txt
	0.0313093517033 (56) Marlitt_Wister_Owls.txt
	0.031383511276 (58) Marlitt_Wister_Schillingscourt.txt
	0.0318617637897 (69) Marlitt_Wister_Baliff.txt
	0.0368013214966 (322) Marlitt_Wister_Rubies.txt

	0.0769901646032 Howells_William_Dean_The_Flight_of_Pony_Baker_A_Boy_s_Town_Story_PG_22219.txt
	0.0771511139766 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	0.07761287507 Optic_Oliver_All_Adrift_Or_The_Goldwing_Club_PG_25577.txt
	0.0779944747005 Ballantyne_R_M_Robert_Michael_Fighting_the_Whales_PG_21202.txt
	0.079001733115 Page_Thomas_Nelson_Two_Little_Confederates_PG_26725.txt
	0.0834456933946 Aikin_Lucy_Defoe_Daniel_Robinson_Crusoe_in_Words_of_One_Syllable_PG_6936.txt
	0.090909814497 Malory_Thomas_Sir_King_Arthur_and_the_Knights_of_the_Round_Table_PG_36462_8.txt
	0.0925782269625 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt
	0.0975311345646 Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
	0.0979575138769 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt

	mean 0.0430195248037 median 0.0409811606206 std 0.0100044614926 less 1 std 0.0330150633111
Cityblock

	0.673693179437 Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.679106783682 Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.702937477576 Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	0.703982949553 The_Second_Wife_Wister_corrected.txt
	0.732820535005 Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	0.73327910521 Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.736400794534 Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	0.772638884187 Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt
	0.773066254612 Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	0.786523244186 Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	0.673693179437 (0) Marlitt_OMS_Wister translation_cleaned_110617.txt
	0.679106783682 (1) Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	0.73327910521 (5) Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	0.786523244186 (9) Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	0.805716186705 (13) Marlitt_Wister_Gisela.txt
	0.845025213532 (37) Marlitt_Wister_Baliff.txt
	0.88931875075 (100) Marlitt_Wister_Schillingscourt.txt
	0.921636978148 (171) Marlitt_Wister_Owls.txt
	1.09485169287 (741) Marlitt_Wister_Rubies.txt

	1.36787901512 Mitchell_S_Weir_Silas_Weir_Mr_Kris_Kringle_A_Christmas_Tale_PG_20180.txt
	1.39226108681 Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
	1.39254261467 Otis_James_Off_Santiago_with_Sampson_PG_43420.txt
	1.40027107393 Aikin_Lucy_Defoe_Daniel_Robinson_Crusoe_in_Words_of_One_Syllable_PG_6936.txt
	1.42946702668 Jackson_Helen_Hunt_Mammy_Tittleback_and_Her_Family_A_True_Story_of_Seventeen_Cats_PG_33240.txt
	1.43143720117 Carruth_Hayden_The_Voyage_of_the_Rattletrap_PG_16586.txt
	1.4364088821 Seton_Ernest_Thompson_The_Biography_of_a_Grizzly_PG_9330.txt
	1.4528747819 Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
	1.47386290606 Dunne_Finley_Peter_Mr_Dooley_in_Peace_and_in_War_PG_22537.txt
	1.50102554267 Seton_Ernest_Thompson_The_Trail_of_the_Sandhill_Stag_PG_32319.txt

	mean 1.04759707033 median 1.02818479985 std 0.135801093177 less 1 std 0.911795977149

cophenetic correlation

Please note that I am no longer running this cell; I keep it here only in case I need to go back to it at some point;

The cophenetic correlation (see https://en.wikipedia.org/wiki/Cophenetic_correlation) "is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points."

I.e., how much does the following dendrogram flatten the information? I want to pick the best (i.e., highest scored) linkage type when making the dendrogram.

In [52]:
from scipy.cluster.hierarchy import cophenet
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import dendrogram, linkage

def find_best_linkage_type(matrix, pdist_matrix, linkage_type):

    Z = linkage(matrix, linkage_type)
    c, coph_dists = cophenet(Z, pdist_matrix)

    print linkage_type, 'cophenet c', c
    
# ----------------------------------------------------
        
pdist_matrix = pdist(matrix)

#find_best_linkage_type(matrix, pdist_matrix, 'ward')
#find_best_linkage_type(matrix, pdist_matrix, 'single')
#find_best_linkage_type(matrix, pdist_matrix, 'complete')
#find_best_linkage_type(matrix, pdist_matrix, 'average')
#find_best_linkage_type(matrix, pdist_matrix, 'weighted')
#find_best_linkage_type(matrix, pdist_matrix, 'centroid')
#find_best_linkage_type(matrix, pdist_matrix, 'median')

Cluster the matrix

Please note that I am no longer running this cell; I keep it here only in case I need to go back to it at some point;

I include the code for this only because it is generally expected; however, I don't much care for the results, nevermind that I tried it with the linkage types with higher cophenetic correlations (average, centroid, single, media). Therefore, I don't include the visualizations. They are also quite large, which makes the notebook cumbersome.

Why don't I like the results?

  • The clusters end up being author-centric; i.e., all of Dickens clusters together, etc.
  • Some relationships--say, for example, between Gold Elsie and JE--aren't as clear in the clustering as they are in the information presented above, in part, I think, because
  • The corpus is skewed toward books like JE (see the shapes of the graphs in the information presented above); as a result, small differences, to the degree erased by clustering, matter.
In [13]:
%matplotlib inline
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist
import numpy as np

def draw_cluster_diagram(matrix, row_labels, linkage_type):
    
    print
    print '****', linkage_type
    print

    Z = linkage(matrix, linkage_type)

    plt.figure(figsize=(10, 500))
    plt.title('ALL WORDS')
    plt.xlabel('')
    plt.ylabel('distance')
    dendrogram(
        Z,
        orientation='left',
        labels=row_labels,
        leaf_font_size=20,
        color_threshold=0.02
    )
    plt.show()
    
# ----------------------------------------------------
        
new_labels = []
for a in labels:
    if 'Jane_Eyre' in a or 'Marlitt' in a:
        new_labels.append('****** ' + a)
    else:
        new_labels.append(a)

#draw_cluster_diagram(matrix, new_labels, 'average')
#draw_cluster_diagram(matrix, new_labels, 'centroid')
#draw_cluster_diagram(matrix, new_labels, 'single')
#draw_cluster_diagram(matrix, new_labels, 'median')

Compute distances

I compute everything-to-everything distances. I use the city block method because I think that it will make it easier for me to later determine which words are contributing to distance.

In [14]:
from scipy.spatial.distance import *

distances = []
    
for a in range(0, len(matrix) - 1):
    
    if a % 100 == 0:
        print 'processing', a
    
    for b in range(a + 1, len(matrix)):
        distances.append([cityblock(matrix[a], matrix[b]), a, labels[a], b, labels[b]])

f = codecs.open('distances.js', 'w', encoding='utf-8')
f.write(json.dumps(distances))
f.close()
processing 0
processing 100
processing 200
processing 300
processing 400
processing 500
processing 600
processing 700
processing 800
processing 900
processing 1000
processing 1100

Reload distances . . .

. . . so I can restart the notebook here, if necessary.

In [15]:
import codecs, json

f = codecs.open('distances.js', 'r', encoding='utf-8')
distances = json.loads(f.read())
f.close()

What do the distances look like?

Useful things accomplished here:

  • The distances occur in a standard distribution; mean and standard deviation would seem to be useful tools for understanding the data;
  • I separate out 9,610 "very close" novel-to-novel relationships; "very close" is defined here as "any distance less than the mean minus two times the standard deviation".
  • I list out the 25 novels which the most "very close" relationships (e.g, Ward_Humphry_Mrs_Marcella_PG_13728_8.txt has 231 such relationships, as does Dickens_Charles_Dombey_and_Son_PG_821.txt).
  • I list out the novels which are very close to Jane Eyre and to the Marlitt novels.

Notice that nothing is very close to Marlitt's Owl's Nest, Rubies, or Schillingscount. I do not believe that, in fact, this is so; instead, I think we're observing the result of problems parsing these texts.

What did I learn in this step?

  • The presense of Dickens in the "very close" relationships needs thinking through (and generally, the consequence of there being more or less of any author in the corpus). I like, for example, that a couple of Gaskell novels figure importantly in the "very close" relationships, since I can explain her as a transmitter of a Brontean aesthetic; but Dickens, because he's roughly a contemporary of Bronte, complicates the question of "effect". In fact, his presense in the "very close" relationships makes wonder if, instead of a "Jane Eyre Effect", we're seeing a "rise of the English periodical effect."
  • Everyone, it seems, has a novel that is "very close" to JE. It's important to note, however, that if I pulled the same list for Dombey, Bleak_House, or John Halifax, we could make the same observation.
  • The lists of novels close to the Marlitt novels are much more interesting, since they point toward the mixing of English and German in translation.

At this point, I might question the composition of the corpus. To review, the corpus consists of fiction which circulated in the Muncie Public Library in the 1890's, and for which we were able to find good (i.e., from PG) electronic copies.

We could, of course, compose a corpus which draws JE-Marlitt comparisions more sharply. A ridiculous example: a corpus of JE, Marlitt, and Frank novels, and nothing else. But to do so, even non-ridiculously, would, I think misrepresent this kind of similarity between JE and the the fiction it was read with.

In [16]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter

def print_very_close(novel_label, lower_distance):
        
    print
    print 'VERY CLOSE TO', novel_label.upper()
    print
    for d in distances:
        if d[0] <= lower_distance:
            if novel_label in d[2]:
                if 'Jane_Eyre' in d[4] or 'Marlitt' in d[4]:
                    print '\t', '****', d[4]
                else:
                    print '\t', d[4]
            if novel_label in d[4]:
                if 'Jane_Eyre' in d[2] or 'Marlitt' in d[2]:
                    print '\t', '****', d[2]
                else:
                    print '\t', d[2]

graph_distances = []
for d in distances:
    graph_distances.append(d[0])

sns.set(color_codes=True)
plt.rcParams['figure.figsize']=(15,5)

ax = sns.distplot(graph_distances, bins=100)
ax.set(xlabel='CITYBLOCK', ylabel='n texts')
plt.show()

print
print 'mean', np.mean(graph_distances),  \
        'median', np.median(graph_distances),  \
        'std', np.std(graph_distances), \
        'less 1 std', (np.mean(graph_distances) - (1 * np.std(graph_distances))), \
        'less 2 std', (np.mean(graph_distances) - (2 * np.std(graph_distances)))
        
n_very_close = 0
very_close_counts = defaultdict(int)

lower_distance = (np.mean(graph_distances) - (2 * np.std(graph_distances)))

very_close_distances = []

for d in distances:
    if d[0] <= lower_distance:
        n_very_close += 1
        very_close_counts[d[2]] += 1
        very_close_counts[d[4]] += 1
        very_close_distances.append(d)

print
print 'n_very_close', n_very_close

print
for w in Counter(very_close_counts).most_common(25):
    if 'Jane_Eyre' in w[0] or 'Marlitt' in w[0]:
        print '****', w[0], w[1]
    else:
        print w[0], w[1]
        
print_very_close('Bront_Charlotte_Jane_Eyre', lower_distance)

print_very_close('Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt', lower_distance)
print_very_close('Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt', lower_distance)
print_very_close('Marlitt_OMS_Wister translation_cleaned_110617.txt', lower_distance)
print_very_close('Marlitt_Wister_Baliff.txt', lower_distance)
print_very_close('Marlitt_Wister_Gisela.txt', lower_distance)
print_very_close('Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt', lower_distance)
print_very_close('Marlitt_Wister_Owls.txt', lower_distance)
print_very_close('Marlitt_Wister_Rubies.txt', lower_distance)
print_very_close('Marlitt_Wister_Schillingscourt.txt', lower_distance)
mean 1.12718750518 median 1.12302060164 std 0.135773081012 less 1 std 0.991414424169 less 2 std 0.855641343157

n_very_close 9610

Ward_Humphry_Mrs_Marcella_PG_13728_8.txt 231
Dickens_Charles_Dombey_and_Son_PG_821.txt 231
Dickens_Charles_Little_Dorrit_PG_963.txt 222
Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt 215
Dickens_Charles_David_Copperfield_PG_766.txt 210
Grand_Sarah_The_Heavenly_Twins_PG_8676_8.txt 207
Dickens_Charles_Bleak_House_PG_1023.txt 205
Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt 200
Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt 199
Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt 195
Gaskell_Elizabeth_Cleghorn_North_and_South_PG_4276.txt 185
Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt 182
Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt 172
Dickens_Charles_Martin_Chuzzlewit_PG_968.txt 170
Dickens_Charles_Our_Mutual_Friend_PG_883.txt 160
Yonge_Charlotte_M_Charlotte_Mary_The_Heir_of_Redclyffe_PG_2505_8.txt 154
Auerbach_Berthold_Villa_Eden_The_Country_House_on_the_Rhine_PG_32902_8.txt 151
Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt 151
**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt 150
Craik_Dinah_Maria_Mulock_John_Halifax_Gentleman_PG_2351.txt 149
Crawford_F_Marion_Francis_Marion_Saracinesca_PG_13757_8.txt 148
Alexander_Mrs_A_Crooked_Path_A_Novel_PG_18418.txt 144
Bront_Charlotte_Shirley_PG_30486.txt 143
Crawford_F_Marion_Francis_Marion_Sant_Ilario_PG_5227.txt 143
Wood_Henry_Mrs_East_Lynne_PG_3322.txt 139

VERY CLOSE TO BRONT_CHARLOTTE_JANE_EYRE

	Evans_Augusta_J_Augusta_Jane_Macaria_PG_27811_8.txt
	Crawford_F_Marion_Francis_Marion_Taquisara_PG_11050_8.txt
	Craik_Dinah_Maria_Mulock_John_Halifax_Gentleman_PG_2351.txt
	Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt
	Roe_Edward_Payson_From_Jest_to_Earnest_PG_6102.txt
	Collins_Wilkie_Armadale_PG_1895.txt
	Whyte_Melville_G_J_George_John_The_Interpreter_A_Tale_of_the_War_PG_40660_0.txt
	Fleming_May_Agnes_A_Changed_Heart_A_Novel_PG_41672.txt
	Crawford_F_Marion_Francis_Marion_Sant_Ilario_PG_5227.txt
	Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	Cholmondeley_Mary_Red_Pottage_PG_14885.txt
	**** Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	Evans_Augusta_J_Augusta_Jane_Vashti_Or_Until_Death_Us_Do_Part_PG_31620.txt
	Black_William_In_Silk_Attire_A_Novel_PG_40111_0.txt
	Stephens_Ann_S_Ann_Sophia_The_Old_Homestead_PG_8078.txt
	Carey_Rosa_Nouchette_Uncle_Max_PG_16080.txt
	Ingelow_Jean_Fated_to_Be_Free_A_Novel_PG_12303.txt
	Collins_Wilkie_The_Woman_in_White_PG_583.txt
	Eliot_George_Felix_Holt_the_Radical_PG_40882.txt
	Hardy_Thomas_Far_from_the_Madding_Crowd_PG_27.txt
	Collins_Wilkie_After_Dark_PG_1626.txt
	Dickens_Charles_Martin_Chuzzlewit_PG_968.txt
	Irving_Washington_Bracebridge_Hall_or_The_Humorists_PG_13515.txt
	Weyman_Stanley_John_Sophia_A_Romance_PG_39168.txt
	Fothergill_Jessie_The_First_Violin_A_Novel_PG_29219_8.txt
	Yonge_Charlotte_M_Charlotte_Mary_The_Chaplet_of_Pearls_PG_5274.txt
	Reade_Charles_A_Woman_Hater_PG_3669.txt
	Maartens_Maarten_My_Lady_Nobody_A_Novel_PG_49903_0.txt
	Ebers_Georg_The_Bride_of_the_Nile_Complete_PG_5529.txt
	Yonge_Charlotte_M_Charlotte_Mary_Dynevor_Terrace_Or_The_Clue_of_Life_Volume_PG_4235.txt
	Lytton_Edward_Bulwer_Lytton_Baron_The_Caxtons_A_Family_Picture_Complete_PG_7605.txt
	Dickens_Charles_Hard_Times_PG_786_0.txt
	Sue_Eugne_The_Wandering_Jew_Complete_PG_3350.txt
	Dickens_Charles_Sketches_by_Boz_Illustrative_of_Every_Day_Life_and_Every_Day_People_PG_882.txt
	Dickens_Charles_Barnaby_Rudge_A_Tale_of_the_Riots_of_Eighty_PG_917.txt
	Sheppard_Elizabeth_Sara_Charles_Auchester_Volume_of_PG_38949_8.txt
	Crawford_F_Marion_Francis_Marion_Saracinesca_PG_13757_8.txt
	Yonge_Charlotte_M_Charlotte_Mary_The_Young_Step_Mother_Or_A_Chronicle_of_Mistakes_PG_5843.txt
	Gaskell_Elizabeth_Cleghorn_North_and_South_PG_4276.txt
	Haggard_H_Rider_Henry_Rider_Beatrice_PG_3096.txt
	Crawford_F_Marion_Francis_Marion_Greifenstein_PG_6446.txt
	Alexander_Mrs_Ralph_Wilton_s_weird_PG_41740.txt
	Gaskell_Elizabeth_Cleghorn_Mary_Barton_PG_2153.txt
	Caine_Hall_Sir_The_Christian_A_Story_PG_8407_8.txt
	Hornung_E_W_Ernest_William_Peccavi_PG_36115_8.txt
	Dickens_Charles_Bleak_House_PG_1023.txt
	Gaskell_Elizabeth_Cleghorn_Sylvia_s_Lovers_Complete_PG_4537.txt
	Lyall_Edna_We_Two_A_Novel_PG_2007.txt
	Kingsley_Henry_Ravenshoe_PG_41636_8.txt
	James_G_P_R_George_Payne_Rainsford_Arabella_Stuart_A_Romance_from_English_History_PG_49468_8.txt
	Von_Arnim_Elizabeth_The_Benefactress_PG_30302_8.txt
	Corelli_Marie_The_Sorrows_of_Satan_or_The_Strange_Experience_of_One_Ge_PG_42332_8.txt
	Yonge_Charlotte_M_Charlotte_Mary_Chantry_House_PG_7378_0.txt
	Whitney_A_D_T_Adeline_Dutton_Train_Faith_Gartney_s_Girlhood_PG_18896.txt
	Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt
	Hentz_Caroline_Lee_Helen_and_Arthur_or_Miss_Thusa_s_Spinning_Wheel_PG_23106_8.txt
	Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt
	Lytton_Edward_Bulwer_Lytton_Baron_A_Strange_Story_Complete_PG_7701.txt
	Eliot_George_Middlemarch_PG_145.txt
	Ward_Humphry_Mrs_Helbeck_of_Bannisdale_Volume_II_PG_9442_8.txt
	Lytton_Edward_Bulwer_Lytton_Baron_Pelham_Complete_PG_7623.txt
	Ward_Humphry_Mrs_Eleanor_PG_9087_8.txt
	Auerbach_Berthold_Villa_Eden_The_Country_House_on_the_Rhine_PG_32902_8.txt
	Dickens_Charles_The_Posthumous_Papers_of_the_Pickwick_Club_v_of_PG_47534_8.txt
	Stephens_Ann_S_Ann_Sophia_The_Gold_Brick_PG_34500.txt
	Yonge_Charlotte_M_Charlotte_Mary_The_Heir_of_Redclyffe_PG_2505_8.txt
	Bront_Emily_Wuthering_Heights_PG_768.txt
	Lever_Charles_James_Confessions_Of_Con_Cregan_the_Irish_Gil_Blas_PG_32060.txt
	Alcott_Louisa_May_Moods_PG_28203.txt
	Reade_Charles_Love_Me_Little_Love_Me_Long_PG_4607.txt
	Lever_Charles_James_Charles_O_Malley_The_Irish_Dragoon_Volume_PG_8577_8.txt
	Bront_Charlotte_Shirley_PG_30486.txt
	Hay_John_The_Bread_winners_A_Social_Study_PG_16321.txt
	Dickens_Charles_Little_Dorrit_PG_963.txt
	Duchess_Airy_Fairy_Lilian_PG_35228.txt
	Evans_Augusta_J_Augusta_Jane_St_Elmo_PG_4553.txt
	Lawrence_George_A_George_Alfred_Guy_Livingstone_or_Thorough_PG_17084.txt
	Stephens_Ann_S_Ann_Sophia_Fashion_and_Famine_PG_40114_8.txt
	Dickens_Charles_Our_Mutual_Friend_PG_883.txt
	Meredith_George_Diana_of_the_Crossways_Complete_PG_4470.txt
	Bront_Anne_The_Tenant_of_Wildfell_Hall_PG_969_0.txt
	MacDonald_George_David_Elginbrod_PG_2291.txt
	Eliot_George_Adam_Bede_PG_507.txt
	Alexander_Mrs_A_Crooked_Path_A_Novel_PG_18418.txt
	Harland_Marion_Alone_PG_46505.txt
	Dickens_Charles_A_Tale_of_Two_Cities_PG_98_8.txt
	Spielhagen_Friedrich_What_the_Swallow_Sang_A_Novel_PG_34599.txt
	Porter_Jane_Thaddeus_of_Warsaw_PG_6566.txt
	Sand_George_Mauprat_PG_2194.txt
	Eliot_George_The_Mill_on_the_Floss_PG_6688.txt
	Eliot_George_Daniel_Deronda_PG_7469.txt
	Hawthorne_Nathaniel_Twice_Told_Tales_PG_13707.txt
	Dickens_Charles_Dombey_and_Son_PG_821.txt
	Crawford_F_Marion_Francis_Marion_Casa_Braccio_Volumes_and_PG_26327_8.txt
	Castle_Egerton_The_Light_of_Scarthey_A_Romance_PG_26045_8.txt
	Harte_Bret_The_Luck_of_Roaring_Camp_and_Other_Tales_With_Condensed_Nov_PG_6373.txt
	Bront_Charlotte_Villette_PG_9182.txt
	Evans_Augusta_J_Augusta_Jane_Infelice_PG_17718_8.txt
	Dickens_Charles_The_Mystery_of_Edwin_Drood_PG_564.txt
	Braddon_M_E_Mary_Elizabeth_The_Infidel_A_Story_of_the_Great_Revival_PG_50676_8.txt
	Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	Stowe_Harriet_Beecher_Uncle_Tom_s_Cabin_PG_203.txt
	Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt
	Warner_Susan_The_Wide_Wide_World_PG_18689_8.txt
	Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	Holmes_Mary_Jane_Lena_Rivers_PG_12835.txt
	Alcott_Louisa_May_Little_Women_PG_514_8.txt
	Dickens_Charles_Great_Expectations_PG_1400_8.txt
	Craik_Dinah_Maria_Mulock_Mistress_and_Maid_A_Household_Story_PG_13461.txt
	Craik_Dinah_Maria_Mulock_Olive_A_Novel_PG_22121_8.txt
	Hornung_E_W_Ernest_William_At_Large_PG_35684_8.txt
	Warner_Susan_Queechy_PG_8874_8.txt
	Bront_Charlotte_The_Professor_PG_1028.txt
	Hardy_Thomas_A_Pair_of_Blue_Eyes_PG_224.txt
	**** Marlitt_OMS_Wister translation_cleaned_110617.txt
	Collins_Wilkie_The_Dead_Secret_A_Novel_PG_43092.txt
	Stephens_Ann_S_Ann_Sophia_A_Noble_Woman_PG_30111_8.txt
	Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	Wood_Henry_Mrs_East_Lynne_PG_3322.txt
	Woolson_Constance_Fenimore_Horace_Chase_PG_39067.txt
	Grand_Sarah_The_Heavenly_Twins_PG_8676_8.txt
	Eliot_George_Romola_PG_24020.txt
	Auerbach_Berthold_On_the_Heights_A_Novel_PG_33294.txt
	Alcott_Louisa_May_Work_A_Story_of_Experience_PG_4770.txt
	Collins_Wilkie_No_Name_PG_1438.txt
	Thackeray_William_Makepeace_Vanity_Fair_PG_599.txt
	Ward_Humphry_Mrs_Marcella_PG_13728_8.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	Stoddard_Elizabeth_The_Morgesons_A_Novel_PG_12347.txt
	Carey_Rosa_Nouchette_Not_Like_Other_Girls_PG_28463.txt
	Corelli_Marie_Thelma_PG_3823_8.txt
	Reade_Charles_A_Terrible_Temptation_A_Story_of_To_Day_PG_7895.txt
	Wood_Henry_Mrs_Verner_s_Pride_PG_15627.txt
	Corelli_Marie_Temporal_Power_A_Study_in_Supremacy_PG_6921_8.txt
	Frederic_Harold_The_Damnation_of_Theron_Ware_PG_133.txt
	Holmes_Mary_Jane_Gretchen_A_Novel_PG_40702_0.txt
	Warner_Susan_Hills_of_the_Shatemuc_PG_16918.txt
	Reade_Charles_Hard_Cash_PG_3067.txt
	Holmes_Mary_Jane_Ethelyn_s_Mistake_PG_12104.txt
	Holmes_Oliver_Wendell_Elsie_Venner_PG_2696.txt
	MacDonald_George_The_Portent_and_Other_Stories_PG_8913.txt
	Hentz_Caroline_Lee_Ernest_Linwood_or_The_Inner_Life_of_the_Author_PG_20462.txt
	Schubin_Ossip_Erlach_Court_PG_35541.txt
	Roe_Edward_Payson_Barriers_Burned_Away_PG_6627.txt
	Dickens_Charles_The_Pickwick_Papers_PG_580.txt
	Corelli_Marie_The_Master_Christian_PG_4285.txt
	MacDonald_George_Wilfrid_Cumbermede_PG_9183.txt
	Warren_Samuel_Ten_Thousand_a_Year_Volume_PG_31004_8.txt
	Dickens_Charles_David_Copperfield_PG_766.txt
	Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt

VERY CLOSE TO MARLITT_E_EUGENIE_AT_THE_COUNCILLOR_S_OR_A_NAMELESS_HISTORY_PG_43393_0.TXT

	Evans_Augusta_J_Augusta_Jane_Vashti_Or_Until_Death_Us_Do_Part_PG_31620.txt
	The_Second_Wife_Wister_corrected.txt
	Werner_E_The_Alpine_Fay_A_Romance_PG_35229_8.txt
	Ward_Humphry_Mrs_Eleanor_PG_9087_8.txt
	Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	Dickens_Charles_Dombey_and_Son_PG_821.txt
	Evans_Augusta_J_Augusta_Jane_Infelice_PG_17718_8.txt
	Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	**** Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt
	**** Marlitt_Wister_Baliff.txt
	Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	**** Marlitt_OMS_Wister translation_cleaned_110617.txt
	Stephens_Ann_S_Ann_Sophia_A_Noble_Woman_PG_30111_8.txt
	Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	Schubin_Ossip_Erlach_Court_PG_35541.txt
	Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt

VERY CLOSE TO MARLITT_E_EUGENIE_GOLD_ELSIE_PG_42426.TXT

	Evans_Augusta_J_Augusta_Jane_Macaria_PG_27811_8.txt
	Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt
	Crawford_F_Marion_Francis_Marion_Sant_Ilario_PG_5227.txt
	Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	**** Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	Evans_Augusta_J_Augusta_Jane_Vashti_Or_Until_Death_Us_Do_Part_PG_31620.txt
	Fothergill_Jessie_The_First_Violin_A_Novel_PG_29219_8.txt
	Maartens_Maarten_My_Lady_Nobody_A_Novel_PG_49903_0.txt
	Ebers_Georg_The_Bride_of_the_Nile_Complete_PG_5529.txt
	The_Second_Wife_Wister_corrected.txt
	Sue_Eugne_The_Wandering_Jew_Complete_PG_3350.txt
	Streckfuss_Adolf_Castle_Hohenwald_A_Romance_PG_34892.txt
	Yonge_Charlotte_M_Charlotte_Mary_The_Young_Step_Mother_Or_A_Chronicle_of_Mistakes_PG_5843.txt
	Crawford_F_Marion_Francis_Marion_Greifenstein_PG_6446.txt
	Dickens_Charles_Bleak_House_PG_1023.txt
	Werner_E_The_Alpine_Fay_A_Romance_PG_35229_8.txt
	Von_Arnim_Elizabeth_The_Benefactress_PG_30302_8.txt
	Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt
	Hentz_Caroline_Lee_Helen_and_Arthur_or_Miss_Thusa_s_Spinning_Wheel_PG_23106_8.txt
	Ward_Humphry_Mrs_Eleanor_PG_9087_8.txt
	Auerbach_Berthold_Villa_Eden_The_Country_House_on_the_Rhine_PG_32902_8.txt
	Yonge_Charlotte_M_Charlotte_Mary_The_Heir_of_Redclyffe_PG_2505_8.txt
	Bront_Charlotte_Shirley_PG_30486.txt
	Dickens_Charles_Little_Dorrit_PG_963.txt
	Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	Duchess_Airy_Fairy_Lilian_PG_35228.txt
	Evans_Augusta_J_Augusta_Jane_St_Elmo_PG_4553.txt
	Stephens_Ann_S_Ann_Sophia_Fashion_and_Famine_PG_40114_8.txt
	Bront_Anne_The_Tenant_of_Wildfell_Hall_PG_969_0.txt
	Harland_Marion_Alone_PG_46505.txt
	Spielhagen_Friedrich_What_the_Swallow_Sang_A_Novel_PG_34599.txt
	**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	Dickens_Charles_Dombey_and_Son_PG_821.txt
	Castle_Egerton_The_Light_of_Scarthey_A_Romance_PG_26045_8.txt
	Bront_Charlotte_Villette_PG_9182.txt
	Evans_Augusta_J_Augusta_Jane_Infelice_PG_17718_8.txt
	Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	**** Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt
	**** Marlitt_Wister_Gisela.txt
	**** Marlitt_Wister_Baliff.txt
	Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	**** Marlitt_OMS_Wister translation_cleaned_110617.txt
	Stephens_Ann_S_Ann_Sophia_A_Noble_Woman_PG_30111_8.txt
	Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	Grand_Sarah_The_Heavenly_Twins_PG_8676_8.txt
	Auerbach_Berthold_On_the_Heights_A_Novel_PG_33294.txt
	Corelli_Marie_Thelma_PG_3823_8.txt
	MacDonald_George_The_Portent_and_Other_Stories_PG_8913.txt
	Schubin_Ossip_Erlach_Court_PG_35541.txt
	Dickens_Charles_David_Copperfield_PG_766.txt
	Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt

VERY CLOSE TO MARLITT_OMS_WISTER TRANSLATION_CLEANED_110617.TXT

	Evans_Augusta_J_Augusta_Jane_Macaria_PG_27811_8.txt
	Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
	**** Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	Evans_Augusta_J_Augusta_Jane_Vashti_Or_Until_Death_Us_Do_Part_PG_31620.txt
	Stephens_Ann_S_Ann_Sophia_The_Old_Homestead_PG_8078.txt
	The_Second_Wife_Wister_corrected.txt
	Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt
	Ward_Humphry_Mrs_Eleanor_PG_9087_8.txt
	Bethusy_Huc_Valeska_Grfin_von_The_Eichhofs_A_Romance_PG_35311_8.txt
	Evans_Augusta_J_Augusta_Jane_St_Elmo_PG_4553.txt
	Stephens_Ann_S_Ann_Sophia_Fashion_and_Famine_PG_40114_8.txt
	**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
	Dickens_Charles_Dombey_and_Son_PG_821.txt
	Crawford_F_Marion_Francis_Marion_Casa_Braccio_Volumes_and_PG_26327_8.txt
	Evans_Augusta_J_Augusta_Jane_Infelice_PG_17718_8.txt
	Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	**** Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	**** Marlitt_Wister_Gisela.txt
	Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	Glmer_Claire_von_A_Noble_Name_or_Dnninghausen_PG_36550.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
	Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt

VERY CLOSE TO MARLITT_WISTER_BALIFF.TXT

	**** Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

VERY CLOSE TO MARLITT_WISTER_GISELA.TXT

	The_Second_Wife_Wister_corrected.txt
	**** Marlitt_Wister_Little_Moorland_Princess_cleaned_121817.txt
	**** Marlitt_OMS_Wister translation_cleaned_110617.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

VERY CLOSE TO MARLITT_WISTER_LITTLE_MOORLAND_PRINCESS_CLEANED_121817.TXT

	**** Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
	The_Second_Wife_Wister_corrected.txt
	Dickens_Charles_Dombey_and_Son_PG_821.txt
	Schubin_Ossip_O_Thou_My_Austria_PG_35454.txt
	Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt
	**** Marlitt_Wister_Gisela.txt
	Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt
	**** Marlitt_OMS_Wister translation_cleaned_110617.txt
	**** Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

VERY CLOSE TO MARLITT_WISTER_OWLS.TXT


VERY CLOSE TO MARLITT_WISTER_RUBIES.TXT


VERY CLOSE TO MARLITT_WISTER_SCHILLINGSCOURT.TXT

What determines "closeness"

Can I determine which words contribute--or do not contribute--to the distance between two novels?

Here, I simply scatter plot the relative frequencies of words in two novels; the X-axis is the frequency on one novel, and the Y-axis is the frequency in the other. I draw plot for 8 novels, each compared to JE (JE is always the X-axis). Four of those novels are very close to JE, two are Marlitt novels, and four are far from JE.

What did I learn in this step?

The scatter plots for JE and novels close to JE look much like we would expect them to: most of the points are along a 45 degree slope upward and to the right from the origin; i.e., the plots suggest that a large number of words occur at about the same frequency in the two novels.

The Marlitt novels look like other novels which are very close to JE, although, since Marlitt novels are a litte more distant from JE, their plots show a "fuzzier" 45 degree slope upward and to the right.

It's harder to see what's going in the plots for novels which are distance from JE. Root cause seems to be that these novels have very high-frequency words which do not appear in JE, but which alter the scale of the graph.

Bottom line? These plots suggest that it would be possible to identify word-by-word the causes of the (dis)similarity between two novels.

In [13]:
%matplotlib inline
from scipy.spatial.distance import *
import math
import matplotlib.pyplot as plt
import seaborn as sns

def what_makes_two_novels_close(novel_a, novel_b):
    
    n_a = -1
    n_b = -1
    
    for n in range(0, len(labels)):
        if novel_a == labels[n]:
            n_a = n
        if novel_b == labels[n]:
            n_b = n
    
    distance = cityblock(matrix[n_a], matrix[n_b])
    
    print
    print
    print novel_a
    print novel_b
    print 'distance', distance
    
    sns.set(color_codes=True)
    plt.rcParams['figure.figsize']=(15,15)
    
    x = []
    y = []
    close_frequent_words = []
    
    for w in range(0, len(matrix[n_a])):
        if matrix[n_a][w] == 0 and matrix[n_b][w] == 0:
            pass
        else:
            x.append(matrix[n_a][w])
            y.append(matrix[n_b][w])
            
    high_dimension = -1
    for a in x:
        if a > high_dimension:
            high_dimension = a
    for a in y:
        if a > high_dimension:
            high_dimension = a
            
    plt.scatter(x, y)
    plt.xlim(0, high_dimension + 0.001)
    plt.ylim(0, high_dimension + 0.001)
    plt.title(novel_a + ' -- ' + novel_b + '\nWord Frequencies (distance ' + str(distance) + ')')
    plt.xlabel(novel_a)
    plt.ylabel(novel_b)
    plt.show()
    
# ------------------------------------------------------------------------------------
    
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Bront_Charlotte_Shirley_PG_30486.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Dickens_Charles_Dombey_and_Son_PG_821.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Harland_Marion_Alone_PG_46505.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt')

what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt')

what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt')
what_makes_two_novels_close('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', \
                            'Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt')
    

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Bront_Charlotte_Shirley_PG_30486.txt
distance 0.574124057822

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Dickens_Charles_Dombey_and_Son_PG_821.txt
distance 0.695358789743

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Harland_Marion_Alone_PG_46505.txt
distance 0.704994238847

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
distance 0.715822754499

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt
distance 0.786523244186

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
distance 0.83218783419

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Alcott_Louisa_May_Pratt_Anna_Bronson_Alcott_Comic_Tragedies_Written_by_PG_33986.txt
distance 1.36304220185

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Castlemon_Harry_Frank_in_the_Woods_PG_42307_8.txt
distance 1.3651526006

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Castlemon_Harry_Frank_on_the_Prairie_PG_42101_0.txt
distance 1.36537511079

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Billings_Josh_Josh_Billings_on_Ice_and_Other_Things_PG_41025.txt
distance 1.39258555283

Network graph of very_close_distances

I create a network from the very close distances; if two novels are very close, then I connect them with an edge. I color Marlitt's nodes in blue, Charlotte Bronte's in red, and Dickens' in green.

The network graphs is available at http://talus.artsci.wustl.edu/je_effect_web/distance_network.html.

Use your mousewheel to zoom and unzoom the graph. Hover over a node to see what novel it represents. Click and drag over white space to move the graph in the window.

I also list the 25 novels with the highest network centrality, as well as the 25 novels with the most triangles in the network.

What did I learn in this step?

  • The corpus contains a set of novels--a rather large set--with lots of similarities between those novels. In other words, a substantial portion (1/2? 2/3?) of the fiction circulating in Muncie was, at the level of diction, all alike; and, if we take nouns as a proxy for semantic content, then we might say that they were all (sort of) "about" the same thing.
  • JE is embedded in this set of very similar fiction (as is Dickens). JE looks central in the graph, and it may well be central, but it's not entirely possible to be sure from the graph alone.
  • Marlitt's novels are a part of the set of very similar fiction, although her novels are on the edge of the group, and not central to it (the graph seems reliable on this point).

I'm speculating to myself a "Jane Eyre Effect" like: 1) Some sort of a general cultural effect (periodicals?) in England, which causes Bronte and Dickens to produce fiction which, at the level of the noun, is similar; 2) JE (and Dombey?) goes to Germany, where Marlitt, etc write fiction influenced by 1; 3) those German works come back to Anglophone countries in translation where, in combination with "native" English writers influenced by 1, they spark off another round of the "Jane Eyre Effect".

Note to self: There's a novel labeled The_Second_Wife_Wister_corrected.txt, which should be labeled with its original author (Marlitt?).

In [55]:
import json, codecs
import networkx as nx
from networkx.readwrite import json_graph
from networkx.algorithms import *
from collections import defaultdict, Counter

sns.set(color_codes=True)
plt.rcParams['figure.figsize']=(15,15)

nodes_added = []

G=nx.Graph()

for d in very_close_distances:
    
    if d[2] not in nodes_added:
        nodes_added.append(d[2])
        G.add_node(d[2])
    
    if d[4] not in nodes_added:
        nodes_added.append(d[4])
        G.add_node(d[4])
        
    G.add_edge(d[2], d[4])
    
f = codecs.open('je_effect_web/node_link_data.js', 'w', encoding='utf-8')
f.write(json.dumps(json_graph.node_link_data(G)))
f.close()

dc = degree_centrality(G)

print
for w in Counter(dc).most_common(25):
    if 'Jane_Eyre' in w[0]:
        print '****', w[0], w[1]
    else:
        print w[0], w[1]
    
tris = triangles(G)

print
for w in Counter(tris).most_common(25):
    if 'Jane_Eyre' in w[0]:
        print '****', w[0], w[1]
    else:
        print w[0], w[1]
Ward_Humphry_Mrs_Marcella_PG_13728_8.txt 0.297297297297
Dickens_Charles_Dombey_and_Son_PG_821.txt 0.297297297297
Dickens_Charles_Little_Dorrit_PG_963.txt 0.285714285714
Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt 0.276705276705
Dickens_Charles_David_Copperfield_PG_766.txt 0.27027027027
Grand_Sarah_The_Heavenly_Twins_PG_8676_8.txt 0.266409266409
Dickens_Charles_Bleak_House_PG_1023.txt 0.263835263835
Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt 0.2574002574
Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt 0.256113256113
Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt 0.250965250965
Gaskell_Elizabeth_Cleghorn_North_and_South_PG_4276.txt 0.238095238095
Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt 0.234234234234
Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt 0.221364221364
Dickens_Charles_Martin_Chuzzlewit_PG_968.txt 0.21879021879
Dickens_Charles_Our_Mutual_Friend_PG_883.txt 0.20592020592
Yonge_Charlotte_M_Charlotte_Mary_The_Heir_of_Redclyffe_PG_2505_8.txt 0.198198198198
Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt 0.194337194337
Auerbach_Berthold_Villa_Eden_The_Country_House_on_the_Rhine_PG_32902_8.txt 0.194337194337
**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt 0.19305019305
Craik_Dinah_Maria_Mulock_John_Halifax_Gentleman_PG_2351.txt 0.191763191763
Crawford_F_Marion_Francis_Marion_Saracinesca_PG_13757_8.txt 0.190476190476
Alexander_Mrs_A_Crooked_Path_A_Novel_PG_18418.txt 0.185328185328
Bront_Charlotte_Shirley_PG_30486.txt 0.184041184041
Crawford_F_Marion_Francis_Marion_Sant_Ilario_PG_5227.txt 0.184041184041
Wood_Henry_Mrs_East_Lynne_PG_3322.txt 0.178893178893

Dickens_Charles_Dombey_and_Son_PG_821.txt 6521
Ward_Humphry_Mrs_Marcella_PG_13728_8.txt 6311
Dickens_Charles_Little_Dorrit_PG_963.txt 6109
Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt 6020
Dickens_Charles_David_Copperfield_PG_766.txt 5984
Grand_Sarah_The_Heavenly_Twins_PG_8676_8.txt 5846
Dickens_Charles_Bleak_House_PG_1023.txt 5669
Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt 5550
Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt 5428
Gaskell_Elizabeth_Cleghorn_North_and_South_PG_4276.txt 5393
Robins_Elizabeth_The_Open_Question_A_Tale_of_Two_Temperaments_PG_37827.txt 5165
Spielhagen_Friedrich_Hammer_and_Anvil_A_Novel_PG_34868_8.txt 5155
Woolson_Constance_Fenimore_Anne_A_Novel_PG_32707_0.txt 4714
Dickens_Charles_Martin_Chuzzlewit_PG_968.txt 4580
Dickens_Charles_Our_Mutual_Friend_PG_883.txt 4495
Yonge_Charlotte_M_Charlotte_Mary_The_Heir_of_Redclyffe_PG_2505_8.txt 4432
**** Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt 4186
Wood_Henry_Mrs_East_Lynne_PG_3322.txt 4178
Wood_Henry_Mrs_Verner_s_Pride_PG_15627.txt 4156
Auerbach_Berthold_Villa_Eden_The_Country_House_on_the_Rhine_PG_32902_8.txt 4144
Crawford_F_Marion_Francis_Marion_Saracinesca_PG_13757_8.txt 3988
Hillern_Wilhelmine_von_Only_a_Girl_or_A_Physician_for_the_Soul_PG_36709_8.txt 3987
Alexander_Mrs_A_Crooked_Path_A_Novel_PG_18418.txt 3982
Craik_Dinah_Maria_Mulock_John_Halifax_Gentleman_PG_2351.txt 3980
Crawford_F_Marion_Francis_Marion_Greifenstein_PG_6446.txt 3939

What makes two novels "close"?

Here, I dig into pairs of novels, looking for words which occur at a relatively high frequency but don't especially contribute to the distance between the novels. I.e., I'm looking for words which occur at roughly the same frequency in both novels, and which occur relatively frequently in both.

The output is:

Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

    matrix distance 0.786523244186

    DAY DOOR EYE FACE ROOM TIME WORD

    absence action addition address advice affection afresh afternoon age agony
    arrangement article back basin beauty being bell black blood blow board body
    bond bone boot boudoir brain bride bridle brow . . . 

    important_common_words_distance 0.0218369461188 (2%)
    important_common_words_frequency 0.191273434414 (19%)

I list the two novels I examine, then the distance between the two novels. Next, I list, in upper case, the top 25 words from JE which occurs in both novels at more or less the same frequency. Then, I detail the 250 top words which occur at more or less the same frequency in both novels. Lastly, I list the distance contributed by those words to the total distance, and the frequency of those words in the first of the two novels.

At the bottom of the listing, I list the top 100 nouns listed in the preceeding details.

Note that I'm only printing off the first 25 JE-other novel comparisions, although I'm counting all the nouns which otherwise would have been details.

What did I learn in this step?

  • I find it interesting that, when two novels are very similar, 20% of their noun frequency contributes so little (2%) to the observed distance. I'm only looking at the top 250 words in this cell; however, if I increase that, the numbers don't change all that much.
  • The common vocabulary contains, as we might expect, plenty of words that just aren't all that interesting ("potato", "egg", "coffee", "tea"). On the other hand, it's surprising how many of those words have moral, social, or emotional valences; these nouns, shared at a common frequency across such a large portion of the corpus, suggests that these novels, like JE, are concerned with more than just stage-setting and scene description.
  • An example: JE and John Halifax share "agony", "anguish", "argument", "barrier", "circumstance", "confusion", "conscience", "consideration", "contempt", "control", "conversation", "curiosity", "decision", "desperation", "destruction", and so on. And they share such words at about the same frequency, and such words are among the most frequent words with suimilar frequencies. How could these two novels not be about roughly the same thing?
In [93]:
from scipy.spatial.distance import *
import math, textwrap
from collections import defaultdict, Counter

def what_makes_two_novels_close_V2(novel_a, novel_b, common_word_count, distance_number):
    
    n_a = -1
    n_b = -1
    
    for n in range(0, len(labels)):
        if novel_a == labels[n]:
            n_a = n
        if novel_b == labels[n]:
            n_b = n
    
    print_details = True
    if distance_number > 20:
        if 'Marlitt' in labels[n_a] or 'Marlitt' in labels[n_b]:
            pass
        else:
            print_details = False
    
    if print_details == True:
    
        print
        print
        print labels[n_a]
        print labels[n_b]

        print
        print '\t', 'matrix distance', cityblock(matrix[n_a], matrix[n_b])

    individual_distances = []
    
    for w in range(0, len(matrix[n_a])):
        if matrix[n_a][w] == 0 and matrix[n_b][w] == 0:
            pass
        else:
            
            word_distance = math.fabs(matrix[n_a][w] - matrix[n_b][w])
            word_frequency_a = matrix[n_a][w]
            word_frequency_b = matrix[n_b][w]
            word = dictionary[w]
            
            if word_frequency_b > 0 and word_frequency_a > 0:
                word_ratio = (word_frequency_a / word_frequency_b)
                if word_ratio >= 0.8 and word_ratio <= 1.2:
                    individual_distances.append([word_frequency_a, word_frequency_b, word_distance, word])
            
    individual_distances.sort(reverse=True)
        
    important_common_words = []
    important_common_je_words = []
    important_common_words_distance = 0.0
    important_common_words_frequency = 0.0
    
    for i in individual_distances[:250]:
        if i[3] in je_common_nouns[:25]:
            important_common_je_words.append(i[3])
            #common_word_count[i[3].upper()] += 1
            
        important_common_words.append(i[3])
        important_common_words_distance += i[2]
        important_common_words_frequency += i[0]
        
        common_word_count[i[3]] += 1
        
    important_common_je_words.sort()
    important_common_words.sort()
    
    if print_details == True:
    
        print
        print '\t', '\n\t'.join(textwrap.wrap(' '.join(important_common_je_words).upper(), 80))
        print
        print '\t', '\n\t'.join(textwrap.wrap(' '.join(important_common_words), 80))
        print
        print '\t', 'important_common_words_distance', important_common_words_distance, \
                    ('(' + str(int(important_common_words_distance / cityblock(matrix[n_a], matrix[n_b]) * 100)) + '%)')
        print '\t', 'important_common_words_frequency', important_common_words_frequency, \
                    ('(' + str(int(important_common_words_frequency * 100)) + '%)')
    
    word_by_word_distance = 0
    for a in individual_distances:
        word_by_word_distance += a[0]
    
    #print
    #print '\t', 'word_by_word_distance', word_by_word_distance
    
# --------------------------------------------------------------------------

print 'je_common_nouns[:25]', je_common_nouns[:25]

print
print '**********************************************************************************'
print 'SELECTED NOVEL PAIRS'
print '**********************************************************************************'

common_word_count = defaultdict(int)

for d in very_close_distances:
    
    if 'Jane_Eyre' in d[2] and 'Gold_Elsie' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)
    if 'Gold_Elsie' in d[2] and 'Jane_Eyre' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)


for d in very_close_distances:
    
    if 'Dombey' in d[2] and 'Gold_Elsie' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)
    if 'Gold_Elsie' in d[2] and 'Dombey' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)


for d in very_close_distances:
    
    if 'Jane_Eyre' in d[2] and 'Dombey' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)
    if 'Dombey' in d[2] and 'Jane_Eyre' in d[4]:
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, -1)

print
print '**********************************************************************************'
print 'JANE EYRE AND EVERYTHING CLOSE TO IT'
print '**********************************************************************************'

n_je_distances = 0
common_word_count = defaultdict(int)

for d in very_close_distances:
    
    if 'Jane_Eyre' in d[2] or 'Jane_Eyre' in d[4]:
    
        n_je_distances += 1
        
        what_makes_two_novels_close_V2(d[2], d[4], common_word_count, n_je_distances)
        
print
print
print 'n_je_distances', n_je_distances
print
for w in Counter(common_word_count).most_common(100):
    print w[0], w[1]

    
je_common_nouns[:25] [u'eye', u'day', u'sir', u'room', u'time', u'hand', u'night', u'face', u'door', u'word', u'heart', u'man', u'house', u'hour', u'nothing', u'life', u'lady', u'way', u'thing', u'head', u'something', u'child', u'voice', u'fire', u'one']

**********************************************************************************
SELECTED NOVEL PAIRS
**********************************************************************************


Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

	matrix distance 0.786523244186

	DAY DOOR EYE FACE HEART LIFE ONE ROOM TIME VOICE WORD

	absence action addition address advice affection afresh afternoon age agony
	arrangement article back basin beauty being bell black blood blow board body
	bond bone boot boudoir brain bride bridle brow call case cat certainty change
	chaos charge charm chestnut childhood china choice city coal coin cold companion
	companionship comprehension comrade consolation contrary conversation costume
	couch courage covering crescent cruelty danger day death delusion despair
	diamond direction distress door drive echo embrace end establishment excitement
	exclamation experience eye face fairy fall feature fence fold fool form friend
	fruit furniture gaiety gate gift glimpse gloom governess grace grandeur grief
	handful haste heart help her hold holiday horn horse host housekeeper hurry idea
	ideal impatience inclination influence instrument iron jaw judgment key kind
	kindness king labourer lace land lap laurel lawn lid life line loneliness love
	lover maniac mark marriage mass match memory merchant mirror moss name need
	nerve nest news object offer one opening opportunity organ papa pardon patience
	permission person perusal pity place poetry position praise presentiment price
	pride proceeding proof proposal reality relief reply reproach resolution resolve
	respect rest result revelation risk room rule scarf sentence servant service
	shade shadow shop shoulder silence sin sister situation skirt slave snow society
	sofa song sorrow sound spark sphere star state struggle stuff submission
	suffering sum superintendent syllable sympathy table tete thank throat time toe
	toilet token tongue torture town train traveller treatment trunk twilight
	uniform use utterance veneration violence voice vow waist wanderer watch welcome
	whisper whit widow wish witch woman word

	important_common_words_distance 0.0218369461188 (2%)
	important_common_words_frequency 0.191273434414 (19%)


Dickens_Charles_Dombey_and_Son_PG_821.txt
Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

	matrix distance 0.793824997972

	CHILD DAY DOOR EYE HAND HOUR LADY ONE ROOM VOICE WORD

	advice alarm angel approach arrangement ash astonishment atom beam beauty
	beginning bird blessing blow body bone bracket brother burden call capacity
	carriage case cause chief child childhood choice city clerk cloak cloth
	companion companionship condition confusion consequence consideration contempt
	contrast corner couch cry curtain dancing darling day deed dignity dislike
	displeasure door doorway dream dust effect effort emotion end entertainment
	enthusiasm esteem evening event everything evil example exclamation expectation
	expedition explanation eye family fate father favourite fear fellow figure
	fireside firm fist flesh flight floor flow forefinger fragment fro furniture
	fury genius gentleness gift glass gloom grain gravity ground hair hall hand
	hatred heap hearing height help hesitation hold hole host hour household hurry
	husband idea imagination importance inclination indifference influence interview
	iron kitchen lady lamp level look loss love material mean measure memory mention
	middle mirror mist mixture morsel mother mouth murmur nature neck neglect
	neighbour news nod nonsense noon nostril note o'clock observer one opening order
	organ page pair papa paper part passion peace pen penny people perfection
	perusal pipe plank pleasure potato practice praise presence prey proof proposal
	protection protector rag rapidity reality reliance relief remark reply
	reputation residence risk rock room sake salutation saying scene scream second
	sens sentiment shock shoulder shriek sight silver sister size slave solemnity
	spite stair start store stranger string struggle substitute suit sun sweetness
	sympathy table talk tear tenderness thrust tin town tray trouble understanding
	unhappiness vice visit voice wave week weight welcome while whom wilderness word
	work world yonder youth

	important_common_words_distance 0.020395569676 (2%)
	important_common_words_frequency 0.217383926147 (21%)


Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Dickens_Charles_Dombey_and_Son_PG_821.txt

	matrix distance 0.695358789743

	DAY DOOR EYE HEART LADY LIFE NIGHT NOTHING ONE ROOM VOICE WORD

	account advice affection anger animal appearance apple arrangement arrival
	attachment aunt awe bank beauty bed bedside bird blind bloom blow bone bonnet
	breakfast burden bye call care case chair chance character chest childhood chin
	choice church circumstance city clock clothe coffee colour command companion
	complexion conclusion conduct consciousness conviction cook couch crown curse
	day deal degree description desire determination difference difficulty distance
	dog door doubt earth egg end event exercise expectation eye eyebrow fatigue foot
	furniture game gaze glance gloom glove grasp gratitude gravity grief ground
	guest handkerchief health heart help hint hold hope horse hotel humour hurry
	husband idea impression inch individual information inquiry intention iron
	journey joy lady landscape leaf letter life light lip living love matter
	meantime meditation meeting merit message midnight midst might mind misery
	mistake moment morning morrow musing mystery nail need neighbourhood news night
	noise none nothing notice obedience one opening organ other palm passage patient
	pause pavement people person philosopher pillow plant plate play point praise
	present promise proof proposal quarter rat reach reality reception recollection
	relation reply resource response responsibility rest retirement return road room
	row ruin scheme secret self sentiment shame shoulder side sign sister size slave
	sleep slumber smoke soldier spirit spot staircase station storm straw success
	sun sunset surprise sympathy table talk tea temper tendency term terror thank
	theme thought token top town trifle truth turn uniform use utterance vacation
	vain velvet victim view voice want warmth warning wedding wife wind window word
	work wreck wrist wrong yard youth

	important_common_words_distance 0.0240952408834 (3%)
	important_common_words_frequency 0.230865060314 (23%)

**********************************************************************************
JANE EYRE AND EVERYTHING CLOSE TO IT
**********************************************************************************


Evans_Augusta_J_Augusta_Jane_Macaria_PG_27811_8.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.801365369454

	DOOR HOUSE MAN NIGHT ROOM TIME

	accident admiration age aid angel anguish anything apron arrangement attendance
	attention awe balm band bank beam bell benefit billiard black board bonnet book
	breadth breath breeze brother card case chair character chicken choice
	circumstance claim clock closet cloud cold comer committee communion complexion
	comrade conscience consciousness consent contempt conversation couch county
	course curl curse darkness daylight degree detail difference difficulty dining
	disaster distress doom door eagerness effort egg embrace emotion evening faith
	fellow firm flower fog foliage foot foreground forest frame fruit gate genius
	germ glass gold good grace habit hesitation history hope hospitality host house
	household hum ignorance impulse inch inclination jaw jealousy jewel journey
	judge kind kindness kingdom knowledge leaf lid light lily living lock man
	manifestation mansion mantelpiece marble marriage material messenger midst mine
	mission mistake moment morning motif neck need neighbour night nobody none noon
	nurse objection occasion ocean part path people period petition place plenty
	plume portion possession prayer preparation price proceeding proof pupil
	quivering race ray readiness rebellion reproach resolve responsibility reward
	room sacrifice sake scheme school scorn scrutiny sea seed service set sewing
	share side silence size skill slate smoke sob soul sound spasm spectre spell
	sphere spot spring square state stranger strength suffering sum sunset sympathy
	system table talk tea temper temptation think thither thought thrill throat tie
	time toil tone tooth trait triumph trouble trunk tutor undertaking use vacation
	variance veil vine voyage waist warmth wayside wealth while whole whom window
	winter wish witch woman world wretchedness wrist writing yard yawn

	important_common_words_distance 0.0198037270938 (2%)
	important_common_words_frequency 0.184830314644 (18%)


Crawford_F_Marion_Francis_Marion_Taquisara_PG_11050_8.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.821239162977

	DOOR HEART HOUR HOUSE

	account acquaintance act admiration advance advantage age agent agony air
	annoyance answer appearance aspiration back bank barrier base beam bedside blow
	book brass breast breeze building butler butter car cause chair chapter
	character cheek circumstance coachman coffin communication complexion conduct
	confidence consent consequence corridor counsel cross cry curiosity darkness
	dawn debt defect degree delicacy depth description direction disappointment dish
	distance distress doll door doubt drawing dream drive drop earnestness earth
	echo effect egg elder embarrassment enemy errand event everybody exchange
	exercise existence experience faith farm fashion fate fault favour fear feature
	figure finger footman force fragment fruit furniture garden glass gold gown hair
	half handkerchief heart heat history hour house humiliation humour ideal injury
	inspection instance interest interference irregularity journey judge key
	kindness lap lash leaf length lie light linen loaf lodging longing luxury manner
	mantle marble mark martyr material meaning mercy middle midnight mind mirror
	misery mistress morning morrow name nature nerve news none nose o'clock opposite
	oppression orange pale parish part pass peak peril person personage persuasion
	pinch plan point potato precaution prey prison purpose recollection regret
	religion repent repetition reproach resemblance respect result retreat ring risk
	roof rule sacrifice sadness sea seal self sensation sentence servant shame shawl
	sheet shock shoulder shutter sigh sight silk singer soldier sorrowful soul spasm
	spirit standard standing storm strain streak study stuff sum sun sunset supply
	tear thither thread thrill timber tooth tribe trust vehemence velvet view vision
	visitor wall waste weariness week while whisper window wine winter work wrath
	youth

	important_common_words_distance 0.0147191933342 (1%)
	important_common_words_frequency 0.158443081364 (15%)


Craik_Dinah_Maria_Mulock_John_Halifax_Gentleman_PG_2351.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.789309010047

	DOOR EYE HEAD HEART NIGHT SOMETHING VOICE WORD

	actor advance advantage adventure advice afresh agent agony air angel anguish
	answer argument barrier beau bed beggar bench blind breakfast breast breathing
	burst bye centre chapter cheek childhood church circumstance clothe coal coin
	college conclusion confusion conscience consideration contempt control
	conversation copy counsel cousin cover cradle curiosity curl custom daisy
	daylight decay decision deed delicacy depth desperation destruction devil
	diamond dish division doctrine door dread dust echo egg entertainment eternity
	evening eye eyelash failure falsehood fear fence figure fir fit flock foot
	fortune frankness freshness friendless fro front future gaze ghost grace grasp
	gravel group gulf habit head heart heat heiress hint horizon horror hospitality
	impression inch infant innocence instinct interest journey knee labourer lamp
	land lawyer leaf lesson loaf maker mark master mate matter meal member mercy
	middle midst might mind mine mischief misery money month motion mourning muslin
	necessity neck neighbourhood night noise nook novel nut obedience objection
	opinion opportunity oppression other path patient piece pleasure pool prattle
	precaution prejudice quality question quivering reading regret relic relief
	remark resemblance rest ride road romance root sadness satisfaction scent scheme
	season self set shadow shame shelf shock shore shower silk silver sleep snuff
	sob society sofa softness something sorrow soul sound sphere spot spray square
	star start stay stead stop strait strength struggle sum sunbeam surprise
	syllable sylph tea tear terror test thinking throat thrust thunder tongue topic
	touch tower trim triumph variety verse visit voice waistcoat want water wave web
	wedding well widow will winding window woodland word wreck

	important_common_words_distance 0.0133928908863 (1%)
	important_common_words_frequency 0.15199108319 (15%)


Ward_Humphry_Mrs_Robert_Elsmere_PG_8737.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.741104217613

	CHILD DAY EYE FACE HEAD HEART HOUSE NOTHING ROOM SOMETHING TIME VOICE WORD

	accent account acquaintance admiration agony air ambition anger animal annoyance
	apron arm article association back bar bench bent blaze brain breast brow bush
	business butler butter capital card case cat cheek child childhood church city
	claim class clergyman cloud comfort command commonplace confidence conflict
	consolation creed curate cut darling daughter dawn day deal death debt decay
	delusion demand desire despair determination diamond diffidence dinner dozen
	drawing eagle earth egg endurance entertainment evening evidence expectation eye
	face fall family fever finger flock flood fool form frame front fruit girl good
	gravity green ground guardian hatred head health heart hill holiday house human
	idea impatience incident indignation institution instrument intelligence keeping
	knee knot lace lamp lantern leader leant level light limit lip lunch luxury
	madness mantelpiece match material meaning meditation mercy midnight mistake
	morrow motion move movement necessity nobody nose nothing notice oak objection
	offer opposite other pang papa paper passage path person phrase piano piece pity
	place pledge plenty point politeness position poverty power praise prejudice
	preparation presence proceeding pulse purity rain reason record reference regard
	remembrance remorse residence resource response rest result return right road
	room round rush scale scent scholar secret sens service shadow sheep shop site
	situation slave smoke something sound spectacle spot spring stair strength
	string stuff succession summon sun tapestry tea terror thank thinking thirst
	thither thought time torture trace tract tree trembling trouble truth turn
	umbrella victim violence voice walk west whisper widow will window wing wit
	woman wonder word workhouse wreath wreck wrist yesterday

	important_common_words_distance 0.0223647368937 (3%)
	important_common_words_frequency 0.209988564144 (20%)


Roe_Edward_Payson_From_Jest_to_Earnest_PG_6102.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.854727712583

	EYE HAND LADY NIGHT SOMETHING

	absence act advantage age aid aim aisle altar ambition angle anguish
	announcement appetite assistance attempt attitude attribute aught awe back
	background bag ball bass beau bell benefit bloom breakfast breath brook burst
	bustle care charge cheek childhood choice church circumstance coal coldness
	combination comfort comforter coming complaint conflict congratulation
	consciousness content conviction convulsion cord costume cradle crimson crisis
	crumb cure decision delicacy demon devil diamond dining direction disappointment
	dislike dissipation dust ease echo edge effort eloquence emotion endurance enemy
	enjoyment erect event example exertion explanation eye falsehood family fashion
	fate feeling fiction flame floor folk foot fragment gesture glare glory glow
	good grave half hand hardship harmony helper her hero history holiday honey
	hotel idea idol ignorance imagination importance instance intellect interruption
	judge justice keeping kiss lace lady lake lamp land lap lash law leaf line
	longing look manager marble material matron measure money moonlight morning
	morrow mourning mystery neck night nook note novel nut opera parent peace
	peculiarity period picture pity place plain plank portal possession price pride
	process prudence pulse question radiance rainbow rate relation relic relish
	remainder resolution resource rest restraint rush sadness safety salary salt
	scruple scrutiny secret sens sense service shame ship shower shrine simplicity
	situation skill smile solace soldier something son song specimen spirit spring
	stand standing state stride stroke subject success suffering sum sun
	superstition taste temple tete tie toil top torture tower tune use vacation vain
	warmth watch water while wild wind wine wing wish witch yard year yesterday

	important_common_words_distance 0.0142443691566 (1%)
	important_common_words_frequency 0.139157655094 (13%)


Collins_Wilkie_Armadale_PG_1895.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.827526915799

	DAY DOOR FACE HEAD HOUR LADY LIFE NIGHT ROOM SOMETHING THING VOICE

	absence accord account acquaintance advance advantage angle animal apron
	arrangement ash attachment attraction bachelor beat bedclothe bedroom birth body
	book breast breath breathing building call cap cat chain choice claim clothe
	coffee companion company compartment compassion complexion connection
	conservatory control convenience costume courage cover crime day death debt
	decision departure discussion dish dismay district dog door dream dress dressing
	driver duty eagerness errand evil examination exertion explanation expression
	face fact fall fatigue finger footman friendship gentleman ghost glass
	grandfather gratitude gravity guard handkerchief hardship head health hint hour
	household idea identity ignorance image impatience inch indifference indulgence
	innocence insanity inscription inside interference isolation jealousy key kind
	ladder lady land lap leaf legacy life light lip literature look madness match
	mattress meat men mess method minute mistress misunderstanding morality morsel
	mourning muscle neglect night nonsense nose novelty number opera oppression
	paint parish party pastry penny period pillow pipe plank pollard porter portrait
	prejudice present problem property proprietor protection provision purchase rake
	recommendation reflection relation reply resemblance resistance restlessness
	restraint revolt right road room sanction sarcasm scenery scream secrecy self
	sentence set share sheet shell shilling sigh sincerity singer size slip smell
	soldier solution something sound south specimen speed square stare state step
	stillness stroke struggle success support sympathy table tact teapot ten tender
	tenderness tenor thing thirst throat thrust title tone tongue tool tooth top
	torment trifle trouble tune umbrella underhand vain variance vehicle vexation
	voice volume waistcoat walk want water weather week wheel whole wild will wit
	world year youth

	important_common_words_distance 0.0159338459865 (1%)
	important_common_words_frequency 0.165020917348 (16%)


Whyte_Melville_G_J_George_John_The_Interpreter_A_Tale_of_the_War_PG_40660_0.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.78611424667

	CHILD DAY EYE FACE HAND TIME VOICE WAY

	abstraction accent account addition affection age agony aim anger angle
	annoyance anything appetite aspiration attraction autumn awe back band bar bead
	bird bloom blow board brain brow business bye call calm career centre chapter
	character charm cheek chestnut child circle clothing company complexion
	conclusion confidence consent constitution conversation couch course crown cup
	daughter day debt degree depth devil disappointment disposition dissipation
	distance dream ear ease education end enjoyment estate everybody exchange
	experience eye face faith fancy fate feature field finger fir flood flower fool
	force forehead forest fragrance frame fruit gaiety gentleman gesture glance
	glass gleam glimpse gratification grave ground habit half hand health hold home
	hope household idol indifference indulgence influence information inquiry
	instinct intercourse interest interview journey justice kindness king lamp
	leisure letter lie light lightning lip listener lock look madness match meadow
	mean measure mercy mess midnight morrow mother mountain music name news nobody
	noon nose nostril note notice novel number obedience occasion order outline pain
	paper paradise part path peace pearl permission perusal picture plan play pocket
	possibility praise proprietor purpose reception reflection refreshment regard
	regret reply reserve resource rest return rise rose sacrifice scene sea season
	sens service set shame shawl shop shoulder shower side sight sin skill soil
	sorrow specie spirit spring star step store storm strength string style subject
	substance sum summer sunrise suspicion tear tendency tenderness throat time
	toilet toilette tone tongue torture town track treatment tree truth variety view
	vigour visitor voice volume want wave way whisper whole wreck year yonder

	important_common_words_distance 0.0179059225585 (2%)
	important_common_words_frequency 0.197169301446 (19%)


Fleming_May_Agnes_A_Changed_Heart_A_Novel_PG_41672.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.790790133981

	HAND HEART HOUR THING WAY

	accident admiration afresh air allusion ambition amusement anguish anxiety
	anybody apparition aspirant atom attention baby back bag bead bedstead blush box
	bracelet branch breach bride bridegroom brother building bush bustle carpet case
	change chapter chicken circlet cloud coffin cold commotion companion compassion
	concealment condition conduct conservatory constitution convent conviction cough
	country crash creature crisis curl dark darkness daughter deed degradation
	departure description desk devotion direction discovery disease dismay dread
	dressmaker drop ear ebon ejaculation eloquence end endurance establishment
	excursion fairy fascination fate father fiction fold folly fool footing form
	front furniture garland glass gloaming glow good goodness governess grace grain
	ground hag half hand happiness hatred hay heart height help host hour
	housekeeper husband i. idiot inspection iron knife knot land landscape lap leaf
	lecture library light lighting lip loneliness longing magistrate mahogany maid
	meeting menace midnight midst moment moon mouth move musing muslin nation neck
	necklace nightmare nobody noise nuisance o'clock occasion occupant offer opera
	other palm paroxysm particular patient peace person pilot pity plate portal
	prejudice pressure privilege proceeding promise proof quarter question quiver
	radiance readiness reason remorse rent reply request residence rest return
	reward ring ringlet rite rush satisfaction scene service set sewing shaft shame
	shoulder sign sinner sister sky slip slipper smoke snake sofa sorrow sort soul
	spot stand statue stay steed stillness stool stop struggle subject suffering sum
	sun swell table teaching temper tender terror thing think threshold toe tone
	truth uncertainty wage waist water way weakness weight whisper whole wickedness
	wind wood wreck your

	important_common_words_distance 0.0152451633699 (1%)
	important_common_words_frequency 0.141930490547 (14%)


Crawford_F_Marion_Francis_Marion_Sant_Ilario_PG_5227.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.81652716877

	CHILD DAY EYE HEAD HOUR HOUSE ROOM SOMETHING VOICE WAY

	accomplishment account acknowledgment admiration afresh afternoon age air altar
	angel attic authority back ball bar being bent blood breath burden bye cap
	caprice chair cheek child chimney choice church circle circumstance claim clerk
	coachman coin coldness command comment communication compassion concern
	conscience consciousness contempt contrary control couch cover crisis custom cut
	darkness dawn day death despair destruction difference disappointment disguise
	doubt drawing drive drop dust edge education effort element entrance
	establishment estate experience extremity eye fall falsehood fatigue fault
	favourite fear fidelity figure fit flight forward furniture gaiety game gate
	gentleman gold gravity habit head help her hold hotel hour house human
	humiliation ignorance impression improvement inclination individual inquiry
	instrument intellect interest journey joy kind kingdom lash lecture length lid
	light line linen lip living look maid mark mask material meditation memory mercy
	metal mind minute model morning morrow movement nail name nature necessity neck
	need o'clock obedience occasion occupation occurrence office opinion outlet
	outline pain pardon pause penny performance persuasion pillar play point
	politeness portfolio poverty power presence prey priest prospect quality
	quantity quiver ray recognition regard rent revelation room row ruin rumour
	savage scene scruple servant service shadow shutter side sight sign silence
	silver simplicity skin slipper sombre somebody something soul sound speaker spot
	square stain stair standing step stock stone store stream strength substitute
	suffering suggestion suitor sunset superiority sympathy talent talk tear
	tenderness term thought torture touch tower trifle trouble vehemence vehicle
	virtue vision voice wage wax way wedding week wickedness will work world wrist
	year

	important_common_words_distance 0.0175740221264 (2%)
	important_common_words_frequency 0.198458715596 (19%)


Gaskell_Elizabeth_Cleghorn_Ruth_PG_4275.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.715822754499

	DAY DOOR EYE HAND HEAD MAN NIGHT ONE ROOM

	account acknowledgment addition advance advantage age air animal answer
	approbation arm arrangement attempt attendance aught ball bar bed beginning bell
	bond bough box chapter check cheek chimney circle clothe cloud comment
	communication companion complexion condition contrast costume cottage
	countenance crimson crisis cruelty cry curiosity day deal death degree delight
	demon description despair detail devil direction disappointment door dozen dream
	dress earth ember emotion enjoyment establishment evening everybody evidence
	exchange explanation eye fashion fault fear feather feeling garment gentleman
	glimpse good goodness grace grave ground guide habit half hand happiness head
	height hill holiday honour hope horizon hue importance inferiority information
	instance instinct interference introduction isolation jealousy judge keeping
	king labourer lamp land leg lesson light lion listening living loss lunch luxury
	man mate material matron mean meat memory menace mercy merit mess messenger
	middle might mischief mistake month morning morrow music nest news night noise
	notice notion number o'clock objection observer occasion one oppression order
	ostler outline owner parent parting party peculiarity permission perseverance
	pie pillow pity plot plumage position possibility post poverty preference
	prejudice presence principle progress promise proof proportion quality question
	rain reference remainder residence resolution rest restraint romance room rumour
	saying scent scruple sense servant service sewing shade shutter silence skill
	smile sob soul south spot spring stage stand station sternness stirring stock
	stone stream strength string style submission success suffering sun teaching
	third throat tiding tooth transaction trouble view vigour village visitor
	warning waste week welcome while wickedness window winter wish woe woman yard
	yearning youth

	important_common_words_distance 0.0203335878346 (2%)
	important_common_words_frequency 0.197037440224 (19%)


Cholmondeley_Mary_Red_Pottage_PG_14885.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.816960710864

	HEAD HEART NOTHING ROOM WORD

	address advance advantage anguish anybody approach arm attachment attempt attic
	aunt authority bandage banister beast bedroom bench blind brain breadth bush
	business bye capital case cast charge cheek civility class climate coachman
	column confidence conquest conscience contrary contrast cook core corner cottage
	courtship cradle crape crisis crumb cry cup curate cure deal deed detail
	diffidence dinner direction disease dish district dog earnestness earth echo end
	endurance everybody excitement exclamation expression faith family fancy finger
	folk folly forehead forty ghost glimpse good gravel grove handkerchief happiness
	head heart holiday horse impatience impression incident indifference interval
	interview iron joy judgment justice kingdom knee lamb land laurel lawn length
	liar light living lodging look mahogany maker mat memory message midst
	misfortune morning muff name north nothing notice o'clock oak occupant opening
	opportunity orange outline pain palm pane pardon pas passage passenger passion
	pastor pause pendent personage pinch place plenty poetry poison pony possession
	post practice praise prayer presence present pressure pretext probability
	process proof prudence purpose quivering rail recognition relation remorse
	repetition resemblance resentment reserve respect restlessness result retreat
	right ring roof room sarcasm satin satisfaction scream scripture sense service
	shade shaft share sheep shilling shoe shop shoulder shrubbery side sight sign
	sincerity sketch skirt south speck sponge star station stocking stop storm
	strain straw struggle supper supply swell temper temptation tender tete thinking
	thread throat tiding tooth top tree tribute vice victim virtue visit wage waist
	weakness wedding week weight whither whole wilderness window wing wonder wood
	word wrath yearning youth zeal

	important_common_words_distance 0.0131745262136 (1%)
	important_common_words_frequency 0.135489843707 (13%)


Marlitt_E_Eugenie_At_the_Councillor_s_or_A_Nameless_History_PG_43393_0.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.83218783419

	DAY DOOR EYE FACE LADY ONE TIME VOICE

	accident admiration advance affection afternoon arch attitude aught autumn awe
	beauty bed bird birth blast blood bloom boot boudoir box breeze ceiling centre
	chance change chicken coat consolation constitution contrast cook core cottage
	course crisis cry curtain danger darling daughter day death decision delight
	delirium determination devotion direction disgust door draught drawing dread
	elegance element end enemy enjoyment evening evidence examination example eye
	face fairy faith family farm fashion feather fellow flame flight fold footman
	footstool forehead fortune freak freshness garment generation gleam glory glove
	gown ground happiness haste heat height humanity humiliation i. ignorance image
	imagination improvement individual inmate interruption judgment keeper key kind
	kitchen lady land landscape lantern laurel lawn lecture light madness marble
	mark mass mean meaning means member memory merchant mirror misery mist mistake
	month mouth murmur neck nest objection occurrence ocean one opportunity outline
	pair parcel passion peace philanthropist picture place plank plate pledge
	politeness pool portion presence pretension pride problem proceeding proof
	property protection pursuit quivering rank ray reason refreshment regret relic
	repetition request resemblance respect rest result retirement revelation revolt
	reward robber romance rook root row rule sacrifice satin scene scream secret
	security sensation series servant shake share sheet shone shower shrub side
	sight silence situation slumber smell snow sound source speech star state
	stocking stone style submission success succession successor surgeon taste tea
	tear think thorn thought throb tie time token topic touch tree truth turf
	twilight use vapour vehemence vein visitor vocation voice watch while wife
	wintry work wound wreath youth

	important_common_words_distance 0.0179393920673 (2%)
	important_common_words_frequency 0.166486813139 (16%)


Evans_Augusta_J_Augusta_Jane_Vashti_Or_Until_Death_Us_Do_Part_PG_31620.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.769490633732

	DAY EYE HOUR HOUSE MAN THING VOICE

	acquaintance address advantage afternoon apartment appearance article
	association attention authority autumn aversion balm band bank barrier base
	beast bee blow board bolt bone brain breadth building calm carpet case ceremony
	change character chest china cigar claim communion companion complexion concern
	confusion conjecture consciousness contrary cook course creed crime crimson
	cross cup curiosity dawn day degradation description desk destiny determination
	dissipation dog drapery dress drift driver eagerness ear effect egg emotion
	everything exercise eye fall family father fatigue fault fee fiend flood flower
	fog foliage forehead form fragrance front gesture ghost gift girl glass gleam
	glove gold governess group guidance guide half handkerchief haste health hearing
	hill history hold horizon hour house ignorance importance improvement
	inclination intention interest iron judgment key kindness knee law library lid
	light likeness lock lunatic man mark marriage means meeting merit midst mirror
	mist model moss mystery nail name neck need nerve nonsense noon nose notice
	object observation opera opinion page pain painting pang parcel path period
	perusal pipe place poison poverty preference presence proof proposal proprietor
	punishment quarter ray reason reception recess reflection reply repose
	responsibility return revelation ring romance roof rule season sermon service
	shawl ship silence skin sky sob somebody song spasm spectre spell spring
	stillness storm stranger stream string style success suffering summer sun
	surgeon sweep syllable sympathy table talent tempest tendency theme thing tone
	tongue topic town trace treasure tree tress trouble vas veil vein vessel vision
	voice watch wealth week west wheel while wife will window wing wish work wound

	important_common_words_distance 0.0157010439245 (2%)
	important_common_words_frequency 0.169959572248 (16%)


Black_William_In_Silk_Attire_A_Novel_PG_40111_0.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.840462286006

	DAY EYE HAND HEAD LADY ONE SOMETHING

	acknowledgment acquaintance action affection arm attention attire bead bedside
	beggar behalf birth blast bond bosom bow brain bye case centre ceremony cheek
	circumstance clerk clothe coachman comfort commonplace communication conclusion
	confusion consent constitution couch course covering creed curiosity custom
	danger day despair devil difference dining direction disclosure dish dismay
	disposition district drawing dream drop education element emotion end
	entertainment estate evening event expectation experience explanation eye faith
	family finger fit flight fold fortune gaiety gentleman gesture glass glory glove
	governess gravity green ground guardian guide habit hair half hand happiness
	harvest haste head heaven hero hesitation history home honour horse host idol
	impression improvement impulse infant influence instant intelligence intention
	interest journey kind lady law leant length light loneliness loss mark mean
	meaning merchant message mood moonlight news nightmare north nose notice nut oak
	obedience one organ origin pain parcel pardon patient peculiarity person pet
	philosopher phrase picture place plainness pleasure pledge plot pointer
	politeness pound pressure prey pride problem promise prospect prudence purity
	purpose rate reaction reception reference regard relation resentment residence
	resource respect rest result return revelation round rush satin saying scream
	secrecy service severity shaft shame sickness side sight silk silver singer
	sinner situation smell snow something sort spot square star stay stick string
	struggle summer summit sun sunshine surgeon table temper term thrill throat tone
	top torture town toy travel trick tune turn uncertainty utmost utterance vale
	velvet view voyage waistcoat wave weather wedding week wintry wisdom witch
	witness wood world wrath wrong yard year youth

	important_common_words_distance 0.0190085223791 (2%)
	important_common_words_frequency 0.167728024871 (16%)


Stephens_Ann_S_Ann_Sophia_The_Old_Homestead_PG_8078.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.820044406412

	DOOR HOUSE LADY LIFE NIGHT NOTHING ONE ROOM SOMETHING TIME WORD

	abyss advice age amazement anger anguish annoyance appearance argument attic
	back benefit black box brain brand briar bush canvas capital caress case cast
	cattle change cheese chimney closet clothe coffee cold coldness commotion
	companion comprehension condition conscience consolation contentment control
	conviction counsel country course crime cultivation cup cut darkness deal
	decision delight depth desk destiny dew district door driver effort
	embarrassment energy entrance excitement expectation family farm farmer fatigue
	feature feeling firm fit flint flood fondness form fortitude frame frown fur
	garden gaze glass glimpse glory good goodness gown hair handkerchief heaven herd
	hesitation hill homage honey house hundred hurricane idea imagination impulse
	inch influence innocence institution intelligence interruption interview judge
	keeper kiss kitchen knitting lady lamp letter life light limb loaf loneliness
	lot love luxury ma'am mahogany manner mansion mantle mast mean meaning metal
	midnight midst misfortune moment murmur neck need night nonsense note nothing
	nourishment number oak observation one opening orchard pair pang papa partner
	passion pavement pendent persuasion picture pity place plank pocket pointer
	position power prayer presentiment prey pride proof protection pulse quilt rain
	recommendation regeneration regret repentance resolution rest reward ring roof
	room rush scene scruple seat seed serenity shoulder shrubbery side sight silver
	sister size skin smoke snow something song sort spirit standing station step
	sternness stone stray stride string subject succession sun supply surprise sweep
	sympathy table talent talk tender terror thank thought throat throb tiding time
	tint toil tooth top trifle umbrella uniform victim village weather wood word
	year

	important_common_words_distance 0.0177174046849 (2%)
	important_common_words_frequency 0.184028294862 (18%)


Carey_Rosa_Nouchette_Uncle_Max_PG_16080.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.785582501578

	DOOR EYE HEART HOUSE NOTHING SOMETHING

	account action advantage age agitation air angel annoyance antiquity appetite
	attachment attention attitude aunt back ball bar battle bead beauty bed bedroom
	bedstead bird black blaze blight blossom bolt bonnet book bottom cake canvas
	carriage cattle chair charge cheek chord civility class cold coldness colourless
	commencement commonplace communication complaint condition confidence conscience
	consent contentment contrary copy cord country crash cup curiosity cut dark
	death debt deed delay description destruction difference discussion doll door
	doubt dream dressmaker dust ear effect egg embarrassment embroidery
	entertainment eternity evil experiment eye faith fatigue fear feeling fever
	finger firmness fit fog folly food footing footstool frenzy fund future
	gentleness gesture girl gleam glimpse gold goodness grate gravity guest hair
	half hay hearing heart heiress her history hope horror hotel house housekeeper
	human hundred husband ignorance inspection inspiration instant instinct interest
	introduction ivory joy justice knot lace latch laughter leisure limit line lip
	lot love ma'am magistrate meal means meat meeting mercy mind minute misery
	mission moonlight mystery name naught nothing notion o'clock observer office
	opposite painting path peace picture pie pillow place practice presence priest
	problem purpose quarter rate reception regret religion response rest return
	rider right roof row schoolroom self sens sentence set sewing shadow sheet
	sickness sign silver skin slip sob something son sorrow square standing state
	statue street strength success suffering suitor table terror thorn topic tray
	tribute twilight view wage walk warmth wave week wing wisdom world writing year
	yearning yoke youth

	important_common_words_distance 0.0159420562591 (2%)
	important_common_words_frequency 0.166551246537 (16%)


Ingelow_Jean_Fated_to_Be_Free_A_Novel_PG_12303.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.804125510711

	DAY FACE HAND HEAD HEART LADY WAY WORD

	act affection afternoon anguish annoyance appearance apron arrangement
	aspiration attendance attention aught band bargain bench bird bloom bow breast
	breathing building bush care career ceremony chapter churchyard cloth coachman
	colour coming command commonplace companion concern conclusion confusion
	consciousness constitution conversation cook cordiality corner cottage counsel
	county crape crash crow curiosity cut danger darling day delight dependence
	desert dining disease distress dog doubt drive dusk duty earnestness emblem
	enemy excitement expectation expression face failure fall fault feeling fiction
	field fit flight forest friendship gallery gate girl glove glow grace grammar
	gratitude grey ground half hamlet hand head heart heat heaven help her
	hesitation horse hotel hurry i. impatience improvement independence indifference
	indignation information inhabitant inspection intelligence journey king knife
	lady lantern laugh laughter lawn lecture leisure light line listener living
	majesty mark meadow means member middle milk miniature misery mist month moss
	mouth nail noise none north nourishment o'clock oil opening order parent
	peculiarity person pet pie plank plate pool portion praise present probability
	proportion proposal purpose quantity queer question rapture rat recollection
	reference regret relation remainder rent repetition request reserve rest root
	round ruth satisfaction scarf scheme secrecy secret sense shaft share shoe shore
	side silence sincerity sinner situation sleep slice smile sofa soil solicitude
	song sort south spirit spread stage stand state storm stuff substance summer
	sunshine throat toast token toy triumph trouble understanding vain valley
	variety velvet vigour virtue visitor want warning way weather weight welcome
	while wilderness window wine wing winter wood woodland word yard yesterday yew

	important_common_words_distance 0.0165976516309 (2%)
	important_common_words_frequency 0.154188318437 (15%)


Collins_Wilkie_The_Woman_in_White_PG_583.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.804369854268

	DOOR HEAD LADY LIFE ROOM

	afresh afternoon age alternation animal animation annoyance answer arm attendant
	attraction bachelor back background benefit bit bitterness block boarding book
	boot box breakfast breath breathing briefly building cabinet capacity caprice
	care century certainty chain chair character characteristic charge cheek cheer
	circle civility claim clergyman clothe coffee complexion connection conscience
	contemplation conversation costume crisis cross darkness darling decision
	deformity description desolation destruction deuce dinner disease disposition
	distance door dozen dress dressmaker driver eagerness elbow emphasis employment
	establishment eve evening extremity fancy favour fear foot forty frenzy
	friendless front function game garden gentleman gown grass gratitude gravel grey
	guest guide happiness head heap hold hope horror hue idea idiot image impatience
	impulse infection inside insolence inspection instinct intelligence interruption
	irritation island keepsake lady lane language lapse leaf leave lie life light
	lip list lock look malady march match material meantime method middle mine
	minute misery neglect nerve north number obstacle offer opera opposite other
	ottoman path people plan pocket point pool portfolio pound poverty pretence
	price promise protection rage rapidity readiness reading reception recollection
	record regard regret relation reliance relish remain repose riding road roof
	room round sacrifice safety sea sens sense sentence shawl ship shoulder
	shrubbery silence sill skeleton smoke sofa sorrow soul south spark stair step
	stillness stop substance suggestion sum superstition tear terror thinking top
	topic torpor tour trace tranquillity tribute trouble uncle utmost victim wage
	walk waste watch weight wheel whirl whisper whispering whole wilderness window
	wine wretch year

	important_common_words_distance 0.0154440308888 (1%)
	important_common_words_frequency 0.148348119782 (14%)


Eliot_George_Felix_Holt_the_Radical_PG_40882.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.84602086742

	HAND HEAD NOTHING ONE TIME WORD

	absence account addition advance advice ambition annoyance association attention
	attraction aught bar bench benefit black bloom boot bottom breathing building
	bust cabinet care carriage cat centre chair change chapter character check chin
	choice clothe complexion confidence confusion conjecture constitution corner
	creature creed cure cut debt desk devil difference disclosure discourse
	discovery disgust document dozen elder eloquence emotion entrance error
	expedient fall fancy feeling fender finger food force fortune fragment freshness
	friend fro fur future garment gentry glimpse glory glove gold grace grandfather
	grasp grass gratitude ground guest hand handkerchief harvest haste head heiress
	her holiday hotel hurry husband illness impatience indulgence inn insolence
	instinct intention introduction irritation isolation joy keeper knee knot
	landscape lantern lap leader length library license lie light lodge lodging love
	lunch madness manner manor marriage mean mirror mist mistake moment moth mouth
	move movement narrative neck news noise north nothing number nut oak obligation
	offer one order pain paradise party passion penny period pity point porridge
	port poverty presentiment pretension proposal proprietor punishment rat reach
	reference reserve resident resolve respect responsibility return reward ring
	rise rule salt sarcasm scent seed service set sheep shoulder shrub sinner smile
	snare snuff sorrow soul source south space spectacle spirit stable stair
	standard standing start state station stay stillness stone submission succession
	sum summit sunshine supply tapestry teaching tender terror thinking thought time
	toast tongue tooth train treatment trembling trifle type uncle utterance vessel
	vice victim vocation walk warmth waste weariness will wish word working wrong
	year yesterday yoke youth

	important_common_words_distance 0.0135577728994 (1%)
	important_common_words_frequency 0.141584434064 (14%)


Hardy_Thomas_Far_from_the_Madding_Crowd_PG_27.txt
Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt

	matrix distance 0.815820082332

	DOOR EYE FACE HAND HEAD HOUSE NIGHT NOTHING VOICE WORD

	address advice afternoon air anger appearance arrangement arrival article aspect
	attachment aunt base beating bee blow bond brass breeze burden business bustle
	call capital case cat certainty chance chimney circumstance city climax clock
	coal companion conduct consistency contour contrast cottage county course
	creature death debt defect depth determination devil diamond disappointment
	disease dismay distance door dress dusk dust ear element exchange experience
	expression eye face feeling finger fir flame flight foot force forty frame
	friendship frock game garb gaze globe good grasp gravel green grey grief hair
	hand head hesitation history home house household humour imagination indignation
	indulgence information inquiry instant interview journey labour lamp lane lapse
	line loss maker material mean measure mercy messenger milk mind minute moment
	morning morrow moss musing name nature neck neighbourhood night nothing notice
	number oak object occasion occupant offer pair palm pane pardon part penny
	period philosopher pillar pipe pity pocket point portion pound power precaution
	process progress proposal proprietor protection pursuit quality rain rainbow
	reach relic relish reply reserve responsibility rest revelation robber rook root
	rush scene scripture sea self sensation sense serenity sermon shake shilling
	ship shone shoulder sight silver situation skin slip smell snake softness song
	sort soul spectre state stillness stone substitute succession summer sunset
	superiority swell talk tenderness thinking thought thread tooth torture touch
	track transaction travel traveller triviality twilight uncertainty uniform
	utterance valley vision voice waist wall want washing wedding week west while
	whither widow wife wild wind wonder wood word work world wound writing wrong
	yesterday

	important_common_words_distance 0.0198336615901 (2%)
	important_common_words_frequency 0.189847280661 (18%)


Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_OMS_Wister translation_cleaned_110617.txt

	matrix distance 0.836019025155

	HEART LADY LIFE ONE TIME WORD

	accord account action advance afternoon age allusion alteration annoyance
	apartment appearance apple approach arrival artist attendance autumn balcony
	band bank bar bed bird blood blow bolt bone call capital card chapter character
	childhood church circumstance claim cloth companion comrade condition
	consciousness consequence content countenance country crape crown cup curiosity
	cut dame decay delicacy design despair disaster disclosure distinction distress
	dozen education emotion endurance entertainment excursion expectation experience
	fact fancy fashion fate favour feather fever foot freedom friend gallery glow
	good ground gulf hair hay heart height hem hint home host humanity husband
	impatience impression indifference insolence inspiration intelligence interest
	introduction intruder joy kingdom knife knot lady lantern lash law letter level
	life light lily line listener loss love lover luxury mantle mask mass mat
	material matter meaning means meeting memory metal midnight mind mischief misery
	morrow mourning mouth murmur name necessity neck needle noise obedience
	objection observer office one opera opinion oppression ornament parish passion
	path pavement people performance permission pipe place plain plate plot pocket
	poverty presence present presentiment pride prison purity purpose pursuit
	quantity race realm reason reception recognition refuge regard relic rent
	repentance request risk roll row satin scale secrecy sentence series sermon
	service severity sewing shadow share shawl shone shoulder shrub side silk slate
	sob sound speaker speck spectator spring stage step sternness stream stuff style
	succession suffering sunshine thorn thought thread time toilette tongue tooth
	top tower tree twig twilight uncertainty understanding valley vein volume vow
	want wardrobe warmth wife wing wintry woe word yoke

	important_common_words_distance 0.0158930674085 (1%)
	important_common_words_frequency 0.154811461214 (15%)


Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt
Marlitt_E_Eugenie_Gold_Elsie_PG_42426.txt

	matrix distance 0.786523244186

	DAY DOOR EYE FACE HEART LIFE ONE ROOM TIME VOICE WORD

	absence action addition address advice affection afresh afternoon age agony
	arrangement article back basin beauty being bell black blood blow board body
	bond bone boot boudoir brain bride bridle brow call case cat certainty change
	chaos charge charm chestnut childhood china choice city coal coin cold companion
	companionship comprehension comrade consolation contrary conversation costume
	couch courage covering crescent cruelty danger day death delusion despair
	diamond direction distress door drive echo embrace end establishment excitement
	exclamation experience eye face fairy fall feature fence fold fool form friend
	fruit furniture gaiety gate gift glimpse gloom governess grace grandeur grief
	handful haste heart help her hold holiday horn horse host housekeeper hurry idea
	ideal impatience inclination influence instrument iron jaw judgment key kind
	kindness king labourer lace land lap laurel lawn lid life line loneliness love
	lover maniac mark marriage mass match memory merchant mirror moss name need
	nerve nest news object offer one opening opportunity organ papa pardon patience
	permission person perusal pity place poetry position praise presentiment price
	pride proceeding proof proposal reality relief reply reproach resolution resolve
	respect rest result revelation risk room rule scarf sentence servant service
	shade shadow shop shoulder silence sin sister situation skirt slave snow society
	sofa song sorrow sound spark sphere star state struggle stuff submission
	suffering sum superintendent syllable sympathy table tete thank throat time toe
	toilet token tongue torture town train traveller treatment trunk twilight
	uniform use utterance veneration violence voice vow waist wanderer watch welcome
	whisper whit widow wish witch woman word

	important_common_words_distance 0.0218369461188 (2%)
	important_common_words_frequency 0.191273434414 (19%)


n_je_distances 150

day 88
eye 86
rest 76
nothing 74
side 73
head 72
name 70
ground 70
word 69
place 69
voice 67
air 66
half 66
life 63
something 62
light 62
hand 61
time 61
window 61
morning 60
year 59
arm 59
heart 59
part 57
step 57
finger 56
look 55
return 55
question 55
account 55
face 55
thought 55
service 54
door 54
mind 53
chapter 53
sound 53
foot 53
end 52
mistake 52
shoulder 52
neck 52
lip 52
moment 52
course 51
good 51
house 51
cheek 51
conversation 51
opening 51
care 50
table 50
line 50
hair 50
age 50
room 49
none 49
change 49
one 49
week 49
doubt 49
mouth 48
interest 48
pain 48
direction 48
evening 48
world 48
attention 48
turn 47
view 47
home 47
curiosity 46
round 46
companion 46
sight 46
truth 46
attempt 45
shame 45
trouble 45
difference 45
sun 45
mercy 45
spot 44
lady 44
necessity 44
morrow 44
back 44
whisper 44
fit 43
earth 43
presence 43
sign 43
case 43
beauty 42
character 42
journey 42
love 42
silence 42
tooth 42
death 42
In [ ]: