In chapters 12-20 and 22-24 of Jane Eyre.
In this notebook I look at differences in language between passages, where Rochester looks at Jane and where Jane looks at Rochester.
In the beginning I separate the passages into two groups, saved as text files for use later in the notebook.
The first file contains all passages, where Rochester looks at Jane, or talks about her appearance, or where Jane imagines Rochester looking at her, according to the XML tags created by me (Tomek) and Sandra.
The second file contains all passages, where Jane looks at Rochester, or talks about his appearance, or where she imagines looking at Rochester.
The number of passages in each file is indicated below cells with code below.
This notebook does not demonstrate to what extent the characters speak differently -- many instances of Rochester's gaze are in the novel not as direct speech, but as Jane's narrative. However, the notebook can indicate some ways in which Rochester's and Jane's gaze play a different role in the novel, and how do different character's look. We will be able to create similar notebooks for German translations to compare the results to the result of this notebook
import unicodecsv
import codecs, re
import nltk
import spacy
import en_core_web_sm
#nlp = spacy.load('en')
nlp = en_core_web_sm.load()
#import unicodecsv
f = open('passages.rochester.txt', 'w')
f.write('')
f.close()
f = open('passages.rochester.txt', 'a')
passages_rochester =[]
with open('extracted_passages_JE_only_ch12_20_22_24.csv', 'r') as csv_passages:
spamreader_passages = unicodecsv.reader(csv_passages, delimiter='|', quotechar='"')
for i, row in enumerate(spamreader_passages):
if i > 0:
if re.search('Rochester_looking_at_Jane', row[3]) or re.search('Rochester_on_Janes_appearance', row[3]) or re.search('Jane_imagining_Rochester_looking_at_her', row[3]):
f.write(row[4]+'\n')
passages_rochester.append(row[4])
print 'rochester_gaze_passages:' + str(len(passages_rochester))
#for i, passage in enumerate(passages_rochester):
# print i, passage
#print copy_words_1
f = open('passages.jane.txt', 'w')
f.write('')
f.close()
f = open('passages.jane.txt', 'a')
passages_jane =[]
with open('extracted_passages_JE_only_ch12_20_22_24.csv', 'r') as csv_passages:
spamreader_passages = unicodecsv.reader(csv_passages, delimiter='|', quotechar='"')
for i, row in enumerate(spamreader_passages):
if i > 0:
if (re.search('Jane_looking_at_Rochester', row[3]) or re.search('Jane_reflecting_on_Rochesters_appearance', row[3])) or re.search('Jane_imagining_looking_at_Rochester', row[3]):
f.write(row[4]+'\n')
passages_jane.append(row[4])
print 'jane_gaze_passages:' + str(len(passages_jane))
#for i, passage in enumerate(passages_rochester):
# print i, passage
#print copy_words_1
texts = [
{'file_name': 'passages.rochester.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
{'file_name': 'passages.jane.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
]
for t in texts:
t['raw_text'] = codecs.open(t['file_name'], 'r', encoding='utf-8').read()
t['tokens'] = nltk.word_tokenize(t['raw_text'])
t['text_obj'] = nltk.Text(t['tokens'])
cleaned_text = re.sub('\s+', ' ', t['raw_text'])
t['spacy_doc'] = nlp(cleaned_text)
print 'Done!'
What are the common nouns in Rochester and Jane gaze passages?
In the two cells below I count how many times (absolute count and relative frequency per 1000 nouns) each noun appears in Rochester and Jane gaze passages. Lists containing nouns appearing at least 3 times are under each cell, together with absolute count and relative frequency per 1000 nouns.
from collections import defaultdict, Counter
nouns_rochester = []
noun_counts_rochester = []
for t in texts:
if t['file_name'] == 'passages.rochester.txt':
nouns = []
words = []
print
print t['file_name']
print
lemma_counts = defaultdict(int)
for t in t['spacy_doc']:
words.append(t)
if t.pos_ == 'NOUN' and t.lemma_ not in ['what', 'who']:
lemma_counts[t.lemma_] += 1
nouns.append(t)
print 'words' + str(len(words))
print 'nouns:' + str(len(nouns))
for w in Counter(lemma_counts).most_common(1000):
word = str(w[0])
count= str(w[1])
relative_count = str((float((float(w[1])/float(len(nouns)))))*1000)
nouns_rochester.append(word)
noun_counts_rochester.append([word, count, relative_count])
if float(w[1]) > 2:
print '\t', w[0], w[1], str((float((float(w[1])/float(len(nouns)))))*1000)
from collections import defaultdict, Counter
nouns_jane =[]
noun_counts_jane = []
for t in texts:
if t['file_name'] == 'passages.jane.txt':
nouns = []
words = []
print
print t['file_name']
print
lemma_counts = defaultdict(int)
for t in t['spacy_doc']:
words.append(t)
if t.pos_ == 'NOUN' and t.lemma_ not in ['what', 'who']:
lemma_counts[t.lemma_] += 1
nouns.append(t)
print 'words' + str(len(words))
print 'nouns:' + str(len(nouns))
for w in Counter(lemma_counts).most_common(1000):
word = str(w[0])
count= str(w[1])
relative_count = str((float((float(w[1])/float(len(nouns)))))*1000)
nouns_jane.append(word)
noun_counts_jane.append([word, count, relative_count])
if float(w[1]) > 2:
print '\t', w[0], w[1], str((float((float(w[1])/float(len(nouns)))))*1000)
Which nouns are significantly more common for one character than the other?
Below I check which nouns occur significantly most frequently in Rochester and Jane gaze passages. Some nouns are particularly striking:
"Pleasure" is much more common in expressions of Jane's gaze. Same for "something." "Thing" is much more common in expressions of Rochester's gaze. Same for "question," "glance."
print 'MORE FOR ROCHESTER'
print
print 'noun--count(Rochester)--count(Jane)--relative(per 1000 nouns, Rochester)--relative(per 1000 nouns, Jane)'
print
for ncr in noun_counts_rochester:
for ncj in noun_counts_jane:
if ncr[0] == ncj[0] and float(ncr[2]) > float(ncj[2]):
if float(ncr[1])>float('3'):
print ncr[0], ncr[1], ncj[1], ncr[2], ncj[2]
if ncr[0] not in nouns_jane:
if float(ncr[1])>float('3'):
print ncr[0], ncr[1], '0', ncr[2], '0'
print
print 'MORE FOR JANE'
print
print
print 'noun--count(Jane)--count(Rochester)--relative(per 1000 nouns, Jane)--relative(per 1000 nouns, Rochester)'
print
for ncj in noun_counts_jane:
for ncr in noun_counts_rochester:
if ncj[0] == ncr[0] and float(ncj[2]) > float(ncr[2]):
if float(ncj[1])>float('3'):
print ncj[0], ncj[1], ncr[1], ncj[2], ncr[2]
if ncj[0] not in nouns_rochester:
if float(ncj[1])>float('3'):
print ncj[0], ncj[1], '0', ncj[2], '0'
Now I am curious in what contexts do some of the nouns more common in passages of one of the character's gaze appear
While some differences in common nouns in sentences about Rochester's and Jane's gaze are obvious (such as 'master' being more common in instances of Jane's gaze), some differences are more striking.
Below I extract sentences with some of these nouns, which seem more surprising.
"Thing," almost always used by Rochester, refers to Jane.
"Questions" are to be asked by Rochester -- two of the times it is him, who asks, and the one time when Jane asks she is afraid that it is inappropriate.
"Glance" is almost always Rochester's glance, except one instance, where he is reading Jane's glance.
"Time" seeks to have various meanings, from time necessary to establish Jane's and Rochester's relationship to time Jane and Rochester spend gazing at each other.
"Minute" (or "two minutes") usually stands for an extended while of Rochester looking at Jane.
"Something" may represent Jane feeling uncertain (particularly when she reads Rochester's expression), intuitive, 'feminine'?
Jane feels "pleasure" in looking at Rochester.
"Features" used by Jane when looking at Rochester usually refer to the changing features on Rochester's face, when she reads his expression.
print
print
print '--------------------------- THING ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'thing' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- QUESTION ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'question' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- GLANCE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'glance' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- TIME ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'time' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- MINUTE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'minute' and t.pos_=='NOUN':
print sent.text + '\n'
print
print '--------------------------- PLEASURE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'pleasure' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- SOMETHING ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'something' and t.pos_=='NOUN':
print sent.text + '\n'
print
print
print '--------------------------- FEATURE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'feature' and t.pos_=='NOUN':
print sent.text + '\n'
Now I look at the common verbs in passages of Jane and Rochester looking at the other
Like with nouns, under each cell with code there is a list of the most common words. First, a list of most common verbs in Rochester gaze passages, second a list of mostr common verbs in Jane gaze passages.
for t in texts:
if t['file_name'] == 'passages.rochester.txt':
verbs = []
words = []
verb_counts_rochester = []
verbs_rochester = []
print
print t['file_name']
print
lemma_counts = defaultdict(int)
for t in t['spacy_doc']:
words.append(t)
if t.pos_ == 'VERB' and t.lemma_ not in ['what', 'who']:
lemma_counts[t.lemma_] += 1
verbs.append(t)
print 'words' + str(len(words))
print 'verbs:' + str(len(nouns))
for w in Counter(lemma_counts).most_common(1000):
if float(w[1]) > 3:
print '\t', w[0], w[1], str((float((float(w[1])/float(len(nouns)))))*1000)
word = str(w[0])
count= str(w[1])
relative_count = str((float((float(w[1])/float(len(nouns)))))*1000)
verb_counts_rochester.append([word, count, relative_count])
verbs_rochester.append(word)
for t in texts:
if t['file_name'] == 'passages.jane.txt':
verbs = []
words = []
verb_counts_jane = []
verbs_jane =[]
print
print t['file_name']
print
lemma_counts = defaultdict(int)
for t in t['spacy_doc']:
words.append(t)
if t.pos_ == 'VERB' and t.lemma_ not in ['what', 'who']:
lemma_counts[t.lemma_] += 1
verbs.append(t)
print 'words' + str(len(words))
print 'verbs:' + str(len(nouns))
for w in Counter(lemma_counts).most_common(1000):
if float(w[1]) > 3:
print '\t', w[0], w[1], str((float((float(w[1])/float(len(verbs)))))*1000)
word = str(w[0])
count= str(w[1])
relative_count = str((float((float(w[1])/float(len(verbs)))))*1000)
verb_counts_jane.append([word, count, relative_count])
verbs_jane.append(word)
Which nouns are significantly more common for one character than the other?
Below I check which nouns occur significantly most frequently in Rochester and Jane gaze passages. Some nouns are particularly striking:
"Pleasure" is much more common in expressions of Jane's gaze. Same for "something." "Thing" is much more common in expressions of Rochester's gaze. Same for "question," "glance."
print 'MORE FOR ROCHESTER'
print
print 'noun--count(Rochester)--count(Jane)--relative(per 1000 nouns, Rochester)--relative(per 1000 nouns, Jane)'
print
for vcr in verb_counts_rochester:
for vcj in verb_counts_jane:
if vcr[0] == vcj[0] and float(vcr[2]) > float(vcj[2]):
print vcr[0], vcr[1], vcj[1], vcr[2], vcj[2]
if vcr[0] not in verbs_jane:
if float(vcr[1])>float('3'):
print vcr[0], vcr[1], '0', vcr[2], '0'
print
print 'MORE FOR JANE'
print
print 'noun--count(Jane)--count(Rochester)--relative(per 1000 nouns, Jane)--relative(per 1000 nouns, Rochester)'
print
for vcj in verb_counts_jane:
for vcr in verb_counts_rochester:
if vcj[0] == vcr[0] and float(vcj[2]) > float(vcr[2]):
print vcj[0], vcj[1], vcr[1], vcj[2], vcr[2]
if vcj[0] not in verbs_rochester:
if float(vcj[1])>float('3'):
print vcj[0], vcj[1], '0', vcj[2], '0'
Now I check the contexts of some of the more interesting verbs, more common in instances of one of the characters' gaze
Rochester: will, smile, can, must Jane: could, rise, feel
Particularly striking are differences in the presence of some modal verbs. While "would" is similarly common for both characters, for example "can" occurs only with instances of Rochester's gaze, while "could" only with instances of Jane's gaze, but this may be a result of the first-person narrator: "can" occurs usually in instances of direct speech of Rochester, while "could" -- in Jane's narration.
The fact that "feel" is more common in passages, where Jane gazes, may point to the same phenomenon as the greater presence of "something" in Jane -- her uncertainty in reading Rochester's expression, and her reliance on feeling, intuition, as a way of understanding and interpreting the reality.
print
print
print '--------------------------- WILL ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'will' and t.pos_=='VERB':
print '[WILL]' + sent.text + '\n'
print
print
print '--------------------------- SMILE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'smile' and t.pos_=='VERB':
print '[SMILE]' + sent.text + '\n'
print
print
print '--------------------------- CAN ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'can' and t.pos_=='VERB':
print '[CAN]' + sent.text + '\n'
print
print
print '--------------------------- MUST ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'must' and t.pos_=='VERB':
print '[MUST]' + sent.text + '\n'
print
print
print '--------------------------- COULD ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'could' and t.pos_=='VERB':
print '[COULD]' + sent.text + '\n'
print
print
print '--------------------------- RISE ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'rise' and t.pos_=='VERB':
print '[RISE]' + sent.text + '\n'
print
print
print '--------------------------- FEEL ---------------------------'
for t in texts:
print
print t['file_name']
print
for sent in t['spacy_doc'].sents:
for t in sent:
if t.lemma_ == 'feel' and t.pos_=='VERB':
print '[FEEL]' + sent.text + '\n'