In this notebook I study the uses of "something" and "thing" in Jane Eyre. They can both express uncertain perception, but "thing" is also used as a term of belittlement.
import spacy
print spacy.__version__
nlp = spacy.load('en')
import codecs
text = codecs.open('Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt', 'r', encoding = 'utf-8').read()
doc = nlp(text)
somethings = []
for t in doc:
if t.lemma_ == 'something':
somethings.append(t)
#print t.text
#print t.text
print len(somethings)
things = []
for t in doc:
if t.lemma_ == 'thing':
things.append(t)
print len(things)
from collections import Counter
my_deps = []
for thing in things:
my_deps.append(thing.dep_)
for w in Counter(my_deps).most_common():
rel_thing_dep = (float(w[1]) / float(len(things)))*float('100')
print w[0], w[1], rel_thing_dep
from collections import Counter
my_deps = []
for something in somethings:
my_deps.append(something.dep_)
for w in Counter(my_deps).most_common():
rel_something_dep = (float(w[1]) / float(len(somethings)))*float('100')
print w[0], w[1], rel_something_dep
Below (next two cells) I am finding paragraphs with most "thing" lemma in them -- I get numbers (in order in the novel) of paragraphs with more than one "thing" lemma in them.
import re
paragraphs = re.split('\n\n+', text)
p_numbers = []
for i, p in enumerate(paragraphs):
tokens = []
spacy_p = nlp(p)
for t in spacy_p:
tokens.append(t)
if t.lemma_ == 'thing':
#print i, t.dep_
p_numbers.append(i)
thing_paragraphs = []
for n in Counter(p_numbers).most_common():
if float(n[1]) > float('1'):
print n
thing_paragraphs.append(n[0])
Below I print out the paragraphs identified above.
for i, p in enumerate(paragraphs):
if i in thing_paragraphs:
print
print '[paragraph '+str(i)+']', p
print
Among the passages Sandra and I have marked up, there are three, where Rochester calls Jane 'thing'. I want to see what dependency tag spacy assigns to these passages, so that we can find more similar passages in Jane Eyre. Below I find sentence numbers for these passages, to later do dependency parsing on these sentences.
(4047) "You -- you strange, you almost unearthly thing!"
(4199) "But what had you to ask, thing, -- out with it?"
(4379) "Yes, bonny wee thing, I'll wear you in my bosom, lest my jewel I should tyne
"Thing" in these sentences is usually translated to German as "Ding," although sometimes it is ommitted and sometimes translated as "Wesen."
for i, s in enumerate(doc.sents):
for t in s:
if t.text == "wee" or t.text =="unearthly":
print i, s
if re.search('ask, thing', str(s)):
print i, s
je_thing_sents = []
for i, s in enumerate(doc.sents):
if i in [int('4047'), int('4199'), int('4379')]:
print '----------'+str(i)+'-----------'
print s
print
for t in s:
if t.lemma_ == "thing":
print t.text, t.pos_, t.dep_
print
Spacy has tagged dependency for "thing" in these passages as npadvmod (noun-phrase adverbial modifier). In the meantime I found out that sometimes spacy tags "thing" in similar passages as appos (appositional modifier). Below I get out from Jane Eyre all the sentences, where "thing" is dependency tagged as appos or npadvmod.
sents = []
for i, s in enumerate(doc.sents):
for t in s:
if t.lemma_ == 'thing' and (t.dep_ == 'appos' or t.dep_ == 'npadvmod'):
print '-------'+str(i)+'----'+t.dep_+'--------'
print s
sents.append(s)
print
print '*********************************************************************************'
print
print 'sentences selected', str(len(sents))
Sentences above, where the use of "thing" refers to a person: 520, 632, 834, 2670, 3715, 4047, 4199, 4379, 5824.
In 9 out of 19 sentences, where spacy classifies "thing" as appositional modifier or noun phrase adverbial mofifier, "thing" refers to a person. In all these sentences, use of thing seems to be linked to intentional looking (gazing).
Now, besides Jane Eyre, I will also look at David Copperfield and two Marlitt's works: Countess Gisela, and Old Madamoiselle's Secret. This will allow me to compare how "thing" is used in reference to a person in these novels.
import nltk
texts = [
{'file_name': 'Bront_Charlotte_Jane_Eyre_An_Autobiography_PG_1260.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
{'file_name': 'Dickens_Charles_The_Personal_History_of_David_Copperfield_PG_43111.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
{'file_name': 'Marlitt_Wister_Countess_Gisela_corrected_4_10_2018.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
{'file_name': 'Marlitt_Wister_OMS_translation_cleaned_110617.txt',
'raw_text': '', 'tokens': [], 'text_obj': None, 'spacy_doc': None},
]
for t in texts:
t['raw_text'] = codecs.open(t['file_name'], 'r', encoding='utf-8').read()
t['tokens'] = nltk.word_tokenize(t['raw_text'])
t['text_obj'] = nltk.Text(t['tokens'])
cleaned_text = re.sub('\s+', ' ', t['raw_text'])
t['spacy_doc'] = nlp(cleaned_text)
print 'Done!'
First I look at how many times "thing" lemma appears in each novel and how many times it is an appositional modifier or noun-phrase adverbial modifier (likely to be referring to a person).
print
for t in texts:
print t['file_name']
thing = []
select_thing = []
for token in t['spacy_doc']:
if token.lemma_ == 'thing':
if token.dep_ == 'appos' or token.dep_ == 'npadvmod':
select_thing.append(token)
else:
thing.append(token)
print 'thing:', str(len(thing)), 'thing as appos or npadvmod:', str(len(select_thing))
print
Below I get adjectives and adverbs from sentences, where "thing" is tagged as an appositional modifier or a noun-phrase adverbial modifier.
for t in texts:
print t['file_name']
advs = []
advs_select = []
for s in t['spacy_doc'].sents:
for t in s:
if t.lemma_ == 'thing':
if t.dep_ == 'appos' or t.dep_ == 'npadvmod':
for t in s:
if t.pos_ == 'ADJ' or t.pos_ == 'ADV':
advs_select.append(t.lemma_)
else:
for t in s:
if t.pos_ == 'ADJ' or t.pos_ == 'ADV':
advs.append(t.lemma_)
print 'THING AS APPOS OR NPADVMOD'
for a in Counter(advs_select).most_common():
if float(a[1]) > float('2'):
print a
print 'OTHER THING'
for a in Counter(advs).most_common():
if float(a[1]) > float('3'):
print a
Both in Jane Eyre and Dickens words like "poor" and "little" are more likely to occure with thing as npadvmod or appos. But we are getting hardly anything for Marlitt... perhaps just not enough data.
Next I check passages with appos and npadvmod in Marlitt to see whether "thing" refers to a person there and whether it has the same connotations in Jane Eyre. It will also be good to just look at passages in JE and Dickens, where thing (as appos or npadvmod) appears together with adjectives like "little" and "poor" to check whether both authors are doing the same thing.
As we see below, in OMS "thing" as npadvmod is 3/4 times a person, and as such always negative or pitiful: stupid, poor, ungrateful. Why would OMS (or English translation of OMS) be more influenced by JE than Gisela, which was written a year later? What are the equivalents of these sentences in German?
for t in texts:
if re.search('Marlitt', t['file_name']):
print '****' + t['file_name'] + '****'
print
sents_marlitt = []
for i, s in enumerate(t['spacy_doc'].sents):
for token in s:
if token.lemma_ == 'thing' and (token.dep_ == 'appos' or token.dep_ == 'npadvmod'):
print '-------'+str(i)+'----'+token.dep_+'--------'
print s
sents_marlitt.append(s)
print '--------------------'
print
print 'sentences selected', str(len(sents_marlitt))
print
Additionally, below I get all the sentences from Jane Eyre and David Copperfield, where "thing" dependency is an appos or npadvmod and additional words bellitlement and pity, "poor" or "little," appear.
for t in texts:
if re.search('Dickens', t['file_name']) or re.search('Bront', t['file_name']):
print '****' + t['file_name'] + '****'
print
for i, s in enumerate(t['spacy_doc'].sents):
tokens = []
for token in s:
tokens.append(token.lemma_)
if token.lemma_ == 'thing' and (token.dep_ == 'appos' or token.dep_ == 'npadvmod'):
if 'poor' in tokens or 'little' in tokens:
print '-------'+str(i)+'----'+token.dep_+'--------'
print s
print
print '--------------------'
print
It seems there might be a difference in the emotions and value conveyed by use of "thing" in relation to a person in Jane Eyre, David Copperfield, and Wister's translations Marlitt's works. "Thing" seems to be used as a term of diminishment by all authors, but while in Jane Eyre it appears to be pitying (dominance of "poor little thing") in the other works it seems to be straight up negative ("ungrateful thing," for instance). Here I look at the adjectives and adverbs appearing in the sentences with these instances of "thing," where "thing" is appos or npadvmod.
from collections import defaultdict, Counter
for t in texts:
adjv = []
print
print t['file_name']
for s in t['spacy_doc'].sents:
for token in s:
if token.lemma_ == 'thing':
if token.dep_ == 'appos' or token.dep_ == 'npadvmod':
for token in s:
if token.pos_ == 'ADJ' or token.pos_ == 'ADV':
adjv.append(token.lemma_)
for w in Counter(adjv).most_common(20):
print w[0], w[1]
Now answering Professor Tatlock's question about Rochester using "thing" as a word of uncertainty. I search for instances of "thing," where it appears together with "see" and "look" lemmas.
Results suggest that "thing" does not appear as a word suggesting Rochester's uncertainty any more than it suggests Jane's uncertainty. It would confirm that the Rochester-specific use of "thing" is as a term to belittle Jane.
for t in texts:
if re.search('Bront', t['file_name']):
print '****' + t['file_name'] + '****'
print
for i, s in enumerate(t['spacy_doc'].sents):
tokens = []
for token in s:
tokens.append(token.lemma_)
if 'thing' in tokens:
if 'see' in tokens or 'look' in tokens:
print '-------'+str(i)+'--------'
print s
print
print '--------------------'
print
In the two cells below finding paragraphs with more than one "something" in them.
import re
something_paragraphs = re.split('\n\n+', text)
something_p_numbers = []
for j, sp in enumerate(something_paragraphs):
s_tokens = []
spacy_sp = nlp(sp)
for st in spacy_sp:
s_tokens.append(t)
if st.lemma_ == 'something':
#print j, st.dep_
something_p_numbers.append(j)
print '(paragraph number, number of "somethings")'
something_paragraphs = []
for sn in Counter(something_p_numbers).most_common():
if float(sn[1]) > float('1'):
print sn
something_paragraphs.append(sn[0])
Now I print out these paragraphs.
for j, sp in enumerate(paragraphs):
if j in something_paragraphs:
print
print '[paragraph '+str(j)+']', sp
print
Below I get all the sentences in Jane Eyre, where "something" appears as the subject.
This is almost always Jane struggling to name what she is observing.
for i, s in enumerate(doc.sents):
tokens = []
for token in s:
tokens.append(token.lemma_)
if token.lemma_ == 'something' and (token.dep_ == 'nsubj'):
print '-------'+str(i)+'----'+token.dep_+'--------'
print s
print
print '--------------------'
print
In the beginning of this notebook, we have seen that "something" in Jane Eyre appears especially often as a conjuct.
Here I check in what contexts does Spacy tag "something" dependency as conjunct, and find out that it is almost always in frases like This almost always appears as "something like it," "something of that sort," or something of that sort.
for i, s in enumerate(doc.sents):
tokens = []
for token in s:
tokens.append(token.lemma_)
if token.lemma_ == 'something' and (token.dep_ == 'conj'):
print '-------'+str(i)+'----'+token.dep_+'--------'
print s
print
print '--------------------'
print
print
for t in texts:
print t['file_name']
words = []
somethings = []
for token in t['spacy_doc']:
words.append(token)
if token.lemma_ == 'something':
somethings.append(token)
rel_freq = (float(len(somethings))/float(len(words)))*1000
print 'words:', str(len(words)), 'something:', str(len(somethings)), 'something per 1000 words:'+str(rel_freq)
print