The results from topic modeling with the two additional OMS translations are here.
Results from modeling the endings of the novels, and from modeling 500 topics (the same as the endings) across the full set of novels, and across the top 30 are here.
50 most common words for each of the 20 topics.
A graphic connecting similar novels together.
50 most common words for each of the 50 topics.
A graphic connecting similar novels together.
50 most common words for each of the 20 topics.
A graphic connecting similar novels together.
50 most common words for each of the 50 topics.
A graphic connecting similar novels together.
Results from modeling the endings of the novels, and from modeling 500 topics (the same as the endings) across the full set of novels, and across the top 30.
. . . let's try getting the spreadsheet and the most common topic words from the the regular mallet interface,.
I prepared network diagrams in two ways. This first three images were drawn in the same way as the images for the first set of results, above. If two novels were within the distance of the average less two standard deviations, I connected them. For the purpose of comparing the results of the top 30 novels with the first set of results, these first three images are comparable. Note that there aren't a lot of connection between novels.
The next three images were drawn using a relaxed definition of close. If two novels were within the distance of the average less one standard deviations, I connected them. The resulting images show more connection between novels; however, they may not be useful for comparing to the first set of results, since the definition of "close" is different.
And here are links to the pages for the novel-by-novel viewers:
This link leads to a discussion of things Steve did in late October and early November to see if we couldn't extract additional meaning from Mallet's outputs. The examples are rough, quickly made prototypes intended to solicit ideas for refining visualizations, etc.
The presentation notes for Nov 4, 2011.
The "german-ish-ness" notes from Nov 9, 2011.
I tried listing words which occur only in one topic. No luck, since the lists seem to get every last OCR error in our texts. There may very well be good, meaningful words in these lists, but they're hard to see because they're surrounded by gibberish.