The interface to Mallet is here.

The results from topic modeling with the two additional OMS translations are here.

Results from modeling the endings of the novels, and from modeling 500 topics (the same as the endings) across the full set of novels, and across the top 30 are here.

First set of topic modeling results

20 topics -- the full novels

Topic number spreadsheet.

50 most common words for each of the 20 topics.

A graphic connecting similar novels together.

50 topics -- the full novels

Topic number spreadsheet.

50 most common words for each of the 50 topics.

A graphic connecting similar novels together.

20 topics -- capitalized words removed

Topic number spreadsheet.

50 most common words for each of the 20 topics.

A graphic connecting similar novels together.

50 topics -- capitalized words removed

Topic number spreadsheet.

50 most common words for each of the 50 topics.

A graphic connecting similar novels together.

Endings (and 500 topics)

Results from modeling the endings of the novels, and from modeling 500 topics (the same as the endings) across the full set of novels, and across the top 30.

Results here.

Top 30 novels

Just the network diagrams . . .

. . . let's try getting the spreadsheet and the most common topic words from the the regular mallet interface,.

I prepared network diagrams in two ways. This first three images were drawn in the same way as the images for the first set of results, above. If two novels were within the distance of the average less two standard deviations, I connected them. For the purpose of comparing the results of the top 30 novels with the first set of results, these first three images are comparable. Note that there aren't a lot of connection between novels.

20 topics.

50 topics.

100 topics.

The next three images were drawn using a relaxed definition of close. If two novels were within the distance of the average less one standard deviations, I connected them. The resulting images show more connection between novels; however, they may not be useful for comparing to the first set of results, since the definition of "close" is different.

20 topics.

50 topics.

100 topics.

And here are links to the pages for the novel-by-novel viewers:

20 topics.

50 topics.

100 topics.

Presentations, (popped) trial balloons, etc

Nov 4, 2011: Prototypes for visualization, etc

This link leads to a discussion of things Steve did in late October and early November to see if we couldn't extract additional meaning from Mallet's outputs. The examples are rough, quickly made prototypes intended to solicit ideas for refining visualizations, etc.

The presentation notes for Nov 4, 2011.

The "german-ish-ness" notes from Nov 9, 2011.

Sometimes stuff doesn't work out, and that stuff is here . . .

I tried listing words which occur only in one topic. No luck, since the lists seem to get every last OCR error in our texts. There may very well be good, meaningful words in these lists, but they're hard to see because they're surrounded by gibberish.

tatlockNoCaps_reorderedWordsInOnlyOneTopic_20.txt

tatlockNoCaps_reorderedWordsInOnlyOneTopic_50.txt