Reading Level

Our market basket analysis suggested that what people read depended in part on their age. For example, young boys were much more likely to read Alger than older men. Since we have so many texts from Project Gutenberg, it seems like a trivial matter to check to see if the texts themselves reflect what we see in reader behavior. We used an open-source package to calculate the reading grade levels (see below), and they generally bear out what we see from reader behavior.

Note, however, that these scores are for just the most popular books. Looking at the actual history for individual readers often reveals a more complex story. It's not unusual, for example, to find a reader who read a lot of popular fiction, and yet who also read government-issued materials on current affairs.

For example, if you go to the page to query the database by author, select “United States Philippine commission” or “United States Congress House Select Committee to Investigate Hazing at Military Academy” in the author drop down, submit, and then follow the borrower links, you should have no trouble in finding readers who read both government publications and lots of popular fiction.

And even the simplest of books isn't uniformly simple. Take, for example, how Horatio Alger introduces the title character in Sam's Chance, supposedly the simplest book in our sample corpus:

He was not a model boy, as those who have read his early history, in "The Young Outlaw," are aware; but, on the other hand, he was not extremely bad. He liked fun, even if it involved mischief; and he could not be called strictly truthful nor honest. But he would not wantonly injure or tyrannize over a smaller boy, and there was nothing mean or malicious about him.

The reading grade levels for our sample of the most popular books in the Muncie Public Library are available here.