Input data
TCP data:
- 25,xxxx texts now
- 4,000 more "soon"
- 5,000 more somewhere further up the pipeline
- if EEBO has 125,000 texts, then we're on a path to have > 30% of them.
Example markup
Encomia-related counts:
- 665 (of 25,xxx) texts have encomia.
- 2,335 encomia
- ~2,400 names
Problems w/the TCP data:
- Encomia signatures are not always marked (e.g., in Latin encomia).
- Encomia names are not standardized
- I don't understand the TCP's attribution of authorship (the "Thomas Cecil" problem)./li>
Conversion process
Workflow:
- Automated SGML-to-XML conversion (not to full TEI)
- Automated extraction of encomia and loading of our encomia database.
- Automated merging of TCP metadata (dates, authors and titles) and our encomia database./li>
- Limited automatic correction/spelling standardization of encomia signatures.
- Lots and lots of manual review and correction/spelling standardization.
- Extract data from the database, then build the visualization.
The underlying data
Problems:
- Some character-related issues (long s's) in the SGML-to-XML conversion.
- Some slop in the correction/spelling standardization of encomia signatures.
- Problems with encomia signed by initials, or unsigned, or signed in some non-typical (Latin) way.
Visualization
- The relationships are complicated and tangled, so the visualization is too.
- The tool contains the best way to visualize these relationships.
- We're at the edge of what is possible in terms of interactivity in the browser.
- The circle sizes are suggestive (bigger means more), but not mathematical.
The visualization tool.
Needed Enhancements:
- Google-map style zoom and pan.
- Better/sharper colors.
- Don't display people with only one connection (i.e., simplify the graph).
Next?
- Clean up some issues in the programming (visualization enhancements, conversion issues, database design).
- Another brief round of correction/spelling standardization.
- Explore the data for other kinds of possibilities (types of signatures, book subject matter, etc).
- Staff a summer 2011 project. Focus on fixing encomia signatures, on evaluating the data against paper bibliographies, and on using the data to make preliminary arguments.