This presentation was provided by Ashley Clark, Northeastern University, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
1. The Metadata is the Message
Assessing, Curating and Publishing Data
for the Humanities
Ashley M. Clark
Digital Scholarship Group
Northeastern University Libraries
2. Cultures of Reception
● Initiative to transcribe and publish texts responding to early modern works by
women
● Collected documents include:
○ Reviews (theatrical and literary)
○ Extracts
○ Essays
○ Biographies
● As part of the transcription process, encoders:
○ Note the original source of the document;
○ Identify the work(s) by women mentioned or reviewed;
○ Identify the women creators mentioned or reviewed;
○ Tag the document with relevant themes, formats, and genres; and
○ Classify the reception of the main work on a positive-negative scale.
3. Background
● Cultures of Reception was funded by the National Endowment for the
Humanities
● The Women Writers Project (WWP) began work on this initiative in 2010 at
Brown University
● In 2013, the WWP moved to Northeastern University Libraries' Digital
Scholarship Group (DSG)
● Transcription for Cultures of Reception continued at Northeastern, with an
entirely new group of encoders
4. The State of the Data
● Transcription records were created by encoders in a web interface
● Records stored in CouchDB as JSON objects
● Pain points for WWP:
○ Transcription accomplished with the help of buttons to insert basic XML tags, but:
■ no pretty printing,
■ no well-formedness checks
○ CouchDB requires Javascript knowledge for querying
○ Inconsistent names and titles, for example:
■ "Horace Juvenal" is the same person as
■ "Mary Darby Robinson" is the same person as
■ "Mary Robinson"
7. An Example from the Corpus
1. The Women Writers Project's transcription of
2. "Art. VII. The Wild Irish Girl; a National Tale. By Miss Owenson..." in
volume ns 57 of The Monthly Review; or Literary Journal, which
reviews
3. The London edition of Lady Morgan's The Wild Irish Girl, published
in 1806 by Sir Richard Philips.
8. Publication Challenges
● 690 transcriptions of "reviews"
● Users are unlikely to want to find any one particular review
● Users are very likely to want to explore reviews by
○ Reviewed or mentioned author,
○ Reviewed or mentioned work,
○ Source book or periodical,
○ Publication location,
○ Tags (e.g. theme)
9. Data cleanup
● For longevity within the WWP:
○ Exported records into TEI-encoded XML;
○ Placed records under version control;
○ Created descriptive filenames for reference and display
● For findability:
○ Created canonical metadata entries for authors, works, and sources;
○ Ensured transcription records included identifier references to the canonical entries of subjects of
interest
● For readability:
○ Created descriptive transcription record name
○ Created shorthand titles of works and sources
○ Minimally tidied TEI encoding
12. Women Writers in Review
● The web publication for the Cultures of Reception corpus
● Emphasis on discovery and exploration through links and faceting
● Powered by the same API available to researchers for data access
● Future plans:
○ Incorporate visualizations
○ Highlight temporal and geographic shifts
○ Clean up XML encoding for share-ability