2. Discussion Points
1. Background
2. Overview of prototype biographical resource and
access system
3. Future directions and open discussion
3. Context
• Research and demonstration project
• Sponsored by NEH
• Grant term: March 2010 – March 2012
• Three partner organizations:
– Institute for Advanced Technology in the Humanities, University of
Virginia
– School of Information, UC Berkeley
– California Digital Library
4. Goals
• Develop tools for extracting EAC-CPF records,
drawing on existing data (EAD finding aids/
collection guides)
• Build a large test corpus of EAC-CPF records
• Create a prototype biographical resource and
access system, using those records
5. What is EAC-CPF?
• Encoded Archival Context – Corporate Bodies,
Persons, and Families
• Standard for encoding archival authority records:
– Authorized name headings for the entity
– Biographical/historical context for the entity
– Links to resources created by the entity, and about the entity
• Collections (represented by EAD finding aids)
• Bibliographic resources, etc.
12. Data Inputs
• EAD Finding Aids
– Online Archive of California [~14,000]
– Northwest Digital Archives (NWDA) [~5,200]
– Library of Congress [~900]
– Virginia Heritage [~8,300]
• Authority Records
– Library of Congress: NACO/LCNAF [~4+ million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~290,000]
– OCLC Research: Virtual International Authority File (VIAF) [intersection with
NACO/LCNAF]
14. Data Flow
• Extract names from EAD finding aids
– Creator names (<...name>) with biographical/organizational histories (<bioghist>)
– Names as subjects (<controlaccess>)
– Names in correspondence series
• Normalize and convert into EAC-CPF; retain link back to EAD(s)
• Match EAC-CPF records against one another and against existing
authority records (ULAN, VIAF, LCNAF)
– Enhance EAC-CPF by normalizing entries, adding alternative entries, titles,
languages used, and sex (VIAF), and historical data (ULAN)
16. Meet the target users
Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and
networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for
information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help
students find topics for papers.
17. Meet the target users
Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and
networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for
information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help
students find topics for papers.
Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how
this site would be useful to their users. Wants to understand how their records were used and what the added value
is.
18. Meet the target users
Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and
networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for
information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help
students find topics for papers.
Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how
this site would be useful to their users. Wants to understand how their records were used and what the added value
is.
Quincy: Library School Student working to QA record matching.
19. Meet the target users
Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and
networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for
information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help
students find topics for papers.
Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how
this site would be useful to their users. Wants to understand how their records were used and what the added value
is.
Quincy: Library School Student working to QA record matching.
Adele: Person doing authority work during collection processing.
20. Meet the target users
Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and
networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for
information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help
students find topics for papers.
Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how
this site would be useful to their users. Wants to understand how their records were used and what the added value
is.
Quincy: Library School Student working to QA record matching.
Adele: Person doing authority work during collection processing.
Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established
programatically.
24. EAC’s Implicit Information Architecture
Expose Schema’s terminology in user interface
Metadata Fields / used mostly for facets
25. EAC’s Implicit Information Architecture
Expose Schema’s terminology in user interface
Metadata Fields / used mostly for facets
XTF Section Types / based on hierarchy of EAC
26. XTF XSLT Framework
pre filter - do special tokenization to create custom EAC facets
query parser - CGI params to XTF query XML
result formatter - XTF results to HTML
doc formatter - EAC-CPF to HTML
http://code.google.com/p/xtf-cpf/
28. social graph visualization
code at https://code.google.com/p/eac-graph-load/
simple JSON access to tinkerpop graph on backend with
javscript on front end in live prototype [graph demo link in
prototype]
graphML file with open license should be viewable in other
tools
29. Linked Data / Open Data
RDFa owl:sameAs links to VIAF
httpRange-14 (XTF URL + “#entity” for the car)
HTML5 microdata chronology
Future: RDF Dump with an Open Data License
based on Ed Summer’s graphML to RDF python script
links to wikipedia and other sources
31. Future Directions?
• From research and demonstration to longer-term resource?
• Integration of merged data back into EAD access systems?
• Distributed cooperative archival authority control that is crowd-sourced by researchers and
curated by archivists?
• Scale up EAD data sources?
• More links to external resources (Wikipedia, WorldCat Identities, openURLs)?
• Social network visualizations/interactive navigation?
• Unique identifiers for EAC-CPF records (ORCID, ISNI, ARK)?
• Standardized name entries for source repositories contributing EAC-CPF records?