This presentation discusses the background of the Encoded Archival Context standard (EAC-CPF) and its potential to enhance collaboration amongst archival institutions. The speakers focus on an early implementation of EAC-CPF at East Carolina University, but they also discuss other local efforts such as the groundbreaking NC-BHIO project.
Approaching Authority: A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University
1. Approaching Authority A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Mark Custer and Jennifer Joyner 2011-03-31 Society of North Carolina Archivists South Carolina Archival Association 2011 Joint Meeting
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39. Questions? A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Mark Custer and Jennifer Joyner 2011-03-31 Society of North Carolina Archivists South Carolina Archival Association 2011 Joint Meeting
40. Bibliography Brown, J. F. (2006). More than just a name: Archival authority control, creator description, and the development of Encoded Archival Context (EAC) (Master's thesis). Burrows, T. (2007). Identity parade: Building web portals about people. OCLC Systems & Services: International Digital Library Perspectives, 23(4), 329-331. doi:10.1108/10650750710831448 Evans, M. (1986). Authority control: An alternative to the record group concept. The American Archivist, 49 (3), 249-261. doi:10.2307/j50000557 Hymas, P. (2007). NCBHIO [PowerPoint slides]. Society of American Archivists Description Section Meeting. Retrieved from: http://www.archivists.org/saagroups/descr/EACNCBHIO.ppt McKim, J. (2002). North Carolina Encoded Archival Description Project. North Carolina Archivist, Winter 2002, 8-11. Retrieved from: http://rtpnet.org/snca/newsletter/volume_66.pdf
41. Bibliography (cont.) Pitti, D. (2004). Creator description: Encoded Archival Context. Cataloging & Classification Quarterly, 38(3), 201-226. doi:10.1300/J104v38n03_16 Sweeney, S. (2008). The ambiguous origins of the archival principle of “Provenance." Libraries & the Cultural Record, 43(2), 193-213. doi:10.1353/lac.0.0017 Thurman, A. C. (2005). Metadata standards for archival control: An introduction to EAD and EAC. Cataloging & Classification Quarterly, 40(3-4), 183-212. doi:10.1300/J104v40n03_09 Veve, M. (2009). Supporting name authority control in XML metadata: A practical approach at the University of Tennessee. Library Resources & Technical Services, 53(1), 41-52. Weimer, L. (2007). Pathways to provenance: "DACS" and creator descriptions. Journal of Archival Organization, 5(12), 33-48. Whittaker, B. M. (2007). DACS and RDA: Insights and questions from the new archival descriptive standard. Library Resources & Technical Services, 51(2), 98.
Hinweis der Redaktion
Differs from library authority records in that it links collections concerning the same creator/entity and provides context of the archival materials Separate from finding aids: Independent resources for researchers and repositories.
-McKim, in his article, uses the Terry Sanford papers as his justification for the creation of the NCEAD project. -Terry Sanford, as a state senator, governor of North Carolina, president of Duke, and a US Senator, has active collections at many institutions This makes it difficult for researchers. They have to search multiple places, as well as visit multiple repositories. -The argument for NCEAD was that it would be a virtual repository for finding aids – a centralized resource for receiving information about dispersed papers. -The same argument can be made for EAC. If implemented on a statewide (or larger) level, the location of collections would be more apparent. It would be the ultimate resource for researchers and other repositories.
New process: Review finding aid. Decide authorized form for creator name. Check the LCNAF, our library catalog. If name does not exist, we create an authorized for according to AACR2. Cataloging staff has had NACO training, but we do not currently submit all of these names for approval. Assign new LC subject headings. Enter creator information and subject headings in database using web form. Regenerate html and index (updates EAD) Upgrade the marc record and submit to OCLC Overlay upgraded record in Symphony Update EAD file on server Time consuming process that leaves many un-cataloged collections: -Typically do not update un-cataloged collections with authorized form of name until the finding aid is cataloged. - Cataloger research to confirm it is indeed the same creator. -Research requires communication with Special Collections staff members and can be time-consuming. For EAC to be fully implemented at ECU, we would need to examine the process and make adjustments so that all collections could receive some cataloging treatment (at least authorized names) before an EAC record is created.
The Social Networks and Archival Context Project Processing EAD records from LoC, OAC, and the NWDA Despite having nearly 124k de-duplicated names present in the current database, if you were to do a search for “Frances Renfrow Doak” here, you won’t find any records. And so, this was one of the primary impetuses behind our desire to work on this preliminary project.
We imported a lot of data into Google Refine, but for the purposes of this presentation, we will focus on our “normal name” column. Immediately you can see that we have some names listed here, like “Alex Albright”, which are in need of being grouped into a single record. In this particular case, that is very easy to do since those strings are exact matches. One of the ways that Google Refine really shines, though, is in its ability to discover a variety of inexact matches with its “cluster and edit” feature.
5 well-established and advanced string comparison techniques. (fingerprint, metaphone, and ppm proved most useful for our data).
The fingerprint algorithm is almost always going to produce meaningful results, especially if you don’t have many names with special character encoding issues. Most importantly, it will highlight things that you won’t discover if you’re just attempting to match exact strings, or even exact strings after the values of have been translated to lower-case.
The default settings in Refine turned out to be pretty good. I did attempt to use the PPM method with its most precise options and this took an extremely long time (and didn’t yield any useful results that we hadn’t already uncovered). Since we had 1842 strings to analayze, for instance, reducing “block chars” down to 1 meant that our computer had to execute nearly 1.7 million different computations (and it just didn’t seem to matter for strings this short). However, in the case of our data, increasing the radius to 1.8 uncovered every permutation of inaccuracies in our names that we needed to ferret out (so, I would suggest experimenting with that variable if using the PPM method on a similar dataset). The Levenshtein function, however, just doesn’t make much sense when doing name comparisons (especially when some of those authorized names [i.e., a lot of extra characters] will have years associated with them, and others won’t)
Another way that Google Refine will helps in a project such as this, is in its ability to add to your data w/o having to do any extra scripting outside of the software package.
Querying the VIAF database with “names”, since we haven’t yet recorded any LCCNs.
Adding those unique IDs into our EAC records. This is our record for Frances Renfrow Doak.
This is the HTML view of the Frances Renfrow Doak EAC record. If we implement EAC into our current EAD database at ECU, we would write our own stylesheet for display (especially to take advantage of some of the extra data that we’ve added to our initial records). For the purposes of this trial, however, I have used a slightly modified XSLT stylesheet that Brian Tingle has created for the SNAC project (and graciously shared online at https://bitbucket.org/btingle/cpf2html).
The “related identities” listed here, for instance, could be added as extra “cpfRelations” to this particular EAC record.
Right now, however, the only cpfRelations that have been added to this record, were those names that you previously saw listed under “Autograph entries” in the online view of this EAD collection. And so, that’s where those 45 people listed here have come from.
And if you scroll down, you’ll see here that “Terry Sanford” also has an external link…
… which takes you to a very bare-bones record already created in the process of the SNAC project (due to an EAD record for a collection housed at Stanford). However, there is another EAC record for Terry Sanford – a much more robust record – which was created over five years ago in 2006.