Presentation given by Julia Barrett, UCD Library Research Services Manager, at Academic & Special Libraries Annual Seminar 1st March 2013, Dublin, Ireland
Procuring digital preservation CAN be quick and painless with our new dynamic...
Â
UCD Digital Library: Creating online access to historical and contemporary collections - opportunities and challenges
1. Academic & Special Libraries Section
Annual Seminar
UCD Digital Library: Creating online access to
historical & contemporary collections -
opportunities and challenges
Julia Barrett, Research Services Manager, UCD Library
Julia.barrett@ucd.ie
2. Outline
⢠UCD Digital Library
â IVRLA background
â where we are now
â where next?
⢠Opportunities
⢠Challenges
3. 5 year pilot project,
IVRLAâs main goals were access funded by the Irish State and the EU
and preservation. 2005-2010
ďźProof of concept
UCD Ă CLĂIRIGH
INSTITUTE
IRISH
SPECIAL
COLLECTIONS
DIALECT
ARCHIVE ďźBody of digitised content
Humanities-based source
ďźFunctioning repository prototype
material located in 7 physical
ART HISTORY
UCD ARCHIVES locations on campus
GEOLOGICAL
SCIENCES
NATIONAL
FOLKLORE
COLLECTION
4. Sample IVRLA Collections
UCD Archives National Folklore Collection
⢠Papers of William Frazer ⢠Folklore Photograph Collection
⢠Papers of Eugene OâCurry ⢠Questionnaire: Emigration to
⢠Papers of James Meenan America
⢠Boehm/Casement Papers ⢠Questionnaire: Tinkers
[Travellers]
Special Collections ⢠Urban Folklore Project (Dublin)
⢠Beranger Watercolours ⢠Schoolsâ Manuscript Collection â
Carna and Ballinasloe, Co. Galway
⢠Historic Maps Collection
⢠à Lochlainn Collection: Ballads Irish Dialect Archive
⢠19th Century Pamphlet Collection ⢠Manuscript Collection
⢠UCD Letters ⢠Card Collection
⢠The Beckett Country Collection UCD MĂcheĂĄl Ă ClĂŠirigh
⢠Curran Collection: Photographs
⢠UCD MOCI Monograph Collection
⢠O'Donovan/Reeves correspondence
5. Content Diversity
⢠Variety of content types
â Text:
letters, books, pamphlets, ephemera, manuscripts, diaries, ballads, ess
ays
â Audio: sound recordings
â Video: video interviews
â Images: photographs, slides, paintings
â Cartographic: maps
â Datasets: database
⢠30 Core Collections online
⢠17 Research Collections
â which show how research can be done using
existing digital resources and how this can generate even further
research resources in digital humanities
6. Static Repository v. Digital Library
⢠Why are we digitising?
⢠What are we digitising?
⢠For whom?
⢠How will this be of benefit?
⢠How will we know?
⢠How will we provide support?
⢠How will we provide access? Different levels of access?
⢠What infrastructure?
⢠What metadata standards / policies will we use?
⢠What workflows will we use?
⢠What about long-term preservation?
⢠What models will we use?
7. OAIS Model: Open Archival
Information System
⢠OAIS principles (ISO 14721:2003)
â Organisation (people) and systems
â Provides services to identified communities
â Sustainability
⢠Preservation orientation
â Durability and usability
⢠Adherence to best practices/standards
⢠Pay attention to data modelling & workflow
8. OAIS Functional Model (ISO 14721)
SIP = Submission Information Package
AIP = Archival Information Package
DIP = Dissemination Information Package
10. Infrastructure
⢠Fedora 2.2 to 3.5
⢠Utilisation of Solr
â Open source
platform to
enable flexible
and configurable
indexing and
searching
⢠http://lucene.apa
che.org/solr/
⢠JPEG2000
⢠Geospatial
capabilities
11. Collections
⢠54 online, incl. 3 new
â 19th Century Social History Pamphlets
â UCDscholarcast
â Thomas Hardyâs The Return of the Native
⢠1 in final stage
â Desmond FitzGerald Photographs
⢠6 waiting to begin
⢠4 planning stage
⢠4 proposals
12. Collections
⢠External
â 15 collections in Europeana
â 2 collections in ARTstor
Approximately 190,000 digital files in totalâŚand counting
15. In addition to more collectionsâŚ
⢠Workflow and metadata policies
⢠Linked data
⢠Extension of geospatial capabilities
⢠Full-text searching
⢠Implementation of the Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH)
⢠Access policies and user accounts for restricted
content, etc.
⢠Extension of storage and backup capacity
⢠Investigation of Hydra http://projecthydra.org/
16. Workflow and Metadata Policies
⢠Different workflows for different collection
types
⢠Different metadata policies for different
collection types
â MODS (Metadata Object Description Schema); DC
(Dublin Core); ESE (Europeana); EAD (Encoded
Archival Description) â electronic finding aid
â VRA (Visual Resources Association)
â Geospatial (ISO 19115 Geographic Information â
Metadata, etc.)
17. Linked Data
⢠Exposure of
metadata using
semantic web
technologies
â Makes metadata
actionable as data
â Uses RDF (resource
description
framework) data
model
⢠Expressions about
resources in the
form of SUBJECT-
RELATIONSHIP-
OBJECT (âtripleâ)
18. Geospatial: New Web Mapping Framework
⢠Geographic dimension to many resources
⢠New mapping framework implemented to better expose this geospatial
information to its users
⢠Framework provides new tools for finding resources by geospatial criteria
⢠makes use of the geospatial indexing capabilities of its search engine
Solr
⢠can visualise georeferenced information on a map
⢠These enhancements improve user experience but
also lay the foundation for additional geospatial
data and information services planned for the
UCD Digital Library, such as the display of
georectified historic maps.
19. Full-text Searching
⢠âI was trying to search for a person (my aunt, who
worked in Leeson St and was one of the people
who moved the hospital to Nutley Lane). However,
if I understand it correctly, the files are not
searchable. Is this planned and what system would
be used?â
20. Implementation of OAI-PMH
⢠The Open Archives
Initiative Protocol
for Metadata
Harvesting (OAI-
PMH) is a
mechanism for
repository
interoperability
⢠Implementation
would allow for
automatic
harvesting of
records to library
catalogue
21. Hydra http://projecthydra.org/
Hydra provides a versatile
and feature rich
environment for end-users
and repository
administrators alike
⢠Manipulation of images
(e.g. crop, rotation)
⢠Manage workflows
22. UCD Digital Library
Software Hardware
⢠oXygen ⢠i2s SupraScan Quartz A1 HD
⢠FileMaker Pro book scanner
⢠Adobe Photoshop ⢠Kodak iQsmart2 flatbed
⢠Adobe Bridge scanners x 2
⢠Komodo
⢠Scanning workstations
⢠HeidiSQL
⢠Cygwin ⢠External hard drives
⢠NX Client ⢠VM servers
⢠Fedora-Commons
⢠Apache Tomcat
23. UCD Digital Library
Languages File Formats
⢠XML ⢠TIFF
⢠XSLT ⢠JPEG2000
⢠PHP ⢠JPEG
⢠HTML ⢠PDF
⢠PERL ⢠Word
⢠Java ⢠Excel
⢠JavaScript ⢠CSV
⢠Ruby-on-Rails ⢠Shapefiles
⢠SPARQL ⢠WAV
⢠JSON ⢠MP3
⢠etc ⢠MPEG4
24. UCD Digital Library
Metadata Schemas Authorities/Ontologies
⢠MODS ⢠LCSH
⢠DC ⢠Art & Architecture Thesaurus
⢠EAD ⢠MARC country & geographic
⢠ISO 19115 area codes
⢠VRA ⢠LCTGM
⢠NUDS ⢠Logainm - Placenames
⢠ESE Database of Ireland
⢠EDM ⢠VIAF
⢠METS ⢠DBPEDIA
⢠MIX
⢠GeoNames
⢠PREMIS
⢠OpenStreetMap
25. Challenges
⢠Complexity
⢠Range of skills needed
⢠Mainstreaming
⢠Managing expectations
27. Simplified Workflow
Analog
Assessment Rights
Selection Clearance
Digital
Digitisation
Backup
Publish Storage
Preservation
Post
Digital Processing
Library
Ingest Metadata
Workflows Creation
Š University College Dublin
32. Opportunities
⢠Working / collaborating with Schools and
Repositories
⢠Working / collaborating with external organisations
⢠Identifying and using existing scanned collections
⢠Facilitating availability, accessibility and usability
⢠Using technologies to enhance the user experience
⢠Staff development
⢠Library collaboration (?)
34. ⢠Builds relationships
â Schools, Institutes, Repositories, Buildings &
Services, etc.
⢠Collections become more visible and
discoverable
⢠Library role in relation to auditing existing
digital collections on Campus
⢠What collections? Links to anniversaries and
other defined priorities
35. Collaborating with external
organisations
⢠Osi 19th century 5â and 10â town plans
⢠TCDâs 1:25,000
military series
and UCDâs 25â
maps to fill
their gaps
36. Existing scanned collections
⢠May need to be ârescuedâ
⢠May need more work
⢠May help fill mutual gaps
⢠Collaborate â fewer staff, sharing of skills (e.g.
metadata, geo-rectification)
⢠Be clear about who does what and what the outcomes
will be (MOU)
â OSI: 1,000 scans of 5â and 10â 19th towns / cities (2006-
2008); UCD: metadata and geo-rectification
â TCD & UCD map swap
37. Availability and Accessibility
⢠Availability âThe main advantage to having the journal
of primary digitised, for me, was that it greatly
source increased how accessible it was.
material
Prior to the journal being made available in
digital format the only copies (most especially
⢠Easy of the older editions) were those in the
accessibility National Library or those stored in the INMO's
to such own (excellent) library.
materials
As you know, researchers consulting library
copies are somewhat constrained by library
opening hours and by the fact that another
researcher may be using the material on the
same day - rendering it unavailable to youâ.
38. ⢠Enables the
promotion of a
collection to
multiple related
disciplines
43. Staff Development and Library
Collaboration
⢠Build on staff experience
⢠Develop new skills in a growth area
⢠Possibilities for libraries collaborating with
each other?
44. Most Popular Collections
(past 5 months)
⢠1. Schools' Manuscript
Collection - My Home District
2. The Irish Nursing Journals
Collection
3. Folklore Photograph
Collection
4. Schoolsâ Manuscript
Collection - Carna &
Ballinasloe, Co. Galway
5. Folklore Schools 1937-38
6. Tierney/MacNeill
Photographs
7. Questionnaire: Irish Famine
(1845-1852)
8. Historic Maps Collection
9. Folk music
10. UCDscholarcast
45. The Broader Vision âŚ
The UCD Digital Library is âŚ.
⢠A repository of UCDâs digital cultural heritage materialsâŚdiversity
⢠A repository of data of various kinds
⢠A resource discovery framework
⢠A platform for scholarly interaction with digital content
⢠A platform for new forms of digital publications
⢠A platform for the dissemination of the outcomes of UCD research
and creativity
⢠A key component of UCD research infrastructure
⢠A platform for innovation in library services, teaching &
learning, and research
⢠A catalyst for new partnerships between UCD Library and its
extended community
⢠A node in an emergent global environment of linked data
A number of people assisted me with this presentation and so Iâd like to acknowledge Peter Clarke, Orna Roche, John Howard and particularly Audrey Drohan who is here in the audienceâŚ.
IVRLA = Irish Virtual Research Library & Archive
This is the backgroundFunded by the HEAThe IVRLA project developed a proof of concept in relation to creating a body of digitised content. What was important was the testing out of a variety of different types of materials within a range of collections. And to create a repository that would be scaleable into the futurePartnership and relationshipswith repositories very important â to get buy-in from both academics and repositories - something that has been actively continued and built upon. The key repositories and partnerships were âŚ.
UCD MĂcheĂĄl Ă ClĂŠirigh InstituteThese are examples of collections
Qualitative datasets (survey data; media-based documentation of testimonies from field informants; textual/tabular data)Research Collections, e.g.: surveys the holdings of particular thematic areas from a variety of repositories, creating lists and then to digitise a selection of the material. E.g. Towards 2016 project surveyed UCDâs holdings of material relating to the 1916 Rising and to digitise a selection of this primary sourcematerial, arrange them and comment on them and make them available on the IVRLA.So the IVRLA is what preceded the DL. Before I move specifically to the DL Iâm going to look at some of the key questions around DL development and look at the overall framework we are using.
A digital library is something that is dynamic and something that puts the user in the centre.We need to consider:Why are we digitising? â to provide easier access; to enable new research; to add value by linking collections? To facilitate the preservation of fragile materials?What are we digitising? â do we have a collection development policy? Can we digitise everything we want to digitise? Copyright issues? Sensitive materials and ethical issues?For whom? â who are our end users? Identified researcher cohorts? What about the opportunities to build relationships with our repositories and Schools?How will this be of benefit?How will we know?How will we provide access and support?Then the questions around HOWâŚmetadata, workflows, infrastructures, long-term preservation etc.And what overall models will we use?...[OAISâŚnext slide]
An Open Archival Information System (or OAIS) consists of an organization of people and systems, that has accepted the responsibility to preserve information and make information / collections available for a Designated Community.The model advocates adherence to best practices and standards; and also to data modelling and workflowsVisually it looks like thisâŚ.[next slide]
This model is one on which the DL is largelybased.SIP â in a zipped folder we receive the datastreams (i.e. each content item is a datastream and will consist of the image, metadata). This is then ingested (i.e. uploaded plus associated processes) into the DL where it is archived and managed. From the store the access policies are developed (e.g. levels of access), thereby allowing users access following their querying.The AIP is something that we are looking at now â not quite there yet and will be dependent on using Premis metadata schema for preservation purposes.So, where are we now with the DL?
All the IVRLA collections are now in the DLSoâŚlastyear we implemented a new infrastructure, fedora 2.2 to 3.5. This offers improved efficiencies and stabilityThe Solr search engine offers hit highlighting, faceted search, dynamic clustering, etc. Itâs a powerful indexing and retrieval tool.Move to JPEG2000 â weâve moved away from the delivery of data using the sometimes problematic djvu format to JPEG2000.Because of the way the compression engine works, JPEG 2000 supplies a higher quality final image even when zoomed in on.This version of Fedora also allows for geospatial searching â we started to look at that last year â and this year we are further developing that (will talk about later on)We have also increased the number and range of collectionsâŚ..
See ADâs listing19th Century Social History Pamphlets â include a variety of themes: education, health, famine, business etc.UCDscholarcast â downloadable lectures from School of English and elsewhereThomas Hardyâs The Return of the Native â Mnauscript copy of Hardyâs work
2 Georgian buildings collections in ArtStorDigital file = one photo, one page of a book (i.e. one scan), one audio-file etc.
Top right â School Manuscripts collection â My Home district (Sligo). This consists of a series of selected essays by schoolchildren , a topographical description of their own locality which they were encouraged to write. Early 20th centuryBeranger watercolours â beautiful.Joyce and unidentified man.National Folklore Coll. PhotoText of a speech by terenceMacSwiney.Many of the DLâs users are not researchers but members of the public who are exploring the collections for personal perspectivesOther materials are rare and pre-date the founding of the irish state.
Other e.g.:UCD has a partnership with the Franciscans and as a result the DL has some important collections e.g. the Luke Wadding papers, a monograph collection, material culture mendican order collections â really beautiful chalices and other religious objects. Images from multiple perspectives in great detail. This one is the William Ferris Chalice.Other e.g.s here: School games collection (hopscotch); sculpture trail; Austin Clarke collectionâŚAt the moment and over the next few months we are looking atâŚ.[next slide]
Over the next yearâŚin addition to more collectionsâŚ.
Increase in diversity of collections:Currently using the old IVRLA workflow.Different scanning procedures depending on the type of collection e.g. images, coins.A book will have one descriptive record with multiple scansAn photo will have one record and one imageThe sculptures in the UCD Sculpture Trail collection will each have a record but will have many different viewsSo we need to draw up different workflows (some more complex than others â maps are complicated) to which different standards will be applied.There are approx. 30 main metadata standards and many more smaller ones
Linked data is a method of publishing structured data so that it can be interlinked and become more useful.In order to link data successfully names of people, places etc. must be standardised through the use of an authority list e.g. e.g. VIAF - The Virtual International Authority File . So what happens is that the names authority and the value are automatically added to the metadataResources are expressed in the form of a âtripleâ consisting of a subject â relationship â object e.g. âDan Brown wrote the Da vinci codeâ â using a standardised form of Dan brownâs name â VIAF.We are still working on this on backendâŚ.not yet available through the DL front page.
Evaluating how to implement full-text searching this yearSolr (search engine we are using on Fedora) has an in-built OCR (optical character recognition) engine â but we need to develop this.Not everything is OCR-able â e.g. handwriting.
We are also looking at enhancing the user experience and specifically looking at evaluating the Hydra framework - Hydra is a repository solution that is being used by institutions on both sides of the North Atlantic to provide access to their digital content.  Hydra provides a versatile and feature rich environment for end-users and repository administrators alike.Northwestern University Library's Digital Image Library We are looking at security issues around accessing restricted resources â we need to implement more sophisticated access control mechanisms.We need to extend our storage and backup capacityTo underpin all of this, what software, hardware etc are we usingâŚ..next few slides
i2s SupraScan Quartz A1 HD book scanner â book scanner up to A1 â high definition
ISO 19115 - geospatialNUDS: Numismatic Description StandardAuthorities / Ontologies (or âVocabulariesâ)LCTGM LC Thesaurus for Graphic Materials Use of authorities imp. In realising linked dataWhatever standards we are using we are trying to ensure adherence to best practiceI think the number of languages, formats, schemas etc does highlight one of the challengesâŚthat of complexityâŚ.[next slide]
Iâve been working in this area for just over a year â it has taken me some time to get used to new language, concepts, acronyms â this is a wordle created from 3 recent emailsâŚ
Understand the complexityItâs not just about scanningCommunicate the complexityThis is just the workflow to get the item ready for ingest (i.e. from analog to when ingested = upload plus associated processes)Assessment selection will link in to our collection development PolicyCopyright clearanceDigitisation / image processing etc.Etc.
This is a diagram showing how we implement content access.So following ingest to the Fedora server (green)⌠we then make the collection available (via our service components) to our end users in a variety of different waysSo creating digital libraries will need a range of skills â lots of stages.
Range of skills needed, including IT skillsNeed therefore to pull together teams of people from areas not only in the LibrarySpecialised areas â with usually not many people in each area â so vulnerable to loss of expertise. Real challenges where one staff leaving can throw the whole systemKnowledge management therefore is importantSo how does this work in UCD LibraryâŚ.[next slide]
The IVRLA was an externally funded project.The Digital Library is not and so the different areas of activity are centred in the current library structure as part of the ongoing work of the library.Within that we are looking at âmainstreamingâ from Research & Innovation to other library unitsProgrammer â about to recruitCS â just another library collectionIR: managing â same sorts of infrastructure, workflow management etc.
This is a slide showing the Republic of Letters which describes scholarly communities and networks of knowledge in the 18th century. The Iberian Short Title Catalogue (ISTC) is a growing catalogue/dataset of books printed on the Iberian Peninsula up to 1650. A tool for:Location of early print materialsResearch in history of printingSchool of HistoryWhen we showed our academic this he was exceedingly interested â he too could do something similar utilising this type of data visualisation.Among the many advantages of the Fedora platform are the potentials for the augmentation and reuse of the data. Geospatial coordinates and other data can be embedded into the underlying data for both publishers/printers and physical location of the items. This could be repurposed into a data visualisation tool to show timelines of publications, patterns of use, the most influential publishers, etc. But must manage how to whet peopleâs appetite with what is realistic â timewise and with our existing resources. Concentrate on the path rather than a possible end helps to manage expectations.The last section of my presentation looks at the opportunitiesâŚ[next slide]
Libs collaborating with other libraries?
Numismatics project: School of ClassicsIB: School of History1916 photo: UCD ArchivesThomas Hardy manuscript of the Return of the Native â Special CollsIrish Nurses Journal â School of Nursing and the INMOPhotos from the Irish Folklore CommissionSchool of Art History Georgian photos
Library role in relation to auditing existing digital collections on Campus â could be developedWhat collections? Links to anniversaries and other defined priorities â adhere to an overall colldev policy
OSI scanned the maps in 2006-2008 â never made availableBecause of retirements and re-deployment only one member of OSI staff remains who worked on that project â danger of loss of knowledge and loss of collectionsGeorectification will mean being able to overlay the old map with a current map and to make comparisons. With the implementation of our mapping framework weâll be able to include images of buildings etc on these also.Paul Ferguson gave us 600 maps from the Military Series - 1:25,000 scale (1940s)We scanned approx 340 25" OSi mid 20th century maps for TCD
Availability of primary source material (students donât usually see these until 3rd / 4th level) â therefore pedagogic advantage
NursingAlso:Social historyLabour history
Opportunities through the implementation of newer technologiesâŚIf you are teaching about the history of the development of Dublin City and interested in Georgian arch â 2 collectionsâŚvery detailed views â interior and exterior. Building materials, architecture, plasterwork, ornamentation, whole range of things.This is the Custom House
Lord Howthâs house â one of the Dublin County john Roque maps of the 18th century, from UCDâs Special CollectionsMore about mapsâŚ..
Use of geospatial technologies:As already mentioned, many resources in the UCD Digital Library have a geographic .WE are experimenting with geocoding these references, and providing links to external sources where additional information is availableâsuch as geonames.org and OpenStreetMap.
Final 2 opportunitiesâŚ.Develop new skills in a growth area exciting areaPossibilities for libraries collaborating with each other? E.g. metadata policies (but all using the same standards?)
A repository of UCDâs digital cultural heritage materialsâŚdiversityA repository of data of various kinds and provides a framework for resource discoveryA platform for new forms of digital publicationsA platform for the dissemination of the outcomes of UCD research and creativityA platform for innovation in library services, teaching & learning, and researchA proactive way of partnering with our extended community