6. Timeline
• Hire staff
• Convene advisory &
working groups
• Develop infrastructure
• Establish workflows
• Gather data from NY 3Rs
hosted collections
• Create and distribute
guidelines for contribution
• Outreach to LAMS
statewide
• Grow New York content in
DPLA
Phase 1
October 2013 - April 2015
Phase 2
May 2015 - April 2016
10. Regional Partners
Hosted Regional Collections Non-hosted Regional Collections
Member institutions with:
• Locally hosted collections
– Large private institutions
– Small specialized institutions
with local systems
• Collections looking for a
home
12. What’s next
Phase 1
• First contribution to DPLA – happening now! 89,626 records
• Schedule monthly ongoing harvests
• Coordinate outreach and communication through 3R’s regional
liaisons in preparation for Phase 2
Phase 2
• Broaden contribution
• Streamline contribution processes
• Continue adding regionally hosted content
• Work with regional liaisons to include non-hosted partners
statewide
• Grow New York content in DPLA
15. Contribution
ESDN gets different metadata feeds from lots of partners:
CONTENTdm, Islandora, CollectiveAccess, ArchivalWare, etc.
DC, MODS right now.
ESDN maps all that data to MODS.
ESDN sends one stream of MODS data to DPLA.
23. Step by Step
• Get permission.
• Map collection metadata to our MODS for
sending to DPLA.
• Review data feed for inconsistency and mapping
issues (OAI must be configured).
• Check for presence of required and
recommended fields.
25. Get Permission
“I understand that the descriptive metadata – as distinguished from the
digital objects themselves – will be made freely available by the DPLA for
harvesting and re-use under a Creative Commons CC0 license.”
28. Required/Recommended Fields
• Required (per record):
• Title
• Rights
• Link to object in local context
• Link to thumbnail (where applicable)
• Other fields recommended :
• Date
• Place
• Type
• Subject
29. What should I do to get ready?
Administrative/Technical
• Start thinking about the permission letter.
• For local systems, verify that OAI or other
sharing protocols are configured.
30. What should I do to get ready?
Metadata
• Follow standards/local practices consistently.
• Read a bit about shareable metadata.
• Review your rights statements.
DPLA brings together digital resources from America’s libraries, archives and museums and it makes them freely available to the world.
In addition to their searchable portal which provides access to millions of items from libraries, archives, and museums around the US -- DPLA is also a platform with a free and open Application Programming Interface (API) for developers and programmers. The API encourages creative reuse of this shared content.
Where does all that content come from?
DPLA relies on a national network of partners called “Hubs”. Many of DPLA’s hubs are state or regionally based partners – these are the Service hubs shown in this diagram - that gather data from collecting institutions and share that data with DPLA.
ESDN is the New York State service hub for DPLA.
In May 2013 - the NY 3Rs association put out a report that named 6 initiatives as part of their strategic planning process. One of the 6 initiatives was the creation of a New York service hub for DPLA. It was agreed then that METRO would host the hub, but that it would be administered in collaboration with the other 8 regional councils that make up the NY 3R’s Association.
When the hub was announced a 3-year project plan was put in place. The plan was divided into two phases.
As you can see, we are now in the home stretch of Phase 1 which has focused primarily on laying the necessary groundwork to get the hub up and running. Basically, we’ve spent the past year putting the pieces in place to get our hub functional. And now we’re preparing to move into phase 2 this spring. Phase 2 is when the hub will really open for business. That’s when we will begin to open up participation more broadly throughout the state.
Phase 1 – we mostly focused on hosted 3R’s collections and low hanging fruit. We used select collections from regional projects and a few additional partners to get our infrastructure and workflows ironed out.
We roped in the State Library and the State Archives and we wanted to work with at least one large, private institution in the state, Columbia University. These organizations all agreed to provide us with data from their systems so that we could work through our technical processes – but even more importantly they’ve helped us establish our workflows for contribution.
So when we open up contribution more broadly in Phase 2 and begin bringing in content from all sorts of organizations statewide, we have some idea now of what our process needs to look like.
First of all, we’re clear now that we cannot sustain 1 to 1 relationships with every institution in the state that wants to contribute data to DPLA. Instead we will be relying on the 3R’s councils to facilitate the contribution process with new partners. We’re working now to implement a regionally coordinated approach.
We think this will function kind of like a statewide network of mini-hubs. Luckily, the state is already divided into to nine existing service regions.
This is the NY 3Rs councils map. In phase 2 of the ESDN project plan, all communication and contact with potential and existing contributors will be coordinated through the NY3R’s regional councils. Each regional council will host an ESDN liaison (a council staff member) who will be the primary contact for ESDN within a given part of the state. So there will be 9 liaisons statewide that will facilitate contribution to DPLA.
That facilitation will continue to include growing content from these hosted projects we’ve already been working with in Phase 1. We believe these projects will also provide a potential on-ramp to DPLA for small institutions in each region that can’t support their own systems locally.
Of course not every institution in the state has their digital content in one of these hosted projects. So, in addition to growing content from the existing hosted projects, regional liaisons will also facilitate partnerships with institutions that host collections locally.
The ESDN Regional Liaisons are already meeting as a group and working with ESDN staff to coordinate consistent and clear communication statewide.
For the Capital District, your regional liaison for ESDN will be Susan.
Susan has been working closely with us from the beginning. She will take the lead in contacting and communicating with institutions that would like to contribute to DPLA through ESDN.
If over the course of the past year you have reached out to ESDN directly to express interest in contributing – either via the online interest form, or speaking to me directly at conferences – that information has all been tracked and passed on to Susan so she is aware of your interest. Later this spring, once we enter phase 2, Susan will take the lead in reaching out to interested parties.
Some of you may have already heard from Susan and we’re very excited for the interest CDLC members have shown in participating in DPLA, we just ask your patience as we move forward to begin rolling partners into the project.
11,324 from CDLC member institutions in this first batch. (miSci – contributed 5,286 records; Sage – contributed 176 records). We have partners in other regions contributing as few as 1 record. Quantity doesn’t matter. To us it’s about representing the state’s cultural heritage as completely as we can.
Mission/primary goal of a service hub (from a metadata perspective), taken from DPLA Hubs info page.
Want to use this as the basis for a relatively high-level discussion about how we provide metadata records to the DPLA.
Up till this point, majority of content from CONTENTdm hosted collections. CONTENTdm uses Dublin Core. Other systems store data in other metadata formats like MARC or MODS.
So, as we’re pulling records from all kinds of different systems we need to have one common standard that we’re mapping stuff to.
Aggregation process – the business of ESDN in a nutsell.
Partner metadata coming from CONTENTdm, Islandora, CollectiveAccess, ArchivalWare, etc. – currently aggregating from 71 institutions
Pull in data from multiple providers in all kinds of formats. We apply transformations to address weird system quirks and inconsistencies in date fields, etc. We output one single, normalized stream of data to DPLA.
Aggregation process – the business of ESDN in a nutshell.
Partner metadata coming from CONTENTdm, Islandora, CollectiveAccess, ArchivalWare, etc. – currently aggregating from 71 institutions
Pull in data from multiple providers in all kinds of formats. We apply transformations to address weird system quirks and inconsistencies in date fields, etc. We output one single, normalized stream of data to DPLA.
This is a screen grab of one of miSci’s records in New York Heritage (General Electric Photographs Collection). It’s one of 4,689 records in this collection in COTNENTdm. You can see a little snippet of the Dublin Core metadata that displays to the public.
Behind the scenes, when we pull data from New York Heritage, we’re using the OAI-PMH protocol to pull the metadata from this collection into our aggregation tool, REPOX.
Here’s what the OAI-PMH – the raw dublin core that we pull from CONTENTdm - for that same record looks like. When you enter data into a system like CONTENTdm and it might ask you to map it to a dublin core field – that system then outputs this record in an XML feed of dublin core data.
Systems like CONTENTdm often provide OAI capability or plugins so that a system administrator can choose to expose that feed to harvesters like ESDN.
We do this for every collection in NYH that we have permission to contribute to DPLA
Guess what record is included in miSci’s records we pulled from CONTENTdm? Here’s Mr McCune out put from REPOX as MODS which we will deliver to DPLA.
And then, once our records are actually live in DPLA, this is how that data will be displayed in DPLA’s portal.
One of the key pieces of our contribution workflow which will be coordinated through the regional liaisons will be the acquisition of permissions letters from each contributing institution.
Our agreement with DPLA specifies that we need to obtain written permission from libraries and archives before we start harvesting and sharing their metadata. That seems fair. So, we have decided to do this a little less formally than they do in some other states. We are not asking for institutions to sign a legal agreement with ESDN or anything like that. We are simply asking institutions to provide a signed letter on their letter head specifiying that they understand what were doing and that they agree to share their metadata with DPLA through ESDN.
I said “simply” but whether or not getting a letter like this signed is a simple process at any given organization seems to be a coin toss. The thing that makes some institutions nervous is this sentence:
A Creative Commons CC0 license effectively places your metadata in the public domain. All metadata contributed to DPLA is available for download and reuse by anyone for any purpose.
One of DPLA’s goals is to promote innovation and development within the cultural heritage community. Their commitment to open access goes beyond just making metadata searchable through their interface, it also includes making it available to developers and programmers to create new and transformative uses of our collective cultural heritage.
This can only work if all of the metadata contributed to DPLA is unencumbered by copyright restrictions.
We have some more information about CC0 and some additional information about this requirement and what it means will be available on our website. I’d like to encourage you to read more about this and get familiar with the language being used by us and the DPLA about opening up access to metadata. And also the many benefits that go along with this openness.
But please understand, this is a non-negotiable requirement for contribution to the DPLA. So, we do need a signed permission letter on file before we can share your data with the DPLA. For CDLC members, this is one of those things you’ll work with Susan on.
We do not require many fields, more optional and recommended fields. Low-barrier to entry.
One of the most time consuming pieces of the contribution workflow is getting the institutional letters signed by administrators. Some places need to go through a legal council, others don’t. Each local environment is different, so if this might take more time in your environment, you might want to lay a little ground work on this.
Another unexpected hiccup can be configuring your digital collections system to expose your data via OAI-PMH or some other sharing protocol. To be clear – if your collections are in one of the council hosted projects like in New York Heritage – you won’t have to worry about this at all.
Or, if your institution uses a proprietary digital collections tool like CONTENTdm, it might be as easy as clicking on a check box in the administrative module. However if your institution hosts digital projects in an open source system like Islandora or Collective Access you might need to do some work with your tech folks to get that set up so that we can harvest your data.
Again, these are things you can think about starting now, but once ESDN is fully functional and we begin open contributions more broadly, your council liaison will have information and resources for you to help get you step-by-step through the contribution process.
This is a question I get almost everywhere I go? How do we get ready to contribute to DPLA? What do we have to do?
As you can imagine, each institution and in fact each collection at each institution is unique. We really do look at each collection as it comes to us and we deal with data issues collection by collection. That said, here are some basic principles that if you can address some of these things now, it will make the whole processes much easier and may even increase the click backs you receive from the DPLA website after you contribute your data.
This is to say – if you’ve been following standards in metadata creation, that is fantastic. It doesn’t so much matter to us if you’ve followed those standards correctly, as long as you’ve followed them consistently. So, if your cataloger or metadata specialist has entered all of your creator names in a title field and all of your titles in a creator field – that’s fine by us! Because we can programmatically translate that very consistent mistake into the necessary fields on our end. Where things get tricky for us is when half of your title fields contain creator data and the other half contain title data. Then we have no way to distinguish which is which.
Begin thinking and reading about what your records might look like outside of their local context. If someone finds an image titled “boy on dock” within a clearly marked photography collection from the Hudson River Valley Swimmers Memorial collection, that title is going to make a lot of sense. However, if someone finds this record in DPLA or a google search, the title “boy on dock” loses a bit of context and meaning. As we begin to see our cultural heritage in a larger shared context, we’re realizing that what makes sense locally doesn’t always translate when a record is standing on it’s own. That’s what shareable metadata is all about in a nut shell. Google shareable metadata for more.
This one is one where I might get a little bit on my high horse so please bear with me. There is a long standing practice in digital collections of adding a single rights statement for every item in every collection that an institution owns. Existing statements often says something along the lines of: “All rights reserved” Or “Written permission is required” Or something like that. Often – VERY often – those statements are inaccurate. For example, if an all rights reserved statement is attached to a photograph from the 1890s we believe pretty strongly that that statement is inaccurate because a photo from the 1890’s is in the public domain. So, we are definitely encouraging folks – and DPLA is encouraging folks - to take a look at the rights statements in your records. Reconsider usage rights and apply statements that are as accurate as humanly possible.
To be clear, for the time being we will take your records even if you don’t fix any of these issues – and we know that going back and doing clean-up projects like this is no small deal. So, please don’t have metadata shame! Don’t hold back contribution because your data currently has one or all of these issues. The beautiful thing about incremental harvesting is that any changes you make to your records over time will be reflected in DPLA going forward. Clean-up is often a long term process and making your valuable materials accessible is much more important than having perfect records.