1. From Theory to Action
A pragmatic approach to digital preservation
strategies and tools
SHARE MODULE 1 : THEORY vs. ACTION
Lynne M. Thomas
Northern Illinois University
2. In this module, you will...
Understand what digital preservation will actually look like in practice
(Theory vs. Action)
Understand that digital preservation is an incremental process
Be able to make informed decisions about digital preservation tools
and services based on your organizations resources
3. Sponsored By:
Why I’m speaking today.
• Defining Moments Found Some Friends
• Applied for Implementation Grant Received a “Figure It Out” Grant Received NEH grant
4. About me
• Head, Distinctive Collections and
Curator, Rare Books and Special
Collections
• Co-PI on initial IMLS grant
• Collecting contemporary literary
papers for science fiction & fantasy
• Content-focused
6. Clarification: Preservation vs. Access
Long term access (Preservation)
• Purpose: ensure long-term access
• Focus: current & future users
• Relies on proven (reliable) technologies
to preserve digital objects across
generations of technology
• Accumulates metadata over the life cycle
to trace preserved content
• Preservation systems create new
versions of digital objects for access to
deliver as needs change over time
Short term access
• Purpose: provide content to users now
• Focus: current
• Relies on cutting edge technologies to
provide best and fastest access at a point
in time
• Selects metadata needed to use and
understand content
• Access systems deliver objects with user-
oriented services
8. How do we get from here to there?
Solution in Theory
vs.
Solution in Practice
Scary OAIS Spaghetti Monster
Illustrations by Jørgen Stamp digitalbevaring.dk
CC BY 2.5 Denmark
9. Solution in Theory
• OAIS (Open Archival Information
Systems) and other schematic models
• TRAC Certification (Trustworthy
Repositories Audit & Certification)
• TDR ISO 16363 (Trustworthy Digital
Repository ISO Standard)
• Curation Lifecycles that don’t look a
thing like our current workflows
SIPs, AIPs, DIPs and checksums, Oh my!
10. From Theory to Action: Solution in Practice is Iterative
• Starting small is OK! A simple tool may still move you closer to your
goals.
• Not all tools and services are created equal.
• Choices of tools are not forever. They serve what you need now,
selected with an eye to later.
• Today’s hot new tools are tomorrow’s orphans. Focus on workflows!
• Knowing what you have is crucial. Documentation more so.
• You already have many of the necessary skills!
11. Our take on what you need to consider when thinking about your digital stuff…..
Getting it
Understanding it
&
Documenting it
Taking care of it
Letting people use it
…or not!
And a few other
odds & ends…
Solution in Practice
AKA Good Enough DP for real people!!
12. Our take on some things that need to happen or
be considered along the way to this
“Digital Preservation” thing….
We used this to understand the myriad of tools and services
that are out there by mapping them across this lifecycle.
http://digitalpowrr.niu.edu/tool-grid/
13. Let’s Talk About Tools….
Technical skill available + amount of annual funding devoted to DP = range of tools you will be considering
There are front-end/processing
tools like….. Archivematica
Curator’s Workbench
Data Accessioner
BitCurator
And there are back-end
storage/preservation services like…..
MetaArchive
DuraCloud
Amazon Glacier
Fixity
Internet Archive
There are even some services that will pretty much do it all like….
Preservica
Dspace Direct (uses DuraCloud)
ArchivesDIRECT
Note: Yes, there are also CMS’s, IR software,
….ugh. However, these are outside the scope of
this workshop!
*Tools/Services in RED were tested in-depth by POWRR
15. A note about the word “free”…
NOT
Open source software requires resources to install, maintain, and improve it.
16. Things to consider:
How to Decide? Results May Vary…
• How many staff members will be actively engaged in the digital curation lifecycle? Are they tech-savvy?
• How robust and supportive is your technical/systems group? Do you even have one? How about some
developers/programmers…have any of those on staff?
• Does your organization already use archival management software or an Institutional Repository (like
ARCHON/ArchivesSpace, BePress, Fedora etc.)? Consider selecting tools/services that work well with
what you have.
• Do you have digital collections unique to your institution that are irreplaceable? Consider organizing
collections along the lines of those that warrant more robust preservation than others. For example:
1 TB (High Value) MetaArchive (gold standard)
3 TB (Medium Value) Amazon Glacier (cheapest storage with fixity checking)
Rest (Replaceable) Tape Drive Backups
In other words: One tool/service may not be your only solution.
17. How to Decide? Results May Vary…
Remember: Smaller institutions with less resources may also have
unique advantages like….
• Less red tape for getting things done
• Fewer levels to push requests for additional resources through
• Self-administered workstations (aka no IT administrative lock downs)
• Personnel-heavy operating model (usually has smaller cash flow)
• Higher cash flows and less data (like small, private institution)
It doesn’t take years to set
up an account with
something like DuraCloud.
You only need to convince
the person one level above
you to get what you need.
Want to install a
simple open source
tool? Go for it!
This is ideal for running a *free* robust
tool that requires a developer and
server administrator like
Archivematica.
You can purchase a
reasonably-priced, hosted
soup-to-nuts solution.
JAIME
DANIELLE & STACEY - go around and do a quick test to make sure Data Accessioner and Fixity will lauch on the attendees’ machines
Hand out pretests as people walk in
**Keep the the background of POWRR BRIEF**
At the end of this slide, have attendees introduce themselves: Name and institution ONLY!!!!!
What is your role in digital preservation at your institution?
It is my problem administratively (selection/curation/content)
It is my problem technically (IT/systems/programming)
One person shop: it’s just me all the way down
I supervise the person/people whose problem it is
I’m just curious
I make funding decisions
What types of digital materials do you have in your collection…including the backlog?
Unknown material on a variety of media (CD’s, floppy drives, etc.)
Stuff that has been digitized
Video and/or audio files
Institutional electronic records
Data sets
I’m afraid to look
Approximately how old is this material?
< 1 year
1-3 years
3-5 years
6-10 years
>10 years
I don’t actually know
Poll: Your biggest fear/barrier
It’s too hard technically
I don’t even know where to start
I don’t even know what we have
There’s no money
There’s no time
I’m too late to the game (everyone else has this figured out)
No one believes this is a problem
REALLY HONE IN ON ACCESS VS. PRESERVATION. Introduce before the NDSA activity
From Library of Congress: We are focused on preparing content for preservation….the dark part of the lifecycle.
The focus on this workshop is on Preservation, though some of the tools and services we cover have access components.
Of course, some finagling of the materials is necessary before it even gets to a preservation system, which we are going to get some hands-on practice with later today
At what level is your organization currently operating?
Level 4
Level 3
Level 2
Level 1
Level –1
Is there a panic level?
NOTE: After the exercise as we regroup, instead of going through each column for a raise of hands, instead try…
“In any of these areas, are folks at a level 4? Level 3? Level 0.5?”
Make clear interested in the “bulk” of capabilities – will be shorter discussion if we ask people to state Level 1, 2, 3, or 4 rather than everyone sharing details on which specific cells they can do
Lynne
Solution in Theory: Scary OAIS Spaghetti Monster
Solution in Practice: Figure out what you have, start talking to people, build a team and eventually a policy!
ACTUALLY GO OVER WHAT AIPS SIPS AND DIPS ARE..>USE THE SIMPLIFIED LANGUAGE WE PUT IN THE WHITE PAPER
SIP – Ingest (Accessioning)
AIP – Data Management (Storage)
DIP - Access
Checksums – string of numbers & letters generated by algorithm, attached as metadata, to show that a file has not been changed either when moving from one system to another or over time through bit rot.
In just over 10 years, there are 6 repositories that have been able to become certified!!! (Chronopolis Report; Hathitrust Report; Portico Report; Scholars Portal; CLOCKKS, which received the highest score of any org, and now Canadiana.org) OAIS was conceived in 1996 and accepted as an ISO standard in 2002.
A Note: These are all valuable things that benefit the field of digital preservation greatly…. We just don’t want you to become overwhelmed by them and grind to a halt before you take your first steps…like we did!
In the intro (to the 90 page document called the TRAC Criteria & Checklist,” the process came out of an identified need…
“The [OAIS] reference model (ISO 14721) provides a common conceptual framework describing the environment, functional components, and information objects within a system responsible for the long-term preservation of digital materials. Long before it became an approved standard in 2002, many in the cultural heritage community had adopted OAIS as a model to better understand what would be needed from digital preservation systems.
“Institutions began to declare themselves ‘OAIS-compliant’ to underscore the trustworthiness of their digital repositories, but there was no established understanding of ‘OAIS-compliance’ beyond meeting the high-level responsibilities defined by the standard. There were certainly no criteria for measuring compliance.”
Monologue to talk them off the edge. You don’t have to pick the tools/services that will do ALL of the steps from the get go. Start with a step or two. And it doesn’t even have to be with a fancy schmancy tool!
CAVEAT: Time Sensitive. Software changes quickly. Like, REALLY QUICKLY. Reference COPTR
**Show the handout! There is a handout in the packets that show our definitions for all of these.**
We have mapped out the tools we will be discussing
Have a look at the tool grid, and then follow the link to COPTR
Having a basic knowledge of DP…they should already know the difference between storage and preservation. But it may come up as a question anyway.
Explain what we mean by processing and backend. Define difference between DP and IR….interoperability will be important as you make longer-term decisions
Details on these specific tools later today. WE PROMISE. Why did POWRR choose the tools in red? Recommendations from Advisory Board based on perceived best fits with “smaller” orgs, state of the field at that time, availability to test tool within our grant budget and timeframe (some services would require contracts that would take to long to get through procurement and some were simply out of the price range of the grant..and likely our target audiennce as well!
Talking point: Front-end means the initial steps in the Dig Curation lifecycle, and not the “Access” part of things. Perhaps we can make it clearer by clarifying what it archivist/curator facing and what would be end-user facing (access/DIP/etc.)
We can also talk about which phases spit out SIP’s DIP’s and AIP’s and where those are used subsequently and by whom.
Most of these are Open Source or FREE tools.
Orphan Tools like HOPPLA and Curator’s Workbench – a tool that was praised highly a few years ago, and is now no longer supported. Shows how quickly tools and services can change. Be mindful of this when choosing a tool/service.
**REFER TO THE READING – WALK THIS WAY** It references various microservices that help with pre-ingest and inventory. Rates tools from level of difficulty and provides recommendations.
Technical skill available + amount of annual funding devoted to DP = range of tools you will consider. This can also be zero. No technical skills but a bit of money? This will help with decision making for range of tools you’ll consider.
Jaime
Poll Pick 3 or 4 tools/services that you are interested in hearing about more in-depth.
Jaime
Some tools are free open source tools. But it’s important to note that open source tools are free like kitties, and not like beer. There’s A LOT of work and maintenance that goes along with that free kitty, whereas a free beer allows you to sit back and enjoy.
Getting a free kitten doesn’t cost money up front, but there’s a lot of maintenance. Sadly, Open Source is not like a free beer you can sit back and enjoy.
Open source software requires resources to install, maintain, and improve it.
ALSO – if timing allows – ASK AFTER EACH TOOL: DOES ANYONE USE THIS TOOL/SERVICE CURRENTLY? THOUGHTS? QUESTIONS? This has led to some great discussions!
How many staff members will be actively engaged in the digital curation lifecycle? Are they tech-savvy?
How robust and supportive is your technical/systems group? Do you even have one? How about some developers/programmers…have any of those on staff?
Is your institution already using archival management software or an Institutional Repository (like ARCHON/ArchivesSpace, BePress, Fedora etc.)? You’ll want to select tools/services that work well with what you have.
Do you have digital collections unique to your institution that are irreplaceable? Consider organizing collections along the lines of those that warrant more robust preservation services than others. For example: 1 TB (High Value) MetaArchive (gold standard) 3 TB (Medium Value) Amazon Glacier (cheapest storage with fixity checking) Rest (Replaceable) Tape Drive Backups
You can have a lot of archivists and no one tech-savvy. That could be a problem
NOTE: maybe mention that Glacier is available as standalone storage and also a back-end option with X products that we tested?