Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Taming the Monster: Digital Preservation Planning and Implementation Tools
1. Taming
the Monster
Digital Preservation Planning
and Implementation Tools
Dorothea Salo
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/ One System, One Library
WorldIslandInfo.com / CC-BY 2.0
2 June 2011
2. Why is this
so scary?
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
3. Isn’t this just
as scary?
Photo: “News Paper Origami Dragon Monster”
http://www.flickr.com/photos/epsos/3777343342/
epSos.de / CC-BY 2.0
4. Yet we
persevere.
Photo: “News Paper Origami Dragon Monster”
http://www.flickr.com/photos/epsos/3777343342/
epSos.de / CC-BY 2.0
5. DIGITAL IS NO
DIFFERENT.
Photo: “559 - The Matrix - Seamless Texture”
http://www.flickr.com/photos/zooboing/4335531915/
Patrick Hoesly / CC-BY 2.0
6. Many of the same ideas apply...
• Planning and policy
• Risk assessment
• Risk management
• (knowing that we can’t save everything)
• Materials quality matters!
• Problem discovery and remediation
• Crisis management
• Chief problems: staff, $$$, organizational
commitment
Photo: “Where I Teach”
http://www.flickr.com/photos/eklektikos/2541408630/
Todd Ehlers / CC-BY 2.0
7. Planning and
assessment
tools
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
8. Scene-setting
• Rosenthal, David. “Requirements for Digital
Preservation: a Bottom-Up Approach.”
• http://www.dlib.org/dlib/november05/rosenthal/
11rosenthal.html
• If you’re new to this, or trying to find your
feet, this is the best short introduction I
know.
• The list of threats is outstanding.
Photo: “Bottoms Up! - Duck; San Anton Gardens, Malta”
http://www.flickr.com/photos/foxypar4/3123113762/
John Haslam / CC-BY 2.0
9. TRAC
• “Trusted Repository Audit Checklist”
• Despite the name, covers a LOT more than
the technology!
!
• Budget
• Staffing
• “designated communities”
• CRL will audit you, if you like
• (don’t, unless you’re really serious!)
• http://catalog.crl.edu/record=b2212602~S1
10. DRAMBORA
• Digital Repository Audit Method Based on
Risk Assessment
• A “self-test,” if you will.
• DRAMBORA is equally good as a pre- or post-test.
• Personally, I prefer DRAMBORA to TRAC,
!
especially for those just starting out.
• http://www.repositoryaudit.eu/
• (registration required for toolkit access)
11. Coping with
file formats
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
12. The one acronym you
need to know: FITS
• “File Information Tool Set”
• (you need to know this; otherwise it’s hard to Google)
• Wrapper for several file-format detector
software packages
• Intended to be baked into other software
• It’s early days yet!
• (This means you can’t always trust what the tools tell
you, especially when they’re telling you about errors.)
13. What’s this file?
• wotsit.org “The Programmer’s File and
Data Resource”
• Directory of file extensions
• When in doubt: open in a browser or text
editor and see what you get.
• N.b.: Microsoft Word is NOT a text editor!
14. Solving the
geographic
distribution
problem
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
15. What problem, now?
• The “all your eggs in one basket” problem.
• If all your bits are on one server, and the server room
is flooded, or your town is nuked—oops.
• Not the same as backups!
• Don’t get me wrong, backups are important!
• Backups are SHORT-TERM, and usually LOCAL.
Geographic distribution (plus associated auditing) is
intended for the long term.
• Don’t forget auditing!
Photo: “Nido”
http://www.flickr.com/photos/italintheheart/3679974298/
Jorge Elías / CC-BY 2.0
16. LOCKSS
• Lots of Copies Keeps Stuff Safe!
• (There is also Portico, but Portico only works with
e‑journal content.)
• Open-source software that handles replication and
(some) auditing.
• “Private LOCKSS network”
• A group of institutions agrees to build a LOCKSS
network just for the stuff they’re interested in.
• ASERL does this for ETDs. Many institutions
(including UW-Madison) participate in a PLN for
govdocs.
17. “The cloud”
• Typical cloud-based storage services make
NO promises they won’t lose your stuff.
• And for large quantities of data, bandwidth can become
an issue.
• And can they look at your stuff? Should they be able to?
• Some early movers in this market fading
• Iron Mountain had to kill their service.
• DuraCloud
• trying to finesse this issue by negotiating tougher SLAs
with cloud-storage providers
Photo: “Sky View From Humboldt Park”
http://www.flickr.com/photos/purpleslog/2589612577/
Purple Slog / CC-BY 2.0
18. Repository
and digital-library
platforms
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
19. Friendly word
of advice:
PICK
SOFTWARE
LAST. Photo: “Briana Calderon; future educator of america.”
http://www.flickr.com/photos/46132085@N03/4703617843/
Arielle Calderon / CC-BY 2.0
20. Another friendly word of
advice:
DON’T CHASE
THE SHINY.
Photo: “Sparkle Texture”
http://www.flickr.com/photos/abbylanes/3214921616/
Abby Lane / CC-BY 2.0
21. Digital-library software
• Is almost always VERY BAD at digital
preservation!
• (most packages don’t even try!)
• So if a file gets corrupted on the server, or whatever...
no warnings, no restore, nothing. Also, provenance?
Who needs provenance? Event tracking? What’s that?
• I’m not saying don’t use it. I’m saying that
it doesn’t solve this problem.
• In fact, if you’re using this software, you need to solve
this problem FOR IT.
Photo: “National DIGITAL Library”
http://www.flickr.com/photos/schex/193912573/
Jesse Schexnayder / CC-BY 2.0
23. Institutional-repository
software
• Is SHOCKINGLY bad at digital preservation!
• (Though sometimes better than most DL software.)
• Examples
• Hosted/commercial: Digital Commons (BePress),
ContentDM, DigiTool
• If you go hosted, you’d better ask about their digital-
preservation practices!
• Open-source: EPrints, DSpace, Fedora
Photo: “IMG_0668”
http://www.flickr.com/photos/12967790@N00/66531124
Robert / CC-BY 2.0
24. A new approach:
curation
microservices
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
25. Do we really need
Photo: “giant crystal blob”
http://www.flickr.com/photos/a_of_doom/527905701/
A of DooM / CC-BY 2.0
THE BLOB?
26. How about a jigsaw
puzzle instead?
• Break the digital-preservation problem
down into parts.
• Code up each part, making sure that it
plays nicely with other parts.
• lots of nice APIs!
• which means other software can adopt/adapt
microservices as well!
• Put parts together as you need them.
Photo: “Lapsana Apogonoides Puzzle”
http://www.flickr.com/photos/gdesigneralex/2313092112/
gdesigneralex / CC-BY 2.0
27. California Digital Library
• Pioneering this approach
• Has open-sourced code for microservices
• Has added microservices together to build
its “Merritt” storage/repository service
28. Escaping the silos:
Fedora Commons
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
29. What is Fedora Commons?
• Blueprints and foundation, not the whole
house (analogy credit to Peter Gorman)
• You build the house you want!
• Or you build condominiums on the same
foundation.
• Need different user interfaces for different materials?
• Need different structures and behaviors?
• No problem! Fedora can handle that.
• (have I run this analogy into the ground yet?)
32. E-records
management
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
33. Axioms
• Records management is
about policy and
procedures.
• If your policy doesn’t fit with
their procedures, guess what
wins? Choose battles wisely.
• There is never enough
storage space.
• Nobody cares until
there’s a crisis.
• Software will not save
you... but it might help!
Photo: “The Never Ending Math Problem”
http://www.flickr.com/photos/acidwashphotography/2967752733/
d3 Dan / CC-BY 2.0
34. Duke Data Accessioner
• Accessioning tool for digital data
• use case: J. Important Scholar dumps her hard drive
on your desk, expects you to cope
• File migrator, metadata manager, GUI,
plugins (e.g. for file-format detection)
• Bit rough, but in production use.
• http://library.duke.edu/uarchives/about/tools/data-
accessioner.html
35. Archivematica
• Soup-to-nuts records management and
digital preservation tool.
• Evaluation and accessioning all the way through
preservation actions. (Oddly, they seem to be
missing disposal... but they’re in alpha, so...)
• Open source
• Runs on a Linux server; RMs and archivists log in to
GUI application remotely.
• Normally I hate and fear silos, but this one
is smartly built on microservices.
36. Practical E-Records
• Weblog by Chris Prom and protegés
• Tool evaluations, conference-session
writeups, essays on praxis
• Best reading out there for the do-it-
yourselfer
• If you’re not reading it, why not?
• http://e-records.chrisprom.com/
37. Last thoughts
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
38. If you can’t do everything...
Image: “Confused”
http://www.flickr.com/photos/kristiand/3223044657/
Kristian D. / CC-BY 2.0
that’s okay. Who can?
39. DO SOMETHING.
Photo: “Came hame háááá!”
http://www.flickr.com/photos/kristiand/3223044657/
Guirí R. Reyes / CC-BY 2.0
40. The worst threat?
INACTION. Photo: “Fatty’s role model”
http://www.flickr.com/photos/cloudzilla/4910616774/
cloudzilla / CC-BY 2.0
41. Thank you!
This presentation is available
under a Creative Commons 3.0
United States license.
Photo: “Happy Easter, to my Peeps”
http://www.flickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0