2. Session overview
• Some definitions
• Digital preservation challenges
• Preservation strategies
• Non-technical issues
• Collaboration, institutional culture, legal issues, costs
• Digital Forensics
3. Definitions
• Digital stewardship, preservation, curation
– Often used interchangeably to mean active management of digital
information over time to ensure its accessibility; “stewardship” is
broadest term
• Born digital
– Information created in digital form (not digitized!)
• Digitize (scan, reformat, “reborn” digital)
– Create a digital copy of an analog original
• Life Cycle
– Stages that digital content moves through from creation to
preservation to access
• Archive/Archival Store/Repository
– System used to accept, store and access specified information with
long term value; provides for secure, redundant and managed
administration; protect content and ensure ongoing access
4. Pictures Rather than Words
Atlas of Digital Damages on Flickr, http://ow.ly/hrJ7i
5. The Digital Preservation Challenge
• Libraries, archives, museums and other cultural
heritage institutions have unparalleled
experience managing analog items
• Digital information is an existential test:
institutions have to figure out a new way of
doing business
• Hard, because most institutions and staff have
limited experience dealing with digital
• Hard, too, because digital presents challenges
6. Problem: Lots and Lots of Data
• Huge volume of digital information—and it is
rapidly growing
• Organizations, governments and individuals
are all information creators
• Some large chunks of this information has
value—actual or potential—from perspective
of archives/libraries
• Which chunks to focus on?
7. Problem: Information Complexity
• Dynamic databases, websites
• Sophisticated specialty uses: CGI, CAD/CAM,
geospatial…
• Highly specialized applications dependent on
deep knowledge: scientific databases
8. Problem: Technological
Dependency/Obsolescence
• Every piece of digital information depends on a stack
of technologies working perfectly together, e.g.:
– File format (pdf, html, doc)
– Storage media (cloud, hard drive, USB drive)
– Application software (reader, browser, app)
– Operating system (Windows XP, Vista, 7)
– Computing device (PC, laptop, smart phone)
• Each layer of the stack is changing
• Ensuring ongoing access requires work, careful
planning
10. What is “Preservation”?
• What does a system need to do with information to provide for
adequate preservation and access, now and in the future?
• Is saving the original files enough? Do they need to be
converted/normalized?
• What metadata needs to be available?
• How important is original “look and feel” compared with
information content?
• Answers to such questions drive strategies, approaches
11. Progress is Evident
• A number of initiatives are tackling the issue
around the world
• Using some common principals, but different
approaches
• Reasons for optimism:
– Important elements of the issue are defined
– Solid conceptual framework exists
– Biggest institutions are deeply engaged
– Extensive cooperation, sharing, open development
12. Mix of Institutional Strategies
• Build Institutional foundations
– Provide mandate and policies for a preservation program
– Trusted Digital Repositories/TRAC
• Develop Internal systems
– Build an infrastructure (Proprietary? Open Source?)
• Use External Services
– Pay for an existing infrastructure
• Learn by doing
– Identify/capture content, rely on iterative improvement
• Collaborate
– Work with others on shared approaches
• Observe and wait
13. Preservation Approaches
• Differences of opinion now exist
• Possible that future approaches will emerge
• Three commonly accepted approaches today:
• Bit preservation
• Migration
• Emulation
• Can rely on one approach or a hybrid
14. Approach: Bit Preservation
Capture information in its original form and
focus on maintaining data integrity: files are
kept unchanged
Advantages Disadvantages
•Lower cost • Useful life of data unclear
•Scalable, practicable • Future functionality (look and feel)
•Works well (so far) at risk
15. Approach: Migration
Transform/normalize data into formats and
structures that are optimal for preservation
Advantages Disadvantages
•Homogeneous data easier • Complex ingest processes
to manage, access • Loss of data, functionality
•Files are preserved with rich • Based on assumptions about future
contextual metadata • IP issues major barrier
•Potential to solve • Scalability, practicality not proven
preservation issues once and
for all
16. Approach: Emulation
Use software to mimic behavior of obsolete
systems to access and use original data
Advantages Disadvantages
• Look and feel preserved • Complex development: may need to
• Potential to solve access issues emulate HW, OS, applications …
once and for all • Technology a moving target: need
• No need to process original files many emulators to reflect changes
• IP issues major barrier
• Scalability, practicality not proven
• Is the emulation right?
17. Preferred Approaches Share Basic Ideas
• No optimal system; iterative improvements will continue
• Keep the original files
• Active management essential
– Move data to new storage media ~5 years
– Monitor data integrity with fixity checks
– Ensure data remains accessible and interpretable
• Make multiple copies and store separately
• Modular approach to tools and services
• Watch for changes in technology and user expectations
18. Preferred Approaches Are Open
• Open architectures:
– Allows adding, upgrading and swapping system
components from different vendors and sources
– Essential not to be locked into one approach: must be able
to easily move data to new platform
– Systems should support interoperability
• Open Standards:
– Published, widely used, consensus based
– Can include open source or commercial products
– Key is transparent understanding of technical basis to
enable data access, manipulation
19. Important Non-Technical Issues
• Collaboration: new models needed for institutions,
communities to work together
• Institutional culture: new policies, leaders need to
integrate analog and digital management, staff need
new skills
• Cost: many variables; economic sustainability is an
issue
20. Copyrighted, Private, Confidential
• Exceptions in U. S. Copyright law for libraries
& archives are outdated
– 3 copy limit
• Societal norms and expectations for privacy
are shifting
– especially on the Internet
• Data mining and other techniques allow for
new kinds of access and new policies
– Social media, personal information
21. Digital Forensics
• Tools and approaches for protecting and
extracting digital information
• Special relevance for all types of digital media,
personal digital archiving
• Basic principles:
– Acquire evidence without alteration
– Do work in accountable, repeatable way
22. Work with Current & Archaic Data
• Must handle current digital information from
mobile devices, networks, live data on remote
computers, flash media, virtual machines,
cloud services and encrypted sources
• Also deal with older information on all
imaginable media—8” floppy disks, punch
cards, ancient hard drives
• Everything to do with computing is either
obsolete or rapidly headed that way
23. Personal Archiving
• “Personal papers” increasingly digital
• Social media, web largely driven by personal
creation
• Personal content characterized by highly
inconsistent structures, formats, provenance
• High risk of incompleteness, questionable
authenticity
24. Forensic Life Cycle (Partial)
• Securing and Evaluating the Scene: ensure safety, confirm computer
equipment present, secure equipment, identify and protect evidence,
conduct interviews
• Documenting the Scene: create a permanent record of the scene by
means of photography and note taking, document condition and location
of computers
• Evidence Collection: collect computer hardware and media while
preserving evidential value, obtain analogue evidence such as passwords,
handwritten notes, computer manuals, printouts
• Forensic Imaging and Copying: e.g. for hard drive – removal of physical
disk from computer, digital preview and capture using physical or logical
disk acquisition, with writeblockers, followed by return of original media
to evidence custodian
Source: Digital Forensics and Preservation, DPC Technology Watch Report 12-03
25. Summary
• Digital information presents tough issues in terms of
preservation and access
• Libraries and archives must address these issues even though
there are no ideal solutions and some open questions
• Initiatives are underway around the world testing different
approaches to preservation
• There are a number of significant non-technical issues
• Digital preservation is also relevant on the personal level;
digital forensics is an emerging sub-specialty
26. For More Information: A Partial List
• Digital preservation: an introduction, UKLON, http://ow.ly/hpoWr
• An Introduction to Digital Preservation, JISC Digital Media,
http://ow.ly/hpp7A
• Curation Reference Manual, Digital Curation Centre, http://ow.ly/hppeR
• Digital Preservation Handbook, Digital Preservation Coalition,
http://ow.ly/hppk2
• Digital Preservation Management Tutorial, Inter-university Consortium for
Political and Social Research, University of Michigan, http://ow.ly/hpprU
• Harnessing the Power of Digital Data for Science and Society, Report of the
Interagency Working Group on Digital Data to the Committee on Science of
the National Science and Technology Council, http://ow.ly/hppxC
• International Study on the Impact of Copyright Law on Digital Preservation,
Library of Congress, JISC, OAK Law, SURFfoundation, http://ow.ly/hppBs
• National Digital Information Infrastructure and Preservation Program,
Library of Congress, http://ow.ly/hppHP
27. For More Information: A Partial List-2
• LIFE3: A Predictive Costing Tool For Digital Collections, Life Cycle Information
for E Literature, University College London Library Services and the British
Library, http://ow.ly/hpoI7
• Open Planets Foundation, http://ow.ly/htqEw
• Preserving Moving Pictures and Sound, DPC Technology Watch Report 12-01
March 2012, http://ow.ly/hoYQx
• Digital Forensics and Preservation, DPC Technology Watch Report 12-03
November 2012, http://ow.ly/hoZiW
• Digital Forensics and Born Digital Content in Cultural Heritage Collections,
http://ow.ly/hpnn3
•Library of Congress digital preservation blog, The Signal, http://ow.ly/hpq0F
• National Digital Stewardship Alliance, Digital Preservation Glossary,
http://ow.ly/hua7X