Digital preservation from a records management perspective
1. Digital Preservation from a Records
Management Perspective
Michael Day
Research and Development Team Leader
UKOLN, University of Bath
Digital Preservation Roadshow, Manchester, 10 December 2009
UKOLN is supported by:
www.ukoln.ac.uk
A centre of expertise in digital information management
2. Presentation outline
• Records management
• Digital preservation basics
– Digital preservation challenges
– The OAIS Reference Model
– Digital preservation principles and strategies
– Digital preservation tools:
• Preservation planning (Plato)
• Repository audit (TRAC, DRAMBORA)
• Case studies:
– E-mail
– Websites
www.ukoln.ac.uk
A centre of expertise in digital information management
3. Records management (1)
• ISO 15489:2001
– Defines records management as “The field of
management responsible for the efficient and systematic
control of the creation, receipt, maintenance, use and
disposition of records, including the processes for
capturing and maintaining evidence of and information
about business activities and transactions in the form of
records”
www.ukoln.ac.uk
A centre of expertise in digital information management
4. Records management (2)
• ISO 15489:2001 states that records management
includes:
– setting policies and standards;
– assigning responsibilities and authorities;
– establishing and promulgating procedures and
guidelines;
– providing a range of services relating to the management
and use of records;
– designing, implementing and administering specialized
systems for managing records; and
– integrating records management into business systems
and processes.
www.ukoln.ac.uk
A centre of expertise in digital information management
5. Digital preservation challenges (1)
• Technical challenges
– Digital media
• Currently magnetic or optical tape and disks, some
devices (e.g., memory sticks)
• Uncertain lifetimes
– Hardware and software dependence
• Most digital objects are dependent on particular
configurations of hardware and software
• Relatively short obsolescence cycles
www.ukoln.ac.uk
A centre of expertise in digital information management
6. Digital preservation challenges (2)
• Conceptual challenges:
– Three levels of information required:
• Physical layer – unusually a bitstream
• Logical layer – defines how to interpret the bitstream
(through software) to generate meaningful information
(e.g. ASCII, XML, file formats)
• Conceptual layer – real world objects
– Some are analogues of traditional objects, e.g.
meeting minutes, research papers
– Others are not, e.g. Web pages, GIS, 3D models
of chemical structures
» Complex and dynamic
www.ukoln.ac.uk
A centre of expertise in digital information management
7. Digital preservation challenges (3)
– On which of the three layers should preservation
activities focus?
• We need to preserve the ability to reproduce the
objects, not just the bits
• In fact, we can change the bits and logical
representation and still reproduce an ‘authentic’
conceptual object (e.g. by converting a text file into
PDF or TIFF)
• Authenticity and integrity
– How can we trust that an object is what it claims to be?
– Digital information can easily be changed by accident or
design
www.ukoln.ac.uk
A centre of expertise in digital information management
8. Digital preservation basics
• An ongoing approach to managing digital content
based on:
– The identification and adoption of appropriate
preservation strategies
• Creation or Ingest stages are normally the best time
to ensure that data are fit-for-purpose and
“preservable”
– The collection and management of appropriate metadata
• Capture of explicit and implicit knowledge, contexts
– The ongoing monitoring of technical contexts and the
application of preservation planning techniques
– Continual monitoring of the organisation (audit)
www.ukoln.ac.uk
A centre of expertise in digital information management
9. OAIS Reference Model (1)
• Reference Model for an Open Archival Information
System (OAIS)
– ISO 14721:2003 Space data and information transfer
systems -- Open archival information system --
Reference model
– Defines:
• Common vocabulary (definitions of key concepts)
• Information model (information packages, metadata,
etc.)
• Functional model (six functional entities)
• Mandatory responsibilities
www.ukoln.ac.uk
A centre of expertise in digital information management
10. OAIS Reference Model (2)
• OAIS Mandatory Responsibilities:
– Negotiating and accepting information
– Obtaining sufficient control of the information to ensure
long-term preservation
– Determining the "designated community"
– Ensuring that information is independently
understandable, i.e. can be (re)used without the
assistance of those who produced it
– Following documented policies and procedures
– Making the preserved information available
www.ukoln.ac.uk
A centre of expertise in digital information management
11. OAIS Reference Model (3)
P C
Preservation Planning
R O
O Descriptive
DIP N
info.
D Descriptive
queries S
info.
U Data
result sets
U
SIP Management
C Access M
E Ingest orders E
R SIP Archival R
AIP Storage AIP
SIP DIP
Administration
MANAGEMENT OAIS Functional Entities (Figure 4-1)
www.ukoln.ac.uk
A centre of expertise in digital information management
12. OAIS Reference Model (4)
• OAIS Information Model:
– Defines the “Information Packages” required
• Ingest (Submission Information Package)
• Storage (Archival Information Package)
• Access (Dissemination Information Package)
– General principle of Information Packages:
• All objects are wrapped in multiple layers of metadata
(Representation Information, Descriptive Information,
Packaging, etc.)
www.ukoln.ac.uk
A centre of expertise in digital information management
13. OAIS Reference Model (5)
• Implementation fundamentals:
– OAIS is a reference model (a conceptual framework),
NOT a blueprint for system design
– It informs the design of system architectures, the
development of systems and components
– It provides common definitions of terms … a common
language, a means of making comparison
– But it does NOT ensure consistency or interoperability
between implementations
– Conformance only relates to mandatory responsibilities
and following the information model
www.ukoln.ac.uk
A centre of expertise in digital information management
14. OAIS and records management
OAIS Records Management
Main focus on system functions Main focus on wider
and information flows organisational needs
Records management fully
Ingest function implies a
integrated with the business
“custodial” model
function
Fixity (bit level) Authenticity
Negotiating and accepting
Appraisal
information
www.ukoln.ac.uk
A centre of expertise in digital information management
15. The DCC Lifecycle Model
• Digital Curation:
– “…The activity of, managing and promoting the use of
data from its point of creation, to ensure it is fit for
contemporary purpose, and available for discovery and
re-use” (Lord & MacDonald, 2003)
• DCC Digital Curation Lifecycle Model:
– Focused on the entire lifecycle of objects (influenced by
records management and archives thinking) from
creation, through appraisal, ingest, storage, to access
and reuse
– Preservation activities at core of model …
www.ukoln.ac.uk
A centre of expertise in digital information management
17. Digital preservation principles (1)
• Most of the technical problems associated with
long-term digital preservation can be solved if a
life-cycle management approach is adopted
– i.e. a continual programme of active management
– Ideally, combines both managerial and technical
processes, e.g., as in the OAIS Reference Model
– Many current preservation systems are attempting to
support this approach
– Digital preservation strategies need to be seen in this
wider context
• Wherever possible, retain also the original byte-
stream
www.ukoln.ac.uk
A centre of expertise in digital information management
18. Digital preservation principles (2)
• Preservation needs to be considered at a very
early stage in an object's life-cycle
• There is a need to identify 'significant properties'
– Recognises that preservation is context dependent, even
user specific (concept of 'designated community')
– “Performance” model (National Archives of Australia)
– Helps with choosing an acceptable preservation strategy
• Encapsulation
– Surrounding the digital object - at least in theory - with all
of the information needed to decode and understand it
(including software)
www.ukoln.ac.uk
A centre of expertise in digital information management
19. Digital preservation principles (3)
• Metadata and documentation is vitally important
– Relates to OAIS Information Model concepts like
Representation Information and Preservation Description
Information
– Functions
• Records meaning
• Records the context
• Enables the development of finding aids
– Specific standards are being developed that support
digital preservation activities (e.g., the PREMIS Data
Dictionary)
www.ukoln.ac.uk
A centre of expertise in digital information management
20. Digital preservation strategies
• Technology preservation
– Maintaining technology
• Computer museums, digital archaeology
• Emulation
– Running original bit-streams and application software on
emulator programs that mimic the behaviour of obsolete
hardware and operating systems
• Migration
– Periodic transfer of digital information from one hardware
and software configuration to another, or from one
generation of computer technology to a subsequent one
www.ukoln.ac.uk
A centre of expertise in digital information management
21. Choosing a strategy (1)
• Preservation strategies are not in competition
– Different strategies will work together, may be value in
diversification
– Migration strategies mean difficult choices need to be
made about target formats
• But the strategy chosen has implications for:
– The technical infrastructure required (and metadata)
– Collection management priorities
– Rights management
• Owning the rights to re-engineer software
– Costs
www.ukoln.ac.uk
A centre of expertise in digital information management
22. Choosing a strategy (2)
• Plato preservation planning tool (EU Planets
project)
– A decision support tool that helps users explore the
evaluation of potential preservation solutions against
specific requirements and for building a plan for
preserving a given set of objects
– Integrates file format identification (using DROID); some
migration services; XML-based generic format
characterisation using XCL (eXtensible Characterisation
Languages)
– http://www.ifs.tuwien.ac.at/dp/plato/intro.html
www.ukoln.ac.uk
A centre of expertise in digital information management
23. Preservation support on ingest
• Formats can be identified and validated on ingest
or deposit into a repository
– JHOVE (JSTOR/Harvard Object Validation Environment)
– PRONOM, DROID (The National Archives)
• Metadata
– Some tools exist for the automatic capture of metadata
• Standardisation on ingest
– Received wisdom suggests the adoption of open or non-
proprietary standards, e.g. databases structured in XML,
uncompressed images, 'preservation friendly' standards
like PDF/A
www.ukoln.ac.uk
A centre of expertise in digital information management
24. Repository audit frameworks
• Repository audit frameworks first developed out of
the OAIS Reference Model
– OAIS Mandatory Responsibilities (only six of them):
• The main focus was on technical and organisational
aspects, e.g.:
– That repositories ensure that preserved
information (content) can be understood
(independently understandable)
– That documented policies and procedures are
being followed
• No clear concept of OAIS compliance (although this is
often claimed by system developers)
www.ukoln.ac.uk
A centre of expertise in digital information management
25. TRAC Criteria and Checklist (1)
• Trusted Repositories Audit and Certification
(TRAC): Criteria and Checklist
– Background:
• Checklist developed by the RLG-NARA Digital
Repository Certification Task Force
• Revised (following pilot audits) by the Center for
Research Libraries and OCLC
• Based upon OAIS concepts
www.ukoln.ac.uk
A centre of expertise in digital information management
26. TRAC Criteria and Checklist (2)
• TRAC criteria cover three main aspects:
– Organisational Infrastructure
• Governance and viability, structure and staffing, financial
sustainability, contracts, etc.
– Digital Object Management
• Ingest, preservation planning, archival storage, etc.
– Technologies, Technical Infrastructure, & Security
• Systems and infrastructure, etc.
www.ukoln.ac.uk
A centre of expertise in digital information management
27. TRAC Checklist example page
www.ukoln.ac.uk
A centre of expertise in digital information management
28. DRAMBORA
• DRAMBORA (Digital Repository Audit Method
Based on Risk Assessment)
– Digital Curation Centre / Digital Preservation Europe
– “Presents a methodology for self-assessment,
encouraging organisations to establish a comprehensive
self-awareness of their objectives, activities and assets
before identifying, assessing and managing the risks
implicit within their organisation“
– Identifying risks and scoring each one on likelihood and
impact
– Covers: organisational context, policies, assets, risks,
etc.
– Online tool (http://www.repositoryaudit.eu/about/)
www.ukoln.ac.uk
A centre of expertise in digital information management
29. Repository audit frameworks
• A means of "asking the right questions" about your
repository and documenting appropriate
procedures and risks
• Both TRAC and DRAMBORA are under
consideration by (different) ISO technical
committees
– External badge of quality (a "certified preservation
repository")
– vs.
– Management tool for self assessment
www.ukoln.ac.uk
A centre of expertise in digital information management
30. Case study 1: E-mail preservation
• Electronic Mail
– Now ubiquitous in many business contexts
– A mixture of records and other stuff
– High-risk if not managed properly:
• Loss of accountability, efficiency, public credibility,
organisational memory, etc.
• There also may be legal and financial consequences
– An obvious candidate for the records management
approach
www.ukoln.ac.uk
A centre of expertise in digital information management
31. Some specific challenges of E-mail
• Inappropriate content
– For example: spam, personal messages, illegal content
• Wide range of attachment types – some will
provide preservation challenges of their own
• Unclear responsibilities:
– Users can be reluctant to ‘manage’ incoming mail
– E-mail seen as personal domain, not as organisational
property ... this can have consequences …
www.ukoln.ac.uk
A centre of expertise in digital information management
33. "All staff will be reminded of the appropriate use of Number 10
resources" – Downing Street spokesperson
www.ukoln.ac.uk
A centre of expertise in digital information management
35. “The unfortunate incident that has taken
place through the illegal hacking of the
private communications of individual
scientists …” (Rajendra Pachauri,
Chairman of the UN Intergovernmental
Panel on Climate Change, statement, 4
Dec 2009, http://www.ipcc.ch/)
“Since emails are normally intended to be private,
people writing them are, shall we say, somewhat
freer in expressing themselves than they would in a
public statement” (RealClimate Web pages,
http://www.realclimate.org/)
www.ukoln.ac.uk
A centre of expertise in digital information management
36. Approaches to managing e-mail
• Developing specific policies for managing email
within an organisation
– Produce guidance for creators (and others)
– Identify the chain of custody through lifecycle
– Need to involve all people involved, e.g. creators,
managers, records managers, IT staff, etc.
• Developing a preservation approach
– Appraisal - the identification of key e-mail content or
records
– Preservation strategies – the adoption of suitable
strategies to deal with that content that needs to be
retained
www.ukoln.ac.uk
A centre of expertise in digital information management
37. E-mail policies (1)
• Policies need to cover:
– Creation practices
– Using business e-mail accounts for private use & vice
versa
– Levels of organisational monitoring
– Legal issues
– Integrated records retention and preservation
– Disposal
www.ukoln.ac.uk
A centre of expertise in digital information management
38. E-mail policies (2)
From: http://www.hm-treasury.gov.uk/about_record_mngmnt_pol.htm
www.ukoln.ac.uk
A centre of expertise in digital information management
39. E-mail preservation
– Appraisal
• Determining what content needs to be preserved
• Destruction of transient/unnecessary e-mails
– Saving e-mail records independently of the e-mail client
– Check that content is complete - comprising message
body, headers & attachments
– Consider authenticity requirements
– Ingest into an organisational EDRMS or repository
– Make decisions on appropriate preservation strategies for
content and attachments
• Selecting a standard format?
• Significant properties?
www.ukoln.ac.uk
A centre of expertise in digital information management
40. Lost e-mails from the past
• The world’s very first network email
– Sent by Ray Tomlinson (BBN Technologies), late 1971
– A test message, probably something like
“QWERTYUIOP” (documented, but not preserved – the
contents were “entirely forgettable, and I have, therefore,
forgotten them”)
– First ‘real’ message explained to colleagues how to send
messages over the network (exact text now unknown)
– Probably no significant records management
implications, but a key step in the historical development
of the Internet was not recorded
www.ukoln.ac.uk
A centre of expertise in digital information management
41. Case study 2: Preserving Websites
• Websites are ubiquitous:
– “The Web has become the platform and interface of
choice for virtually every kind of information system”
(JISC-PoWR Handbook)
– Typically run by IT staff (e.g., Web managers), main
responsibilities relate to keeping systems online, stable
and secure, and up-to-date … content is constantly
evolving
– Potential role for records managers to identify which
parts of institutional Websites need to be incorporated
within RM guidelines
www.ukoln.ac.uk
A centre of expertise in digital information management
42. Preserving Websites (2)
• Things to consider:
– The identification / appraisal of Web records
– Change frequency
– Ownership and rights
– Databases and the “deep Web”
– The use of Content Management Systems (CMS)
– Streamed content
– The use of third-party sites
– Personalisation / Web 2.0 / social networking
www.ukoln.ac.uk
A centre of expertise in digital information management
43. Preserving Websites (3)
• Collection approaches:
– Various harvesting tools exist (e.g. Heritrix)
– Domain harvesting, selective capture, periodic capture
– Working with third parties – e.g.:
• European Archive (http://www.europarchive.org/)
• Internet Archive (http://www.archive.org/)
• Some examples of existing initiatives:
– UK Government Web Archive (TNA):
http://www.nationalarchives.gov.uk/webarchive/
– UK Web Archive (BL, JISC, Wellcome Library, NLW)
http://www.webarchive.org.uk/ukwa/
www.ukoln.ac.uk
A centre of expertise in digital information management
44. Preserving Websites (4)
• Aspects of Websites that could be preserved:
– Information Content
– Information Appearance
– Information Behaviour
– Information Relationships (e.g. links, embedded or linked
metadata)
– Change history
– Use history
– From: Kevin Ashley (ULCC), “The JISC-PoWR Handbook -
Explaining Web Preservation,” via SlideShare:
http://bit.ly/7GyJbd
www.ukoln.ac.uk
A centre of expertise in digital information management
45. Conclusions
• Records management approaches fit well with
digital preservation requirements
• Both focused on:
– The identification of the specific content that needs to be
managed over a certain period of time (e.g. appraisal,
data audit, selection)
– The creation and capture of appropriate contextual
information and metadata
– The development of appropriate organisational policies
and procedures
– Both involve the consideration of organisational and
technical challenges
www.ukoln.ac.uk
A centre of expertise in digital information management
46. Further reading (1)
• General
– ISO 15489:2001 Information and documentation --
Records management – Part 1: General / Part 2:
Guidelines
– Paradigm Project Workbook:
http://www.paradigm.ac.uk/workbook/
– Tufts-Yale Fedora and the Preservation of University
Records: http://dca.lib.tufts.edu/features/nhprc/reports/
– Plato Preservation Planning tool:
http://www.ifs.tuwien.ac.at/dp/plato/intro.html
– DRAMBORA: http://www.repositoryaudit.eu/about/
www.ukoln.ac.uk
A centre of expertise in digital information management
47. Further reading (2)
• Preserving Emails:
– Maureen Pennock, “Curating E-mails,” In: DCC Curation
Manual (2006): http://www.dcc.ac.uk/resource/curation-
manual/chapters/curating-e-mails/
– The National Archives, Developing a policy for managing
e-mail (2004):
http://www.nationalarchives.gov.uk/documents/managing
_emails.pdf
– Collaborative Electronic Records Project, Email records
guidance (Smithsonian Institution Archives & Rockefeller
Archives Center, 2007):
http://siarchives.si.edu/pdf/CERP_Email_guidance_supp
_0307.pdf
www.ukoln.ac.uk
A centre of expertise in digital information management
48. Further reading (3)
• Preserving Websites:
– JISC-PoWR Handbook (Nov 2008):
http://jiscpowr.jiscinvolve.org/handbook/
– JISC-PoWR blog: http://jiscpowr.jiscinvolve.org/
– The National Archives - Web Continuity project:
http://www.nationalarchives.gov.uk/webcontinuity/
– Adrian Brown, Archiving Websites: a practical guide for
information management professionals (London: Facet
Publishing, 2006)
– Julien Masanès (ed.), Web Archiving (Berlin: Springer-
Verlag, 2006)
www.ukoln.ac.uk
A centre of expertise in digital information management
49. Questions?
“Pigabyte”
King Bladud’s Pigs in Bath
(public art project), Summer
2008
http://www.kingbladudspigs.org/
www.ukoln.ac.uk
A centre of expertise in digital information management
50. Acknowledgments
• UKOLN is funded by the Joint Information
Systems Committee (JISC) of the UK higher and
further education funding councils, the Museums,
Libraries and Archives Council (MLA), as well as
by project funding from the JISC, the European
Union, and other sources. UKOLN also receives
support from the University of Bath, where it is
based.
• More information: http://www.ukoln.ac.uk/
www.ukoln.ac.uk
A centre of expertise in digital information management
Reference: Thibodeau, K. (2002)."Overview of technological approaches to digital preservation and challenges in coming years." In: The state of digital preservation: an international perspective . Washington, D.C.: Council for Library and Information Resources. Available: http://www.clir.org/pubs/abstract/pub107abst.html
References: CCSDS 650.0-B-1. (2002). Reference model for an Open Archival Information System (OAIS): http://www.ccsds.org/documents/650x0b1.pdf ISO 14721:2003. Space data and information transfer systems -- Open archival information system -- Reference model. Geneva: International Organization for Standardization.
References: Nelson, M.L. (2001). "Buckets: a new digital library technology for preserving NASA research." Journal of Government Information , 28(4), 369-394. http://www.cs.odu.edu/~mln/pubs/jgi/jgi-eprint.pdf Universal Preservation Format: http://info.wgbh.org/upf/
References: Nelson, M.L. (2001). "Buckets: a new digital library technology for preserving NASA research." Journal of Government Information , 28(4), 369-394. http://www.cs.odu.edu/~mln/pubs/jgi/jgi-eprint.pdf Universal Preservation Format: http://info.wgbh.org/upf/