2. PRESERVATION
The purpose of preservation is to ensure the continued accessibility
of
an object overtime. Successful preservation requires that
- The object be accessible to users, and
- It retain its unique value to users.
Physical materials suffer damage and decay: the acid present in the
paper damage its fibers, causing it to become brittle and discolored
overtime. Such concerns also apply to digital objects. The physical
storage media will degrade over time or may become corrupted
overtime.
3. Digital Information
Digital information is saved in the form of bits (ones and
Zeroes) which represents the value in binary notation. Such
information cannot be directly interpreted by the user, but rather
requires mediation of software capable of translating that
information into human readable form.
(Look at fig 6.1 on page 83)
Digital Preservation
Digital preservation requires the management of objects
overtime, using techniques that may result in frequent and
profound changes to the technical representation of that record.
There is no significant difference between the preservation of
web resources and any other digital object, and the same
techniques can be applied in each case.
5. Emulation
Emulation is the process of creating a ‘virtual’ version of the
original environment that was used to access a given file. The
virtualized environment is accessed via an emulation application
on modern hardware and software. This allows access to the
original content to be maintained (without changing this content),
through the emulated computer.
Emulation attempts to retain the experience, and the original
form
of the data, and to a degree the performance, but does not
necessarily retain the original form or performance of the
hardware.
6. How it Emulation Works:
1. A contemporary (latest) access environment for a digital object
is encapsulated into an emulated (copy) environment;
2. The emulated environment is accessed using a current
hardware and software platform; and
3. By using the current hardware and software platform to access
the emulated environment, the emulated environment is used
to access the target file.
7. Migration
Migration is the process of converting a piece of digital content
from its original file format into a new format that can more
easily be accessed without having to maintain contemporary
software and hardware.
The basic premise is that the file format
needs to be changed. It might be preferable to store the properties
that have been identified as significant across multiple files, or
using multiple storage mechanisms (e.g., a file and a database).
8. How Migration works
1. Original file format is acquired; and
2. File Format is changed to another format.
9. Renders
An application that runs with current hardware and software is used
to access the digital object.
• The software itself could either be written internally, or procured
from another party.
• It could either be a first party application, if it is written by the
same Organization responsible for creating the file format, or a
third party application in all other cases.
10. PRONOM
The National Archives (TNA) has been actively collecting,
preserving, and making available electronic records for nearly 10
years. TNA’s approach to digital preservation is founded on two
fundamental activities:
- Passive preservation: which provides secure storage, and
- Active preservation: which ensures the continued accessibility
of the stored records over time, and across
changing technologies.
11. Active preservation
Active preservation generates new technical manifestations of
objects through processes such as format migration or
emulation, to ensure their continued accessibility within
changing technological environments.
13. Technical Registries
A registry is an information source that provides a common
reference point for a particular community of users. By registering the
key information concepts, the community can benefit from a shared
understanding of what those concepts mean; in effect it provides a
common vocabulary.
In case of technical registry in digital reservation, these concepts
relate to the technical dependencies of digital objects.
For example, if an object is described as being in JPEG format, and
another is described as being JFIF1.02 format, how can we tell that both
the formats are same.
A file format registry containing standard definitions of each
format, provides a solution: if everyone describes formats with reference
to the registry to the registry definition then all ambiguity is removed. A
standard referencing mechanism can be provided if each registry record
is also assigned a persistent unique identifier.
14. Technical Registries
Not only file formats benefit from registries their use can
potentially be extended to every element of the representation
network, including
- character encoding schemes,
- Compression algorithms
- Software
- Operating systems
- Hardware and storage media
PRONOM the first such operational registry was developed by “
THE NATIONAL ARCHIVES” of the UK (TNA) in 2003 and is
available as a free online resource.
15. Characterization
Before any object can be preserved, it must be understood with
sufficient technical precision. Specifically, it is necessary to
understand the significant properties of the object, which must be
preserve over time if it is to be regarded as authentic, and its
technical characteristics, which will influence the specific
preservation strategies which may be employed. For example: the
resolution and color depth of a image are likely to be considered
fundamental properties to preserve.
Characterization comprises three discrete stages:
- Identification
- Validation
- Property Extraction
16. Identification : Identification typically performed using some
form of signatures, a digital ‘finger print’ which
is unique to a specific format. The simplest
signature is provided by a file extension.
DROID : (Digital Record Object Identification) software
developed by TNA is an example of an identification tool
that uses both internal and external signatures to perform
automated batch identification formats.
Validation: This determine whether the object is well formed
and valid against its formal specification.
Property Extraction : The properties of the object which are
significant to its long term preservation.
17. Preservation Planning
Preservation planning forms the decision making of active
preservation. Its role is to identify and monitor technological
changes and their potential impacts on stored digital objects, and
to develop the necessary detailed preservation plans to mitigate
against those impacts.
18. Preservation Action
Preservation action represents the enactment of the preservation
plan in accordance with the chosen preservation strategy. This
will entail either the migration of objects to new formats or the
development of emulated environments. whatever preservation
plan is adopted, preservation action requires the availability of
specialized software tools.
19. Passive Preservation
Passive preservation is concerned with the secure storage of
digital objects, and the prevention of accidental or unauthorized
damage or loss. As such, passive preservation needs to
encompass the following functions: {Brown, A. 2006}
a. Security and access control
b. Integrity
c. Storage management
d. Content management
e. Disaster recovery
20. Tools for Passive Preservation
With journal prices, especially in the science, technical and
medical (STM) sector, still out of control, more and more authors
and universities want to take an active part in the publishing and
preservation process themselves.
In picking a tool, a library has to consider a number of questions:
• What material should be stored in the repository?
• Is long-term preservation an issue?
• Which software should be chosen?
• What is the cost of setting the system up? and
• How much know-how is required?
21. What is the LOCKSS Program?
LOCKSS (Lots of Copies Keep Stuff Safe), based at Stanford
University Libraries, is an international community initiative that
provides libraries with digital preservation tools and support so
that they can easily and inexpensively collect and preserve their
own copies of authorized e-content. LOCKSS, in its eleventh
year, provides libraries with the open-source software and
support to preserve today’s web-published materials for
tomorrow’s readers while building their own collections and
acquiring a copy of the assets they pay for, instead of simply
leasing them. LOCKSS provides 100% post cancellation access.
http://lockss.stanford.edu/
22. EPrints
EPrints is a tool that is used to manage the archiving of research in the
form of books, posters, or conference papers. Its purpose is not to provide a
long-
term archiving solution that ensures that material will be readable and accessible
through technology changes, but instead to give institutions a means to collect,
store and provide Web access to material.
Currently, there are over 140 repositories worldwide that run the EPrints
software. For example, at the University of Queensland in Australia, EPrints is
used as 'a deposit collection of papers that showcases the research output of UQ
academic staff and postgraduate students across a range of subjects and
disciplines, both before and after peer-reviewed publication.'
EPrints is a free open source package that was developed at the
University of Southampton in the UK
http://www.eprints.org/
23. DSpace
The DSpace open source software has been developed by the
Massachusetts Institute of Technology Libraries and Hewlett-
Packard. The current version of DSpace is 1.2.1.
According to the DSpace Web site the software allows
institutions to capture and describe digital works using a custom
workflow process distribute an institution's digital works over the
Web, so users can search and retrieve items in the collection
preserve digital works over the long term
http://www.dspace.org/
24. Future Trends
International Standards
With the rapid development of information and
communication environment, numerous intellectual works are
available in digital format on the Internet, and those digital
resources have disappearing tendencies soon after their
appearance. Digital archiving is the long-term procedure to
process, manage and preserve those digital objects, which are
considered to have timeless value. Since 1990's, as their long-
term national projects, many countries like Australia, the United
States, and European nations have progressed their online
preservation efforts for digital resources led by their national
libraries with cooperation from other institutions and organizations.
25. OASIS
The National Library of Korea (NLK), with the change of
status of libraries in digital information era, has planned an efficient
national information service to the people with collection of quality
online digital information and provision of public service, to preserve
those intellectual records for the next generations to come.
For the opening of the National Digital Library of Korea in
2008, to collect various web contents, NLK is working on a project for
online digital resource collection and preservation, OASIS (Online
Archiving & Searching Internet Sources www.OASIS.go.kr). The
OASIS system was developed in December 2005, to preserve online
digital resource for the future generation, to collect and preserve
national digital cultural heritage, and to establish standard management
policies for the digital resources.
27. OASIS Approach for Web Resource Collection
Selective Collection of Web Resources
NLK's approach for web archiving is basically a selective
collection. Currently we have two types of objects to collect:
Web sites and Individual web digital resources. They are being
selectively collected by an established collection development
policy. We will expand the target objects into video, image, and
audio gradually.
OASIS Collection Target and Collection Policy
The selection of target resources was based on the utility for the
current or the future information need, author's popularity, the
uniqueness of information, academic contents, being up-to-date
of the information, frequency of upgrading, and the accessibility.
28. OASIS Annual Resource Collection Statistics
The collection started in 2004 and currently OASIS has
156,798 resources in total. The collection size is about
2.4 terabytes.
Table 1. OASIS Resources Collection Statistics (Number of Titles)
Type of Resources 2004 2005 2006 Total
Individual Digital
43,861 45,280 42,958 132,099
Resource
Web Site 1,218 2,716 20,765 24,699
Total 45,079 47,996 63,72 156,798
29. OASIS Workflow and Process
OASIS workflows and processes are described for web
sites and individual digital resources respectively.
The process for web sites does not finalize with one cycle
for mirroring because web sites change their contents
continuously. It is necessary to collect their resources to
preserve them by certain time periods. However, it is
impossible for a manager to monitor numerous web sites
changes manually, and it is considered a waste of
resources to collect every resource unconditionally by a
certain interval to preserve, for example, one month, two
months, or six months.
30. Fig. 1. Workflow for Website Archiving
The selected individual digital resources are collected by a robot.
The robot collects the target resources, checks duplicity,
automatically classifies them according to the classification
system and extracts abstract information. For the processed
individual resources, the manager inputs various metadata,
reviews and corrects to make final catalog to preserve.
31. Future Development Direction
• As knowledge information resources migrate from paper to digital formats,
increasing necessity is found for collection and preservation of digital
knowledge information resources at the national level. Recognizing digital
resources' being short-lived, the OASIS system is running at the national
level led by NLK to collect and preserve valuable digital resources for the
current generation to inherit to the next generation as digital cultural
heritage.
• To accomplish the mission, the OASIS system provides national standard
models for submission of online digital resources to the authority in the
future digital environment and for standardization of collection and
preservation systems for online digital resources.
• Major development technologies are applied to OASIS at the levels of
collection, preservation, management, public service, etc. They include the
development of web robot agents and techniques to use them, automatic
classification and automatic abstracting and others for the collection
process.