Davis Digital Preservation and the Web: Challenges for Libraries

Digital preservation and the
web: challenges for libraries
Corey Davis, Council of Prairie and Pacific University Libraries (COPPUL)
Digital Preservation Coordinator

The big challenge for all of us

“Much of our global
cultural heritage, and
our own individual and
social imprint, is at
serious risk of
disappearing.”
Richard S. Whitt, Corporate Director for
Strategic Initiatives at Google

Keepers “…represents
only about 20% of the
‘continuing resources’
and ‘integrated
resources’ having an
ISSN.”
http://library.ifla.org/121/1/
098-burnhill-en.pdf

Traditional library collections…

The web now…
1. With AJAX and HTML5, the web is transitioning from a document-
centric information space, to an applications-based information
space
2. Content is tailored to people, locations, and devices. There is often
no “canonical version” of a webpage anymore

Amnesiac civilization
• “HTML5, in effect, changes the language
of the Web from HTML to Javascript,
from a static document description
language to a programming language.”
• “I've been warning for some time that
one of the fundamental problems facing
digital preservation is the evolution of
content from static to dynamic.”
• http://blog.dshr.org/2011/08/moonalice-
plays-palo-alto.html

Current preservation services…
• Tend to focus on discrete objects or packages (PDFs, images, XML)
• And the creation of Archival Information Packages (AIPs)
• “I have always thought of the ‘autonomous AIP’ zipped up and held on a
storage device as an residue of paper-thinking.” Jon Tilbury, Preservica (Pasig-
discuss listserv)

Some examples of the challenges
of preserving dynamic web
content

The short tail and long tail
1. CNN http://cnn.com
2. Colonial Despatches https://bcgenesis.uvic.ca/

The short tail: CNN
• “CNN.com has been unarchivable since
2016-11-01T15:01:31”
• http://ws-dl.blogspot.ca/2017/01/2017-01-
20-cnncom-has-been-unarchivable.html

January 20th, 2017, Inauguration Day

• “In short, the archival failure is caused
by changes CNN made to their CDN
(content delivery network); these
changes are reflected in the JavaScript
used to render the homepage.”
• John Berlin http://ws-
dl.blogspot.ca/2017/01/2017-01-20-
cnncom-has-been-unarchivable.html

The long tail:
Colonial Despatches
• “This digital archive contains the
original correspondence
between the British Colonial
Office and the colonies of
Vancouver Island and British
Columbia.”
• https://bcgenesis.uvic.ca/

How can we address these
challenges together?

Working with the long-tail
• Major project at University of Victoria to explore the archiving of
dynamic, interactive websites in the digital humanities
• Working with information producers and developers to create
preservation-friendly applications

Selecting technologies for long-term survival
• “We have settled on building web applications which have
virtually no server-side requirements beyond response to
HTTP requests, but instead are based on client-side HTML5,
JavaScript and Cascading Style Sheets.”
• “Using these core standards, we are building completely
‘static’ websites which can actually function locally in any
current web browser, with no server at all, but which still
preserve virtually all of the appearance and functionality of
the original web applications they replace ”
• Martin Holmes, Programmer/Consultant, University of Victoria
Humanities Computing and Media Centre

Best practices for content creators: Distill.pub
• “A Distill article (at least
in its ideal, aspirational
form) isn’t just a paper.
It’s an interactive
medium that lets users
– ‘readers’ is no longer
sufficient – work
directly with machine
learning models.”
• http://distill.pub/about/

Interactivity and preservation
• “Distill does an excellent job of publishing articles that use
interactivity to provide high-quality explanations … without sacrificing
preservability.”
• David Rosenthal http://blog.dshr.org/2017/05/distill-is-this-what-journals-
should.html

Capturing the dynamic
web: Webrecorder.io
• Developed by Rhizome for
preservation of interactive
online art
• Focus on dynamic web content

Academic publications and CLOCKSS
• Digital preservation
collaboration
between research
libraries and
publishers
• Working to develop
functionality to
harvest dynamic
content from
publishers’ websites

Significant issues
• Costs
• Dynamic content
• Presents significant technical and policy issues for preservation
• Scale
• A technical and financial issue
• Incentives
• Public policy could address some of this
• Proprietary information and DRM
• Copyright legislation for preservation not likely forthcoming

Collaboration is key
• Libraries need to work together
• Libraries and publishers and other content creators need to work
together
• Publishers can practice “preservation in place”

Thanks
• corey@coppul.ca
• @coreyleedavis

Davis Digital Preservation and the Web: Challenges for Libraries

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Davis Digital Preservation and the Web: Challenges for Libraries

Ähnlich wie Davis Digital Preservation and the Web: Challenges for Libraries (20)

Mehr von National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Davis Digital Preservation and the Web: Challenges for Libraries