This presentation was provided by Corey Davis of the University of Victoria during the NISO Virtual Conference, Convergence: The Web and Publishing Onto The Web, held on May 17, 2017
Davis Digital Preservation and the Web: Challenges for Libraries
1. Digital preservation and the
web: challenges for libraries
Corey Davis, Council of Prairie and Pacific University Libraries (COPPUL)
Digital Preservation Coordinator
3. âMuch of our global
cultural heritage, and
our own individual and
social imprint, is at
serious risk of
disappearing.â
Richard S. Whitt, Corporate Director for
Strategic Initiatives at Google
4. Keepers ââŚrepresents
only about 20% of the
âcontinuing resourcesâ
and âintegrated
resourcesâ having an
ISSN.â
http://library.ifla.org/121/1/
098-burnhill-en.pdf
7. The web nowâŚ
1. With AJAX and HTML5, the web is transitioning from a document-
centric information space, to an applications-based information
space
2. Content is tailored to people, locations, and devices. There is often
no âcanonical versionâ of a webpage anymore
8.
9.
10. Amnesiac civilization
⢠âHTML5, in effect, changes the language
of the Web from HTML to Javascript,
from a static document description
language to a programming language.â
⢠âI've been warning for some time that
one of the fundamental problems facing
digital preservation is the evolution of
content from static to dynamic.â
⢠http://blog.dshr.org/2011/08/moonalice-
plays-palo-alto.html
11. Current preservation servicesâŚ
⢠Tend to focus on discrete objects or packages (PDFs, images, XML)
⢠And the creation of Archival Information Packages (AIPs)
⢠âI have always thought of the âautonomous AIPâ zipped up and held on a
storage device as an residue of paper-thinking.â Jon Tilbury, Preservica (Pasig-
discuss listserv)
12. Some examples of the challenges
of preserving dynamic web
content
13. The short tail and long tail
1. CNN http://cnn.com
2. Colonial Despatches https://bcgenesis.uvic.ca/
14. The short tail: CNN
⢠âCNN.com has been unarchivable since
2016-11-01T15:01:31â
⢠http://ws-dl.blogspot.ca/2017/01/2017-01-
20-cnncom-has-been-unarchivable.html
17. ⢠âIn short, the archival failure is caused
by changes CNN made to their CDN
(content delivery network); these
changes are reflected in the JavaScript
used to render the homepage.â
⢠John Berlin http://ws-
dl.blogspot.ca/2017/01/2017-01-20-
cnncom-has-been-unarchivable.html
18. The long tail:
Colonial Despatches
⢠âThis digital archive contains the
original correspondence
between the British Colonial
Office and the colonies of
Vancouver Island and British
Columbia.â
⢠https://bcgenesis.uvic.ca/
23. Working with the long-tail
⢠Major project at University of Victoria to explore the archiving of
dynamic, interactive websites in the digital humanities
⢠Working with information producers and developers to create
preservation-friendly applications
24. Selecting technologies for long-term survival
⢠âWe have settled on building web applications which have
virtually no server-side requirements beyond response to
HTTP requests, but instead are based on client-side HTML5,
JavaScript and Cascading Style Sheets.â
⢠âUsing these core standards, we are building completely
âstaticâ websites which can actually function locally in any
current web browser, with no server at all, but which still
preserve virtually all of the appearance and functionality of
the original web applications they replace â
⢠Martin Holmes, Programmer/Consultant, University of Victoria
Humanities Computing and Media Centre
25. Best practices for content creators: Distill.pub
⢠âA Distill article (at least
in its ideal, aspirational
form) isnât just a paper.
Itâs an interactive
medium that lets users
â âreadersâ is no longer
sufficient â work
directly with machine
learning models.â
⢠http://distill.pub/about/
27. Interactivity and preservation
⢠âDistill does an excellent job of publishing articles that use
interactivity to provide high-quality explanations ⌠without sacrificing
preservability.â
⢠David Rosenthal http://blog.dshr.org/2017/05/distill-is-this-what-journals-
should.html
28. Capturing the dynamic
web: Webrecorder.io
⢠Developed by Rhizome for
preservation of interactive
online art
⢠Focus on dynamic web content
29. Academic publications and CLOCKSS
⢠Digital preservation
collaboration
between research
libraries and
publishers
⢠Working to develop
functionality to
harvest dynamic
content from
publishersâ websites
31. Significant issues
⢠Costs
⢠Dynamic content
⢠Presents significant technical and policy issues for preservation
⢠Scale
⢠A technical and financial issue
⢠Incentives
⢠Public policy could address some of this
⢠Proprietary information and DRM
⢠Copyright legislation for preservation not likely forthcoming
32. Collaboration is key
⢠Libraries need to work together
⢠Libraries and publishers and other content creators need to work
together
⢠Publishers can practice âpreservation in placeâ