SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Herbert Van de Sompel
Los Alamos National Laboratory @hvdsomp
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
Old Dominion University @phonedude_mln
http://orcid.org/0000-0003-3749-8116
Martin Klein
Los Alamos National Laboratory @mart1nkle1n
http://orcid.org/0000-0003-0130-2097
To the Rescue of the
Orphans of Scholarly Communication
The project is funded by the Andrew W. Mellon Foundation
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Problem statement
Scholarly objects are everywhere on the web, and are not
systematically archived
• Project perspective
Capturing objects using an institutional & web archiving paradigm
• Object capture flow:
• Step 1: Discovering a researcher’s web identities
• Step 2: Discovering artifacts per web identity
• Step 3: Determining the web boundary per artifact
• Step 4: Capturing resources in the artifacts’ web boundary
Outline
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Scholarship is Evolving
• The research process, not just its outcome, is becoming visible …
on the web
• Massive extension of the scholarly record with an enormous variety
of novel objects
• The objects are heterogeneous, dynamic, compound, inter-related
and distributed across the web
• The objects are often hosted on common web platforms that are not
dedicated to scholarship
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
101 Innovations in Scholarly Communication
Bianca Kramer & Jeroen Bosman. 101 Innovations in Scholarly Communication
https://innoscholcomm.silk.co/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
The Evolving Scholarly Record
Brian Lavoie et al. (2014) The Evolving Scholarly Record
http://www.oclc.org/content/dam/research/publications/library/2014/oclcresearch-evolving-scholarly-
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Web Platforms Record Scholarship
• Increasingly, common web platforms are used for scholarship
• GitHub, Wikis, Wordpress, etc.
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• But, these platforms record rather than archive
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording is not Archiving
“GitHub reserves the right at any time and from time to time to
modify or discontinue, temporarily or permanently, the Service (or
any part thereof) with or without notice.”
GitHub Terms of Service
http://help.github.com/articles/github-terms-of-service
https://help.github.com/articles/github-terms-of-service/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording is not Archiving
GitHub Terms of Service
http://help.github.com/articles/github-terms-of-service
https://opensource.googleblog.com/2015/03/farewell-to-google-code.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording versus Archiving
Recording Archiving
Short-term Longer-term
No guarantees provided Attempt to provide guarantees
Write many/read many Write once/Read many
Scholarly process Scholarly record
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Ian Milligan Mark Matienzo
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Ian Milligan
https://ianmilligan.ca/
https://www.slideshare.net/IanMilligan1
https://github.com/ianmilligan1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Mark Matienzo
http://matienzo.org/
https://www.slideshare.net/anarchivist/presentations
https://github.com/anarchivist
https://osf.io/tgr4k/
https://www.drupal.org/user/380762
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
SlideShare Artifact: 0 Mementos
https://www.slideshare.net/IanMilligan1/resaw-geo-cities
http://timetravel.mementoweb.org/list/20140513211653/https://www.slideshare.net/IanMilligan1/resa
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
GitHub Artifact: 1 Memento
https://github.com/ianmilligan1/Historian-WARC-1
http://web.archive.org/web/*/https://github.com/ianmilligan1/Historian-WARC-1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
The Scholarly Orphans Project
• Funded by the Andrew W, Mellon Foundation
• Los Alamos National Laboratory & New Mexico Consortium
• Old Dominion University
• 04/2016 - 03/2019
• How to capture scholarly orphans for long-term archiving?
• Project explores a paradigm inspired by web archiving
• Scale of the problem
• Bilateral agreements with most web portals unlikely
• Project explores an institution driven paradigm
• Institution should be interested in capturing the artifacts its
scholars deposit on the web
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
An Institutional & Web Archiving Perspective
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Related Work
• LOCKSS
• Web crawling approach
• Focused on journal literature
• Archive-It
• On-demand, subscription-based web archiving
• Not focused on scholarly orphans
• Institutional repository
• Capture an institution’s output
• Focused on manual upload (of journal literature)
• The Locker Project
• Capture an individual’s web presence
• Not focused on scholarly orphans
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Algorithmic Discovery of Web Identities
James Powell et al. (2014) EgoSystem: Where are our alumni?
In: code4lib http://journal.code4lib.org/articles/9519
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Web Identities via a Registry: ORCID
Martin Klein and Herbert Van de Sompel (2017) Discovering scholarly orphans using ORCID
In: JCDL2017 https://arxiv.org/abs/1703.09343
Ian Milligan
http://orcid.org/0000-0002-1470-7723
Mark Matienzo
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Ian Milligan’s ORCID
• Web Identities: 0
http://orcid.org/0000-0002-1470-7723
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Mark Matienzo’s ORCID
• Web Identities: 3
(homepage, ScopusID,
ResearcherID)
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Mark Matienzo’s Home Page
• URI to GitHub
repository, Twitter
• Could be included in
ORCID profile
http://matienzo.org/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Evaluation of ORCID for automatic discovery of Web Identities
• How well does ORCID represent the global community of active
researchers?
• Adoption rate
• Subject coverage
• Geo-location coverage
• How well does ORCID score when it comes to listing Web Identities?
Discovery of Web Identities via a Registry: ORCID
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Adoption Rate
2013 2014 2015 2016
05000001000000150000020000002500000
ORCIDs total
ORCIDs with given names
ORCIDs with first names
ORCIDs with works
ORCIDs with affiliations
ORCIDs with web identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Subject Coverage
0
10
20
30
40
50
60
Other
Life Sciences
Physical
Sciences
Mathematics and
Computer Sciences
Education
Psychology and
Social Sciences
Engineering
Humanities and Arts
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
ORCID Subjects
Ph.D. Researchers
Publications
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Web Identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Web Identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Web Identities via a Registry: ORCID
• Adoption rate is increasing
• Subject coverage is focused, does not cover all disciplines equally
• Geo-Location coverage is good but not quite representative
• Web Identity coverage is poor; not usable for our purpose in its
current state
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 2
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Algorithmic approach
• Scrape artifacts from
pages
http://matienzo.org/publications/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Notifications
• Subscribe to portal
notifications about a
researcher’s new
artifacts
https://www.slideshare.net/anarchivist/presentations
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Artifact Registry
• 5 artifacts of interest
(standards document,
reports, book reviews)
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 3
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Determination of Web Boundary per Artifact
http://signposting.org
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
Mark Nottingham (2010) RFC5988: Web Linking.
http://tools.iets.org/rfc/rfc5988.txt
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Signposting - Publication Boundary Pattern
http://signposting.org/publication_boundary/oxford/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Signposting - Bibliographic Metadata Pattern
http://signposting.org/bibliographic_metadata/springer/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 4
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Legal
• robots.txt
• Licenses
• Technical
• Capture tools
• Capture quality
• Capture authenticity
Challenges Regarding Capturing Web Artifacts
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Legal Challenges re Capturing Artifacts – A Wake-Up Call
SlideShare
• robots.txt unclear, some pages disallowed
• License seems to prohibit archiving
GitHub
• robots.txt unclear, some pages disallowed
• License seems to allow archiving
Drupal
• robots.txt allows relevant URIs
• License seems to prohibit archiving
Open Science Framework
• robots.txt does not disallow crawlers
• License does not mention archiving, individual content may have
specific license
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s SlideShare
Live
Internet Archive
Webrecorder.io
https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name
http://web.archive.org/web/20161229053246/http://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name
https://webrecorder.io/martinklein/cni_test/20170330014029/https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-
the-power-to-name
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s GitHub
Live
Internet Archive
Webrecorder.io
https://github.com/rightsstatements/rightsstatements.github.io
https://web.archive.org/web/20170328040646/https://github.com/rightsstatements/rightsstatements.github.io
https://webrecorder.io/martinklein/cni_test/20170330014135/https://github.com/rightsstatements/rightsstatements.github.io
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s OSF
Live
Internet Archive
Webrecorder.io
https://osf.io/h4ru8/wiki/home/
http://web.archive.org/web/20170328042647/https://osf.io/h4ru8/wiki/home/
https://webrecorder.io/martinklein/cni_test/20170330014219/https://osf.io/h4ru8/wiki/home/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Quality - How well was this page archived?
• Continuing research on memento
damage, first published at JCDL 2014
• Premise: simply reporting “9/10
embedded images were archived” is
insufficient to describe how well the
archive / replay system performed
• Use heuristics from Mechanical Turk
testing to approximate human
conception of damage, e.g.:
o increase weight of missing images
that are large, or centered in the
viewport
o stylesheets can be important!
check for “ugly” results
J.F. Brunelle, M. Kelly, H. SalahEldeen M. C. Weigle, and M. L. Nelson (2014) Not all mementos are created equal:
Measuring the impact of missing resources. In: JCDL 2014
http://dx.doi.org/10.1109/JCDL.2014.6970187 http://dx.doi.org/10.1007/s00799-015-0150-6
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Triptych CSS
“regular” web
pages have nearly
equal distribution
of content over
each third of a
page
if a CSS is missing
AND > 75% of the
non-background
color is in the left
2/3s of the page,
then users
consider this
damaged
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
A Memento Damage Service, Python Library, and Docker Image
Erika Siregar
http://memento-damage.cs.odu.edu/
https://github.com/erikaris/web-memento-damage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Just a Little Bit of Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Moderate Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Significant Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Ian’s GitHub Memento…
https://github.com/ianmilligan1/Historian-WARC-1
http://web.archive.org/web/20130922192416/https://github.com/ianmilligan1/Historian-WARC-1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
… Has Slight Damage
does not
appear to
violate the
“75% / left-
2/3s” rule
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Authenticity - Has this page been tampered with?
• The days of implicitly trusting Brewster
& IA are over
o the people who brought you
fake news will eventually bring you
fake archives
o “mo’ archives mo’ problems”
• Premise: use multiple, independent
archives to record fixity information
from dated observations of mementos
• Plans:
o blockchain
o provenance (i.e., a memento of
memento != 2 independent
mementos)
https://climate.nasa.gov/vital-signs/carbon-dioxide/
http://web.archive.org/web/20170312201332/https://climate.nasa.gov/vital-signs/carbon-dioxide/
http://localhost:8282/michael/wayback/20170313023607/https://climate.nasa.gov/vital-signs/carbon-dioxide/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Push a Web Page into Multiple Archives
Mohamed Aturban (2017) Archive Now (archivenow): A Python Library to Integrate On-Demand
Archives http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Record Fixity in a Manifest File
Shawn Jones (2016) Mementos In the Raw, Take Two
http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Publish Manifest to the Web
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Archive the Manifest
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
“You can’t tell the players without a scorecard” – Harry M. Stevens
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Verifying the Authenticity of a Memento
• Given a Memento, URI-M, that we wish to verify
• Lookup the URI-M at a manifest server
o e.g, captureproject.org/{URI-M}
• Discover all the mementos of the manifest, and verify their
integrity with “trusty URIs”
• For each URI-M listed in the manifest, repeat the fixity calculation
as described in the manifest
• Vote if fixity matches (not tampered with) or if fixity doesn’t match
(tampered with)
o Majority vote wins (assuming independent archives)
Mohamed Aturban (2017) Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital
Artifacts for Linked Data” http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
Video at https://www.youtube.com/watch?v=EY15lj-7_lc
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discussion
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Herbert Van de Sompel
Los Alamos National Laboratory @hvdsomp
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
Old Dominion University @phonedude_mln
http://orcid.org/0000-0003-3749-8116
Martin Klein
Los Alamos National Laboratory @mart1nkle1n
http://orcid.org/0000-0003-0130-2097
To the Rescue of the
Orphans of Scholarly Communication
The project is funded by the Andrew W. Mellon Foundation

Weitere ähnliche Inhalte

Was ist angesagt?

Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.ismaturban
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsJustin Brunelle
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerWiLS
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?Richard Wallis
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...Justin Brunelle
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URILulwahMA
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
 

Was ist angesagt? (20)

Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URI
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 

Ähnlich wie To the Rescue of the Orphans of Scholarly Communication

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Trish Rose-Sandler
 
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...Brian Hole
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersBrian Hole
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsNicole Vasilevsky
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
PRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata ExchangePRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata ExchangeBrian Hole
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationJisc
 
Publishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourPublishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourBrian Hole
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...John Domingue
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Martin Klein
 
Discovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaDiscovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaMatthew Lease
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeBrian Hole
 

Ähnlich wie To the Rescue of the Orphans of Scholarly Communication (20)

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate Students
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
PRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata ExchangePRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata Exchange
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communication
 
Publishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourPublishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising Rigour
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
 
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
Discovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaDiscovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social Media
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
 

Mehr von Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_mementoMartin Klein
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print VersionsMartin Klein
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Martin Klein
 

Mehr von Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_memento
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print Versions
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016
 

Kürzlich hochgeladen

Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteMavein
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSedrianrheine
 
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxA_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxjayshuklatrainer
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxnaveenithkrishnan
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSlesteraporado16
 
Niche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusNiche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusSkylark Nobin
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilitiesalihassaah1994
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
 
world Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxworld Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxnaveenithkrishnan
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpressssuser166378
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdfShreedeep Rayamajhi
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Shubham Pant
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024Jan Löffler
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsRoxana Stingu
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfmchristianalwyn
 

Kürzlich hochgeladen (15)

Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a Website
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
 
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxA_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptx
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
 
Niche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusNiche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus Bonus
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilities
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
 
world Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxworld Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptx
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpress
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
 

To the Rescue of the Orphans of Scholarly Communication

  • 1. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp http://orcid.org/0000-0002-0715-6126 Michael L. Nelson Old Dominion University @phonedude_mln http://orcid.org/0000-0003-3749-8116 Martin Klein Los Alamos National Laboratory @mart1nkle1n http://orcid.org/0000-0003-0130-2097 To the Rescue of the Orphans of Scholarly Communication The project is funded by the Andrew W. Mellon Foundation
  • 2. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 • Problem statement Scholarly objects are everywhere on the web, and are not systematically archived • Project perspective Capturing objects using an institutional & web archiving paradigm • Object capture flow: • Step 1: Discovering a researcher’s web identities • Step 2: Discovering artifacts per web identity • Step 3: Determining the web boundary per artifact • Step 4: Capturing resources in the artifacts’ web boundary Outline
  • 3. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Scholarship is Evolving • The research process, not just its outcome, is becoming visible … on the web • Massive extension of the scholarly record with an enormous variety of novel objects • The objects are heterogeneous, dynamic, compound, inter-related and distributed across the web • The objects are often hosted on common web platforms that are not dedicated to scholarship
  • 4. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 101 Innovations in Scholarly Communication Bianca Kramer & Jeroen Bosman. 101 Innovations in Scholarly Communication https://innoscholcomm.silk.co/
  • 5. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 The Evolving Scholarly Record Brian Lavoie et al. (2014) The Evolving Scholarly Record http://www.oclc.org/content/dam/research/publications/library/2014/oclcresearch-evolving-scholarly-
  • 6. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Web Platforms Record Scholarship • Increasingly, common web platforms are used for scholarship • GitHub, Wikis, Wordpress, etc. • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • But, these platforms record rather than archive Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 7. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Recording is not Archiving “GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.” GitHub Terms of Service http://help.github.com/articles/github-terms-of-service https://help.github.com/articles/github-terms-of-service/
  • 8. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Recording is not Archiving GitHub Terms of Service http://help.github.com/articles/github-terms-of-service https://opensource.googleblog.com/2015/03/farewell-to-google-code.html
  • 9. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Recording versus Archiving Recording Archiving Short-term Longer-term No guarantees provided Attempt to provide guarantees Write many/read many Write once/Read many Scholarly process Scholarly record Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 10. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Ian Milligan Mark Matienzo
  • 11. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Ian Milligan https://ianmilligan.ca/ https://www.slideshare.net/IanMilligan1 https://github.com/ianmilligan1
  • 12. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Mark Matienzo http://matienzo.org/ https://www.slideshare.net/anarchivist/presentations https://github.com/anarchivist https://osf.io/tgr4k/ https://www.drupal.org/user/380762
  • 13. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 SlideShare Artifact: 0 Mementos https://www.slideshare.net/IanMilligan1/resaw-geo-cities http://timetravel.mementoweb.org/list/20140513211653/https://www.slideshare.net/IanMilligan1/resa
  • 14. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 GitHub Artifact: 1 Memento https://github.com/ianmilligan1/Historian-WARC-1 http://web.archive.org/web/*/https://github.com/ianmilligan1/Historian-WARC-1
  • 15. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 The Scholarly Orphans Project • Funded by the Andrew W, Mellon Foundation • Los Alamos National Laboratory & New Mexico Consortium • Old Dominion University • 04/2016 - 03/2019 • How to capture scholarly orphans for long-term archiving? • Project explores a paradigm inspired by web archiving • Scale of the problem • Bilateral agreements with most web portals unlikely • Project explores an institution driven paradigm • Institution should be interested in capturing the artifacts its scholars deposit on the web
  • 16. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 An Institutional & Web Archiving Perspective
  • 17. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Related Work • LOCKSS • Web crawling approach • Focused on journal literature • Archive-It • On-demand, subscription-based web archiving • Not focused on scholarly orphans • Institutional repository • Capture an institution’s output • Focused on manual upload (of journal literature) • The Locker Project • Capture an individual’s web presence • Not focused on scholarly orphans
  • 18. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow
  • 19. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 1
  • 20. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Algorithmic Discovery of Web Identities James Powell et al. (2014) EgoSystem: Where are our alumni? In: code4lib http://journal.code4lib.org/articles/9519
  • 21. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Web Identities via a Registry: ORCID Martin Klein and Herbert Van de Sompel (2017) Discovering scholarly orphans using ORCID In: JCDL2017 https://arxiv.org/abs/1703.09343 Ian Milligan http://orcid.org/0000-0002-1470-7723 Mark Matienzo http://orcid.org/0000-0003-3270-1306
  • 22. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Ian Milligan’s ORCID • Web Identities: 0 http://orcid.org/0000-0002-1470-7723
  • 23. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Mark Matienzo’s ORCID • Web Identities: 3 (homepage, ScopusID, ResearcherID) http://orcid.org/0000-0003-3270-1306
  • 24. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Mark Matienzo’s Home Page • URI to GitHub repository, Twitter • Could be included in ORCID profile http://matienzo.org/
  • 25. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 • Evaluation of ORCID for automatic discovery of Web Identities • How well does ORCID represent the global community of active researchers? • Adoption rate • Subject coverage • Geo-location coverage • How well does ORCID score when it comes to listing Web Identities? Discovery of Web Identities via a Registry: ORCID
  • 26. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Adoption Rate 2013 2014 2015 2016 05000001000000150000020000002500000 ORCIDs total ORCIDs with given names ORCIDs with first names ORCIDs with works ORCIDs with affiliations ORCIDs with web identities
  • 27. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Subject Coverage 0 10 20 30 40 50 60 Other Life Sciences Physical Sciences Mathematics and Computer Sciences Education Psychology and Social Sciences Engineering Humanities and Arts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ORCID Subjects Ph.D. Researchers Publications
  • 28. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 29. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 30. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 31. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Web Identities
  • 32. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Web Identities
  • 33. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Web Identities via a Registry: ORCID • Adoption rate is increasing • Subject coverage is focused, does not cover all disciplines equally • Geo-Location coverage is good but not quite representative • Web Identity coverage is poor; not usable for our purpose in its current state
  • 34. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 2
  • 35. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Algorithmic approach • Scrape artifacts from pages http://matienzo.org/publications/
  • 36. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Notifications • Subscribe to portal notifications about a researcher’s new artifacts https://www.slideshare.net/anarchivist/presentations
  • 37. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Artifact Registry • 5 artifacts of interest (standards document, reports, book reviews) http://orcid.org/0000-0003-3270-1306
  • 38. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 3
  • 39. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Determination of Web Boundary per Artifact http://signposting.org
  • 40. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links Mark Nottingham (2010) RFC5988: Web Linking. http://tools.iets.org/rfc/rfc5988.txt
  • 41. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links
  • 42. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links
  • 43. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Signposting - Publication Boundary Pattern http://signposting.org/publication_boundary/oxford/
  • 44. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Signposting - Bibliographic Metadata Pattern http://signposting.org/bibliographic_metadata/springer/
  • 45. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 4
  • 46. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 • Legal • robots.txt • Licenses • Technical • Capture tools • Capture quality • Capture authenticity Challenges Regarding Capturing Web Artifacts
  • 47. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Legal Challenges re Capturing Artifacts – A Wake-Up Call SlideShare • robots.txt unclear, some pages disallowed • License seems to prohibit archiving GitHub • robots.txt unclear, some pages disallowed • License seems to allow archiving Drupal • robots.txt allows relevant URIs • License seems to prohibit archiving Open Science Framework • robots.txt does not disallow crawlers • License does not mention archiving, individual content may have specific license
  • 48. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s SlideShare Live Internet Archive Webrecorder.io https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name http://web.archive.org/web/20161229053246/http://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name https://webrecorder.io/martinklein/cni_test/20170330014029/https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and- the-power-to-name
  • 49. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s GitHub Live Internet Archive Webrecorder.io https://github.com/rightsstatements/rightsstatements.github.io https://web.archive.org/web/20170328040646/https://github.com/rightsstatements/rightsstatements.github.io https://webrecorder.io/martinklein/cni_test/20170330014135/https://github.com/rightsstatements/rightsstatements.github.io
  • 50. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s OSF Live Internet Archive Webrecorder.io https://osf.io/h4ru8/wiki/home/ http://web.archive.org/web/20170328042647/https://osf.io/h4ru8/wiki/home/ https://webrecorder.io/martinklein/cni_test/20170330014219/https://osf.io/h4ru8/wiki/home/
  • 51. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Quality - How well was this page archived? • Continuing research on memento damage, first published at JCDL 2014 • Premise: simply reporting “9/10 embedded images were archived” is insufficient to describe how well the archive / replay system performed • Use heuristics from Mechanical Turk testing to approximate human conception of damage, e.g.: o increase weight of missing images that are large, or centered in the viewport o stylesheets can be important! check for “ugly” results J.F. Brunelle, M. Kelly, H. SalahEldeen M. C. Weigle, and M. L. Nelson (2014) Not all mementos are created equal: Measuring the impact of missing resources. In: JCDL 2014 http://dx.doi.org/10.1109/JCDL.2014.6970187 http://dx.doi.org/10.1007/s00799-015-0150-6
  • 52. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Triptych CSS “regular” web pages have nearly equal distribution of content over each third of a page if a CSS is missing AND > 75% of the non-background color is in the left 2/3s of the page, then users consider this damaged
  • 53. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 A Memento Damage Service, Python Library, and Docker Image Erika Siregar http://memento-damage.cs.odu.edu/ https://github.com/erikaris/web-memento-damage
  • 54. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Just a Little Bit of Damage…
  • 55. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Moderate Damage…
  • 56. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Significant Damage…
  • 57. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Ian’s GitHub Memento… https://github.com/ianmilligan1/Historian-WARC-1 http://web.archive.org/web/20130922192416/https://github.com/ianmilligan1/Historian-WARC-1
  • 58. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 … Has Slight Damage does not appear to violate the “75% / left- 2/3s” rule
  • 59. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Capture Authenticity - Has this page been tampered with? • The days of implicitly trusting Brewster & IA are over o the people who brought you fake news will eventually bring you fake archives o “mo’ archives mo’ problems” • Premise: use multiple, independent archives to record fixity information from dated observations of mementos • Plans: o blockchain o provenance (i.e., a memento of memento != 2 independent mementos) https://climate.nasa.gov/vital-signs/carbon-dioxide/ http://web.archive.org/web/20170312201332/https://climate.nasa.gov/vital-signs/carbon-dioxide/ http://localhost:8282/michael/wayback/20170313023607/https://climate.nasa.gov/vital-signs/carbon-dioxide/
  • 60. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Push a Web Page into Multiple Archives Mohamed Aturban (2017) Archive Now (archivenow): A Python Library to Integrate On-Demand Archives http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
  • 61. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Record Fixity in a Manifest File Shawn Jones (2016) Mementos In the Raw, Take Two http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
  • 62. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Publish Manifest to the Web
  • 63. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Archive the Manifest
  • 64. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 “You can’t tell the players without a scorecard” – Harry M. Stevens
  • 65. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Verifying the Authenticity of a Memento • Given a Memento, URI-M, that we wish to verify • Lookup the URI-M at a manifest server o e.g, captureproject.org/{URI-M} • Discover all the mementos of the manifest, and verify their integrity with “trusty URIs” • For each URI-M listed in the manifest, repeat the fixity calculation as described in the manifest • Vote if fixity matches (not tampered with) or if fixity doesn’t match (tampered with) o Majority vote wins (assuming independent archives) Mohamed Aturban (2017) Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data” http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html Video at https://www.youtube.com/watch?v=EY15lj-7_lc
  • 66. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Discussion
  • 67. @hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp http://orcid.org/0000-0002-0715-6126 Michael L. Nelson Old Dominion University @phonedude_mln http://orcid.org/0000-0003-3749-8116 Martin Klein Los Alamos National Laboratory @mart1nkle1n http://orcid.org/0000-0003-0130-2097 To the Rescue of the Orphans of Scholarly Communication The project is funded by the Andrew W. Mellon Foundation