Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Achi...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Hyperlinks in Theory
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Hyperlinks in Reality
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Hyperlinks in Reality
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Link Rot
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Link Rot
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Hyperlinks in Reality
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift
2000 2004
2005 2008
http://dl00.org in 2000, 2004, 2...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift
http://icecube.wisc.edu/ on May 8 2009 (left) and Au...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
No Content Drift
http://www.ifa.hawaii.edu/~cowie/k_table.html on ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
The Web, All Hyperlinks Subject to Link Rot, Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
The Web, All Hyperlinks Subject to Reference Rot
• Reference Rot h...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Creating Pockets of Persistence
• How to maintain the integrity of...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
A Managed Collection Desires Reliable Outlinks
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Links to another Managed Collection
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Links to Web at Large Resources
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Exploring Link Rot & Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
<Intermezzo - Hiberlink Study re Reference Rot in STM Articles>
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
PubMed Central Corpus
PMC articles published 1997-2012 PMC
Total 4...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Links to Articles & to Web At Large Resources - PMC
Martin Klein, ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
<Intermezzo - Hiberlink Study re Reference Rot in STM Articles>
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Exploring Link Rot & Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Links Rot Occurs when B moves to C
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Introduce PID(B)
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Link to PID(B) ; HTTP Redirect from PID(B) to B
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
When B moves to C: HTTP Redirect from PID(B) to C
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Core Assumption: PID(B) Will Be Used for Linking
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persis...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
• When classifying links extracted from PMC as linking to articles...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
URI References - PMC
Herbert Van de Sompel, Martin Klein, and Shaw...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Cartoon by Patrick Hochstenbach
http://signposting.org
<Intermezzo...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
• Proposal:
Use typed links to address some long standing problems...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
HTTP Links
Mark Nottingham (2017) RFC8288: Web Linking
http://tool...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
HTTP Links
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
HTTP Links
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
HTTP Links Are Used
curl –I http://dbpedia.org/data/Reykjavik
HTTP...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
For PIDs: Use cite-as Relation Type
Van de Sompel, H., Nelson M., ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
For PIDs: Use cite-as Relation Type
Van de Sompel, H., Nelson M., ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
• The target URI (PID) of the cite-as link can be picked up by
app...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Cartoon by Patrick Hochstenbach
http://signposting.org
</Intermezz...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
PID Alternative - When B Moves to C: HTTP Redirect from B to C
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
PID Alternative - When B Moves to C: HTTP Redirect from B to C
• C...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Exploring Link Rot & Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift Occurs when B Changes over Time
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift Occurs when B Changes over Time
• Is not really cons...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Archived Articles
David Rosenthal (2013) Patio Perspectives at ANA...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
How to Audit Whether a PID-identified Object is Archived
http://th...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Contrast: All Web-Archived Versions of David’s Blog Post
Global au...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Exploring Link Rot & Content Drift
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Scholarly Context Adrift
Shawn Jones, Herbert Van de Sompel, et al...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
How to Assess Content Drift?
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Step 1: Find Pre/Post Mementos
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Step 2: Select Representative Mementos
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Text Similarity Measures
• Compute aggregate text similarity score...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Step 3: Dereference Live Web Version of URI
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Step 4: Representative Memento vs. Live Version
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Content Drift - PMC
Shawn Jones, Herbert Van de Sompel, et al. (20...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Reference Rot for Links to Web at Large is Severe
• Link Rot and C...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
URI References without Representative Mementos - PMC
Shawn Jones, ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Impact of Archival Gap on Links from Managed Collections
Martin Kl...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Uncertainty Regarding the Future of B when A Links to It
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Custodian of A Takes a Snapshot of B when Linking to It
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Taking a Snapshots of B: Automation is Key
• Web archive APIs for ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
site2cite
http://site2cite
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Custodian of A Links to Snapshot of B
• Typical practice for linki...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Permanent Existence/Uptime of Archives?
Capture of http://webcitat...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Permanent Existence/Uptime of Archives?
Remnant of discontinued we...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Permanent Existence/Uptime of Archives?
http://www.themoscowtimes....
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Permanent Existence/Uptime of Archives?
http://web.archive.org/web...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Custodian of A Links to Snapshot of B, Decorates the Link
• Desire...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Robust Links: Link Decoration in Action
See Robust Links at work i...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Recap - A Managed Collection Desires Reliable Outlinks
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Takeaways
• When it comes to links to
managed collections, the
cus...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Takeaways
• When it comes to links to web at
large resources, the ...
@hvdsomp
Thor Conference, Rome, Italy, November 15 2017
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Achi...
Nächste SlideShare
Wird geladen in …5
×

Achieving Link Integrity for Managed Collections

570 Aufrufe

Veröffentlicht am

Looks at hyperlinks from the perspective of a managed collection of resources for which link persistence/integrity is considered a quality of service concern. Distinguishes between links into other managed collections and to the web at large. Considers link rot and content drift.

Veröffentlicht in: Internet
  • Als Erste(r) kommentieren

Achieving Link Integrity for Managed Collections

  1. 1. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp Achieving Link Integrity for Managed Collections Photo by Eric Sieverts
  2. 2. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Hyperlinks in Theory
  3. 3. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Hyperlinks in Reality
  4. 4. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Hyperlinks in Reality
  5. 5. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Link Rot
  6. 6. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Link Rot
  7. 7. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Hyperlinks in Reality
  8. 8. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift
  9. 9. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift
  10. 10. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift 2000 2004 2005 2008 http://dl00.org in 2000, 2004, 2005, 2008
  11. 11. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift http://icecube.wisc.edu/ on May 8 2009 (left) and August 27 2009 (right)
  12. 12. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 No Content Drift http://www.ifa.hawaii.edu/~cowie/k_table.html on June 9 1997 (left) and March 2016 (right)
  13. 13. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 The Web, All Hyperlinks Subject to Link Rot, Content Drift
  14. 14. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 The Web, All Hyperlinks Subject to Reference Rot • Reference Rot hinders our ability to follow links as they were intended when they were put in place: • Link rot: A link stops working all together • Content drift: The Linked content changes over time and may eventually no longer be representative of the content that was originally linked
  15. 15. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Creating Pockets of Persistence • How to maintain the integrity of links? • This challenge exists for the entire web. Some communities with well managed collections care about addressing it because they consider it a Quality of Service issue: • Scholarly communication • Cultural heritage • Legal publications • Government communication • Journalism • Wikipedia • … • What can these communities do to create Pockets of Persistence?
  16. 16. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 A Managed Collection Desires Reliable Outlinks
  17. 17. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Links to another Managed Collection
  18. 18. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Links to Web at Large Resources
  19. 19. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Exploring Link Rot & Content Drift
  20. 20. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 <Intermezzo - Hiberlink Study re Reference Rot in STM Articles>
  21. 21. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 PubMed Central Corpus PMC articles published 1997-2012 PMC Total 479,194 With links to articles 240,857 With links to web-at-large resources 156,160 Links PMC To articles 744,678 To web-at-large resources 480,853A B A B
  22. 22. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Links to Articles & to Web At Large Resources - PMC Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  23. 23. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 <Intermezzo - Hiberlink Study re Reference Rot in STM Articles>
  24. 24. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Exploring Link Rot & Content Drift
  25. 25. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Links Rot Occurs when B moves to C
  26. 26. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Introduce PID(B)
  27. 27. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Link to PID(B) ; HTTP Redirect from PID(B) to B
  28. 28. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 When B moves to C: HTTP Redirect from PID(B) to C
  29. 29. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Core Assumption: PID(B) Will Be Used for Linking
  30. 30. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
  31. 31. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 • When classifying links extracted from PMC as linking to articles, we assumed that filtering on http://dx.doi.org/* would do the trick • But we found a lot of e.g. http://link.springer.com/article/* • For example: • http://link.springer.com/article/10.1007%2Fs00799-014-018-0 • Instead of: • http://dx.doi.org/10.1007/s00799-014-0108-0 • We used CrossRef’s Reverse Domain Lookup to classify these extracted links as linking to articles A Disconcerting Observation
  32. 32. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 URI References - PMC Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102 Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
  33. 33. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Cartoon by Patrick Hochstenbach http://signposting.org <Intermezzo – Signposting the Scholarly Web>
  34. 34. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 • Proposal: Use typed links to address some long standing problems regarding scholarly resources on the web, by interlinking them using appropriate relation types • Focus on a limited set of patterns to support uniformly: •Conveying a Persistent Identifier •Expressing the web boundary of a scholarly resource •Making bibliographic metadata discoverable •Conveying an Author Identifier •Conveying a license that applies to a resource •Conveying a resource type Signposting the Scholarly Web
  35. 35. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 HTTP Links Mark Nottingham (2017) RFC8288: Web Linking http://tools.iets.org/rfc/rfc8288.txt
  36. 36. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 HTTP Links
  37. 37. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 HTTP Links
  38. 38. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 HTTP Links Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  39. 39. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 For PIDs: Use cite-as Relation Type Van de Sompel, H., Nelson M., Bilder, G, Kunze, J., and Warner, S. (2017) “cite-as”: A Link Relation to Convey a Preferred URI for Referencing https://datatracker.ietf.org/doc/draft-vandesompel-citeas/
  40. 40. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 For PIDs: Use cite-as Relation Type Van de Sompel, H., Nelson M., Bilder, G, Kunze, J., and Warner, S. (2017) “cite-as”: A Link Relation to Convey a Preferred URI for Referencing https://datatracker.ietf.org/doc/draft-vandesompel-citeas/
  41. 41. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 • The target URI (PID) of the cite-as link can be picked up by applications, e.g.: • reference managers can pick up the PID of an object when the user saves it while on the landing page, one of the constituent resources • publication pipelines can pick up the PID by looking up (HTTP HEAD) URIs referenced in a paper to determine whether a PID exists for them For PIDs: Use cite-as Relation Type
  42. 42. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Cartoon by Patrick Hochstenbach http://signposting.org </Intermezzo – Signposting the Scholarly Web>
  43. 43. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 PID Alternative - When B Moves to C: HTTP Redirect from B to C
  44. 44. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 PID Alternative - When B Moves to C: HTTP Redirect from B to C • Custodian of C needs to hold on to domain of B • Custodian of C needs to establish redirection patterns; often those are rather simple rules • No problem with establishing links to PID(B); the URI in the browser address bar (initially B, later C) is just fine
  45. 45. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Exploring Link Rot & Content Drift
  46. 46. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift Occurs when B Changes over Time
  47. 47. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift Occurs when B Changes over Time • Is not really considered an issue because: • the objects that receive PIDs were typically static, e.g. scientific papers • when a (substantially) new version of an object is published, typically a new PID is assigned • But: • how to verify that the retrieved version of an object is indeed the referenced version of the object? • Requires: • archiving objects in trusted archive(s) • ability to retrieve objects from the archive(s)
  48. 48. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Archived Articles David Rosenthal (2013) Patio Perspectives at ANADP II: Preserving the Other Half http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html Too few Too low risk
  49. 49. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 How to Audit Whether a PID-identified Object is Archived http://thekeepers.org Journal, Volume, Issue centric Global audit by DOI?
  50. 50. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Contrast: All Web-Archived Versions of David’s Blog Post Global audit by HTTP URI Uses Memento infrastructure http://timetravel.mementoweb.org
  51. 51. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Exploring Link Rot & Content Drift
  52. 52. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Scholarly Context Adrift Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. In: PLOS ONE https://doi.org/10.1371/journal.pone.0167475
  53. 53. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 How to Assess Content Drift?
  54. 54. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Step 1: Find Pre/Post Mementos
  55. 55. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Step 2: Select Representative Mementos
  56. 56. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Text Similarity Measures • Compute aggregate text similarity scores (values between 0...100) for: • Simhash • Jaccard • Sørensen-Dice • Cosine • If the aggregate score is 100, we decide that the Pre/Post Mementos are representative • We find 137K URI references out of 480K that have representative Mementos
  57. 57. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Step 3: Dereference Live Web Version of URI
  58. 58. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Step 4: Representative Memento vs. Live Version
  59. 59. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Content Drift - PMC Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. In: PLOS ONE https://doi.org/10.1371/journal.pone.0167475
  60. 60. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Reference Rot for Links to Web at Large is Severe • Link Rot and Content Drift are severe • Cannot retrieve originally linked content from the live web • Can potentially retrieve originally linked content from web archives • But the archival coverage is too poor, a result of incidental archiving
  61. 61. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 URI References without Representative Mementos - PMC Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. In: PLOS ONE https://doi.org/10.1371/journal.pone.0167475
  62. 62. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Impact of Archival Gap on Links from Managed Collections Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253 Links from Managed Collections to Domains Grey: Linked Content not Archived
  63. 63. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Uncertainty Regarding the Future of B when A Links to It
  64. 64. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Custodian of A Takes a Snapshot of B when Linking to It
  65. 65. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Taking a Snapshots of B: Automation is Key • Web archive APIs for on-demand archiving • perma.cc, Internet Archive, archive.is, webcitation • Amber for Wordpress & Drupal archives resources linked in a page • http://amberlink.org/ • Hiberlink’s experimental Zotero extension archives bookmarked URLs • http://hiberlink.org/zotero.html • Hiberlink’s experimental HiberActive archives all URLs referenced in a newly submitted paper • https://www.slideshare.net/martinklein0815/hiberactive
  66. 66. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 site2cite http://site2cite
  67. 67. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Custodian of A Links to Snapshot of B • Typical practice for linking to snapshots: <a href=“URL of snapshot of B”> • Problems with this practice: o Impossible to visit the original URI, if desired o Requires the permanent existence/uptime of the archive that holds the snapshot -One link rot problem replaced by another http://robustlinks.mementoweb.org/about/
  68. 68. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Permanent Existence/Uptime of Archives? Capture of http://webcitation.org dated July 17 2013 https://archive.today/eAETp
  69. 69. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Permanent Existence/Uptime of Archives? Remnant of discontinued web archive http://mummify.it captured on February 14 2014 https://web.archive.org/web/20140214233752/https://www.mummify.it/
  70. 70. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Permanent Existence/Uptime of Archives? http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-archive-over- islamic-state-video/510074.html
  71. 71. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Permanent Existence/Uptime of Archives? http://web.archive.org/web/20121101043952/http://vogin.nl on March 6 2017 at 15:59 CET
  72. 72. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Custodian of A Links to Snapshot of B, Decorates the Link • Desired practice for linking to captures is to decorate the link so it provides a variety of options: <a href=“URL of snapshot of B” data-originalurl=“B” data-versiondate=“datetime of snapshot of B”> • Supports: o Revisiting the original URL o Finding snapshots in any web archive (via original URL) o Finding a temporally appropriate snapshot in any web archive (via original URL & snapshot datetime) o Automatically accessing a temporally appropriate snapshot in any web archive (Memento protocol using original URL & snapshot datetime) http://robustlinks.mementoweb.org/spec/
  73. 73. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Robust Links: Link Decoration in Action See Robust Links at work in: Van de Sompel H. & Nelson, M.L. (2015) Reminiscing about 15 years of interoperability efforts. D-Lib Magazine. https://doi.org/10.1045/november2015-vandesompel JavaScript makes the link decorations actionable Robust Links Javascript https://github.com/mementoweb/robustlinks
  74. 74. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Recap - A Managed Collection Desires Reliable Outlinks
  75. 75. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Takeaways • When it comes to links to managed collections, the custodian of the linking collection relies on the custodians of the linked collections to preserve link integrity. • PIDs, HTTP redirects are managed by the custodian of linked collections.
  76. 76. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Takeaways • When it comes to links to web at large resources, the custodian of a linking collection cannot rely on the custodians of those linked resources to maintain link integrity. • Creation of Mementos, Robust Links is managed by the custodian of the collection that links to web at large resources.
  77. 77. @hvdsomp Thor Conference, Rome, Italy, November 15 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp Achieving Link Integrity for Managed Collections Photo by Eric Sieverts

×