SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Archive Assisted Archival
Fixity Verification Framework
JCDL 2019
Urbana-Champaign, Illinois
June 2-6, 2019
Mohamed Aturban, Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Department of Computer Science
Norfolk, Virginia 23529 USA
2
This is what
https://climate.nasa.gov/vital-signs/carbon-dioxide/
looks like right now
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
3
The Internet Archive allows us to view
previous versions (mementos) of that page
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
4
http://web.archive.org/web/*/https://climate.nasa.gov/vital-signs/carbon-dioxide/
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
https://web.archive.org/web/20180726025537/https://climate.nasa.gov/vital-signs/carbon-dioxide/
An archived page (memento) from July 2018
5
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
6
The page is in other web archives
For a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
Typical archive URI construction:
wayback.example.org/SomeString/climate.nasa.gov/vital-signs/carbon-dioxide
4,172
62
3
13
webarchive.loc.gov/all/*/climate.nasa.gov/vital-signs/carbon-dioxide/
arquivo.pt/wayback/*/climate.nasa.gov/vital-signs/carbon-dioxide/
perma-archives.org/warc/*/climate.nasa.gov/vital-signs/carbon-dioxide/
archive.is/climate.nasa.gov/vital-signs/carbon-dioxide/
wayback.archive-it.org/all/*/climate.nasa.gov/vital-signs/carbon-dioxide/
web.archive.org/web/*/climate.nasa.gov/vital-signs/carbon-dioxide/
Mementos
available
3
39
7
The web page is archived by
Michael’s Evil Wayback in July 2017
Michaelsevilwayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
8
Replaying the same memento in October 2017,
we got a different CO2
Michaelsevilwayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
9
Which one is the real memento?
July 2017 October 2017
• How to ensure that a memento has remained unaltered
since the time it was captured by the archive?
Michael_evil_wayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Michael_evil_wayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
10
Cryptographic hashes to create
fixity information
• Common hash algorithms (e.g., MD5, SHA256):
A small change in the input à a large change output
SHA256(HTML)
9801 1510 87e1 6d6b
ddb9 e6b0 09fd b723
abe5 1fea b548 0914
a130 6325 5ae4 6caa
5d4d b590 605c 9023
000d 6622 6004 534f
e84a 5549 d535 f91e
cdf4 4952 5c1a 37cf
SHA256(HTML)
11
Compute a hash value on the
downloaded HTML
$ curl -s https://climate.nasa.gov/vital-
signs/carbon-dioxide/ | shasum -a 256
17710fd38d908a3cd124510f26adaec67e57e3f1d3aec
1209c4ad4efbe2c035d
Compute SHA256 hashDownload the page
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Time
HTML
content is
downloaded
e834 c71a efda 284f e03a 4eed 4e8c b78e
a581 537b a888 4aec ec29 bd2d 66cb f521
SHA256
Hash
HTML
content is
downloaded
fc90 88b3 a614 a588 40bd 5387 d93c 16be
824c d2bb b3fa b173 f93f a57d 241a 3790
SHA256
Hash
August 2017
October 2017
The archived page has been tampered with by changing the value of COSeptember 2017
2
12
Compare the current hash with previously calculated hash
To verify the fixity
Hashes are NOT identical à the page has changed!
http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
Two approaches for verifying the
fixity of archived web pages
13
• The Atomic approach
• Generate a manifest file (a JSON file containing the fixity
information) for each memento
• Publish the manifest at a well-known web location
• Disseminate the published manifest to several archives
• The Block approach
• Batch together fixity information of multiple mementos
in a single binary-searchable file (or block)
• Publish the block at a well-known web location
• Disseminate the published block to several archives
(Use web archives to monitor web archives)
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Atomic approach (step 1):
Push a web page into multiple archives
14
https://archive.is/20181224085310/
https://2019.jcdl.org/
https://web.archive.org/web/201812
24085329/https://2019.jcdl.org/
https://perma-archives.org/warc/201
81224085330/https://2019.jcdl.org/
http://www.webcitation.org74tsy6pU0
https://2019.
jcdl.org/
This results in creating multiple mementos of the web page
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Atomic approach (steps 2 & 3):
For each memento, compute fixity “manifest”
and publish it on the web at a well-known
Archival Fixity server
15
manifest.ws-dl.cs.odu.edu/manifest/
https://archive.is/20181224085310/h
ttps://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
https://web.archive.org/web/2018122
4085329/https://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
https://perma-archives.org/warc/201
81224085330/https://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
http://www.webcitation.org/74tsy6pU0
• In this example https://manifest.ws-dl.cs.odu.edu is the Archival Fixity server
• Actual URIs to manifests can be a bit more complex using “Trusty URIs”:
http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
manifest.ws-dl.cs.odu.edu/manifest/
https://archive.is/20181224085310/h
ttps://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
https://web.archive.org/web/2018122
4085329/https://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
https://perma-archives.org/warc/201
81224085330/https://2019.jcdl.org/
manifest.ws-dl.cs.odu.edu/manifest/
http://www.webcitation.org/74tsy6pU0
The manifest’s generic URI always
redirects to the most recent time-specific
manifest version (trusty URI)
$curl -sIL https://manifest.ws-dl.cs.odu.edu/manifest/https://web
.archive.org/web/20181224085329/https://2019.jcdl.org/ | egrep -i
"(HTTP/|^location:)"
HTTP/2 302
location: https://manifest.ws-dl.cs.odu.edu/manifest/2018122409302
4/8c31ccfbb3a664c9160f98be466b7c9fb9a fa80580ab5052001174be59c6a73
a/https://web.archive.org/web/20181224085329/https://2019.jcdl.org/
HTTP/2 200
manifest’s trusty URI manifest’s generic URI
The structure of generic URIs is easy to remember
<Archival-Fixity-Server>/<URI to memento>
So they can be used to look up manifests in both the Archival Fixity server and archives
16
{
"@context": "http://manifest.ws-dl.cs.odu.edu/",
"created": "Sun, 23 Dec 2018 11:43:55 GMT",
"@id": "http://manifest.ws-dl.cs.odu.edu/manifest/20181223114355/c6ad485819abb
e20e37c0632843081710c95f94829f59bbe3b6ad3251d93f7d2/https://web.archiv
e.org/web/2018121102034/https://2019.jcdl .org/",
"uri-r": "https://2019.jcdl.org/",
"uri-m": "https://web.archive.org/web/20181219102034/https://2019.jcdl.org/",
"memento-datetime": "Wed, 19 Dec 2018 10:20:34 GMT",
"http-headers": {
"Content-Type": "text/html; charset=UTF-8",
"X-Archive-Orig-date": "Wed, 19 Dec 2018 10:20:36 GMT",
"X-Archive-Orig-link": "<https://2019.jcdl.org/wp-json/>;
rel="https://api.w.org/"",
"Preference-Applied": "original-links, original-content” },
"hash-constructor": "(curl -s '$uri-m' && echo -n '$Content-Type $X-Archive-
Orig-date $X-Archive-O rig-link') | tee >(sha256sum)
>(md5sum) >/dev/null | cut -d ' ' -f 1 | paste -d':’
<(echo -e 'md5nsha256') - | paste -d' ' - -",
"hash": "md5:969d7aba4c16444a6544bdc39eefe394 sha256:c68a215eb1c3edbf51f565b9
a87f49646456369e51791a86106a6667630737a6"
}
A manifest file example
• Including how hashes are computed
• Hashes are computed on only base HTML file
• Compute fixity on things that should not change like certain original HTTP
response headers
Trusty
URI
17
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Atomic approach (step 4):
Push manifests into multiple archives
• In this example, the memento is in the Internet Archive (IA) and
its fixity information is disseminated to four archives including IA
• An attacker would have to hack a majority of 4 domains (archives)
https://archive.is/20181224093334/http://manifest.
ws-dl.cs.odu.edu/manifest/https://web.archive.org/
web/20181224085329/https://2019.jcdl.org/
https://web.archive.org/web/20181224093355/http://
manifest.ws-dl.cs.odu.edu/manifest/https://web.arc
hive.org/web/20181224085329/https://2019.jcdl.org/
https://perma-archives.org/warc/20181224093354/htt
p://manifest.ws-dl.cs.odu.edu/manifest/https://web
.archive.org/web/20181224085329/https://2019.jcdl.
org/
http://www.webcitation.org/74tvdsyxemanifest.ws-dl.cs.odu.edu/
manifest/https://web.archi
ve.org/web/20181224085329/
https://2019.jcdl.org/
18
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Block approach (step 1):
Batch together fixity information of
multiple mementos in a single file (block)
• Adding additional metadata (e.g., created_at, fields, …)
• The hash of the previous block must be added
!context ["http://oduwsdl.github.io/contexts/fixity"]
!fields {keys: ["surt"]}
!id {uri: "https://manifest.ws-dl.cs.odu.edu/"}
!meta {created_at: "20190111181327"}
!meta {prev_block:"sha256:d4eb1190f9aaae9542...845b632eb2b3f4f098a34144d"}
!meta {type: "FixityBlock"}
org,archive,web)/web/19961022175434/http://search.com
org,archive,web)/web/19961219082428/http://sho.com
org,archive,web)/web/19961223174001/http://reference.com
…
19
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Block approach (step 2):
Publish the block file at the Archival Fixity server
always redirects to the
latest published block
manifest.ws-dl.cs.odu.edu/blocks
The blocks entrypoint
20
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Block approach (step 3):
Push the blocks entrypoint into
multiple archives
https://manifest.ws-dl
.cs.odu.edu/blocks
https://web.archive.org/web/20190121054059
/https://manifest.ws-dl.cs.odu.edu/blocks/7bbf
757046ac0a0a60015a1cb847c3189160d18c809
b210073822df157609e01
• Will result in archiving the latest published block as well
https://perma.cc/8YG3-X7KN
21
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Three steps to verify the fixity
of a memento
1. Discover a manifest/block
• In Atomic approach, this includes discovering the archived
manifests
2. Compute current fixity information of the memento
3. Compare current fixity information with the discovered
manifests/block.
$ curl -sI https://manifest.ws-dl.cs.odu.edu/manifest/https://web.archive.org/web/
20171115140705/http://rln.fm/ | egrep -i "(HTTP/|^location:)”
HTTP/2 302
location: https://manifest.ws-dl.cs.odu.edu/manifest/20181212074423/bd669de8835e38
d54651fe9d04709515beec0c727db82a5366f4bc2506e103d8/https://web.archive.org/web/201
71115140705/http://rln.fm/
An example of discovering the latest manifest in the Archival Fixity server
for the memento web.archive.org/web/2017111 5140705/http://rln.fm/
22
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Evaluation
• A data set of 1,000 mementos from the Internet Archive
• For each memento, we generated and disseminated 3 manifests
to 4 archives
23
• The average size
of a manifest file
is 1,157 bytes
• The manifest size
represents 2.79%
of the actual
download HTML
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
24
Increasing the number of records per block
reduces the block generation time
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
25
The Block approach creates fewer resources
in archives than the Atomic approach
• Given a collection of N = 1,000 mementos
• K = 4 web archives
• The selected block size B = 100 records per block
• The total number of resources created in the archives:
• Atomic
(N ∗ K) = 4,000
• Block
(k ∗ (N/B)) = 40
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Dissemination/download time
varies from one archive to another
26
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
It takes 1.25x, 4x and 36x longer to disseminate a
manifest to perma.cc, archive.org, and
webcitation.org, respectively, than archive.is
27
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
It takes 3.5x longer to disseminate a
manifest to archive.org than perma.cc
28
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Average time for discovering published
fixity information
29
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
The Block approach performs 4.46x faster than the
Atomic approach in verifying the fixity of mementos
30
• The fixity verification time includes:
• Discovering manifests
• Computing current fixity information
• Downloading the archived manifests
• Comparing results
• On average, the verification
time of a memento is 6.65
seconds by the Atomic
approach and 1.49 seconds by
the Block approach
Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
Conclusions
31
• The proposed approaches do not require any changes in the
infrastructure of web archives
• The Block approach creates fewer resources in archives and
reduces fixity verification time (4.46x faster than the Atomic
approach)
• The Atomic approach has the ability to verify the fixity of
archived pages even without using the Archival Fixity server
• Varying/increasing the block size could be one potential solution
to improve the Block approach performance and reduce the
number of resources created in archives
• Caching archived manifests/blocks in the Archival Fixity server
should improve the performance of both approaches

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URI
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
Establishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web PagesEstablishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web Pages
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Linked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data managementLinked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data management
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 

Ähnlich wie Archive Assisted Archival Fixity Verification Framework

Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
judell
 
Evidence informed activism & data-based deliberations
Evidence informed activism & data-based deliberationsEvidence informed activism & data-based deliberations
Evidence informed activism & data-based deliberations
Communication and Media Studies, Carleton University
 
Data Visualization and Mapping using Javascript
Data Visualization and Mapping using JavascriptData Visualization and Mapping using Javascript
Data Visualization and Mapping using Javascript
Mack Hardy
 
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Kritika Garg
 

Ähnlich wie Archive Assisted Archival Fixity Verification Framework (20)

It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
 
Developments in catalogues and data sharing
Developments in catalogues and data sharingDevelopments in catalogues and data sharing
Developments in catalogues and data sharing
 
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
 
Evidence informed activism & data-based deliberations
Evidence informed activism & data-based deliberationsEvidence informed activism & data-based deliberations
Evidence informed activism & data-based deliberations
 
Open Government & Fingal Open Data
Open Government & Fingal Open DataOpen Government & Fingal Open Data
Open Government & Fingal Open Data
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data
 
Open Data - The Fingal Perspective
Open Data - The Fingal PerspectiveOpen Data - The Fingal Perspective
Open Data - The Fingal Perspective
 
Data Visualization and Mapping using Javascript
Data Visualization and Mapping using JavascriptData Visualization and Mapping using Javascript
Data Visualization and Mapping using Javascript
 
Selected sites on digital projects
Selected sites on digital projects Selected sites on digital projects
Selected sites on digital projects
 
WS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web Archives
 
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St Petersburg
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
Archives & Records Association summer seminar Edinburgh 7 June 2019
Archives & Records Association summer seminar   Edinburgh 7 June 2019Archives & Records Association summer seminar   Edinburgh 7 June 2019
Archives & Records Association summer seminar Edinburgh 7 June 2019
 

Mehr von Sawood Alam

Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
Sawood Alam
 

Mehr von Sawood Alam (20)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
TPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesTPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015
 

Kürzlich hochgeladen

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 

Archive Assisted Archival Fixity Verification Framework

  • 1. Archive Assisted Archival Fixity Verification Framework JCDL 2019 Urbana-Champaign, Illinois June 2-6, 2019 Mohamed Aturban, Sawood Alam, Michael L. Nelson, and Michele C. Weigle Old Dominion University Department of Computer Science Norfolk, Virginia 23529 USA
  • 2. 2 This is what https://climate.nasa.gov/vital-signs/carbon-dioxide/ looks like right now Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 3. 3 The Internet Archive allows us to view previous versions (mementos) of that page Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 4. 4 http://web.archive.org/web/*/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 5. https://web.archive.org/web/20180726025537/https://climate.nasa.gov/vital-signs/carbon-dioxide/ An archived page (memento) from July 2018 5 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 6. 6 The page is in other web archives For a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml Typical archive URI construction: wayback.example.org/SomeString/climate.nasa.gov/vital-signs/carbon-dioxide 4,172 62 3 13 webarchive.loc.gov/all/*/climate.nasa.gov/vital-signs/carbon-dioxide/ arquivo.pt/wayback/*/climate.nasa.gov/vital-signs/carbon-dioxide/ perma-archives.org/warc/*/climate.nasa.gov/vital-signs/carbon-dioxide/ archive.is/climate.nasa.gov/vital-signs/carbon-dioxide/ wayback.archive-it.org/all/*/climate.nasa.gov/vital-signs/carbon-dioxide/ web.archive.org/web/*/climate.nasa.gov/vital-signs/carbon-dioxide/ Mementos available 3 39
  • 7. 7 The web page is archived by Michael’s Evil Wayback in July 2017 Michaelsevilwayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 8. 8 Replaying the same memento in October 2017, we got a different CO2 Michaelsevilwayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 9. 9 Which one is the real memento? July 2017 October 2017 • How to ensure that a memento has remained unaltered since the time it was captured by the archive? Michael_evil_wayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Michael_evil_wayback/web/20170717185130/https://climate.nasa.gov/vital-signs/carbon-dioxide/ Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 10. 10 Cryptographic hashes to create fixity information • Common hash algorithms (e.g., MD5, SHA256): A small change in the input à a large change output SHA256(HTML) 9801 1510 87e1 6d6b ddb9 e6b0 09fd b723 abe5 1fea b548 0914 a130 6325 5ae4 6caa 5d4d b590 605c 9023 000d 6622 6004 534f e84a 5549 d535 f91e cdf4 4952 5c1a 37cf SHA256(HTML)
  • 11. 11 Compute a hash value on the downloaded HTML $ curl -s https://climate.nasa.gov/vital- signs/carbon-dioxide/ | shasum -a 256 17710fd38d908a3cd124510f26adaec67e57e3f1d3aec 1209c4ad4efbe2c035d Compute SHA256 hashDownload the page Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 12. Time HTML content is downloaded e834 c71a efda 284f e03a 4eed 4e8c b78e a581 537b a888 4aec ec29 bd2d 66cb f521 SHA256 Hash HTML content is downloaded fc90 88b3 a614 a588 40bd 5387 d93c 16be 824c d2bb b3fa b173 f93f a57d 241a 3790 SHA256 Hash August 2017 October 2017 The archived page has been tampered with by changing the value of COSeptember 2017 2 12 Compare the current hash with previously calculated hash To verify the fixity Hashes are NOT identical à the page has changed! http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  • 13. Two approaches for verifying the fixity of archived web pages 13 • The Atomic approach • Generate a manifest file (a JSON file containing the fixity information) for each memento • Publish the manifest at a well-known web location • Disseminate the published manifest to several archives • The Block approach • Batch together fixity information of multiple mementos in a single binary-searchable file (or block) • Publish the block at a well-known web location • Disseminate the published block to several archives (Use web archives to monitor web archives) Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 14. Atomic approach (step 1): Push a web page into multiple archives 14 https://archive.is/20181224085310/ https://2019.jcdl.org/ https://web.archive.org/web/201812 24085329/https://2019.jcdl.org/ https://perma-archives.org/warc/201 81224085330/https://2019.jcdl.org/ http://www.webcitation.org74tsy6pU0 https://2019. jcdl.org/ This results in creating multiple mementos of the web page Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 15. Atomic approach (steps 2 & 3): For each memento, compute fixity “manifest” and publish it on the web at a well-known Archival Fixity server 15 manifest.ws-dl.cs.odu.edu/manifest/ https://archive.is/20181224085310/h ttps://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ https://web.archive.org/web/2018122 4085329/https://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ https://perma-archives.org/warc/201 81224085330/https://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ http://www.webcitation.org/74tsy6pU0 • In this example https://manifest.ws-dl.cs.odu.edu is the Archival Fixity server • Actual URIs to manifests can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  • 16. manifest.ws-dl.cs.odu.edu/manifest/ https://archive.is/20181224085310/h ttps://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ https://web.archive.org/web/2018122 4085329/https://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ https://perma-archives.org/warc/201 81224085330/https://2019.jcdl.org/ manifest.ws-dl.cs.odu.edu/manifest/ http://www.webcitation.org/74tsy6pU0 The manifest’s generic URI always redirects to the most recent time-specific manifest version (trusty URI) $curl -sIL https://manifest.ws-dl.cs.odu.edu/manifest/https://web .archive.org/web/20181224085329/https://2019.jcdl.org/ | egrep -i "(HTTP/|^location:)" HTTP/2 302 location: https://manifest.ws-dl.cs.odu.edu/manifest/2018122409302 4/8c31ccfbb3a664c9160f98be466b7c9fb9a fa80580ab5052001174be59c6a73 a/https://web.archive.org/web/20181224085329/https://2019.jcdl.org/ HTTP/2 200 manifest’s trusty URI manifest’s generic URI The structure of generic URIs is easy to remember <Archival-Fixity-Server>/<URI to memento> So they can be used to look up manifests in both the Archival Fixity server and archives 16
  • 17. { "@context": "http://manifest.ws-dl.cs.odu.edu/", "created": "Sun, 23 Dec 2018 11:43:55 GMT", "@id": "http://manifest.ws-dl.cs.odu.edu/manifest/20181223114355/c6ad485819abb e20e37c0632843081710c95f94829f59bbe3b6ad3251d93f7d2/https://web.archiv e.org/web/2018121102034/https://2019.jcdl .org/", "uri-r": "https://2019.jcdl.org/", "uri-m": "https://web.archive.org/web/20181219102034/https://2019.jcdl.org/", "memento-datetime": "Wed, 19 Dec 2018 10:20:34 GMT", "http-headers": { "Content-Type": "text/html; charset=UTF-8", "X-Archive-Orig-date": "Wed, 19 Dec 2018 10:20:36 GMT", "X-Archive-Orig-link": "<https://2019.jcdl.org/wp-json/>; rel="https://api.w.org/"", "Preference-Applied": "original-links, original-content” }, "hash-constructor": "(curl -s '$uri-m' && echo -n '$Content-Type $X-Archive- Orig-date $X-Archive-O rig-link') | tee >(sha256sum) >(md5sum) >/dev/null | cut -d ' ' -f 1 | paste -d':’ <(echo -e 'md5nsha256') - | paste -d' ' - -", "hash": "md5:969d7aba4c16444a6544bdc39eefe394 sha256:c68a215eb1c3edbf51f565b9 a87f49646456369e51791a86106a6667630737a6" } A manifest file example • Including how hashes are computed • Hashes are computed on only base HTML file • Compute fixity on things that should not change like certain original HTTP response headers Trusty URI 17 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 18. Atomic approach (step 4): Push manifests into multiple archives • In this example, the memento is in the Internet Archive (IA) and its fixity information is disseminated to four archives including IA • An attacker would have to hack a majority of 4 domains (archives) https://archive.is/20181224093334/http://manifest. ws-dl.cs.odu.edu/manifest/https://web.archive.org/ web/20181224085329/https://2019.jcdl.org/ https://web.archive.org/web/20181224093355/http:// manifest.ws-dl.cs.odu.edu/manifest/https://web.arc hive.org/web/20181224085329/https://2019.jcdl.org/ https://perma-archives.org/warc/20181224093354/htt p://manifest.ws-dl.cs.odu.edu/manifest/https://web .archive.org/web/20181224085329/https://2019.jcdl. org/ http://www.webcitation.org/74tvdsyxemanifest.ws-dl.cs.odu.edu/ manifest/https://web.archi ve.org/web/20181224085329/ https://2019.jcdl.org/ 18 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 19. Block approach (step 1): Batch together fixity information of multiple mementos in a single file (block) • Adding additional metadata (e.g., created_at, fields, …) • The hash of the previous block must be added !context ["http://oduwsdl.github.io/contexts/fixity"] !fields {keys: ["surt"]} !id {uri: "https://manifest.ws-dl.cs.odu.edu/"} !meta {created_at: "20190111181327"} !meta {prev_block:"sha256:d4eb1190f9aaae9542...845b632eb2b3f4f098a34144d"} !meta {type: "FixityBlock"} org,archive,web)/web/19961022175434/http://search.com org,archive,web)/web/19961219082428/http://sho.com org,archive,web)/web/19961223174001/http://reference.com … 19 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 20. Block approach (step 2): Publish the block file at the Archival Fixity server always redirects to the latest published block manifest.ws-dl.cs.odu.edu/blocks The blocks entrypoint 20 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 21. Block approach (step 3): Push the blocks entrypoint into multiple archives https://manifest.ws-dl .cs.odu.edu/blocks https://web.archive.org/web/20190121054059 /https://manifest.ws-dl.cs.odu.edu/blocks/7bbf 757046ac0a0a60015a1cb847c3189160d18c809 b210073822df157609e01 • Will result in archiving the latest published block as well https://perma.cc/8YG3-X7KN 21 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 22. Three steps to verify the fixity of a memento 1. Discover a manifest/block • In Atomic approach, this includes discovering the archived manifests 2. Compute current fixity information of the memento 3. Compare current fixity information with the discovered manifests/block. $ curl -sI https://manifest.ws-dl.cs.odu.edu/manifest/https://web.archive.org/web/ 20171115140705/http://rln.fm/ | egrep -i "(HTTP/|^location:)” HTTP/2 302 location: https://manifest.ws-dl.cs.odu.edu/manifest/20181212074423/bd669de8835e38 d54651fe9d04709515beec0c727db82a5366f4bc2506e103d8/https://web.archive.org/web/201 71115140705/http://rln.fm/ An example of discovering the latest manifest in the Archival Fixity server for the memento web.archive.org/web/2017111 5140705/http://rln.fm/ 22 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 23. Evaluation • A data set of 1,000 mementos from the Internet Archive • For each memento, we generated and disseminated 3 manifests to 4 archives 23 • The average size of a manifest file is 1,157 bytes • The manifest size represents 2.79% of the actual download HTML Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 24. 24 Increasing the number of records per block reduces the block generation time Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 25. 25 The Block approach creates fewer resources in archives than the Atomic approach • Given a collection of N = 1,000 mementos • K = 4 web archives • The selected block size B = 100 records per block • The total number of resources created in the archives: • Atomic (N ∗ K) = 4,000 • Block (k ∗ (N/B)) = 40 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 26. Dissemination/download time varies from one archive to another 26 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 27. It takes 1.25x, 4x and 36x longer to disseminate a manifest to perma.cc, archive.org, and webcitation.org, respectively, than archive.is 27 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 28. It takes 3.5x longer to disseminate a manifest to archive.org than perma.cc 28 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 29. Average time for discovering published fixity information 29 Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 30. The Block approach performs 4.46x faster than the Atomic approach in verifying the fixity of mementos 30 • The fixity verification time includes: • Discovering manifests • Computing current fixity information • Downloading the archived manifests • Comparing results • On average, the verification time of a memento is 6.65 seconds by the Atomic approach and 1.49 seconds by the Block approach Archive Assisted Archival Fixity Verification Framework · JCDL 2019 · June 4, 2019 · Urbana-Champaign, Illinois · @WebSciDL
  • 31. Conclusions 31 • The proposed approaches do not require any changes in the infrastructure of web archives • The Block approach creates fewer resources in archives and reduces fixity verification time (4.46x faster than the Atomic approach) • The Atomic approach has the ability to verify the fixity of archived pages even without using the Archival Fixity server • Varying/increasing the block size could be one potential solution to improve the Block approach performance and reduce the number of resources created in archives • Caching archived manifests/blocks in the Archival Fixity server should improve the performance of both approaches