SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
A Collaborative, Secure, and Private
InterPlanetary Wayback Web Archiving System Using IPFS
Mat Kelly
Old Dominion University
Norfolk, Virginia, USA
@machawk1
David Dias
Protocol Labs
Planet Earth
@daviddias
https://github.com/oduwsdl/ipwb https://ipfs.io
IIPC Web Archiving Conference ‱ June 15, 2017 ‱ London, UK
w/ Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Outline
● InterPlanetary File System Motivation & Design
● InterPlanetary Wayback Motivation & Design
● How IPFS/IPWB relate, relevancy to Web archiving
● Advances in IPFS/IPWB
● Demo(s)
IPFS
Motivation & Design
wip
wip
wip
wip
wip
wip
wip
wip
wip
wip
Add talks here
Motivation & Design
Motivation
● Persistence of archived Web data dependent on resilience
of organization and availability of data
● Remove massive redundancy in Web archive files of
exact duplicate content
● Determine feasibility of pushing WARCs into IPFS
Design
● Extending the CDXJ Format
● Indexing and IPFS Dissemination Procedure
● Replay and IPFS Pull Procedure
index replay
Design - CDXJ Format
com,example)/index.html 20170301192639 {"mime_type": "text/html",
"status_code": "200"}
com,example)/images/frog.png 20170301192639 {"mime_type": "image/png",
"status_code": "200"}
https://github.com/oduwsdl/ORS/wiki/CDXJ
Alam et al. “Web Archive Profiling Through CDX Summarization”, TPDL 2015
See:
Design - CDXJ Format
com,example)/index.html 20170301192639 {"locator":
"urn:ipfs/QmPdyY6Pm66iWtGpTc7PqK11hvsnYSKMVL57G69RiNjGcm/QmNZ6m
KSSAXAmXEocQj5gT4y4kdcr5D2C173ubWJ6PSKEZ", "mime_type": "text/html",
"status_code": "200"}
com,example)/images/frog.png 20170301192639 {"locator":
"urn:ipfs/QmUeko8zM7Xanwz6F9GtRH4rLAi4Poj3EMECGsci3BRQfs/QmPhMnX
74cwqx2xgj9d3N3gTra8CzafXwSbUwU8xagMfqR", "mime_type": "image/png",
"status_code": "200"}
Design
ipwb Design - “Indexing” Process
1. Extract HTTP Response from WARC
○ HTTP header and entity body (payload) separately
2. Push header and payload to IPFS, retain hashes
3. Construct CDXJ record containing:
○ URI of original resource (URI-R)
○ Datetime
○ Locator: urn:ipfs/headerHash/payloadHash
4. Repeat for each WARC-Response record
5. Save locally as CDXJ file
HTTP header
block
HTTP
payload
block
Design - Replay
1. Identify CDXJ line w/ URI-R + datetime
2. Fetch content for header and payload from IPFS using locator
3. Reassemble content into HTTP response, serve to browser
4. Repeat for each embedded resource requested
Advancements
Privacy, Collaboration, and Security
● Encryption on indexing/dissemination, decryption on replay
com,mywebsite)/photos/vacation 20170605083914 {
"locator": "urn:ipfs/QmdmV...P9Hf/QmRDB...1Bz2P",
"encryption_method": "xor", “encryption_key”:
“my#Gre4t#Encrypti0n#K3y!”, "mime_type": "text/html", "status_code": "200"}
● IPWB CDXJs may be transferred for our users’ replay
● CDXJ-by-hash recursive fetch/replay
○ Share hash of CDXJ then $ ipwb replay hash to replicate experience
Privacy, Collaboration, and Security
index replay
... QmVvshF...dXJ
3
Push CDXJ to IPFS
Other ipwb Advancements
● Rerouting (instead of Rewriting) for Archival Replay*
○ IPWB replay registers ServiceWorker
■ Intercepts requests from archival replay to live Web
○ Prevents live Web from “leaking into” the archive on replay
● Memento Support
○ Replay system serves TimeMap, Timegate, and Datetime (memento) resolution endpoints
○ http://localhost/timemap/http://mywebsite.com/photos/vacation
○ http://localhost/memento/20170605092450/http://mywebsite.com/photos/vacation
* To be presented at JCDL 2017 in Toronto, Canada, June 19-23, 2017
A Collaborative, Secure, and Private
InterPlanetary Wayback Web Archiving System Using IPFS
Mat Kelly
Old Dominion University
Norfolk, Virginia, USA
@machawk1
David Dias
Protocol Labs
Planet Earth
@daviddias
https://github.com/oduwsdl/ipwb https://ipfs.io
IIPC Web Archiving Conference ‱ June 15, 2017 ‱ London, UK
w/ Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Demo(s)

Weitere Àhnliche Inhalte

Was ist angesagt?

Integration with hdfs using WebDFS and NFS
Integration with hdfs using WebDFS and NFSIntegration with hdfs using WebDFS and NFS
Integration with hdfs using WebDFS and NFS
Christophe Marchal
 

Was ist angesagt? (20)

InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)
 
Indexing Decentralized Data with Ethereum, IPFS & The Graph
Indexing Decentralized Data with Ethereum, IPFS & The GraphIndexing Decentralized Data with Ethereum, IPFS & The Graph
Indexing Decentralized Data with Ethereum, IPFS & The Graph
 
Redecentralizing the Web: IPFS and Filecoin
Redecentralizing the Web: IPFS and FilecoinRedecentralizing the Web: IPFS and Filecoin
Redecentralizing the Web: IPFS and Filecoin
 
RDM#2- The Distributed Web
RDM#2- The Distributed WebRDM#2- The Distributed Web
RDM#2- The Distributed Web
 
Node.js Interactive
Node.js InteractiveNode.js Interactive
Node.js Interactive
 
Archive What I See Now: Personal Web Archiving with WARCs
Archive What I See Now: Personal Web Archiving with WARCsArchive What I See Now: Personal Web Archiving with WARCs
Archive What I See Now: Personal Web Archiving with WARCs
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Join the super_colony_-_feb2013
Join the super_colony_-_feb2013
 
Developing apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiDeveloping apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapi
 
Integration with hdfs using WebDFS and NFS
Integration with hdfs using WebDFS and NFSIntegration with hdfs using WebDFS and NFS
Integration with hdfs using WebDFS and NFS
 
WebHDFS at King - May 2014 Hadoop MeetUp
WebHDFS at King - May 2014 Hadoop MeetUpWebHDFS at King - May 2014 Hadoop MeetUp
WebHDFS at King - May 2014 Hadoop MeetUp
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
RGW S3: Features vs deep compatibility - Robin Johnson
RGW S3: Features vs deep compatibility  - Robin JohnsonRGW S3: Features vs deep compatibility  - Robin Johnson
RGW S3: Features vs deep compatibility - Robin Johnson
 
wget, curl and scp
wget, curl and scpwget, curl and scp
wget, curl and scp
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
gRPC on .NET Core - NDC Sydney 2019
gRPC on .NET Core - NDC Sydney 2019gRPC on .NET Core - NDC Sydney 2019
gRPC on .NET Core - NDC Sydney 2019
 

Ähnlich wie IPWB and IPFS at WAC2017

Decentralized possibilities with filecoin & ipfs_encode filecoin club
Decentralized possibilities with filecoin & ipfs_encode filecoin clubDecentralized possibilities with filecoin & ipfs_encode filecoin club
Decentralized possibilities with filecoin & ipfs_encode filecoin club
KlaraOrban
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
Ceph Community
 

Ähnlich wie IPWB and IPFS at WAC2017 (20)

Decentralized possibilities with filecoin & ipfs_encode filecoin club
Decentralized possibilities with filecoin & ipfs_encode filecoin clubDecentralized possibilities with filecoin & ipfs_encode filecoin club
Decentralized possibilities with filecoin & ipfs_encode filecoin club
 
Introduction to Filecoin
Introduction to Filecoin   Introduction to Filecoin
Introduction to Filecoin
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker Containers
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
 
Core os dna_automacon
Core os dna_automaconCore os dna_automacon
Core os dna_automacon
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph Objects
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioSpark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
 
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
 
EWD 3 Training Course Part 1: How Node.js Integrates With Global Storage Data...
EWD 3 Training Course Part 1: How Node.js Integrates With Global Storage Data...EWD 3 Training Course Part 1: How Node.js Integrates With Global Storage Data...
EWD 3 Training Course Part 1: How Node.js Integrates With Global Storage Data...
 
Core os dna_oscon
Core os dna_osconCore os dna_oscon
Core os dna_oscon
 
Orchestrating stateful applications with PKS and Portworx
Orchestrating stateful applications with PKS and PortworxOrchestrating stateful applications with PKS and Portworx
Orchestrating stateful applications with PKS and Portworx
 
Orchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and PortworxOrchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and Portworx
 
Concourse - CI for the cloud
Concourse - CI for the cloudConcourse - CI for the cloud
Concourse - CI for the cloud
 
Concourse CI for the Cloud
Concourse CI for the CloudConcourse CI for the Cloud
Concourse CI for the Cloud
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 

Mehr von David Dias (8)

Enter Gossipsub, A scalable, extensible & hardened P2P PubSub Router protocol
Enter Gossipsub, A scalable, extensible & hardened P2P PubSub Router protocolEnter Gossipsub, A scalable, extensible & hardened P2P PubSub Router protocol
Enter Gossipsub, A scalable, extensible & hardened P2P PubSub Router protocol
 
browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck
 
Understanding The Community Lifecycle
Understanding The Community LifecycleUnderstanding The Community Lifecycle
Understanding The Community Lifecycle
 
P2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserP2P Resource Discovery for the Browser
P2P Resource Discovery for the Browser
 
Lisboa WebRTC - May 21, 2015 - Intro to WebRTC
Lisboa WebRTC - May 21, 2015 - Intro to WebRTCLisboa WebRTC - May 21, 2015 - Intro to WebRTC
Lisboa WebRTC - May 21, 2015 - Intro to WebRTC
 
Resource Discovery for the Web Platform using a P2P Overlay Network with WebR...
Resource Discovery for the Web Platform using a P2P Overlay Network with WebR...Resource Discovery for the Web Platform using a P2P Overlay Network with WebR...
Resource Discovery for the Web Platform using a P2P Overlay Network with WebR...
 
TriConf 2014 - LXJS, the Lisbon Javascript Conference
TriConf 2014 - LXJS, the Lisbon Javascript ConferenceTriConf 2014 - LXJS, the Lisbon Javascript Conference
TriConf 2014 - LXJS, the Lisbon Javascript Conference
 
JSConfBR - Securing Node.js App, by the community and for the community
JSConfBR - Securing Node.js App, by the community and for the community JSConfBR - Securing Node.js App, by the community and for the community
JSConfBR - Securing Node.js App, by the community and for the community
 

KĂŒrzlich hochgeladen

Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

KĂŒrzlich hochgeladen (20)

Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

IPWB and IPFS at WAC2017

  • 1. A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving System Using IPFS Mat Kelly Old Dominion University Norfolk, Virginia, USA @machawk1 David Dias Protocol Labs Planet Earth @daviddias https://github.com/oduwsdl/ipwb https://ipfs.io IIPC Web Archiving Conference ‱ June 15, 2017 ‱ London, UK w/ Sawood Alam, Michael L. Nelson, and Michele C. Weigle
  • 2. Outline ● InterPlanetary File System Motivation & Design ● InterPlanetary Wayback Motivation & Design ● How IPFS/IPWB relate, relevancy to Web archiving ● Advances in IPFS/IPWB ● Demo(s)
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. wip
  • 10. wip
  • 11. wip
  • 12. wip
  • 13. wip
  • 14.
  • 15. wip
  • 16. wip
  • 17. wip
  • 18. wip
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 40. Motivation ● Persistence of archived Web data dependent on resilience of organization and availability of data ● Remove massive redundancy in Web archive files of exact duplicate content ● Determine feasibility of pushing WARCs into IPFS
  • 41. Design ● Extending the CDXJ Format ● Indexing and IPFS Dissemination Procedure ● Replay and IPFS Pull Procedure index replay
  • 42. Design - CDXJ Format com,example)/index.html 20170301192639 {"mime_type": "text/html", "status_code": "200"} com,example)/images/frog.png 20170301192639 {"mime_type": "image/png", "status_code": "200"} https://github.com/oduwsdl/ORS/wiki/CDXJ Alam et al. “Web Archive Profiling Through CDX Summarization”, TPDL 2015 See:
  • 43. Design - CDXJ Format com,example)/index.html 20170301192639 {"locator": "urn:ipfs/QmPdyY6Pm66iWtGpTc7PqK11hvsnYSKMVL57G69RiNjGcm/QmNZ6m KSSAXAmXEocQj5gT4y4kdcr5D2C173ubWJ6PSKEZ", "mime_type": "text/html", "status_code": "200"} com,example)/images/frog.png 20170301192639 {"locator": "urn:ipfs/QmUeko8zM7Xanwz6F9GtRH4rLAi4Poj3EMECGsci3BRQfs/QmPhMnX 74cwqx2xgj9d3N3gTra8CzafXwSbUwU8xagMfqR", "mime_type": "image/png", "status_code": "200"}
  • 45. ipwb Design - “Indexing” Process 1. Extract HTTP Response from WARC ○ HTTP header and entity body (payload) separately 2. Push header and payload to IPFS, retain hashes 3. Construct CDXJ record containing: ○ URI of original resource (URI-R) ○ Datetime ○ Locator: urn:ipfs/headerHash/payloadHash 4. Repeat for each WARC-Response record 5. Save locally as CDXJ file HTTP header block HTTP payload block
  • 46. Design - Replay 1. Identify CDXJ line w/ URI-R + datetime 2. Fetch content for header and payload from IPFS using locator 3. Reassemble content into HTTP response, serve to browser 4. Repeat for each embedded resource requested
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. Privacy, Collaboration, and Security ● Encryption on indexing/dissemination, decryption on replay com,mywebsite)/photos/vacation 20170605083914 { "locator": "urn:ipfs/QmdmV...P9Hf/QmRDB...1Bz2P", "encryption_method": "xor", “encryption_key”: “my#Gre4t#Encrypti0n#K3y!”, "mime_type": "text/html", "status_code": "200"}
  • 54. ● IPWB CDXJs may be transferred for our users’ replay ● CDXJ-by-hash recursive fetch/replay ○ Share hash of CDXJ then $ ipwb replay hash to replicate experience Privacy, Collaboration, and Security index replay ... QmVvshF...dXJ 3 Push CDXJ to IPFS
  • 55. Other ipwb Advancements ● Rerouting (instead of Rewriting) for Archival Replay* ○ IPWB replay registers ServiceWorker ■ Intercepts requests from archival replay to live Web ○ Prevents live Web from “leaking into” the archive on replay ● Memento Support ○ Replay system serves TimeMap, Timegate, and Datetime (memento) resolution endpoints ○ http://localhost/timemap/http://mywebsite.com/photos/vacation ○ http://localhost/memento/20170605092450/http://mywebsite.com/photos/vacation * To be presented at JCDL 2017 in Toronto, Canada, June 19-23, 2017
  • 56. A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving System Using IPFS Mat Kelly Old Dominion University Norfolk, Virginia, USA @machawk1 David Dias Protocol Labs Planet Earth @daviddias https://github.com/oduwsdl/ipwb https://ipfs.io IIPC Web Archiving Conference ‱ June 15, 2017 ‱ London, UK w/ Sawood Alam, Michael L. Nelson, and Michele C. Weigle