SlideShare ist ein Scribd-Unternehmen logo
1 von 36
LOTS OF COPIES KEEP STUFF SAFE
Lots More LOCKSS for Web
Archiving: Boons from the
LOCKSS Software Re-Architecture
Nicholas Taylor (@nullhandle)
Program Manager, LOCKSS and Web Archiving
Stanford University Libraries
Web Archiving Week
15 June 2017
overview
• LOCKSS background
• software re-
architecture
• software components
• roadmap
“LAX on take off” by Doug under CC BY-NC-ND 2.0
LOCKSS Background
“Padlocks I” by Domiriel under CC BY-NC 2.0
beginnings
• a serials librarian + a
computer scientist
• print journals → Web
• conserve library’s role as
preserver
• collect from publishers’
websites
• preserve w/ cheap,
distributed, library-
managed hardware
• disseminate when
unavailable from
publisher
Internet Archive: “LOCKSS | Lots of Copies Keep Stuff Safe”
present day
• financially self-
sustaining
• tens of networks
• hundreds of institutions
• all types of content
“LOCKSS | Lots of Copies Keep Stuff Safe”
lots of copies
“The lost cannot be
recovered; but let us save
what remains: not by
vaults and locks which
fence them from the
public eye and use, in
consigning them to the
waste of time, but by such
a multiplication of copies,
as shall place them
beyond the reach of
accident.”
National Archives: “from Thomas Jefferson to Ebenezer Hazard, 18 February 1791”
“Thomas Jefferson” by Rembrandt Peale under Public Domain
decentralized copies
• no monopoly on copy-
making
• independent, de-
correlated copies
• no central point of
failure or vulnerability
• local custody, self-
determination
“Domino's” by david pacey under CC BY 2.0
articulated threat model
• long-term bit integrity
is a hard problem
• more (correlated)
copies doesn’t
necessarily keep stuff
safe
• don’t underestimate:
• people making mistakes
• attacks on information
• organizational failure
“Fragile” by Garrett Coakley under CC BY-NC 2.0
community-validated
• built upon peer-
reviewed research
• successfully operating
for almost 20 years
• CRL TRAC assessment
of CLOCKSS
• overall score matching
previous best
• only perfect technology
score awarded to date
Software Re-Architecture
“The two bridges” by Frank Schulenburg under BY-SA 2.0
why re-architect LOCKSS?
• reduce support +
operations costs
• de-silo components +
enable external
integration
• prepare to evolve w/
the Web
“New York Reflection” by Reto Fetz under CC BY-NC-SA 2.0
aligning with web archiving
Web ARChive (WARC) format compatible technologies
• Heritrix
• OpenWayback
• WarcBase
• Web Archiving Proxy
12
web archiving system APIs (WASAPI)
leveraging community components
Software Components
“Gears” by William Warby under CC BY 2.0
bibliographic metadata extraction
functionality
• for web harvest + file
transfer content
• map values in DOM
tree to metadata fields
• retrieve downloadable
metadata from
expected URL patterns
• parse RIS + XML by
schema
fields
• creator
• title
• published year
• volume
• issue
• article name
publisher/platform heuristics
• plugins making
bibliographic objects +
their metadata on
many publishing
platforms machine-
intelligible
• a framework for
creating such plugins
• useful in absence of
standard conventions,
e.g. Signposting
use cases for metadata extraction
• apply to consistent
subsets of content in
larger corpora
• curate OA materials
within broader crawls
• retrieve faculty
publications posted
online, license allowing
• describe sub-sites
collected while self-
archiving from a single
institutional CMS
“Stanford University”
“ARCADE”
discovery via bibliographic metadata
• submit DOI or
OpenURL query
• get OpenWayback
access URLs
• integrate w/
OpenURL resolver
Stanford Libraries: “Find eJournal”
on-access format migration
• as described in 2005 D-
Lib paper by DSHR et al
• low obsolescence risk
suggested by research
from Holden, Jackson
• implement upstream in
OpenWayback
• example: X-BitMap →
GIF migration
Stanford Libraries: “WorldWideWeb SLAC Home Page”
on-access format migration
• as described in 2005 D-
Lib paper by DSHR et al
• low obsolescence risk
suggested by research
from Holden, Jackson
• implement upstream in
OpenWayback
• example: X-BitMap →
GIF migration
Stanford Libraries: “WorldWideWeb SLAC Home Page”
audit + repair protocol
• core preservation capability
• network nodes conduct
polls to validate integrity of
distributed copies of data
chunks
• more nodes = more security
• more nodes can be down
• more copies can be
corrupted
• …and polls will still conclude
• nonces force re-hashing
• peers are untrusted
• polls are slow, to allow
damage detection “DSC_4346” by Dennis Jarvis under BY-SA 2.0
harvest from content source
content
source
LOCKSS
network
initiate poll w/ nonce
What is the hash of
9815 + chunk 173?
hashing
nodes respond to poll
Hash of 9815 +
chunk 173 is 3716
votes for 3716: 1
nodes respond to poll
Hash of 9815 +
chunk 173 is 3716
votes for 3716: 2
nodes respond to poll
Hash of 9815 +
chunk 173 is 8413
votes for 3716: 2
votes for 8413: 1
initiate poll w/ nonce
Hash of 9815 +
chunk 173 is 3716
votes for 3716: 3
votes for 8413: 1
repair from content source
content
source
LOCKSS
network
use cases for audit + repair
• other distributed
digital preservation
networks
• repository storage
replication layers
• would like to support
varied back-ends:
tape, cloud, etc.
Roadmap
“Milestone?” by Matteo De Toffoli under CC BY-NC-SA 2.0
development progress
• access WARC-stored
content via:
• DOI
• OpenURL
• Memento (URL)
• Solr full-text search
• web services:
• metadata extraction
• metadata query
“Milestones” by Dheeraj Nagwani under CC BY-NC-ND 2.0
looking ahead
• by end of 2017
• Docker-ize components
• web harvest framework
• polling + repair web
service
• by end of 2018
• IP address + Shibboleth
access via OpenWayback
• OpenWayback on-access
format migration
• full-text search web
service “Printemps, work in progress” by Eric Gjerde under CC BY-NC 2.0
follow + plug in
• development
(periodically) being
pushed to Github
• moving toward more
community-oriented
software development
• announcement of work
cycles
• sprint closeout reports
+ demos
• community engagement
Github: “LOCKSS (Lots Of Copies Keep Stuff Safe)”
questions for you
• what potential do you see for LOCKSS technologies
for web archiving, other use cases?
• what standards or technologies could we use that
we maybe haven’t considered?
• how could we help you to use LOCKSS
technologies?
• how would you like to see LOCKSS plug in more to
the web archiving community?

Weitere ähnliche Inhalte

Was ist angesagt?

BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...
BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...
BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...BlueHat Security Conference
 
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosop
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosopThe KeeeX Messenger teaser 2016-05-19-lh-xofad-bosop
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosopLaurent Henocque
 
How One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomHow One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomLaurent Henocque
 
MRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker PatternMRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker PatternNGINX, Inc.
 
Monitoring for DNS Security
Monitoring for DNS SecurityMonitoring for DNS Security
Monitoring for DNS SecurityThousandEyes
 
The KeeeX Teaser xukih-masav-dafik
The KeeeX Teaser xukih-masav-dafikThe KeeeX Teaser xukih-masav-dafik
The KeeeX Teaser xukih-masav-dafikLaurent Henocque
 
Introduction to Filecoin
Introduction to Filecoin   Introduction to Filecoin
Introduction to Filecoin Vanessa Lošić
 
Module: Welcome to Web 3.0
Module: Welcome to Web 3.0Module: Welcome to Web 3.0
Module: Welcome to Web 3.0Ioannis Psaras
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraGlobus
 
Tokyo azure meetup #9 azure update, october
Tokyo azure meetup #9   azure update, octoberTokyo azure meetup #9   azure update, october
Tokyo azure meetup #9 azure update, octoberTokyo Azure Meetup
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018Globus
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusGlobus
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Globus
 
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
 

Was ist angesagt? (19)

Encode
EncodeEncode
Encode
 
BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...
BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...
BlueHat v17 || 28 Registrations Later: Measuring the Exploitation of Residual...
 
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosop
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosopThe KeeeX Messenger teaser 2016-05-19-lh-xofad-bosop
The KeeeX Messenger teaser 2016-05-19-lh-xofad-bosop
 
How One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomHow One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonom
 
MRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker PatternMRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker Pattern
 
ION Islamabad - What's Happening at the IETF?
ION Islamabad - What's Happening at the IETF?ION Islamabad - What's Happening at the IETF?
ION Islamabad - What's Happening at the IETF?
 
Monitoring for DNS Security
Monitoring for DNS SecurityMonitoring for DNS Security
Monitoring for DNS Security
 
The KeeeX Teaser xukih-masav-dafik
The KeeeX Teaser xukih-masav-dafikThe KeeeX Teaser xukih-masav-dafik
The KeeeX Teaser xukih-masav-dafik
 
Introduction to Filecoin
Introduction to Filecoin   Introduction to Filecoin
Introduction to Filecoin
 
Module: Welcome to Web 3.0
Module: Welcome to Web 3.0Module: Welcome to Web 3.0
Module: Welcome to Web 3.0
 
Nzitf Velociraptor Workshop
Nzitf Velociraptor WorkshopNzitf Velociraptor Workshop
Nzitf Velociraptor Workshop
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
 
Tokyo azure meetup #9 azure update, october
Tokyo azure meetup #9   azure update, octoberTokyo azure meetup #9   azure update, october
Tokyo azure meetup #9 azure update, october
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
RSA APJ Velociraptor Lab
RSA APJ Velociraptor LabRSA APJ Velociraptor Lab
RSA APJ Velociraptor Lab
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)
 
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
 

Ähnlich wie Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Architecture

Unlocking LOCKSS with APIs
Unlocking LOCKSS with APIsUnlocking LOCKSS with APIs
Unlocking LOCKSS with APIsnullhandle
 
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS ProgramLots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Programnullhandle
 
Tool Academy: Web Archiving
Tool Academy: Web ArchivingTool Academy: Web Archiving
Tool Academy: Web Archivingnullhandle
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congressnullhandle
 
Produce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupalProduce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupalSTIinnsbruck
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introductionMuddassir Nazir
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรDr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02Kumar Gaurav
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionIWMW
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKrishna-Kumar
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 

Ähnlich wie Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Architecture (20)

Unlocking LOCKSS with APIs
Unlocking LOCKSS with APIsUnlocking LOCKSS with APIs
Unlocking LOCKSS with APIs
 
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS ProgramLots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
 
Tool Academy: Web Archiving
Tool Academy: Web ArchivingTool Academy: Web Archiving
Tool Academy: Web Archiving
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 
Produce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupalProduce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupal
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introduction
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource Condition
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 

Mehr von nullhandle

Understanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web ArchivesUnderstanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web Archivesnullhandle
 
Interoperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media ArchivingInteroperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media Archivingnullhandle
 
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...nullhandle
 
2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlights2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlightsnullhandle
 
Collection Development for Selective Web Archiving
Collection Development for Selective Web ArchivingCollection Development for Selective Web Archiving
Collection Development for Selective Web Archivingnullhandle
 
Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?nullhandle
 
WASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIsWASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIsnullhandle
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Togethernullhandle
 
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web ArchivingOutreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web Archivingnullhandle
 
Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!nullhandle
 
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...nullhandle
 
Campaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional ResearchCampaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional Researchnullhandle
 
2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlights2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlightsnullhandle
 
Considerations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection DevelopmentConsiderations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection Developmentnullhandle
 
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...nullhandle
 
Advocating for Web Archivability
Advocating for Web ArchivabilityAdvocating for Web Archivability
Advocating for Web Archivabilitynullhandle
 
Building Archivable Websites
Building Archivable WebsitesBuilding Archivable Websites
Building Archivable Websitesnullhandle
 
Link Persistence, Website Persistence
Link Persistence, Website PersistenceLink Persistence, Website Persistence
Link Persistence, Website Persistencenullhandle
 
From Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SULFrom Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SULnullhandle
 
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...nullhandle
 

Mehr von nullhandle (20)

Understanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web ArchivesUnderstanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web Archives
 
Interoperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media ArchivingInteroperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media Archiving
 
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
 
2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlights2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlights
 
Collection Development for Selective Web Archiving
Collection Development for Selective Web ArchivingCollection Development for Selective Web Archiving
Collection Development for Selective Web Archiving
 
Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?
 
WASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIsWASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIs
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Together
 
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web ArchivingOutreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
 
Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!
 
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
 
Campaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional ResearchCampaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional Research
 
2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlights2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlights
 
Considerations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection DevelopmentConsiderations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection Development
 
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
 
Advocating for Web Archivability
Advocating for Web ArchivabilityAdvocating for Web Archivability
Advocating for Web Archivability
 
Building Archivable Websites
Building Archivable WebsitesBuilding Archivable Websites
Building Archivable Websites
 
Link Persistence, Website Persistence
Link Persistence, Website PersistenceLink Persistence, Website Persistence
Link Persistence, Website Persistence
 
From Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SULFrom Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SUL
 
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
 

Kürzlich hochgeladen

Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 

Kürzlich hochgeladen (17)

Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 

Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Architecture

  • 1. LOTS OF COPIES KEEP STUFF SAFE Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Architecture Nicholas Taylor (@nullhandle) Program Manager, LOCKSS and Web Archiving Stanford University Libraries Web Archiving Week 15 June 2017
  • 2. overview • LOCKSS background • software re- architecture • software components • roadmap “LAX on take off” by Doug under CC BY-NC-ND 2.0
  • 3. LOCKSS Background “Padlocks I” by Domiriel under CC BY-NC 2.0
  • 4. beginnings • a serials librarian + a computer scientist • print journals → Web • conserve library’s role as preserver • collect from publishers’ websites • preserve w/ cheap, distributed, library- managed hardware • disseminate when unavailable from publisher Internet Archive: “LOCKSS | Lots of Copies Keep Stuff Safe”
  • 5. present day • financially self- sustaining • tens of networks • hundreds of institutions • all types of content “LOCKSS | Lots of Copies Keep Stuff Safe”
  • 6. lots of copies “The lost cannot be recovered; but let us save what remains: not by vaults and locks which fence them from the public eye and use, in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.” National Archives: “from Thomas Jefferson to Ebenezer Hazard, 18 February 1791” “Thomas Jefferson” by Rembrandt Peale under Public Domain
  • 7. decentralized copies • no monopoly on copy- making • independent, de- correlated copies • no central point of failure or vulnerability • local custody, self- determination “Domino's” by david pacey under CC BY 2.0
  • 8. articulated threat model • long-term bit integrity is a hard problem • more (correlated) copies doesn’t necessarily keep stuff safe • don’t underestimate: • people making mistakes • attacks on information • organizational failure “Fragile” by Garrett Coakley under CC BY-NC 2.0
  • 9. community-validated • built upon peer- reviewed research • successfully operating for almost 20 years • CRL TRAC assessment of CLOCKSS • overall score matching previous best • only perfect technology score awarded to date
  • 10. Software Re-Architecture “The two bridges” by Frank Schulenburg under BY-SA 2.0
  • 11. why re-architect LOCKSS? • reduce support + operations costs • de-silo components + enable external integration • prepare to evolve w/ the Web “New York Reflection” by Reto Fetz under CC BY-NC-SA 2.0
  • 12. aligning with web archiving Web ARChive (WARC) format compatible technologies • Heritrix • OpenWayback • WarcBase • Web Archiving Proxy 12
  • 13. web archiving system APIs (WASAPI)
  • 15. Software Components “Gears” by William Warby under CC BY 2.0
  • 16. bibliographic metadata extraction functionality • for web harvest + file transfer content • map values in DOM tree to metadata fields • retrieve downloadable metadata from expected URL patterns • parse RIS + XML by schema fields • creator • title • published year • volume • issue • article name
  • 17. publisher/platform heuristics • plugins making bibliographic objects + their metadata on many publishing platforms machine- intelligible • a framework for creating such plugins • useful in absence of standard conventions, e.g. Signposting
  • 18. use cases for metadata extraction • apply to consistent subsets of content in larger corpora • curate OA materials within broader crawls • retrieve faculty publications posted online, license allowing • describe sub-sites collected while self- archiving from a single institutional CMS “Stanford University” “ARCADE”
  • 19. discovery via bibliographic metadata • submit DOI or OpenURL query • get OpenWayback access URLs • integrate w/ OpenURL resolver Stanford Libraries: “Find eJournal”
  • 20. on-access format migration • as described in 2005 D- Lib paper by DSHR et al • low obsolescence risk suggested by research from Holden, Jackson • implement upstream in OpenWayback • example: X-BitMap → GIF migration Stanford Libraries: “WorldWideWeb SLAC Home Page”
  • 21. on-access format migration • as described in 2005 D- Lib paper by DSHR et al • low obsolescence risk suggested by research from Holden, Jackson • implement upstream in OpenWayback • example: X-BitMap → GIF migration Stanford Libraries: “WorldWideWeb SLAC Home Page”
  • 22. audit + repair protocol • core preservation capability • network nodes conduct polls to validate integrity of distributed copies of data chunks • more nodes = more security • more nodes can be down • more copies can be corrupted • …and polls will still conclude • nonces force re-hashing • peers are untrusted • polls are slow, to allow damage detection “DSC_4346” by Dennis Jarvis under BY-SA 2.0
  • 23. harvest from content source content source LOCKSS network
  • 24. initiate poll w/ nonce What is the hash of 9815 + chunk 173?
  • 26. nodes respond to poll Hash of 9815 + chunk 173 is 3716 votes for 3716: 1
  • 27. nodes respond to poll Hash of 9815 + chunk 173 is 3716 votes for 3716: 2
  • 28. nodes respond to poll Hash of 9815 + chunk 173 is 8413 votes for 3716: 2 votes for 8413: 1
  • 29. initiate poll w/ nonce Hash of 9815 + chunk 173 is 3716 votes for 3716: 3 votes for 8413: 1
  • 30. repair from content source content source LOCKSS network
  • 31. use cases for audit + repair • other distributed digital preservation networks • repository storage replication layers • would like to support varied back-ends: tape, cloud, etc.
  • 32. Roadmap “Milestone?” by Matteo De Toffoli under CC BY-NC-SA 2.0
  • 33. development progress • access WARC-stored content via: • DOI • OpenURL • Memento (URL) • Solr full-text search • web services: • metadata extraction • metadata query “Milestones” by Dheeraj Nagwani under CC BY-NC-ND 2.0
  • 34. looking ahead • by end of 2017 • Docker-ize components • web harvest framework • polling + repair web service • by end of 2018 • IP address + Shibboleth access via OpenWayback • OpenWayback on-access format migration • full-text search web service “Printemps, work in progress” by Eric Gjerde under CC BY-NC 2.0
  • 35. follow + plug in • development (periodically) being pushed to Github • moving toward more community-oriented software development • announcement of work cycles • sprint closeout reports + demos • community engagement Github: “LOCKSS (Lots Of Copies Keep Stuff Safe)”
  • 36. questions for you • what potential do you see for LOCKSS technologies for web archiving, other use cases? • what standards or technologies could we use that we maybe haven’t considered? • how could we help you to use LOCKSS technologies? • how would you like to see LOCKSS plug in more to the web archiving community?

Hinweis der Redaktion

  1. update rosette