SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Tool Academy: Web Archiving
Nicholas Taylor
@nullhandle
Digital Cultural Heritage DC Meetup
December 20, 2012 “cobwebbed screw driver” by Flickr user Colby Gutierrez-Kraybill under CC BY 2.0
what does a web archive look
like?
W/ARC (web archive
container format) flat files, directory tree
“Buckets and Buckets” by Flickr user Josh Kenzer under CC BY-NC-SA 2.0 “Files” by Flickr user Artform Canada under CC BY-NC-ND 2.0
CAPTURE TOOLS
“Crab traps?” by Flickr user Aviruthia under CC BY-NC-SA 2.0
HTTrack
• small-scale website
copier
• recreates website
structure as
filesystem hierarchy
• Windows GUI or CLI
• *nix local web
service or CLI
http://www.httrack.com/
Heritrix
• web-scale archival
crawler
• WARC output
• configure and run
through web service
• Java app, runs best
on *nix
https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
Wget
• retrieve Internet-
accessible files
• supports WARC
output
• CLI utility
http://archiveteam.org/index.php?title=Wget_with_WARC_output
WARCreate
• archive single
webpage(?) to
WARC
• Chrome extension
• no production
release yet
• may eventually
bundle a self-
contained Wayback
Machine
http://matkelly.com/warcreate/
Warrick
• reconstruct website
from web archives
• uses Memento
protocol
• web service or
downloadable Perl
script
http://warrick.cs.odu.edu/
ArchiveFacebook
• archive an
individual
(authenticated)
Facebook profile
• Firefox add-on
https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/
REPLAY TOOLS
“Replay” by Flickr user shareski under CC BY-NC 2.0
Wayback Machine
• replay web
resources stored in
WARC and ARC files
• web service
provided by
Internet Archive
• also, downloadable
software package
• Java app (Tomcat),
runs best on *nix
https://github.com/internetarchive/wayback
MementoFox
• federated discovery
of web archive
resources
• uses Memento
protocol
• utility limited by
paucity of
aggregated indexes
https://addons.mozilla.org/en-us/firefox/addon/mementofox/
WORKFLOW TOOLS
“Replay” by Flickr user shareski under CC BY-NC 2.0
Web Curator Tool
• permissioning, job
scheduling,
harvesting, quality
review, storing
descriptive
metadata
• coupled with
Heritrix v1.0
• Java app (Tomcat)
http://webcurator.sourceforge.net/
NetarchiveSuite
• job scheduling,
data transfer to
preservation
system, proxy
replay
• built for national
domain crawls
• coupled with
Heritrix v1.0
• Java app (JMS)
https://sbforge.org/display/NAS/NetarchiveSuite
CINCH
• batch retrieval of
Internet-accessible
documents and
transfer to
preservation system
• web service for NC
state government
• also, downloadable
software package
• runs on *nix
http://cinch.nclive.org/Cinch/
HOSTED SERVICES
“Services” by Flickr user spodzone under CC BY-NC-ND 2.0
Archive-It
• integrated web
archiving platform
• uses Heritrix and
Wayback Machine
• contract service
provided by
Internet Archive
http://www.archive-it.org/
Web Archiving Service
• integrated web
archiving platform
• uses Heritrix and
Wayback Machine
• contract service
provided by
California Digital
Library
http://webarchives.cdlib.org/was
FILE UTILITIES
“Begin at the Beginning” by Flickr user kate e. did under CC BY-NC-SA 2.0
HTTrack2Arc
• convert HTTrack
output to ARC
format
• CLI Java utility
http://code.google.com/p/httrack2arc/
warc-tools
• parse and re-write
WARC files
• convert ARC files to
WARC files
• no production
release yet
• CLI Python utilities
http://code.hanzoarchives.com/warc-tools
Web Archive Transformation
(WAT) Utilities
• extract metadata
from WARCs for
data analysis
• read data from
local, http, or hdfs-
accessible W/ARCs
• output JSON
https://webarchive.jira.com/wiki/display/Ire
search/Web+Archive+Transformation+
%28WAT%29+Specification,+Utilities,
+and+Usage+Overview
thank you!
Nicholas Taylor
@nullhandle

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

Was ist angesagt? (20)

DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco Repository
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark Jobs
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
60 Admin Tips
60 Admin Tips60 Admin Tips
60 Admin Tips
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Oracle on kubernetes 101 - Dec/2021
Oracle on kubernetes 101 - Dec/2021Oracle on kubernetes 101 - Dec/2021
Oracle on kubernetes 101 - Dec/2021
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
 

Ähnlich wie Tool Academy: Web Archiving

A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distribution
Docker, Inc.
 
DockerCon Recap - Online Meetup by Ben Firshman
DockerCon Recap - Online Meetup by Ben FirshmanDockerCon Recap - Online Meetup by Ben Firshman
DockerCon Recap - Online Meetup by Ben Firshman
Docker, Inc.
 

Ähnlich wie Tool Academy: Web Archiving (20)

Unlocking LOCKSS with APIs
Unlocking LOCKSS with APIsUnlocking LOCKSS with APIs
Unlocking LOCKSS with APIs
 
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
 
A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distribution
 
Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)
 
Building Rich Internet Apps with Silverlight 2
Building Rich Internet Apps with Silverlight 2Building Rich Internet Apps with Silverlight 2
Building Rich Internet Apps with Silverlight 2
 
DockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image Distribution
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2
 
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on AzureDocker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
 
Alex Dias: how to build a docker monitoring solution
Alex Dias: how to build a docker monitoring solution Alex Dias: how to build a docker monitoring solution
Alex Dias: how to build a docker monitoring solution
 
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
 
Docker Dojo
Docker DojoDocker Dojo
Docker Dojo
 
DevOPS training - Day 2/2
DevOPS training - Day 2/2DevOPS training - Day 2/2
DevOPS training - Day 2/2
 
Docker Mentorweek beginner workshop notes
Docker Mentorweek beginner workshop notesDocker Mentorweek beginner workshop notes
Docker Mentorweek beginner workshop notes
 
WebWorks Development for BlackBerry PlayBook and Smartphones
WebWorks Development for BlackBerry PlayBook and SmartphonesWebWorks Development for BlackBerry PlayBook and Smartphones
WebWorks Development for BlackBerry PlayBook and Smartphones
 
Moby KubeCon 2017
Moby KubeCon 2017Moby KubeCon 2017
Moby KubeCon 2017
 
Containers in depth – Understanding how containers work to better work with c...
Containers in depth – Understanding how containers work to better work with c...Containers in depth – Understanding how containers work to better work with c...
Containers in depth – Understanding how containers work to better work with c...
 
DockerCon Recap - Online Meetup by Ben Firshman
DockerCon Recap - Online Meetup by Ben FirshmanDockerCon Recap - Online Meetup by Ben Firshman
DockerCon Recap - Online Meetup by Ben Firshman
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Docker Basics
Docker BasicsDocker Basics
Docker Basics
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 

Mehr von nullhandle

Mehr von nullhandle (20)

Understanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web ArchivesUnderstanding Legal Use Cases for Web Archives
Understanding Legal Use Cases for Web Archives
 
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS ProgramLots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
 
Interoperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media ArchivingInteroperability and Technical Collaboration for Web and Social Media Archiving
Interoperability and Technical Collaboration for Web and Social Media Archiving
 
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
 
2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlights2015 NDSA Web Archiving Survey Report Highlights
2015 NDSA Web Archiving Survey Report Highlights
 
Collection Development for Selective Web Archiving
Collection Development for Selective Web ArchivingCollection Development for Selective Web Archiving
Collection Development for Selective Web Archiving
 
Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?Why Not Lots of Copies Keep(ing) Software Safe?
Why Not Lots of Copies Keep(ing) Software Safe?
 
WASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIsWASAPI Web Archive Data Transfer APIs
WASAPI Web Archive Data Transfer APIs
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Together
 
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web ArchivingOutreach to Campus Webmasters for a Better Web, and Better Web Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
 
Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!Measure All the (Web Archiving) Things!
Measure All the (Web Archiving) Things!
 
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
 
Campaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional ResearchCampaign Web Archives to Support Multi-Institutional Research
Campaign Web Archives to Support Multi-Institutional Research
 
2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlights2013 NDSA Web Archiving Survey Report Highlights
2013 NDSA Web Archiving Survey Report Highlights
 
Considerations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection DevelopmentConsiderations for Strategic Web Archive Collection Development
Considerations for Strategic Web Archive Collection Development
 
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
 
Advocating for Web Archivability
Advocating for Web ArchivabilityAdvocating for Web Archivability
Advocating for Web Archivability
 
Building Archivable Websites
Building Archivable WebsitesBuilding Archivable Websites
Building Archivable Websites
 
Link Persistence, Website Persistence
Link Persistence, Website PersistenceLink Persistence, Website Persistence
Link Persistence, Website Persistence
 
From Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SULFrom Seed to Harvest: Web Archiving Program Considerations for SUL
From Seed to Harvest: Web Archiving Program Considerations for SUL
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Tool Academy: Web Archiving

Hinweis der Redaktion

  1. http://www.httrack.com/
  2. https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
  3. http://archiveteam.org/index.php?title=Wget_with_WARC_output
  4. http://matkelly.com/warcreate/
  5. http://warrick.cs.odu.edu/
  6. https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/
  7. https://github.com/internetarchive/wayback
  8. https://addons.mozilla.org/en-us/firefox/addon/mementofox/
  9. http://webcurator.sourceforge.net/
  10. https://sbforge.org/display/NAS/NetarchiveSuite
  11. http://cinch.nclive.org/Cinch/
  12. http://www.archive-it.org/
  13. http://webarchives.cdlib.org/was
  14. http://code.google.com/p/httrack2arc/
  15. http://code.hanzoarchives.com/warc-tools
  16. https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Transformation+%28WAT%29+Specification,+Utilities,+and+Usage+Overview