SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Archive What I See Now
Mat Kelly, Michael L. Nelson, Michele C. Weigle
Old Dominion University
{mkelly,mln,mweigle}@cs.odu.edu
Web Science and Digital Libraries Research Group
ws-dl.blogspot.com
What’s the Problem?
•
•
•
•

Web archives capture a lot but not everything
Individuals’ interests may not be captured
Timely capture is important
Capture capability must
be enabled for all

November 12, 2013
Salt Lake City, Utah

2
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

• Calls for seed URIs
are reactionary
• Not quick enough
for rapidly
evolving events

November 12, 2013
Salt Lake City, Utah

3
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

• Intermediate
mementos missed
• The story is
incomplete

November 12, 2013
Salt Lake City, Utah

4
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

November 12, 2013
Salt Lake City, Utah

5
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

November 12, 2013
Salt Lake City, Utah

6
2013 Archive-It Partner Meeting
The Amateur Archivist’s Approach
to Just-In-Time capture
• Users take ad hoc approaches
1. Screenshots of Pages

2. Other sub-optimal
approaches

November 12, 2013
Salt Lake City, Utah

7
2013 Archive-It Partner Meeting
Enabling The Amateur
Web Archivist
• Acknowledge the problem:
– THE TOOLS ARE DIFFICULT!

• Resolve the problem:
– Build more accessible tools (make it EASY)
– Appeal to standards (e.g., WARC)
– Make interoperable

November 12, 2013
Salt Lake City, Utah

28500:2009

8
2013 Archive-It Partner Meeting
The Institutional Dilemma
• Safety of Archives Requires $
• Institutions Require Funding
• Users’ Hard Drives Fail
– No Access to Save-As files
and Screenshots

• Hybrid approach needed
– Leverage institutional safety, formats, and tech
– allow direct user deposits

November 12, 2013
Salt Lake City, Utah

9
2013 Archive-It Partner Meeting
So we built it!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

1.
2.

Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage,"
In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp. 437-438
Mat Kelly, Michele C. Weigle , Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage,"
Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC.

November 12, 2013
Salt Lake City, Utah

10
2013 Archive-It Partner Meeting
WARCreate – How it Works

November 12, 2013
Salt Lake City, Utah

11
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

Facebook-Supplied Data Dump

Liberated Data Doesn’t Give The Whole Picture
November 12, 2013
Salt Lake City, Utah

12
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Using Scraping Tools (e.g. wget)

Archive created from
WARCreate in Wayback

The Target Controls What is Allowed
November 12, 2013
Salt Lake City, Utah

13
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

A Crawler Has No Context

No Credentials  No Entry  No Archiving
November 12, 2013
Salt Lake City, Utah

14
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

IA/HERITRIX OBEY ROBOTS

No Means No, if They Say and you Obey
November 12, 2013
Salt Lake City, Utah

15
2013 Archive-It Partner Meeting
So we built it!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

November 12, 2013
Salt Lake City, Utah

16
2013 Archive-It Partner Meeting
Users can now create WARCs!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

Users don’t know

WHAT TO DO
with WARC files
November 12, 2013
Salt Lake City, Utah

17
2013 Archive-It Partner Meeting
So, again, we built it!
Web Archiving Integration Layer (WAIL)
• Heritrix, Wayback, etc. packaged for PC
• GUI front-end allows “One-Click Preservation”
• Provides means to replay WARCs

1.
2.

November 12, 2013
Salt Lake City, Utah

Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving,"
Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD.
Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy,"
Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA

18

2013 Archive-It Partner Meeting
So, again, we built it!
Web Archiving Integration Layer (WAIL)
• Heritrix, Wayback, etc. packaged for PC
• GUI front-end allows “One-Click Preservation”
• Provides means to replay WARCs

November 12, 2013
Salt Lake City, Utah

19
2013 Archive-It Partner Meeting
The

Archive What I See Now
Project

November 12, 2013
Salt Lake City, Utah

20
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

21
2013 Archive-It Partner Meeting
Porting WARCreate to Firefox
• Disjoint extension/add-on APIs
– Little logic can be re-used

• Problems with HTTP header capture in
Chrome are trivial in Firefox
– Chrome = highly asynchronous fetching

• Code to save WARC to PC from browser
reusable in Firefox

November 12, 2013
Salt Lake City, Utah

22
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals

✓ In βeta now!

1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

23
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

24
2013 Archive-It Partner Meeting
Uploading WARCs:
An Open Question
• Working with Archive-It to determine
feasibility of user-provided WARCs
• Consideration of data integrity
• Should data be merged with A-IT crawled
WARCs?
– How do we account for
your www.facebook.com vs. my www.facebook.com

• Privacy?
November 12, 2013
Salt Lake City, Utah

25
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

26
2013 Archive-It Partner Meeting
Sequential Archiving?
• Similar to a focused crawl but URIs defined on
per-site basis to be comprehensive
– Akin to

but generalized

• Implemented into
WARCreate
• Utilize per-site specification to
keep tools from breaking★
personal stream

my tweets

news feed

streams

followees’ tweets

multimedia-photos

photos

photos

N/A

multimedia-videos

videos

videos

N/A

photo collection

albums

N/A

N/A

posts

notes

N/A

N/A

friends

November 12, 2013
Salt Lake City, Utah

posts

global stream

Discovery & Scraping:
The Information Retrieval Approach
- versus The Digital Libraries Approach★

wall

friends

circles

following

27
2013 Archive-It Partner Meeting
Online Hierarchy Definition
• Only (and optionally) applied on recognized sites
– scraping as fallback for establishing hierarchy

• Not limited to social media
– CNN.com, MSNBC.com, etc have similar hierarchies

• Lives online, tools allude to and are always
updated
• Standardized spec* prototype is live online
* M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012
November 12, 2013
Salt Lake City, Utah

28
2013 Archive-It Partner Meeting
Summary
• Firefox WARCreate in Beta
– Chrome WARCreate Users Can Currently
Archive What They See Now with
&

• Sequential Archiving Implemented in Chrome
WARCreate, needs porting
• Next Big Hurdle: Working with Archive-It in
WARC upload logistics

November 12, 2013
Salt Lake City, Utah

29
2013 Archive-It Partner Meeting
Archive What I See Now
• Download Our Archiving Tools!
Web Archiving Integration Layer (WAIL)
http://matkelly.com/WAIL
One-Click Preservation
Heritrix, Wayback and Others On Your PC!

WARCreate for Chrome

http://WARCreate.com
Create WARC files form any web page from your browser

• Share Your Use Cases for Capturing the
Unpreserved and the Unpreservable
• Help Us Improve Our Tools, Give Feedback!
http://bit.ly/wc-wail
November 12, 2013
Salt Lake City, Utah

version in beta
Available Soon!

30
2013 Archive-It Partner Meeting

Weitere ähnliche Inhalte

Was ist angesagt?

Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocksnaveenv
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKSnaveenv
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web ArchivingAnna Perricci
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Brian Gray
 
Visualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssVisualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssF. Tim Knight
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23Mary Bowman-Kruhm
 
Beyond Research Guides
Beyond Research GuidesBeyond Research Guides
Beyond Research GuidesWiLS
 
Nassau Library2
Nassau Library2Nassau Library2
Nassau Library2cara catalano
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipediamoniquekclark
 
Building Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedBuilding Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedMark-Shane Scale ♞
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Togethernullhandle
 
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisUsing Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisBrian Gray
 
Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Nathaniel Russell
 
Wikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: NotesWikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: NotesJeromy-Yu Maximiliain Chan
 
SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012bmmsben
 

Was ist angesagt? (20)

Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocks
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKS
 
Data visualization and school finance
Data visualization and school financeData visualization and school finance
Data visualization and school finance
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web Archiving
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
 
Visualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssVisualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ss
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23
 
Beyond Research Guides
Beyond Research GuidesBeyond Research Guides
Beyond Research Guides
 
Nassau Library2
Nassau Library2Nassau Library2
Nassau Library2
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Building Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedBuilding Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies Revised
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Together
 
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisUsing Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
 
Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013
 
Wikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: NotesWikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: Notes
 
Cyberlaw presentation
Cyberlaw presentationCyberlaw presentation
Cyberlaw presentation
 
Wikipedia
Wikipedia Wikipedia
Wikipedia
 
SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012
 

Ähnlich wie Archive What I See Now - Archive-It Partner Meeting 2013 2013

On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
 
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Patrick "Tod" Colegrove
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...ScottAinsworth
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingAcquia
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015Victoria Steeves
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...The Frick Collection
 
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workDetecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workMartin Hawksey
 
MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey scottsayre
 
Social Bookmarking:Del.icio.us ExposĂŠ
Social Bookmarking:Del.icio.us ExposĂŠSocial Bookmarking:Del.icio.us ExposĂŠ
Social Bookmarking:Del.icio.us ExposĂŠJudy O'Connell
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMSMatthew Critchlow
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked DataRichard Wallis
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked DataRichard Wallis
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarKristin Briney
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...Mat Kelly
 

Ähnlich wie Archive What I See Now - Archive-It Partner Meeting 2013 2013 (20)

On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content Management
 
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: Launching
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workDetecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
 
MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey
 
Social Bookmarking:Del.icio.us ExposĂŠ
Social Bookmarking:Del.icio.us ExposĂŠSocial Bookmarking:Del.icio.us ExposĂŠ
Social Bookmarking:Del.icio.us ExposĂŠ
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
Semantic wikis
Semantic wikisSemantic wikis
Semantic wikis
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMS
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked Data
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked Data
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG Webinar
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
 

Mehr von Mat Kelly

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderMat Kelly
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
 
Slides
SlidesSlides
SlidesMat Kelly
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mat Kelly
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Mat Kelly
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedMat Kelly
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookMat Kelly
 

Mehr von Mat Kelly (17)

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer Header
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web Archives
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web Archives
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
 
Slides
SlidesSlides
Slides
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link Restoration
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive Facebook
 

KĂźrzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

KĂźrzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Archive What I See Now - Archive-It Partner Meeting 2013 2013

  • 1. Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University {mkelly,mln,mweigle}@cs.odu.edu Web Science and Digital Libraries Research Group ws-dl.blogspot.com
  • 2. What’s the Problem? • • • • Web archives capture a lot but not everything Individuals’ interests may not be captured Timely capture is important Capture capability must be enabled for all November 12, 2013 Salt Lake City, Utah 2 2013 Archive-It Partner Meeting
  • 3. Timely Capture Is Important Use Case: Capturing Breaking Stories • Calls for seed URIs are reactionary • Not quick enough for rapidly evolving events November 12, 2013 Salt Lake City, Utah 3 2013 Archive-It Partner Meeting
  • 4. Timely Capture Is Important Use Case: Capturing Breaking Stories • Intermediate mementos missed • The story is incomplete November 12, 2013 Salt Lake City, Utah 4 2013 Archive-It Partner Meeting
  • 5. Timely Capture Is Important Use Case: Capturing Breaking Stories November 12, 2013 Salt Lake City, Utah 5 2013 Archive-It Partner Meeting
  • 6. Timely Capture Is Important Use Case: Capturing Breaking Stories November 12, 2013 Salt Lake City, Utah 6 2013 Archive-It Partner Meeting
  • 7. The Amateur Archivist’s Approach to Just-In-Time capture • Users take ad hoc approaches 1. Screenshots of Pages 2. Other sub-optimal approaches November 12, 2013 Salt Lake City, Utah 7 2013 Archive-It Partner Meeting
  • 8. Enabling The Amateur Web Archivist • Acknowledge the problem: – THE TOOLS ARE DIFFICULT! • Resolve the problem: – Build more accessible tools (make it EASY) – Appeal to standards (e.g., WARC) – Make interoperable November 12, 2013 Salt Lake City, Utah 28500:2009 8 2013 Archive-It Partner Meeting
  • 9. The Institutional Dilemma • Safety of Archives Requires $ • Institutions Require Funding • Users’ Hard Drives Fail – No Access to Save-As files and Screenshots • Hybrid approach needed – Leverage institutional safety, formats, and tech – allow direct user deposits November 12, 2013 Salt Lake City, Utah 9 2013 Archive-It Partner Meeting
  • 10. So we built it! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim 1. 2. Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp. 437-438 Mat Kelly, Michele C. Weigle , Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC. November 12, 2013 Salt Lake City, Utah 10 2013 Archive-It Partner Meeting
  • 11. WARCreate – How it Works November 12, 2013 Salt Lake City, Utah 11 2013 Archive-It Partner Meeting
  • 12. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback Facebook-Supplied Data Dump Liberated Data Doesn’t Give The Whole Picture November 12, 2013 Salt Lake City, Utah 12 2013 Archive-It Partner Meeting
  • 13. Preserving the Original Context Use Case: Capturing Facebook Using Scraping Tools (e.g. wget) Archive created from WARCreate in Wayback The Target Controls What is Allowed November 12, 2013 Salt Lake City, Utah 13 2013 Archive-It Partner Meeting
  • 14. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback A Crawler Has No Context No Credentials  No Entry  No Archiving November 12, 2013 Salt Lake City, Utah 14 2013 Archive-It Partner Meeting
  • 15. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback IA/HERITRIX OBEY ROBOTS No Means No, if They Say and you Obey November 12, 2013 Salt Lake City, Utah 15 2013 Archive-It Partner Meeting
  • 16. So we built it! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim November 12, 2013 Salt Lake City, Utah 16 2013 Archive-It Partner Meeting
  • 17. Users can now create WARCs! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim Users don’t know WHAT TO DO with WARC files November 12, 2013 Salt Lake City, Utah 17 2013 Archive-It Partner Meeting
  • 18. So, again, we built it! Web Archiving Integration Layer (WAIL) • Heritrix, Wayback, etc. packaged for PC • GUI front-end allows “One-Click Preservation” • Provides means to replay WARCs 1. 2. November 12, 2013 Salt Lake City, Utah Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD. Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA 18 2013 Archive-It Partner Meeting
  • 19. So, again, we built it! Web Archiving Integration Layer (WAIL) • Heritrix, Wayback, etc. packaged for PC • GUI front-end allows “One-Click Preservation” • Provides means to replay WARCs November 12, 2013 Salt Lake City, Utah 19 2013 Archive-It Partner Meeting
  • 20. The Archive What I See Now Project November 12, 2013 Salt Lake City, Utah 20 2013 Archive-It Partner Meeting
  • 21. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 21 2013 Archive-It Partner Meeting
  • 22. Porting WARCreate to Firefox • Disjoint extension/add-on APIs – Little logic can be re-used • Problems with HTTP header capture in Chrome are trivial in Firefox – Chrome = highly asynchronous fetching • Code to save WARC to PC from browser reusable in Firefox November 12, 2013 Salt Lake City, Utah 22 2013 Archive-It Partner Meeting
  • 23. The Archive What I See Now Project: Three Goals ✓ In βeta now! 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 23 2013 Archive-It Partner Meeting
  • 24. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 24 2013 Archive-It Partner Meeting
  • 25. Uploading WARCs: An Open Question • Working with Archive-It to determine feasibility of user-provided WARCs • Consideration of data integrity • Should data be merged with A-IT crawled WARCs? – How do we account for your www.facebook.com vs. my www.facebook.com • Privacy? November 12, 2013 Salt Lake City, Utah 25 2013 Archive-It Partner Meeting
  • 26. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 26 2013 Archive-It Partner Meeting
  • 27. Sequential Archiving? • Similar to a focused crawl but URIs defined on per-site basis to be comprehensive – Akin to but generalized • Implemented into WARCreate • Utilize per-site specification to keep tools from breaking★ personal stream my tweets news feed streams followees’ tweets multimedia-photos photos photos N/A multimedia-videos videos videos N/A photo collection albums N/A N/A posts notes N/A N/A friends November 12, 2013 Salt Lake City, Utah posts global stream Discovery & Scraping: The Information Retrieval Approach - versus The Digital Libraries Approach★ wall friends circles following 27 2013 Archive-It Partner Meeting
  • 28. Online Hierarchy Definition • Only (and optionally) applied on recognized sites – scraping as fallback for establishing hierarchy • Not limited to social media – CNN.com, MSNBC.com, etc have similar hierarchies • Lives online, tools allude to and are always updated • Standardized spec* prototype is live online * M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012 November 12, 2013 Salt Lake City, Utah 28 2013 Archive-It Partner Meeting
  • 29. Summary • Firefox WARCreate in Beta – Chrome WARCreate Users Can Currently Archive What They See Now with & • Sequential Archiving Implemented in Chrome WARCreate, needs porting • Next Big Hurdle: Working with Archive-It in WARC upload logistics November 12, 2013 Salt Lake City, Utah 29 2013 Archive-It Partner Meeting
  • 30. Archive What I See Now • Download Our Archiving Tools! Web Archiving Integration Layer (WAIL) http://matkelly.com/WAIL One-Click Preservation Heritrix, Wayback and Others On Your PC! WARCreate for Chrome http://WARCreate.com Create WARC files form any web page from your browser • Share Your Use Cases for Capturing the Unpreserved and the Unpreservable • Help Us Improve Our Tools, Give Feedback! http://bit.ly/wc-wail November 12, 2013 Salt Lake City, Utah version in beta Available Soon! 30 2013 Archive-It Partner Meeting