SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Digital Preservation Research
at Old Dominion University
Justin F. Brunelle
The MITRE Corporation
Old Dominion University
(And hopefully MITRE, soon)
Why are we listening?
• Overview of the problem
• BRIEF introduction to ODU WSDL group
research
• Memento
• I’ll be skipping around, so don’t hesitate to
interrupt me
Digital Preservation
• Using the past Web
– Focus of our research
• Temporal Browsing
– Sessions in the past
• Recovering Lost Pages
– Is it really gone?
• 404s
– How to fix broken links?
1
same URI
maps to same
or very similar
content at a
later time
2
same URI
maps to
different
content at a
later time
3
different URI
maps to same
or very similar
content at the
same or at a
later time
4
the content
can not be
found at
any URI
U1
C1
U1
C1
timeA B
U1
C2
U1
C1
timeA B
U2
C1
U1
C1
U1
404
timeA B
U1
??
U1
C1
timeA B
Change on the Web
Time to Talk About Saving
Everything?
Dinner for one or two costs more than 1TB disk Wikis have popularized versioning
Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.:
http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate
http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg
http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg
Also related projects with cool URI / permalink focus:
http://www.citability.org/
http://data.gov/
http://data.gov.uk/
Fortress Model
• Get a lot of money
• Buy lots of storage
• Hire lots of people
• “Look upon my archive ye Mighty, and
despair!”
Alternate Methods
• Lazy Preservation (McCown)
– “How much preservation do I get if I do absolutely
nothing?”
• Just-In-Time Preservation (Klein)
– Wait for it to disappear, then find a “good ‘nuff”
version
• Shared Infrastructure Preservation
– Push content to sites that might preserve it
• arXiv.org, IA, WebCite…
• Server Enhanced Preservation
– Create archival-ready resources
And Soon…
• Social Preservation
– Preserving resources using 3rd
party Web Services
– Repository for OAI-ORE ReMs
– Social network feel
– Lazy-esque, server-side reconstruction
But I digress…
• Few years away…
• Preliminary research
• And now back to the prior research…
Web Infrastructure (McCown, 2007)
WayBack Machine
http://web.archive.org/web/*/http://www.thecribs.com/
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.thecribs.com/
from these we can
create time-based:
• indexes
• IDF values
• PageRank
Batch Recovery For Sites
http://warrick.cs.odu.edu/
Free limo rides for life?!
13
Reconstruction Diagram
added
20%
identical
50%
changed
33%
missing
17%
Real-Time Recovery for URIs
Synchronicity - www.cs.odu.edu/~mklein/
Memento wants to make navigating the
Web’s Past Easy
15
http://www.mementoweb.org
http://groups.google.com/group/memento-dev
What are you talking about?
• Universal Resource Identifier (URI) ~= URL
• Resource:
– <HTML>
• Representation
W3C Web Architecture: Resource –
URI - Representation
Resource
Representation
Represents
URI
Identifies
dereference
17
dereference content negotiation
W3C Web Architecture: Resource –
URI - Representation
Resource
URI
Identifies
Representation 1
Represents
Representation 2Represents
18
Resources
19
Resources have Representations
20
Resources have Representations that
Change over Time
21
Only the Current Representation is
Available from a Resource
22
Old Representations are Lost
Forever
23
Finding Archived Resources
Go to http://www.archive.org/ and search
http://cnn.com
On http://web.archive.org/web/*/http://cnn.com, select
desired datetime
24
Archived Resources
http://web.archive.org/web/20010911203610/http://www.c
nn.com/ archived resource for http://cnn.com
http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived
resource for
http://en.wikipedia.org/wiki/September_11_attacks
Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC
25
Navigating Archived Resources
http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived
resource for
http://en.wikipedia.org/wiki/September_11_attacks3
Dec 20 2001, 4:51:00 UTC
http://en.wikipedia.org/wiki/The_Pentagon
current
Pentagon
26
Current and Past Web are Not
Integrated
27
• Current and Past Web based on
same technology.
• But, going from Current to
Past Web is a matter of (manual)
discovery.
• Memento wants to make going
from Current to Past Web a
(HTTP) protocol matter.
• Memento wants to integrate
Current And Past Web.
One Memento HTTP Navigation
28
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
One Memento HTTP Navigation
30
Scenario
• cnn.com includes Link to TimeGate at Internet Archive
• URI-R on one server, URI-G & URI-M on another
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD http://cnn.com/ HTTP/1.1
Host: cnn.com
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
32
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-R
LinkG
HTTP/1.1 200 OK
Date: Thu, 21 Jan 2010 00:02:12 GMT
Server: Apache
Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate"
Content-Length: 255
Connection: close
Content-Type: text/html; charset=iso-8859-1
34
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1
Host: web.archive.org
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
36
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-G
302M, Vary, LinkR,B,M
HTTP/1.1 302 Found
Date: Thu, 21 Jan 2010 00:06:50 GMT
Server: Apache
TCN: choice
Vary: negotiate, accept-datetime
Location: http://web.archive.org/web/20010911203610/http://www.cnn.com
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>;
rel=“first- memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>;
rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Content-Length: 0
Connection: close
Content-Type: text/plain; charset=UTF-8
38
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1
Host: web.archive.org
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
40
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-M
200, Content-Datetime, LinkR,B,M
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Archive-Orig-Accept-Ranges: bytes
…
Content-Type: text/html;charset=utf-8
Content-Length: 23364
Date: Thu, 21 Jan 2010 00:09:40 GMT
Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>;
rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>;
rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Connection: close
What does it all mean?
• Cutting edge technology
• Existing Infrastructure
• Redefining Web surfing
• MAJOR “real world” implications
Closing Thoughts
Preservation not for
privileged priesthood
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/
no more hoary stories
about format obsolescence:
http://blog.dshr.org/2010/09/reinforcing-my-point.html
Don't dessicate resources;
leave them on the web
Endless metadata is not
preservation…
archiving as branded service,
not infrastructure
http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
Acknowledgements
• Slides borrowed from:
• Dr. Michael L. Nelson:
– http://www.slideshare.net/phonedude/my-point-of-view-
michael-l-nelson-web-archiving-cooperative
– http://www.slideshare.net/phonedude/review-of-web-
archiving
– http://www.slideshare.net/phonedude/memento-time-
travel-for-the-web
• Martin Klein:
– http://www.slideshare.net/phonedude/synchronicity-
justintime-discovery-of-lost-web-pages

Weitere ähnliche Inhalte

Ähnlich wie Digital Preservation - ODU

Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMichael Nelson
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSawood Alam
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Herbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebHerbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebiMinds conference
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...EDINA, University of Edinburgh
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...EDINA, University of Edinburgh
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first courseVlad Posea
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013Ahmed AlSum
 
Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Kim Ortego
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentJustin Brunelle
 

Ähnlich wie Digital Preservation - ODU (20)

Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
The Memento protocol
The Memento protocolThe Memento protocol
The Memento protocol
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Herbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebHerbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the Web
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
Memento 101
Memento 101Memento 101
Memento 101
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first course
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)
 
Web decay and Internet Archive
Web decay and Internet ArchiveWeb decay and Internet Archive
Web decay and Internet Archive
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 

Mehr von Justin Brunelle

Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Justin Brunelle
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...Justin Brunelle
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosJustin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacationsJustin Brunelle
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsJustin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACMJustin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODUJustin Brunelle
 

Mehr von Justin Brunelle (9)

Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Kürzlich hochgeladen

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Kürzlich hochgeladen (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 

Digital Preservation - ODU

  • 1. Digital Preservation Research at Old Dominion University Justin F. Brunelle The MITRE Corporation Old Dominion University (And hopefully MITRE, soon)
  • 2. Why are we listening? • Overview of the problem • BRIEF introduction to ODU WSDL group research • Memento • I’ll be skipping around, so don’t hesitate to interrupt me
  • 3. Digital Preservation • Using the past Web – Focus of our research • Temporal Browsing – Sessions in the past • Recovering Lost Pages – Is it really gone? • 404s – How to fix broken links?
  • 4. 1 same URI maps to same or very similar content at a later time 2 same URI maps to different content at a later time 3 different URI maps to same or very similar content at the same or at a later time 4 the content can not be found at any URI U1 C1 U1 C1 timeA B U1 C2 U1 C1 timeA B U2 C1 U1 C1 U1 404 timeA B U1 ?? U1 C1 timeA B Change on the Web
  • 5. Time to Talk About Saving Everything? Dinner for one or two costs more than 1TB disk Wikis have popularized versioning Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.: http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg Also related projects with cool URI / permalink focus: http://www.citability.org/ http://data.gov/ http://data.gov.uk/
  • 6. Fortress Model • Get a lot of money • Buy lots of storage • Hire lots of people • “Look upon my archive ye Mighty, and despair!”
  • 7. Alternate Methods • Lazy Preservation (McCown) – “How much preservation do I get if I do absolutely nothing?” • Just-In-Time Preservation (Klein) – Wait for it to disappear, then find a “good ‘nuff” version • Shared Infrastructure Preservation – Push content to sites that might preserve it • arXiv.org, IA, WebCite… • Server Enhanced Preservation – Create archival-ready resources
  • 8. And Soon… • Social Preservation – Preserving resources using 3rd party Web Services – Repository for OAI-ORE ReMs – Social network feel – Lazy-esque, server-side reconstruction
  • 9. But I digress… • Few years away… • Preliminary research • And now back to the prior research…
  • 12. Batch Recovery For Sites http://warrick.cs.odu.edu/ Free limo rides for life?!
  • 14. Real-Time Recovery for URIs Synchronicity - www.cs.odu.edu/~mklein/
  • 15. Memento wants to make navigating the Web’s Past Easy 15 http://www.mementoweb.org http://groups.google.com/group/memento-dev
  • 16. What are you talking about? • Universal Resource Identifier (URI) ~= URL • Resource: – <HTML> • Representation
  • 17. W3C Web Architecture: Resource – URI - Representation Resource Representation Represents URI Identifies dereference 17
  • 18. dereference content negotiation W3C Web Architecture: Resource – URI - Representation Resource URI Identifies Representation 1 Represents Representation 2Represents 18
  • 21. Resources have Representations that Change over Time 21
  • 22. Only the Current Representation is Available from a Resource 22
  • 23. Old Representations are Lost Forever 23
  • 24. Finding Archived Resources Go to http://www.archive.org/ and search http://cnn.com On http://web.archive.org/web/*/http://cnn.com, select desired datetime 24
  • 25. Archived Resources http://web.archive.org/web/20010911203610/http://www.c nn.com/ archived resource for http://cnn.com http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived resource for http://en.wikipedia.org/wiki/September_11_attacks Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC 25
  • 26. Navigating Archived Resources http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived resource for http://en.wikipedia.org/wiki/September_11_attacks3 Dec 20 2001, 4:51:00 UTC http://en.wikipedia.org/wiki/The_Pentagon current Pentagon 26
  • 27. Current and Past Web are Not Integrated 27 • Current and Past Web based on same technology. • But, going from Current to Past Web is a matter of (manual) discovery. • Memento wants to make going from Current to Past Web a (HTTP) protocol matter. • Memento wants to integrate Current And Past Web.
  • 28. One Memento HTTP Navigation 28
  • 29. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 30. One Memento HTTP Navigation 30 Scenario • cnn.com includes Link to TimeGate at Internet Archive • URI-R on one server, URI-G & URI-M on another
  • 31. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 32. Memento HTTP Flow: URI-R HEAD R, Accept-Datetime HEAD http://cnn.com/ HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 32
  • 33. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 34. Memento HTTP Flow: Success – URI-R LinkG HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1 34
  • 35. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 36. GET G, Accept-Datetime Memento HTTP Flow: URI-G GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 36
  • 37. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 38. Memento HTTP Flow: Success – URI-G 302M, Vary, LinkR,B,M HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://web.archive.org/web/20010911203610/http://www.cnn.com Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first- memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8 38
  • 39. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 40. GET M, Accept-Datetime Memento HTTP Flow: URI-M GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 40
  • 41. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 42. Memento HTTP Flow: Success – URI-M 200, Content-Datetime, LinkR,B,M HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Connection: close
  • 43. What does it all mean? • Cutting edge technology • Existing Infrastructure • Redefining Web surfing • MAJOR “real world” implications
  • 44. Closing Thoughts Preservation not for privileged priesthood http://doi.acm.org/10.1145/1592761.1592794 http://booktwo.org/notebook/wikipedia-historiography/ no more hoary stories about format obsolescence: http://blog.dshr.org/2010/09/reinforcing-my-point.html Don't dessicate resources; leave them on the web Endless metadata is not preservation… archiving as branded service, not infrastructure http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
  • 45. Acknowledgements • Slides borrowed from: • Dr. Michael L. Nelson: – http://www.slideshare.net/phonedude/my-point-of-view- michael-l-nelson-web-archiving-cooperative – http://www.slideshare.net/phonedude/review-of-web- archiving – http://www.slideshare.net/phonedude/memento-time- travel-for-the-web • Martin Klein: – http://www.slideshare.net/phonedude/synchronicity- justintime-discovery-of-lost-web-pages