SlideShare ist ein Scribd-Unternehmen logo
1 von 18
More Archives, More Better
Michael L. Nelson
Old Dominion University
ws-dl.blogspot.com
IIPC General Assembly
Ljubljana, Slovenia
April 23, 2013
Three Easy Pieces
‱ "An Evaluation of Caching Policies for Memento TimeMaps"
– 4000 aggregated TimeMaps downloaded daily for 3 months
– 20% of the time the TimeMaps shrink
‱ "How Much of The Web Is Archived?"
– 4000 URIs, 9 archives, 3 search engines
– 16% -- 79% of the web archived
‱ "Profiling Web Archive Coverage for Top-Level Domain and
Content Language"
– 153329 URIs, 12 archives
– querying only top 3 archives gives a complete TimeMap
84% of the time (52% of the time even if you exclude the IA)
An Evaluation of Caching Policies
for Memento TimeMaps
JCDL 2013
Justin Brunelle, Michael L. Nelson
Mean # Mementos per TimeMap per Day
download the same 4000 TimeMaps everyday
ODU OS
upgrade
IA API changes
ODU power outage
Frequency of TimeMap changes over 92 days
Optimal TimeMap Cache TTL=15 days
minimizes queries to archives, minimizes "lost" mementos*days,
will only cache new TimeMap if it is "bigger"
question: can we do this adaptively?
How Much of The Web Is Archived?
JCDL 2011
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen,
Michele C. Weigle, Michael L. Nelson
Public Archives, ca. late 2010 / early 2011
Three categories of archives
‱ Internet ArchiveInternet Archive (classic interface)(classic interface)
‱ Search engineSearch engine
‱ Other archivesOther archives
UK US
1000 URIs, ordered by first observation date
See also: http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html
see also: http://ws-dl.blogspot.com/2013/04/2013-04-19-carbon-dating-web.html
How Much of the Web is Archived?
It depends on which web

Including
SE cache
Excluding
SE Cache
90% 79%
97% 68%
35% 16%
88% 19%
Changes since 2011: no more free SE APIs; greatly reduced IA quarantine period
Profiling Web Archive Coverage for
Top-Level Domain and Content Language
(submitted for publication)
Ahmed AlSum, Michele C. Weigle, Michael L. Nelson,
Herbert Van de Sompel
12 (IIPC) Archives
153329 URIs from DMOZ, archive fulltext search, IA logs, Memento aggregator logs
Temporal Spread
Rate of Acquiring URI-Rs, URI-Ms
TLD / Archive (DMOZ TLD sample; others similar)
Archive / TLD Heatmap
Using Only Top-k Archives for URI Lookup
Yields Good Results
Even when there are 100s of archives, we only need to talk to a few.

Weitere Àhnliche Inhalte

Was ist angesagt?

Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
Discovering Library Collections
Discovering Library CollectionsDiscovering Library Collections
Discovering Library Collections
Rosemie Callewaert
 
Rivers vs. Ponds
Rivers vs. PondsRivers vs. Ponds
Rivers vs. Ponds
Bert Zeeman
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 

Was ist angesagt? (20)

Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?
 
A Future Role for the Library Discovery Interface
A Future Role for the Library Discovery InterfaceA Future Role for the Library Discovery Interface
A Future Role for the Library Discovery Interface
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web Archives
 
Why Blog
Why Blog Why Blog
Why Blog
 
Discovering Library Collections
Discovering Library CollectionsDiscovering Library Collections
Discovering Library Collections
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Rivers vs. Ponds
Rivers vs. PondsRivers vs. Ponds
Rivers vs. Ponds
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked Data
 
American Libraries Live—Libraries and COVID-19: Providing Virtual Services, L...
American Libraries Live—Libraries and COVID-19: Providing Virtual Services, L...American Libraries Live—Libraries and COVID-19: Providing Virtual Services, L...
American Libraries Live—Libraries and COVID-19: Providing Virtual Services, L...
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 

Andere mochten auch

Andere mochten auch (17)

ïżŒUsing Web Archives to Enrich the Live Web Experience Through Storytelling
ïżŒUsing Web Archives to Enrich the Live Web Experience Through StorytellingïżŒUsing Web Archives to Enrich the Live Web Experience Through Storytelling
ïżŒUsing Web Archives to Enrich the Live Web Experience Through Storytelling
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Ähnlich wie More Archives, More Better

Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
IIPC GA 2014 Solr
IIPC GA 2014 SolrIIPC GA 2014 Solr
IIPC GA 2014 Solr
Andy Jackson
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
Lewis Crawford
 
Using Drupal In Libraries
Using Drupal In LibrariesUsing Drupal In Libraries
Using Drupal In Libraries
msfinney
 

Ähnlich wie More Archives, More Better (20)

Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
IIPC GA 2014 Solr
IIPC GA 2014 SolrIIPC GA 2014 Solr
IIPC GA 2014 Solr
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and Potential
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
Professor Dame Wendy Hall - Saving the Web
Professor Dame Wendy Hall - Saving the WebProfessor Dame Wendy Hall - Saving the Web
Professor Dame Wendy Hall - Saving the Web
 
Dm2 e ontotext-nov2012
Dm2 e ontotext-nov2012Dm2 e ontotext-nov2012
Dm2 e ontotext-nov2012
 
Mariana Damova - Ontotext
Mariana Damova - OntotextMariana Damova - Ontotext
Mariana Damova - Ontotext
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
High and Lows of Library Linked Data
High and Lows of Library Linked DataHigh and Lows of Library Linked Data
High and Lows of Library Linked Data
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Database
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Database
 
Tpdl 2016 topic_coverage_indf-bf_crawls
Tpdl 2016 topic_coverage_indf-bf_crawlsTpdl 2016 topic_coverage_indf-bf_crawls
Tpdl 2016 topic_coverage_indf-bf_crawls
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Using Drupal In Libraries
Using Drupal In LibrariesUsing Drupal In Libraries
Using Drupal In Libraries
 
Presentation on Koha
Presentation on KohaPresentation on Koha
Presentation on Koha
 

Mehr von Michael Nelson

Mehr von Michael Nelson (9)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

KĂŒrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

KĂŒrzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

More Archives, More Better

  • 1. More Archives, More Better Michael L. Nelson Old Dominion University ws-dl.blogspot.com IIPC General Assembly Ljubljana, Slovenia April 23, 2013
  • 2. Three Easy Pieces ‱ "An Evaluation of Caching Policies for Memento TimeMaps" – 4000 aggregated TimeMaps downloaded daily for 3 months – 20% of the time the TimeMaps shrink ‱ "How Much of The Web Is Archived?" – 4000 URIs, 9 archives, 3 search engines – 16% -- 79% of the web archived ‱ "Profiling Web Archive Coverage for Top-Level Domain and Content Language" – 153329 URIs, 12 archives – querying only top 3 archives gives a complete TimeMap 84% of the time (52% of the time even if you exclude the IA)
  • 3. An Evaluation of Caching Policies for Memento TimeMaps JCDL 2013 Justin Brunelle, Michael L. Nelson
  • 4. Mean # Mementos per TimeMap per Day download the same 4000 TimeMaps everyday ODU OS upgrade IA API changes ODU power outage
  • 5. Frequency of TimeMap changes over 92 days
  • 6. Optimal TimeMap Cache TTL=15 days minimizes queries to archives, minimizes "lost" mementos*days, will only cache new TimeMap if it is "bigger" question: can we do this adaptively?
  • 7. How Much of The Web Is Archived? JCDL 2011 Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson
  • 8. Public Archives, ca. late 2010 / early 2011 Three categories of archives ‱ Internet ArchiveInternet Archive (classic interface)(classic interface) ‱ Search engineSearch engine ‱ Other archivesOther archives UK US
  • 9. 1000 URIs, ordered by first observation date See also: http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html
  • 11. How Much of the Web is Archived? It depends on which web
 Including SE cache Excluding SE Cache 90% 79% 97% 68% 35% 16% 88% 19% Changes since 2011: no more free SE APIs; greatly reduced IA quarantine period
  • 12. Profiling Web Archive Coverage for Top-Level Domain and Content Language (submitted for publication) Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, Herbert Van de Sompel
  • 13. 12 (IIPC) Archives 153329 URIs from DMOZ, archive fulltext search, IA logs, Memento aggregator logs
  • 15. Rate of Acquiring URI-Rs, URI-Ms
  • 16. TLD / Archive (DMOZ TLD sample; others similar)
  • 17. Archive / TLD Heatmap
  • 18. Using Only Top-k Archives for URI Lookup Yields Good Results Even when there are 100s of archives, we only need to talk to a few.