SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Evaluating the SiteStory
Transactional Web Archive
With the ApacheBench Tool
Justin F. Brunelle
Michael L. Nelson
Lyudmila Balakireva
Robert Sanderson
Herbert Van de Sompel
TPDL 2013, Sept 24 2013
September 7, 2011
September 12, 2011
September 16, 2011
Problem
• People view ABC News all the time
• No mementos for “all the time”
– Stories missing or incomplete
• Possible solutions:
– archive.org: crawl more often (how often is often
enough?)
– abcnews.com: install a Transactional Web Archive
Agenda
Traditional
Archiving
SiteStory
Experiment
Design
Benchmark
Results
Conclusions
7
Traditional Web Archiving
• Active crawling
• Heritrix
Issues with Traditional Web Archiving
• Request can be rejected (robots.txt, user-
agent, IP)
• Can be deceived (geo-location, user-
agent)
• Can be trapped (crawl my calendar!)
• Resource-intense (bandwidth)
• Recrawl vs. change-rate
Missed Updates
seen by humans: C1, C3, C4; archived by crawler: C1, C3
Agenda
Traditional
Archiving
SiteStory
Experiment
Design
Benchmark
Results
Conclusions
11
for each HTTP response,
the Apache web server
sends (i.e., HTTP PUT)
the same entity to SiteStory
web server
Now we have them all
seen by humans: C1, C3, C4; archived by transactional archive: C1, C3, C4
Agenda
Traditional
Archiving
SiteStory
Experiment
Design
Benchmark
Results
Conclusions
14
Benchmark with ab
• ApacheBench: ab
– -n [Number of Connections]
– -c [Concurrency]
• Benchmarked with SiteStory on & off
Benchmark with wget
ws-dl-03.cs.odu.edu
x99
,…,,
megalodon.lanl.gov
TWA@AWS
Agenda
Traditional
Archiving
SiteStory
Experiment
Design
Benchmark
Results
Conclusions
17
Testing LAN with ab
Testing LAN with ab
Benchmark with wget (unburdened)
Benchmark with wget (unburdened)
Benchmark with wget (burdened)
Benchmark with wget (burdened)
Results
• Negligible difference SiteStory On vs Off
• Limited to local LAN
• Performance over WAN?
WAN Testbed Performance
Agenda
Traditional
Archiving
SiteStory
Experiment
Design
Benchmark
Results
Conclusions
26
Results
• Distributed: Higher variance
• Increased delay due to network
• On vs. Off Comparison still comparable
Conclusions
• Small performance
difference
• No gaps in coverage
-- archives every
HTTP response sent
(optimizations possible)
http://mementoweb.github.io/SiteStory/
get started now by
using this piece
SiteStory Testbed
• Use our SiteStory web archive on your server!
1. Install and configure mod_sitestory on your Apache
Server
2. Send an email containing:
1. Your contact info
2. Web server IP address
3. Web server domain name
3. Happy Sitestory’ing!
• mailto: SiteStory-Testbed@googlegroups.com
Backups
Sample ab output
$ ab -n 10 -c 2 "http://www.cs.odu.edu/"
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
…
Server Software: Apache/2.2.17
Server Hostname: www.cs.odu.edu
Server Port: 80
Document Path: /
Document Length: 62289 bytes
Concurrency Level: 2
Time taken for tests: 0.213 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 624810 bytes
HTML transferred: 622890 bytes
Requests per second: 47.01 [#/sec] (mean)
Time per request: 42.540 [ms] (mean)
Time per request: 21.270 [ms] (mean, across all
concurrent requests)
Transfer rate: 2868.66 [Kbytes/sec] received
…
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.0 1 1
Processing: 27 41 10.8 45 62
Waiting: 3 3 0.4 4 4
Total: 27 41 10.8 45 63
Percentage of the requests served within a
certain time (ms)
50% 45
66% 46
75% 46
80% 46
90% 63
95% 63
98% 63
99% 63
100% 63 (longest request)

Weitere ähnliche Inhalte

Was ist angesagt?

Browserscope oscon 2011
Browserscope oscon 2011Browserscope oscon 2011
Browserscope oscon 2011
lsimon
 

Was ist angesagt? (20)

Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Something about links
Something about linksSomething about links
Something about links
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
HTTP2 is Here!
HTTP2 is Here!HTTP2 is Here!
HTTP2 is Here!
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawling
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)
 
Apachecon 2011 stanbol_ogrisel
Apachecon 2011 stanbol_ogriselApachecon 2011 stanbol_ogrisel
Apachecon 2011 stanbol_ogrisel
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010
 
Building a scalable online backup system in python
Building a scalable online backup system in pythonBuilding a scalable online backup system in python
Building a scalable online backup system in python
 
How to Start Performance Testing?
How to Start Performance Testing?How to Start Performance Testing?
How to Start Performance Testing?
 
Browserscope oscon 2011
Browserscope oscon 2011Browserscope oscon 2011
Browserscope oscon 2011
 

Andere mochten auch

Andere mochten auch (17)

When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Ähnlich wie Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
vmaximiuk
 
Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forward
NOLOH LLC.
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
Mat Kelly
 
05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver
tarensi
 

Ähnlich wie Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool (20)

Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
 
Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forward
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Best practices para publicar un WebSite con SharePoint Server 2010
Best practices para publicar un WebSite con SharePoint Server 2010Best practices para publicar un WebSite con SharePoint Server 2010
Best practices para publicar un WebSite con SharePoint Server 2010
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 
Badneedles
BadneedlesBadneedles
Badneedles
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
WattDepot 2.0 presentation
WattDepot 2.0 presentationWattDepot 2.0 presentation
WattDepot 2.0 presentation
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress Hosting
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud Platform
 
05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver05.m3 cms list-ofwebserver
05.m3 cms list-ofwebserver
 
JUDCon 2013- JBoss Data Grid and WebSockets: Delivering Real Time Push at Scale
JUDCon 2013- JBoss Data Grid and WebSockets: Delivering Real Time Push at ScaleJUDCon 2013- JBoss Data Grid and WebSockets: Delivering Real Time Push at Scale
JUDCon 2013- JBoss Data Grid and WebSockets: Delivering Real Time Push at Scale
 
ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoin...
ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoin...ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoin...
ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoin...
 
Web-Socket
Web-SocketWeb-Socket
Web-Socket
 

Mehr von Michael Nelson

Mehr von Michael Nelson (9)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Hinweis der Redaktion

  1. The Internet Archive began aggressively crawling ABC News in July of 2011. But before that, there are large gaps in mementos captured. We will take Jan 12, 2011 as our first observation date. There are three days without corresponding mementos before we arrive at our second observation date of Jan 16, 2011.
  2. Updates are the blue dotes. We miss update C2 and C4. Does it matter that we miss C2? (Tree falls in the woods…). It definitely matters if we miss C4 with the crawler.
  3. Describe sitestory here: archives on servers based on http gets, stored in a memento-complient archive, etc. etc.
  4. Updates are the blue dotes. With SiteStory, we get all the updates except C2
  5. ApacheBench is a tool to benchmark apache servers. Takes number of connections and concurrency of those connections as parameters. We benchmarked an apache server with sitestory both on and off. This measured the server’s ability to deliver content over a network.
  6. For the wget tests, we created 100 resources with 0-99 embedded images. These were PHP pages that also included the current datetime. We executed wget –p for each of them and timed the total round-trip time. We also executed this with sitestory on and off. This measured the performance of the server when a resource was constantly changing and also has many embedded resources.
  7. We set up an experiment on a local LAN between two networked machines.
  8. The server’s ability to return content is not impacted be SiteStory running based on the ab tests.
  9. The wget tests show that (as expected) more embedded resources creates a longer round-trip time. SiteStory runs slower with the increased files, and worsens as compared to when sitestory is off as more embedded resources are present. In these graphs, the middle line is the average over about 100 tests, and the filled in area is the standard deviation. However, we were using an unburdened server.The dip in the beginning of the graph can be attributed to a cold start – the difference is in the order of milliseconds.
  10. We burdened the server by simulating user access to pages hosted by the server. The resulting statistics show that the burden creates higher variance, as expected, but the sitestory
  11. The testbed has higher variance and poorer performance because of the longer network delays. (between ODU and LANL)
  12. Describe sitestory here: archives on servers based on http gets, stored in a memento-complient archive, etc. etc.