SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Filling in the Blanks:
Capturing Dynamically
  Generated Content
                   Justin F. Brunelle
           Old Dominion University
      Advisor: Dr. Michael L. Nelson

      JCDL ‘12 Doctoral Consortium
                        06/10/2012


                                        1
2
3
Problem!
• Which exists in the archive?
  – Probably the unauthenticated version, right?
• What factors created “my” representation?
  – Can I archive “my” representation?
• Am I seeing undead resources?
  – Mix of live and archived content?
• How can we capture, share, and
  archive user experiences?

                                                   4
Which version is in the Internet
          Archive?




                                   5
Which version is in WebCite?




                               6
Craigslist.org
$ curl -I -L http://www.craigslist.org
HTTP/1.1 302 Found
Set-Cookie: …
Location: http://geo.craigslist.org/

HTTP/1.1 302 Found
Content-Type: text/html; charset=iso-8859-1
Connection: close
Location: http://norfolk.craigslist.org
Date: Thu, 31 May 2012 23:26:27 GMT
Set-Cookie: …
Server: Apache

HTTP/1.1 200 OK
Connection: close
Cache-Control: max-age=3600, public
Last-Modified: Thu, 31 May 2012 23:13:46 GMT
Set-Cookie: …
Transfer-Encoding: chunked
Date: Thu, 31 May 2012 23:13:46 GMT
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1;
X-Frame-Options: Allow-From https://forums.craigslist.org
Server: Apache
                                                            7
Expires: Fri, 01 Jun 2012 00:13:46 GMT
Live Resource
Accessed from Norfolk




                        8
Archived Resource
      Submitted from Norfolk
• Submitted to WebCite from Norfolk




                                      9
Live Norfolk Interactive Mapper




                                                           10
http://gisapp2.norfolk.gov/interactive_mapper/viewer.htm
Archived Norfolk Interactive
                   Mapper




                                                                                          11
http://web.archive.org/web/20100924020604/http://gisapp2.norfolk.gov/interactive_mapper/viewer.htm
Web 2.0
• Crawlers aren’t enough
• Relies on interaction/personalization
• Users may want to archive personal
  content
• How do we capture user experiences?
  – Justin’s vs. Dr. Nelson’s experience? Both?
• What about sharing browsing sessions?

                                                  12
Things are better
          (but really worse)
• Better UI, worse archiving
• HTML5
• JavaScript
  – document.write
• Cookies
• User Interaction
• GeoIP

                               13
Traditional Representation
   generation
                               Dereference




     URI




                               Resource
                  Identifies



                                             Represents

                                                          Representation

From W3C Web Architecture                                           14
Representation through
   content negotiation
                      Dereference              Negotiate




     URI




                                    Resource
                  Identifies



                                                Represents

                                                             Representation

From W3C Web Architecture                                              15
Web 2.0 Representation
Generation
                   Dereference



                                           User
URI
                                           Interaction

                                 Client-
                                 side
                   Resource      script
      Identifies



                                 Represents

                                                     Representation

                                                               16
Prior Work
• Capture for Debugging and Security
  – Mickens, 2010; Livshits, 2007, 2009, 2010; Dhawan, 2009
• Crawlers
  – Mesbah, 2008; Duda, 2008; Lowet, 2009
• Caching dynamic content
  – Benson, 2010; Karri, 2009; Boulos, 2010; Periyapatna,
    2009; Sivasubramanian, 2007
• Walden’s paths
  – http://www.csdl.tamu.edu/walden/
• IIPC Workshop 2012: Archiving the Future Web
  – http://blog.dshr.org/2012/05/harvesting-and-preserving-future-web.html
                                                                         17
Two Current Solutions
• Browser-based crawling
  – Problematic at scale, misses post-render content, no
    session spanning, misses “personal” browsing
  – IA
  – To be released – Heritrix 3.X
• Transactional Web Archiving
  – Impact/depth is unknown, client-side changes are
    missed, must have server/content author buy-in
  – LANL
  – http://theresourcedepot.org/
                                                           18
What can Justin do about it?
• How can we capture THE user
  experience?
  – How much user-shared content is archivable?
  – What defines a dynamic representation?
     • Infinitely Changing?
  – How much dynamic content are archives missing?
  – What tools are required to capture the
    representation?
     • Browser Add-on?
  – How much will users contribute to the archives?
• Is this even possible?                              19
Characteristics of a Potential
              Solution
• Browser Add-on
• Crowd sourced
  – User contributions to archives
• Opt-in representation archiving/sharing
• Capture client-side DOM
  – JS, HTML, representation, etc.
• Capture client-side events and resulting
  DOM
  – Includes Ajax and post-render data
• Package and submit to archives             20
21
Dissertation Plan
        BEGIN
        Background Research
        Coursework
        Quals
          Prevalence of                           Current
          Unarchivable Resources                   State

        Define test datasets (set of dynamic and static test pages)
        Define factors/equations of dynamic representations – What
        dynamic content can (and cannot) be captured for archiving?
        Construction of software solution -- VCR for the Web: Record,
        Rewind, Replay
        Analysis of improved capture -- Client-side Archiving: Client-side
        (human assisted) Capture vs. Traditional Crawlers vs. Headless clients
        Explore how personalized archives can be combined with public web
        archives
PhD Defense
Current Work:
   How much can we archive?
• Sample from Bit.ly URIs from Twitter
• Load page in each environment:
  – Live
  – 3rd Party Archived
     • Submit and load from WebCitation
  – Locally stored
     • wget –k -p and load from local drive
  – Local only
     • Load from local drive – no Internet access
                                                    23
Live
http://dctheatrescene.com/2009/06/11/toxic-avengers-john-rando/




                                                                  24
Archived (WebCite)
http://www.webcitation.org/685EYfYEK




                                       25
Locally Stored
http://localhost/dctheatrescene.com/2009/06/11/toxic-avengers-john-rando/




                                                                            26
Local Only
          (No Internet)
    http://localhost/dctheatrescene.com/2009/06/11/toxic-avengers-john-rando/



• Missing:
  12/78 without internet
•      dctheatrescene.com/…/uperfish.args.js?e83a2c
•      dctheatrescene.com/…/css/datatables.css?
       ver=1.9.3

• Small files, bit impact




                                                                                27
Thought Experiment




                     28
Double Click 4x




                  29
Click and drag to left




                         30
Submit to Archive




                    31
Future Research Questions
• What dynamism can (and cannot) be
  captured for archiving?
• Client-side Archiving: Client-side Capture vs.
  Traditional Crawlers
• Client-side contributions to Web Archives:
  Archiving User Experiences




                                                   32
Conclusion
• Is dynamic content
  archivable?
• How much are we
  missing?
• Can you archive
  your experience?
    • For the betterment
      of archives
    • For personal
      capture
                               33
Backups




          34
References
•   J. Mickens, J. Elson, and J. Howell. Mugshot: deterministic capture and replay for
    JavaScript applications. In Proceedings of the 7th USENIX conference on Networked
    systems design and implementation, NSDI'10, pages 11-11, Berkeley, CA, USA, 2010.
    USENIX Association.
•   K.Vikram, A. Prateek, and B. Livshits. Ripley: Automatically securing web 2.0 applications
    through replicated execution. In Proceedings of the Conference on Computer and
    Communications Security, November 2009.
•   E. Kiciman and B. Livshits. Ajaxscope: A platform for remotely monitoring the client-side
    behavior of web 2.0 applications. In the 21st ACM Symposium on Operating Systems
    Principles (SOSP'07), SOSP '07, 2007.
•   B. Livshits and S. Guarnieri. Gulfstream: Incremental static analysis for streaming
    JavaScript applications. Technical Report MSR-TR-2010-4, Microsoft, January 2010.
•   M. Dhawan and V. Ganapathy. Analyzing information flow in JavaScript-based browser
    extensions. Annual Computer Security Applications Conference, pages 382 - 391, 2009.
•   A. Mesbah, E. Bozdag, and A. van Deursen. Crawling Ajax by inferring user interface state
    changes. In Web Engineering, 2008. ICWE '08. Eighth International Conference on, pages
    122-134, July 2008.
•   C. Duda, G. Frey, D. Kossmann, and C. Zhou. AjaxSearch: crawling, indexing and
    searching Web 2.0 applications. Proc. VLDB Endow., 1:1440-1443, August 2008.              35
•   D. Lowet and D. Goergen. Co-browsing dynamic web pages. In WWW, pages 941-950,
References
•   S. Chakrabarti, S. Srivastava, M. Subramanyam, and M. Tiwari. Memex: A browsing
    assistant for collaborative archiving and mining of surf trails. In Proceedings of the 26th
    VLDB Conference, 26th VLDB, 2000.
•   R. Karri. Client-side page element web-caching, 2009.
•   E. Benson, A. M. 0002, D. R. Karger, and S. Madden. Sync kit: a persistent client-side
    database caching toolkit for data intensive websites. In WWW, pages 121{130, 2010.
•   M. N. K. Boulos, J. Gong, P. Yue, and J. Y. Warren. Web gis in practice viii: Html5 and the
    canvas element for interactive online mapping. International journal of health geographics,
    March 2010.
•   S. Periyapatna. Total recall for Ajax applications firefox extension, 2009.
•   S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. Analysis of caching and
    replication strategies for web applications. IEEE Internet Computing, 11:60-66, 2007.




                                                                                             36
Web Archives
• “Web archiving is the process of
  collecting portions of the World Wide
  Web and ensuring the collection
  is preserved … for future researchers,
  historians, and the public. “
  -- http://
  en.wikipedia.org/wiki/Web_archiving


                                           37
What does this have to do with
                 DLs?
•   Improved coverage
•   NARA regulation
•   Improved “memory”
•   Gathers missing User Experiences
    – Or at least an adequate sub-sample




                                           38
Envisioned Solution

   SELECT PREVIOUS REPRESENTATION TO ARCHIVE:




User Event:         User Event:      User Event:
  Text Entered        Double Click     Text Entered
                                       Button Push

Ajax Event:         Ajax Event:      Ajax Event:
   XMLResponse         XMLResponse      XMLResponse   39
Google Maps




              40
Current Web Applications




                           41
Web Applications with Session
          Archiver




                                42

Weitere ähnliche Inhalte

Andere mochten auch

Library 2.0 and User-Generated Content
Library 2.0 and User-Generated ContentLibrary 2.0 and User-Generated Content
Library 2.0 and User-Generated ContentPatrick Danowski
 
Art discovery group catalogue: Usage, content and new horizons
Art discovery group catalogue:  Usage, content and new horizonsArt discovery group catalogue:  Usage, content and new horizons
Art discovery group catalogue: Usage, content and new horizonsJanifer Gatenby
 
User experience of the next-gen catalogue
User experience of the next-gen catalogueUser experience of the next-gen catalogue
User experience of the next-gen catalogueAndrew Preater
 
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...Jane Cowell
 
Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...Tanja Merčun
 
Collaborative activities in Australian research libraries by Janine Schmidt F...
Collaborative activities in Australian research libraries by Janine Schmidt F...Collaborative activities in Australian research libraries by Janine Schmidt F...
Collaborative activities in Australian research libraries by Janine Schmidt F...Jane Cowell
 
Web based catalogue.pptx22
Web based catalogue.pptx22Web based catalogue.pptx22
Web based catalogue.pptx22Gerald Louw
 
From Catalogue 2.0 to the digital humanities: exploring the future of librari...
From Catalogue 2.0 to the digital humanities: exploring the future of librari...From Catalogue 2.0 to the digital humanities: exploring the future of librari...
From Catalogue 2.0 to the digital humanities: exploring the future of librari...Sally Chambers
 
User-Generated Content and Social Discovery in the Academic Library Catalogu...
User-Generated Content and Social Discovery in the Academic Library Catalogu...User-Generated Content and Social Discovery in the Academic Library Catalogu...
User-Generated Content and Social Discovery in the Academic Library Catalogu...Steve Toub
 
Introducing Social Catalogues and Social Software into Public Libraries
Introducing Social Catalogues and Social Software into Public LibrariesIntroducing Social Catalogues and Social Software into Public Libraries
Introducing Social Catalogues and Social Software into Public LibrariesLaurel Tarulli
 
Social Catalogues: Enriching Content that Enhances RA Services
Social Catalogues: Enriching Content that Enhances RA ServicesSocial Catalogues: Enriching Content that Enhances RA Services
Social Catalogues: Enriching Content that Enhances RA ServicesLaurel Tarulli
 
OLA 2014: A Future of Freedom and Innovation in Library Catalogues
OLA 2014: A Future of  Freedom and  Innovation in Library CataloguesOLA 2014: A Future of  Freedom and  Innovation in Library Catalogues
OLA 2014: A Future of Freedom and Innovation in Library Cataloguesjocelyneandrews
 
Next Generataion Discovery
Next Generataion DiscoveryNext Generataion Discovery
Next Generataion DiscoveryJenny Emanuel
 
Rethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library userRethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library userSally Chambers
 
Cataloging 2.0: is there a future for the library catalogue?
Cataloging 2.0: is there a future for the library catalogue?Cataloging 2.0: is there a future for the library catalogue?
Cataloging 2.0: is there a future for the library catalogue?Sally Chambers
 
Web 2.0, web searching and web based catalogue
Web 2.0, web searching and web based catalogueWeb 2.0, web searching and web based catalogue
Web 2.0, web searching and web based catalogueGerald Louw
 
Next generation online catalogs
Next generation online catalogsNext generation online catalogs
Next generation online catalogsafraser246
 
Best Practices in Catalog Strategies
Best Practices in Catalog StrategiesBest Practices in Catalog Strategies
Best Practices in Catalog StrategiesSAP Ariba
 
Theory of Library Cataloguing
Theory of Library Cataloguing Theory of Library Cataloguing
Theory of Library Cataloguing Anupama Saini
 

Andere mochten auch (20)

Library 2.0 and User-Generated Content
Library 2.0 and User-Generated ContentLibrary 2.0 and User-Generated Content
Library 2.0 and User-Generated Content
 
Art discovery group catalogue: Usage, content and new horizons
Art discovery group catalogue:  Usage, content and new horizonsArt discovery group catalogue:  Usage, content and new horizons
Art discovery group catalogue: Usage, content and new horizons
 
User experience of the next-gen catalogue
User experience of the next-gen catalogueUser experience of the next-gen catalogue
User experience of the next-gen catalogue
 
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...
Libraries as Learning Institutions - An Informal History by Dr Ian McShane, S...
 
Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...
 
Collaborative activities in Australian research libraries by Janine Schmidt F...
Collaborative activities in Australian research libraries by Janine Schmidt F...Collaborative activities in Australian research libraries by Janine Schmidt F...
Collaborative activities in Australian research libraries by Janine Schmidt F...
 
Web based catalogue.pptx22
Web based catalogue.pptx22Web based catalogue.pptx22
Web based catalogue.pptx22
 
From Catalogue 2.0 to the digital humanities: exploring the future of librari...
From Catalogue 2.0 to the digital humanities: exploring the future of librari...From Catalogue 2.0 to the digital humanities: exploring the future of librari...
From Catalogue 2.0 to the digital humanities: exploring the future of librari...
 
User-Generated Content and Social Discovery in the Academic Library Catalogu...
User-Generated Content and Social Discovery in the Academic Library Catalogu...User-Generated Content and Social Discovery in the Academic Library Catalogu...
User-Generated Content and Social Discovery in the Academic Library Catalogu...
 
Introducing Social Catalogues and Social Software into Public Libraries
Introducing Social Catalogues and Social Software into Public LibrariesIntroducing Social Catalogues and Social Software into Public Libraries
Introducing Social Catalogues and Social Software into Public Libraries
 
Social Catalogues: Enriching Content that Enhances RA Services
Social Catalogues: Enriching Content that Enhances RA ServicesSocial Catalogues: Enriching Content that Enhances RA Services
Social Catalogues: Enriching Content that Enhances RA Services
 
OLA 2014: A Future of Freedom and Innovation in Library Catalogues
OLA 2014: A Future of  Freedom and  Innovation in Library CataloguesOLA 2014: A Future of  Freedom and  Innovation in Library Catalogues
OLA 2014: A Future of Freedom and Innovation in Library Catalogues
 
Next Generataion Discovery
Next Generataion DiscoveryNext Generataion Discovery
Next Generataion Discovery
 
Rethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library userRethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library user
 
Cataloging 2.0: is there a future for the library catalogue?
Cataloging 2.0: is there a future for the library catalogue?Cataloging 2.0: is there a future for the library catalogue?
Cataloging 2.0: is there a future for the library catalogue?
 
Web 2.0, web searching and web based catalogue
Web 2.0, web searching and web based catalogueWeb 2.0, web searching and web based catalogue
Web 2.0, web searching and web based catalogue
 
Next generation online catalogs
Next generation online catalogsNext generation online catalogs
Next generation online catalogs
 
Library cataloging
Library catalogingLibrary cataloging
Library cataloging
 
Best Practices in Catalog Strategies
Best Practices in Catalog StrategiesBest Practices in Catalog Strategies
Best Practices in Catalog Strategies
 
Theory of Library Cataloguing
Theory of Library Cataloguing Theory of Library Cataloguing
Theory of Library Cataloguing
 

Ähnlich wie Filling in the Blanks: Capturing Dynamically Generated Content

2022-Devnexus-StatefulMicroservices.pptx.pdf
2022-Devnexus-StatefulMicroservices.pptx.pdf2022-Devnexus-StatefulMicroservices.pptx.pdf
2022-Devnexus-StatefulMicroservices.pptx.pdfGrace Jansen
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0animove
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...The Frick Collection
 
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Justin Brunelle
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsJustin Brunelle
 
Share point saturday presentation 9 29-2012-2
Share point saturday presentation 9 29-2012-2Share point saturday presentation 9 29-2012-2
Share point saturday presentation 9 29-2012-2Derek Gusoff
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Cohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationCohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationSimon Buckingham Shum
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)AI4BD GmbH
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版Rikkyo University
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 
Front End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalFront End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalPromet Source
 
Front End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalFront End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalAndy Kucharski
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionNitin Godawat
 
Open Innovation means Open Source
Open Innovation means Open SourceOpen Innovation means Open Source
Open Innovation means Open SourceBertrand Delacretaz
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past WebMichele Weigle
 

Ähnlich wie Filling in the Blanks: Capturing Dynamically Generated Content (20)

2022-Devnexus-StatefulMicroservices.pptx.pdf
2022-Devnexus-StatefulMicroservices.pptx.pdf2022-Devnexus-StatefulMicroservices.pptx.pdf
2022-Devnexus-StatefulMicroservices.pptx.pdf
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Share point saturday presentation 9 29-2012-2
Share point saturday presentation 9 29-2012-2Share point saturday presentation 9 29-2012-2
Share point saturday presentation 9 29-2012-2
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Cohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationCohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 Argumentation
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
W3 C Intro And Beyond - Eyal Sela
W3 C Intro And Beyond - Eyal SelaW3 C Intro And Beyond - Eyal Sela
W3 C Intro And Beyond - Eyal Sela
 
Front End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalFront End page speed performance improvements for Drupal
Front End page speed performance improvements for Drupal
 
Front End page speed performance improvements for Drupal
Front End page speed performance improvements for DrupalFront End page speed performance improvements for Drupal
Front End page speed performance improvements for Drupal
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming Revolution
 
Open Innovation means Open Source
Open Innovation means Open SourceOpen Innovation means Open Source
Open Innovation means Open Source
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past Web
 

Mehr von Justin Brunelle

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...Justin Brunelle
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosJustin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacationsJustin Brunelle
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsJustin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACMJustin Brunelle
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODUJustin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODUJustin Brunelle
 

Mehr von Justin Brunelle (9)

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Filling in the Blanks: Capturing Dynamically Generated Content

  • 1. Filling in the Blanks: Capturing Dynamically Generated Content Justin F. Brunelle Old Dominion University Advisor: Dr. Michael L. Nelson JCDL ‘12 Doctoral Consortium 06/10/2012 1
  • 2. 2
  • 3. 3
  • 4. Problem! • Which exists in the archive? – Probably the unauthenticated version, right? • What factors created “my” representation? – Can I archive “my” representation? • Am I seeing undead resources? – Mix of live and archived content? • How can we capture, share, and archive user experiences? 4
  • 5. Which version is in the Internet Archive? 5
  • 6. Which version is in WebCite? 6
  • 7. Craigslist.org $ curl -I -L http://www.craigslist.org HTTP/1.1 302 Found Set-Cookie: … Location: http://geo.craigslist.org/ HTTP/1.1 302 Found Content-Type: text/html; charset=iso-8859-1 Connection: close Location: http://norfolk.craigslist.org Date: Thu, 31 May 2012 23:26:27 GMT Set-Cookie: … Server: Apache HTTP/1.1 200 OK Connection: close Cache-Control: max-age=3600, public Last-Modified: Thu, 31 May 2012 23:13:46 GMT Set-Cookie: … Transfer-Encoding: chunked Date: Thu, 31 May 2012 23:13:46 GMT Vary: Accept-Encoding Content-Type: text/html; charset=iso-8859-1; X-Frame-Options: Allow-From https://forums.craigslist.org Server: Apache 7 Expires: Fri, 01 Jun 2012 00:13:46 GMT
  • 9. Archived Resource Submitted from Norfolk • Submitted to WebCite from Norfolk 9
  • 10. Live Norfolk Interactive Mapper 10 http://gisapp2.norfolk.gov/interactive_mapper/viewer.htm
  • 11. Archived Norfolk Interactive Mapper 11 http://web.archive.org/web/20100924020604/http://gisapp2.norfolk.gov/interactive_mapper/viewer.htm
  • 12. Web 2.0 • Crawlers aren’t enough • Relies on interaction/personalization • Users may want to archive personal content • How do we capture user experiences? – Justin’s vs. Dr. Nelson’s experience? Both? • What about sharing browsing sessions? 12
  • 13. Things are better (but really worse) • Better UI, worse archiving • HTML5 • JavaScript – document.write • Cookies • User Interaction • GeoIP 13
  • 14. Traditional Representation generation Dereference URI Resource Identifies Represents Representation From W3C Web Architecture 14
  • 15. Representation through content negotiation Dereference Negotiate URI Resource Identifies Represents Representation From W3C Web Architecture 15
  • 16. Web 2.0 Representation Generation Dereference User URI Interaction Client- side Resource script Identifies Represents Representation 16
  • 17. Prior Work • Capture for Debugging and Security – Mickens, 2010; Livshits, 2007, 2009, 2010; Dhawan, 2009 • Crawlers – Mesbah, 2008; Duda, 2008; Lowet, 2009 • Caching dynamic content – Benson, 2010; Karri, 2009; Boulos, 2010; Periyapatna, 2009; Sivasubramanian, 2007 • Walden’s paths – http://www.csdl.tamu.edu/walden/ • IIPC Workshop 2012: Archiving the Future Web – http://blog.dshr.org/2012/05/harvesting-and-preserving-future-web.html 17
  • 18. Two Current Solutions • Browser-based crawling – Problematic at scale, misses post-render content, no session spanning, misses “personal” browsing – IA – To be released – Heritrix 3.X • Transactional Web Archiving – Impact/depth is unknown, client-side changes are missed, must have server/content author buy-in – LANL – http://theresourcedepot.org/ 18
  • 19. What can Justin do about it? • How can we capture THE user experience? – How much user-shared content is archivable? – What defines a dynamic representation? • Infinitely Changing? – How much dynamic content are archives missing? – What tools are required to capture the representation? • Browser Add-on? – How much will users contribute to the archives? • Is this even possible? 19
  • 20. Characteristics of a Potential Solution • Browser Add-on • Crowd sourced – User contributions to archives • Opt-in representation archiving/sharing • Capture client-side DOM – JS, HTML, representation, etc. • Capture client-side events and resulting DOM – Includes Ajax and post-render data • Package and submit to archives 20
  • 21. 21
  • 22. Dissertation Plan BEGIN Background Research Coursework Quals Prevalence of Current Unarchivable Resources State Define test datasets (set of dynamic and static test pages) Define factors/equations of dynamic representations – What dynamic content can (and cannot) be captured for archiving? Construction of software solution -- VCR for the Web: Record, Rewind, Replay Analysis of improved capture -- Client-side Archiving: Client-side (human assisted) Capture vs. Traditional Crawlers vs. Headless clients Explore how personalized archives can be combined with public web archives PhD Defense
  • 23. Current Work: How much can we archive? • Sample from Bit.ly URIs from Twitter • Load page in each environment: – Live – 3rd Party Archived • Submit and load from WebCitation – Locally stored • wget –k -p and load from local drive – Local only • Load from local drive – no Internet access 23
  • 27. Local Only (No Internet) http://localhost/dctheatrescene.com/2009/06/11/toxic-avengers-john-rando/ • Missing: 12/78 without internet • dctheatrescene.com/…/uperfish.args.js?e83a2c • dctheatrescene.com/…/css/datatables.css? ver=1.9.3 • Small files, bit impact 27
  • 30. Click and drag to left 30
  • 32. Future Research Questions • What dynamism can (and cannot) be captured for archiving? • Client-side Archiving: Client-side Capture vs. Traditional Crawlers • Client-side contributions to Web Archives: Archiving User Experiences 32
  • 33. Conclusion • Is dynamic content archivable? • How much are we missing? • Can you archive your experience? • For the betterment of archives • For personal capture 33
  • 34. Backups 34
  • 35. References • J. Mickens, J. Elson, and J. Howell. Mugshot: deterministic capture and replay for JavaScript applications. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI'10, pages 11-11, Berkeley, CA, USA, 2010. USENIX Association. • K.Vikram, A. Prateek, and B. Livshits. Ripley: Automatically securing web 2.0 applications through replicated execution. In Proceedings of the Conference on Computer and Communications Security, November 2009. • E. Kiciman and B. Livshits. Ajaxscope: A platform for remotely monitoring the client-side behavior of web 2.0 applications. In the 21st ACM Symposium on Operating Systems Principles (SOSP'07), SOSP '07, 2007. • B. Livshits and S. Guarnieri. Gulfstream: Incremental static analysis for streaming JavaScript applications. Technical Report MSR-TR-2010-4, Microsoft, January 2010. • M. Dhawan and V. Ganapathy. Analyzing information flow in JavaScript-based browser extensions. Annual Computer Security Applications Conference, pages 382 - 391, 2009. • A. Mesbah, E. Bozdag, and A. van Deursen. Crawling Ajax by inferring user interface state changes. In Web Engineering, 2008. ICWE '08. Eighth International Conference on, pages 122-134, July 2008. • C. Duda, G. Frey, D. Kossmann, and C. Zhou. AjaxSearch: crawling, indexing and searching Web 2.0 applications. Proc. VLDB Endow., 1:1440-1443, August 2008. 35 • D. Lowet and D. Goergen. Co-browsing dynamic web pages. In WWW, pages 941-950,
  • 36. References • S. Chakrabarti, S. Srivastava, M. Subramanyam, and M. Tiwari. Memex: A browsing assistant for collaborative archiving and mining of surf trails. In Proceedings of the 26th VLDB Conference, 26th VLDB, 2000. • R. Karri. Client-side page element web-caching, 2009. • E. Benson, A. M. 0002, D. R. Karger, and S. Madden. Sync kit: a persistent client-side database caching toolkit for data intensive websites. In WWW, pages 121{130, 2010. • M. N. K. Boulos, J. Gong, P. Yue, and J. Y. Warren. Web gis in practice viii: Html5 and the canvas element for interactive online mapping. International journal of health geographics, March 2010. • S. Periyapatna. Total recall for Ajax applications firefox extension, 2009. • S. Sivasubramanian, G. Pierre, M. van Steen, and G. Alonso. Analysis of caching and replication strategies for web applications. IEEE Internet Computing, 11:60-66, 2007. 36
  • 37. Web Archives • “Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved … for future researchers, historians, and the public. “ -- http:// en.wikipedia.org/wiki/Web_archiving 37
  • 38. What does this have to do with DLs? • Improved coverage • NARA regulation • Improved “memory” • Gathers missing User Experiences – Or at least an adequate sub-sample 38
  • 39. Envisioned Solution SELECT PREVIOUS REPRESENTATION TO ARCHIVE: User Event: User Event: User Event: Text Entered Double Click Text Entered Button Push Ajax Event: Ajax Event: Ajax Event: XMLResponse XMLResponse XMLResponse 39
  • 42. Web Applications with Session Archiver 42

Hinweis der Redaktion

  1. Live, local, WC, local no internet