SlideShare ist ein Scribd-Unternehmen logo
1 von 66
A Framework for Aggregating
Private and Public Web Archives
Mat Kelly
Old Dominion University, Norfolk, VA
Advisor: Michele C. Weigle
JCDL 2015 Doctoral Consortium
June 21, 2015
The Problem
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
2
private
archive
private
archive
other
private
archive
other
private
archive
All Archives Cannot Be Aggregated
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
3
private
archive
private
archive
other
private
archive
TimeMap
other
private
archive
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
4
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
5
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
6
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
7
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
8
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
9
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
10
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
11
t = k t = k-1≠
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
12
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
13
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
14
1 year ago 2 year ago 10 year ago
…
180 year ago
TimeMap
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
15
private
archive
Proactive Preservation
• Just-in-time WARC creation
• Personal and potentially private web archiving
• Mitigates deferral problem
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
16
Public vs. Private
Web Archiving
• Public Web Archiving
– Relies on deferred capture
– Uses WARC, Memento, etc.
– Integrates with other public web archives
• Private Web Archiving
– Same tools, less overhead, less bureaucracy
– Uses WARC, Memento, etc.
– Does not integrate
A Framework for Aggregating Private and
Public Web Archives
17
JCDL 2015 Doctoral
Consortium
Typical Web Archive Access
1. Web User Interface
2. Memento
TimeGate
TimeMap
– Accept-Datetime (content negotiation)
A Framework for Aggregating Private and
Public Web Archives
18
URI-G
TimeMap
JCDL 2015 Doctoral
Consortium
Aggregating Multiple Web Archives
• Memento Aggregator
– Temporally Sorted TimeMap combined from
multiple archives
– Allows temporal gaps in one archive to be filled in
by another
TimeMap
Archive Supplementation
• More capturesgreater temporal coverage
• Content on Deep Web
• A large chunk of the Web is not preserved
– Tools’ inability
– Inconsistency over time due to personalization
A Framework for Aggregating Private and
Public Web Archives
20
JCDL 2015 Doctoral
Consortium
Concerns in Aggregating Private
Web Archives
• Privacy
• Inconsistency of page representation
– URI is insufficient key for access
A Framework for Aggregating Private and
Public Web Archives
21
JCDL 2015 Doctoral
Consortium
• Archival integrity
– Has private archives content been manipulated?
Why Individuals Might Want
Personalized Aggregations
• Show my private web archive captures
• Concerned about exposing sensitive info to
public
– But still want to view temporally inline
• Private/Restricted Archives are becoming ever
more common
A Framework for Aggregating Private and
Public Web Archives
22
JCDL 2015 Doctoral
Consortium
Temporal Supplementation
A Framework for Aggregating Private and
Public Web Archives
23
JCDL 2015 Doctoral
Consortium
My Archives Have
What They May Have Missed
A Framework for Aggregating Private and
Public Web Archives
24
JCDL 2015 Doctoral
Consortium
The Concerns Distilled
• Access Control
– And indicators for PWA
• Preservation of Private Content
• Interoperability without privacy compromise
A Framework for Aggregating Private and
Public Web Archives
25
JCDL 2015 Doctoral
Consortium
Web Archive Usage Pattern 1:
Direct Access
A Framework for Aggregating Private and
Public Web Archives
26
OR
TimeMap
JCDL 2015 Doctoral
Consortium
Web Archive Usage Pattern 2:
Web Archive Aggregation
• Better results for a URI due to more sources
for capture
A Framework for Aggregating Private and
Public Web Archives
27
TimeMap
JCDL 2015 Doctoral
Consortium
Previous Patterns: Status Quo
• Patterns 1 and 2 are status quo
– provided by framework
• Querying web archives currently only
considers public web content
– URI for lookup
• Framework introduces 2 new entities
– Memento Meta Aggregator (MMA)
– Private Web Archive Adapter (PWAA)
A Framework for Aggregating Private and
Public Web Archives
28
JCDL 2015 Doctoral
Consortium
Memento Meta Aggregator (MMA)
• Functional superset of (MA)
• Can act as intermediary client to relay MA
results to ultimate user
• Allows just-in-time (JIT) inclusion of archives
– as specified at query time
• Set of archives aggregated can be dynamic
– e.g., Results must not include IA
A Framework for Aggregating Private and
Public Web Archives
29
JCDL 2015 Doctoral
Consortium
MY CNN CAPTURES
Aggregating My Captures
A Framework for Aggregating Private and
Public Web Archives
30
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
Various public web archives
My web archives
MY CNN CAPTURES
The Current Memento Aggregator
A Framework for Aggregating Private and
Public Web Archives
31
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
MY CNN CAPTURES
Accessing the Aggregator
A Framework for Aggregating Private and
Public Web Archives
32
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
MY CNN CAPTURES
Accessing the Aggregator
…does not include our archives
A Framework for Aggregating Private and
Public Web Archives
33
MY BANK CAPTURES
NOT AGGREGATED
NOT AGGREGATED
JCDL 2015 Doctoral
Consortium
100
30
10
140
Access via the Meta Aggregator
MY CNN CAPTURES
Pattern 3: Aggregator Relay
MY BANK CAPTURES
100
30
10
140140
MY CNN CAPTURES
Web Archive Usage Pattern 4:
Including Additional Archives in Aggregation
MY BANK CAPTURES
Access via the Meta Aggregator
…allows our archives to be included
100
30
10
15
140155
MY CNN CAPTURES
MMAs Allow Our Public Captures
to be Shared
A Framework for Aggregating Private and
Public Web Archives
36
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
15
140155
155
155
MY CNN CAPTURES
Web Archive Usage Pattern 5:
Recursive MMA Access
A Framework for Aggregating Private and
Public Web Archives
37
MY BANK CAPTURES
…
Bob’s public
CAPTURES
The organization’s
public CAPTURES 1
The organization’s
public CAPTURES 2
contains
A B C D
Contains
B C D
Contains
C D
A
B C
D
JCDL 2015 Doctoral
Consortium
10
5
15
15
20
35
35
15
50
50
New Framework Entity 1:
Memento Meta Aggregator
• Allow dynamic and JIT set of archives
• Superset can be recursively constructed
• Sets can be shared
My public captures
can be integrated
with public web archives’
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
38
Private Web Archive Adapter
(PWAA)
• Regulates access to Private Web Archives
(PWAs)
• Acts as token authorizer
• With credentials OK, relays results as if
querying the PWA directly
A Framework for Aggregating Private and
Public Web Archives
39
JCDL 2015 Doctoral
Consortium
MY CNN CAPTURES
User Establishes Access with PWA
A Framework for Aggregating Private and
Public Web Archives
40
MY BANK CAPTURES
GET TOKEN for PWA
Key: abcd1234
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
MMA Relays Request
A Framework for Aggregating Private and
Public Web Archives
41
MY BANK CAPTURES
GET TOKEN for PWA
Key: abcd1234
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
PWAA Accepts Request
Generates Reusable Token
A Framework for Aggregating Private and
Public Web Archives
42
MY BANK CAPTURES
ACCESS OK
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
User Submits Request for URI-R
with Token
A Framework for Aggregating Private and
Public Web Archives
43
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
MMA Relays Request (again)
A Framework for Aggregating Private and
Public Web Archives
44
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
PWAA Verified & Relays Request
MA Gets Mementos, per usual
A Framework for Aggregating Private and
Public Web Archives
45
MY BANK CAPTURES
Token: 4f33c64
OK
GET mementos for URI
GET mementos for URI
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
Archives Return Mementos
A Framework for Aggregating Private and
Public Web Archives
46
MY BANK CAPTURES
Token: 4f33c64 OK
Returning mementos
Return mementos
For URI
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
MY CNN CAPTURES
PWAA Relays TimeMap
A Framework for Aggregating Private and
Public Web Archives
47
MY BANK CAPTURES
TimeMap
TimeMap
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
140
10,000
10,143 140 captures
MY CNN CAPTURES
MMA Annotates and Aggregates
A Framework for Aggregating Private and
Public Web Archives
48
MY BANK CAPTURES
TimeMap
TimeMap
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
10,143
140 captures
3 captures
10,000 captures
MY CNN CAPTURES
Web Archive Usage Pattern 6:
Aggregating Public & Private Archives
A Framework for Aggregating Private and
Public Web Archives
49
MY BANK CAPTURES
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
10,143 captures
MY CNN CAPTURES
Regulated Access Can Be Shared
A Framework for Aggregating Private and
Public Web Archives
50
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
GET mementos for URI
Token: c5463b4
GET TOKEN for PWA
Key: 2265eef3
No/invalid token
returned
Access denied or
0 mementos
JCDL 2015 Doctoral
Consortium
3 captures
10,000 captures
Aggregating Multiple PWAs
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
51
MY BANK CAPTURES
Linda’s Private
Captures
Bob’s Private
Captures
GET TOKENs for PWAs
Key: abcd1234, Archive: My
Key: cab45cbf, Archive: Linda
Key: b0b01b, Archive: Bob
3 captures
5 captures
10 captures
5
3
10
Aggregating Multiple PWAs
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
52
MY BANK CAPTURES
Access OK
Token: 7790ca
Access OK
Token: b0b01b
ACCESS
DENIED
Linda’s Private
Captures
Bob’s Private
Captures
3 captures
5 captures
10 captures
5
3
10
PWAs Can Then be Aggregated
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
53
MY BANK CAPTURES
GET mementos for URI
Token: 7790ca, Archive: My
Token: null, Archive: Linda
Token: b0b01b, Archive: Bob
Linda’s Private
Captures
Bob’s Private
Captures
3 captures
5 captures
10 captures
5
3
10
3
10
ø13
Sample TimeMap
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen
to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
54
TimeMap
Access Token Included in TimeMap
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen
to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
55
MY PRIVATE FACEBOOK CAPTURES
My Public Web Archive,
Now Aggregated
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="meme
nto"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
56
MY PUBLIC FACEBOOK CAPTURES
Evaluation Plan
• How effective is the Framework?
• Scalability ramifications of additional
infrastructure?
• Is public-private tokenization most suitable
method for persistent access?
• How can a single archive be sub-divided
between private/public and access controlled?
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
57
Previous Work
Preservation and Replay
PDA 2013 - Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
JCDL 2012 - WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
Evaluating Capture
IJDL 2015 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources
IJDL 2015 - The Impact of JavaScript on Archivability
JCDL 2014 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources
JCDL 2014 - The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScript
Dlib 2013 - A Method for Identifying Personalized Representations in the Archives
TPDL 2013 - On the Change in Archivability of Websites Over Time
Archival Integration
JCDL 2015 - Mobile Mink: Merging Mobile and Desktop Archived Webs
JCDL 2014 - Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento
58
WARCreate – preserve from the browser
WAIL – private web archiving all-in-one suite
Mink – Integrate the live and archived web
SOFTWARE PRODUCTS
PUBLICATIONS
Current Work
• Other approaches of archival lookup beyond
URI
• Appropriate metadata to indicate private web
content in WARC files
• Existing integration attempts by private web
archives & individuals
A Framework for Aggregating Private and
Public Web Archives
59
JCDL 2015 Doctoral
Consortium
 Background Research
 PhD Requirements (Coursework, Qualifying Exam, etc.)
 Build preliminary framework model
 JCDL Doctoral Consortium
EXTENDED RESEARCH
• Research prevalence of private web archives
• Research access control methods in web archiving and other domains
• Investigate other access patterns and expound on those defined
• PhD Candidacy Exam describing merit of research plan
• Implement feedback received from candidacy exam committee
• Programmatically implement MMA and PWAA
CASE STUDIES (real-world application)
• Publicly Available Non-Aggregated Archives (e.g., Rhizome)
• Deep web preservation/access (bank account/Facebook feeds)
• DISSERTATION DEFENSE
Dissertation Plan
Preliminary Publication Plan
JCDL 2016 Evaluation of User Access Patterns for Private Web Archives
TPDL 2016 Methods in adding JIT Inclusion of Private Web Archives in Memento
ACM
SACMAT*
Research exploring tokenization and similar methods for archival access
establishment
iPres 2016 Research investigating URI clash & other needed identifiers for
distinguishing archived content from the “deep web” with archived
content from the public live web.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
61
* Symposium on Access Control Models and Technologies
Future Research Questions
• Can a PWAA perform content negotiation[1] on
the private-public spectrum?
• What level of security is needed?
– e.g., reporting UNAUTHORIZED vs. 0 mementos
A Framework for Aggregating Private and
Public Web Archives
62
JCDL 2015 Doctoral
Consortium
[1] RFC2295 https://www.ietf.org/rfc/rfc2295.txt
Summation
• Why?
– No means exists to integrate private and public web
archives.
• How to Evaluate?
– Does this framework fit real world needs? Scalable?
• When will I know I am done?
– Any public/private web archive* can be integrated.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
63
* -compliant
References
• D. Abrams, R. Baecker, and M. Chignell. Information Archiving with Bookmarks: Personal Web Space Construction and
Archiving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 41–48, 1998.
• A. AlSum, M. Weigle, M. Nelson, and H. Van de Sompel. Profiling Web Archive Coverage for Top-Level Domain and Content
Language. International Journal on Digital Libraries, 14(3-4):149–166, 2014.
• J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. Not All Mementos Are Created Equal: Measuring The
Impact Of Missing Resources. In Proceedings of JCDL, pages 321–330, London, England, 2014.
• J. F. Brunelle, M. Kelly, M. C. Weigle, and M. L. Nelson. The Impact of JavaScript on Archivability. International Journal on
Digital Libraries, pages 1–23, 2015.
• J. F. Brunelle and M. L. Nelson. An Evaluation of Caching Policies for Memento TimeMaps. In Proceedings of JCDL, pages
267–276, 2013.
• D. Gomes, S. Freitas, and M. J. Silva. Design and Selection Criteria for a National Web Archive. In Research and Advanced
Technology for Digital Libraries, pages 196–207. Springer, 2006.
• D. Hardt. The OAuth 2.0 Authorization Framework. IETF RFC 6749, October 2012.
• M. Jones and D. Hardt. The OAuth 2.0 Authorization Framework: Bearer Token Usage. IETF RFC 6750, October 2012.
• M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. A Method for Identifying Personalized Representations in the
Archives. D-Lib Magazine, 19(11/12), Nov/Dec 2013.
• M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. On the Change in Archivability of Websites Over Time. In Proceedings
of the International Conference on Theory and Practice of Digital Libraries (TPDL), pages 35–47, Valletta, Malta, 2013.
• M. Kelly, M. L. Nelson, and M. C. Weigle. Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Using
XAMPP. Poster and demo presented at Personal Digital Archiving, February 2013.
• M. Kelly, M. L. Nelson, and M. C. Weigle. The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and
JavaScript. In Proceedings of JCDL, pages 25–28, London, England, September 2014.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
64
References
• M. Kelly and M. C. Weigle. WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In Proceedings of
JCDL, pages 437–438, Washington, DC, June 2012.
• C. C. Marshall. Rethinking Personal Digital Archiving, Part 1. D-Lib Magazine, 14(3/4), Mar/Apr 2008.
• C. C. Marshall. Rethinking Personal Digital Archiving, Part 2. D-Lib Magazine, 14(3/4), Mar/Apr 2008.
• J. Niu. Functionalities of Web Archives. D-Lib Magazine, 18(3/4), Mar/Apr 2012.
• M. Phillips. PANDORA, Australia’s Web Archive, and the Digital Archiving System that Supports It.
http://pandora.nla.gov.au/pandas.html, 2003.
• H. C.-H. Rao, Y.-F. Chen, and M.-F. Chen. A Proxy-based Personal Web Archiving Service. SIGOPS Oper. Syst. Rev., 35(1):61–72,
Jan. 2001.
• A. Rauber, M. Kaiser, and B. Wachter. Ethical Issues in Web Archive Creation and Usage-Towards a Research Agenda. In 8th
International Web Archiving Workshop (IWAW08), 2008.
• D. Rosenthal. Re-thinking Memento Aggregation. http://blog.dshr.org/2013/03/re-thinking-memento-aggregation.html,
2013.
• T. Schwarz, M. Baker, S. Bassi, B. Baumgart, W. Flagg, C. van Ingen, K. Joste, M. Manasse, and M. Shah. Disk Failure
Investigations at the Internet Archive. In Work-in-Progess session, NASA/IEEE Conference on Mass Storage Systems and
Technologies (MSST2006), 2006.
• S. Strodl, F. Motlik, K. Stadler, and A. Rauber. Personal & Soho Archiving. In Proceedings of JCDL, pages 115–123, 2008.
• M. Thelwall and L. Vaughan. A fair history of the Web? Examining country balance in the Internet Archive. Library &
Information Science Research, 26(2):162–176, 2004.
• B. Tofel. ‘Wayback’ for Accessing Web Archives. In 7th International Web Archiving Workshop (IWAW07), 2007.
• H. Van de Sompel, M. Nelson, and R. Sanderson. HTTP Framework for Time-Based Access to Resource States – Memento.
IETF RFC 7089, December 2013.
• T. Wang, M. Srivatsa, and L. Liu. Fine-Grained Access Control of Personal Data. In Proceedings of the 17th ACM Symposium
on Access Control Models and Technologies, pages 145–156, 2012.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
65
A Framework for Aggregating
Private and Public Web Archives
Mat Kelly
Old Dominion University, Norfolk, VA
Advisor: Michele C. Weigle
JCDL 2015 Doctoral Consortium
June 21, 2015

Weitere ähnliche Inhalte

Was ist angesagt?

Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
DHO Intro to CMS for DH Workshop
DHO Intro to CMS for DH WorkshopDHO Intro to CMS for DH Workshop
DHO Intro to CMS for DH WorkshopShawn Day
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMagnus Pfeffer
 
Authority Control: Wikipedia + Wikidata
Authority Control: Wikipedia + WikidataAuthority Control: Wikipedia + Wikidata
Authority Control: Wikipedia + WikidataErika Herzog
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...Ahmed AlSum
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperabilityHerbert Van de Sompel
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014Repository Fringe
 
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital PreservationThe Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital PreservationEducopia
 
Curation and Digital Storytelling
Curation and Digital StorytellingCuration and Digital Storytelling
Curation and Digital StorytellingShawn Day
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
 
Integration of Web ProtĂŠgĂŠ into DBpedia
Integration of Web ProtĂŠgĂŠ into DBpediaIntegration of Web ProtĂŠgĂŠ into DBpedia
Integration of Web ProtĂŠgĂŠ into DBpediaRalphSchaefermeier
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
 
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...kramsey
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Hector Correa
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
 

Was ist angesagt? (20)

Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
DHO Intro to CMS for DH Workshop
DHO Intro to CMS for DH WorkshopDHO Intro to CMS for DH Workshop
DHO Intro to CMS for DH Workshop
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
 
Authority Control: Wikipedia + Wikidata
Authority Control: Wikipedia + WikidataAuthority Control: Wikipedia + Wikidata
Authority Control: Wikipedia + Wikidata
 
Your Digital Preservation Cookbook
Your Digital Preservation CookbookYour Digital Preservation Cookbook
Your Digital Preservation Cookbook
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Access to Content via Link Resolvers
Access to Content via Link ResolversAccess to Content via Link Resolvers
Access to Content via Link Resolvers
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014
 
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital PreservationThe Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital Preservation
 
Curation and Digital Storytelling
Curation and Digital StorytellingCuration and Digital Storytelling
Curation and Digital Storytelling
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
 
Integration of Web ProtĂŠgĂŠ into DBpedia
Integration of Web ProtĂŠgĂŠ into DBpediaIntegration of Web ProtĂŠgĂŠ into DBpedia
Integration of Web ProtĂŠgĂŠ into DBpedia
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
 
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 

Ähnlich wie JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Public Web Archives

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Anna Perricci
 
Building an Accessible Digital Institution
Building an Accessible Digital InstitutionBuilding an Accessible Digital Institution
Building an Accessible Digital Institutionlisbk
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemorySamantha Norling
 
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR TutorialExploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR TutorialVictor Makarenkov
 
Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
 
How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...Katherine Lawrence
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
JISC PoWR poster
JISC PoWR posterJISC PoWR poster
JISC PoWR posterlisbk
 
Unleashing library services with web 2.0 (ss)
Unleashing library services with web 2.0 (ss)Unleashing library services with web 2.0 (ss)
Unleashing library services with web 2.0 (ss)Dhanashree Date
 
METRO Conference 2014: How collaboration can save [more of] the web: recent p...
METRO Conference 2014: How collaboration can save [more of] the web: recent p...METRO Conference 2014: How collaboration can save [more of] the web: recent p...
METRO Conference 2014: How collaboration can save [more of] the web: recent p...Anna Perricci
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the InternetIRJET Journal
 
Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914Beat Estermann
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Essentials of Open Source Documentation
Essentials of Open Source DocumentationEssentials of Open Source Documentation
Essentials of Open Source DocumentationMoi Borah
 
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...lisbk
 

Ähnlich wie JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Public Web Archives (20)

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
 
Building an Accessible Digital Institution
Building an Accessible Digital InstitutionBuilding an Accessible Digital Institution
Building an Accessible Digital Institution
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR TutorialExploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
 
IR-AUDIT
IR-AUDITIR-AUDIT
IR-AUDIT
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web Archiving
 
How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
JISC PoWR poster
JISC PoWR posterJISC PoWR poster
JISC PoWR poster
 
Unleashing library services with web 2.0 (ss)
Unleashing library services with web 2.0 (ss)Unleashing library services with web 2.0 (ss)
Unleashing library services with web 2.0 (ss)
 
METRO Conference 2014: How collaboration can save [more of] the web: recent p...
METRO Conference 2014: How collaboration can save [more of] the web: recent p...METRO Conference 2014: How collaboration can save [more of] the web: recent p...
METRO Conference 2014: How collaboration can save [more of] the web: recent p...
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Essentials of Open Source Documentation
Essentials of Open Source DocumentationEssentials of Open Source Documentation
Essentials of Open Source Documentation
 
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
 

Mehr von Mat Kelly

Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderMat Kelly
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mat Kelly
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Mat Kelly
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Mat Kelly
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
 
Digital Preservation 2013
Digital Preservation 2013Digital Preservation 2013
Digital Preservation 2013Mat Kelly
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedMat Kelly
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookMat Kelly
 

Mehr von Mat Kelly (17)

Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer Header
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web Archives
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web Archives
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
 
Digital Preservation 2013
Digital Preservation 2013Digital Preservation 2013
Digital Preservation 2013
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link Restoration
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive Facebook
 

KĂźrzlich hochgeladen

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

KĂźrzlich hochgeladen (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Public Web Archives

  • 1. A Framework for Aggregating Private and Public Web Archives Mat Kelly Old Dominion University, Norfolk, VA Advisor: Michele C. Weigle JCDL 2015 Doctoral Consortium June 21, 2015
  • 2. The Problem JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 2 private archive private archive other private archive other private archive
  • 3. All Archives Cannot Be Aggregated JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 3 private archive private archive other private archive TimeMap other private archive
  • 4. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 4
  • 5. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 5
  • 6. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 6
  • 7. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 7
  • 8. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 8
  • 9. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 9
  • 10. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 10
  • 11. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 11 t = k t = k-1≠
  • 12. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 12
  • 13. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 13
  • 14. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 14 1 year ago 2 year ago 10 year ago … 180 year ago TimeMap
  • 15. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 15 private archive
  • 16. Proactive Preservation • Just-in-time WARC creation • Personal and potentially private web archiving • Mitigates deferral problem JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 16
  • 17. Public vs. Private Web Archiving • Public Web Archiving – Relies on deferred capture – Uses WARC, Memento, etc. – Integrates with other public web archives • Private Web Archiving – Same tools, less overhead, less bureaucracy – Uses WARC, Memento, etc. – Does not integrate A Framework for Aggregating Private and Public Web Archives 17 JCDL 2015 Doctoral Consortium
  • 18. Typical Web Archive Access 1. Web User Interface 2. Memento TimeGate TimeMap – Accept-Datetime (content negotiation) A Framework for Aggregating Private and Public Web Archives 18 URI-G TimeMap JCDL 2015 Doctoral Consortium
  • 19. Aggregating Multiple Web Archives • Memento Aggregator – Temporally Sorted TimeMap combined from multiple archives – Allows temporal gaps in one archive to be filled in by another TimeMap
  • 20. Archive Supplementation • More capturesgreater temporal coverage • Content on Deep Web • A large chunk of the Web is not preserved – Tools’ inability – Inconsistency over time due to personalization A Framework for Aggregating Private and Public Web Archives 20 JCDL 2015 Doctoral Consortium
  • 21. Concerns in Aggregating Private Web Archives • Privacy • Inconsistency of page representation – URI is insufficient key for access A Framework for Aggregating Private and Public Web Archives 21 JCDL 2015 Doctoral Consortium • Archival integrity – Has private archives content been manipulated?
  • 22. Why Individuals Might Want Personalized Aggregations • Show my private web archive captures • Concerned about exposing sensitive info to public – But still want to view temporally inline • Private/Restricted Archives are becoming ever more common A Framework for Aggregating Private and Public Web Archives 22 JCDL 2015 Doctoral Consortium
  • 23. Temporal Supplementation A Framework for Aggregating Private and Public Web Archives 23 JCDL 2015 Doctoral Consortium
  • 24. My Archives Have What They May Have Missed A Framework for Aggregating Private and Public Web Archives 24 JCDL 2015 Doctoral Consortium
  • 25. The Concerns Distilled • Access Control – And indicators for PWA • Preservation of Private Content • Interoperability without privacy compromise A Framework for Aggregating Private and Public Web Archives 25 JCDL 2015 Doctoral Consortium
  • 26. Web Archive Usage Pattern 1: Direct Access A Framework for Aggregating Private and Public Web Archives 26 OR TimeMap JCDL 2015 Doctoral Consortium
  • 27. Web Archive Usage Pattern 2: Web Archive Aggregation • Better results for a URI due to more sources for capture A Framework for Aggregating Private and Public Web Archives 27 TimeMap JCDL 2015 Doctoral Consortium
  • 28. Previous Patterns: Status Quo • Patterns 1 and 2 are status quo – provided by framework • Querying web archives currently only considers public web content – URI for lookup • Framework introduces 2 new entities – Memento Meta Aggregator (MMA) – Private Web Archive Adapter (PWAA) A Framework for Aggregating Private and Public Web Archives 28 JCDL 2015 Doctoral Consortium
  • 29. Memento Meta Aggregator (MMA) • Functional superset of (MA) • Can act as intermediary client to relay MA results to ultimate user • Allows just-in-time (JIT) inclusion of archives – as specified at query time • Set of archives aggregated can be dynamic – e.g., Results must not include IA A Framework for Aggregating Private and Public Web Archives 29 JCDL 2015 Doctoral Consortium
  • 30. MY CNN CAPTURES Aggregating My Captures A Framework for Aggregating Private and Public Web Archives 30 MY BANK CAPTURES JCDL 2015 Doctoral Consortium Various public web archives My web archives
  • 31. MY CNN CAPTURES The Current Memento Aggregator A Framework for Aggregating Private and Public Web Archives 31 MY BANK CAPTURES JCDL 2015 Doctoral Consortium 100 30 10
  • 32. MY CNN CAPTURES Accessing the Aggregator A Framework for Aggregating Private and Public Web Archives 32 MY BANK CAPTURES JCDL 2015 Doctoral Consortium 100 30 10
  • 33. MY CNN CAPTURES Accessing the Aggregator …does not include our archives A Framework for Aggregating Private and Public Web Archives 33 MY BANK CAPTURES NOT AGGREGATED NOT AGGREGATED JCDL 2015 Doctoral Consortium 100 30 10 140
  • 34. Access via the Meta Aggregator MY CNN CAPTURES Pattern 3: Aggregator Relay MY BANK CAPTURES 100 30 10 140140
  • 35. MY CNN CAPTURES Web Archive Usage Pattern 4: Including Additional Archives in Aggregation MY BANK CAPTURES Access via the Meta Aggregator …allows our archives to be included 100 30 10 15 140155
  • 36. MY CNN CAPTURES MMAs Allow Our Public Captures to be Shared A Framework for Aggregating Private and Public Web Archives 36 MY BANK CAPTURES JCDL 2015 Doctoral Consortium 100 30 10 15 140155 155 155
  • 37. MY CNN CAPTURES Web Archive Usage Pattern 5: Recursive MMA Access A Framework for Aggregating Private and Public Web Archives 37 MY BANK CAPTURES … Bob’s public CAPTURES The organization’s public CAPTURES 1 The organization’s public CAPTURES 2 contains A B C D Contains B C D Contains C D A B C D JCDL 2015 Doctoral Consortium 10 5 15 15 20 35 35 15 50 50
  • 38. New Framework Entity 1: Memento Meta Aggregator • Allow dynamic and JIT set of archives • Superset can be recursively constructed • Sets can be shared My public captures can be integrated with public web archives’ JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 38
  • 39. Private Web Archive Adapter (PWAA) • Regulates access to Private Web Archives (PWAs) • Acts as token authorizer • With credentials OK, relays results as if querying the PWA directly A Framework for Aggregating Private and Public Web Archives 39 JCDL 2015 Doctoral Consortium
  • 40. MY CNN CAPTURES User Establishes Access with PWA A Framework for Aggregating Private and Public Web Archives 40 MY BANK CAPTURES GET TOKEN for PWA Key: abcd1234 JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 41. MY CNN CAPTURES MMA Relays Request A Framework for Aggregating Private and Public Web Archives 41 MY BANK CAPTURES GET TOKEN for PWA Key: abcd1234 JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 42. MY CNN CAPTURES PWAA Accepts Request Generates Reusable Token A Framework for Aggregating Private and Public Web Archives 42 MY BANK CAPTURES ACCESS OK Token: 4f33c64 JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 43. MY CNN CAPTURES User Submits Request for URI-R with Token A Framework for Aggregating Private and Public Web Archives 43 MY BANK CAPTURES GET mementos for URI Token: 4f33c64 JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 44. MY CNN CAPTURES MMA Relays Request (again) A Framework for Aggregating Private and Public Web Archives 44 MY BANK CAPTURES GET mementos for URI Token: 4f33c64 JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 45. MY CNN CAPTURES PWAA Verified & Relays Request MA Gets Mementos, per usual A Framework for Aggregating Private and Public Web Archives 45 MY BANK CAPTURES Token: 4f33c64 OK GET mementos for URI GET mementos for URI JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 46. MY CNN CAPTURES Archives Return Mementos A Framework for Aggregating Private and Public Web Archives 46 MY BANK CAPTURES Token: 4f33c64 OK Returning mementos Return mementos For URI JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures
  • 47. MY CNN CAPTURES PWAA Relays TimeMap A Framework for Aggregating Private and Public Web Archives 47 MY BANK CAPTURES TimeMap TimeMap TimeMap JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures 140 10,000 10,143 140 captures
  • 48. MY CNN CAPTURES MMA Annotates and Aggregates A Framework for Aggregating Private and Public Web Archives 48 MY BANK CAPTURES TimeMap TimeMap TimeMap JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures 10,143 140 captures 3 captures 10,000 captures
  • 49. MY CNN CAPTURES Web Archive Usage Pattern 6: Aggregating Public & Private Archives A Framework for Aggregating Private and Public Web Archives 49 MY BANK CAPTURES TimeMap JCDL 2015 Doctoral Consortium 100 30 10 3 captures 10,000 captures 10,143 captures
  • 50. MY CNN CAPTURES Regulated Access Can Be Shared A Framework for Aggregating Private and Public Web Archives 50 MY BANK CAPTURES GET mementos for URI Token: 4f33c64 GET mementos for URI Token: c5463b4 GET TOKEN for PWA Key: 2265eef3 No/invalid token returned Access denied or 0 mementos JCDL 2015 Doctoral Consortium 3 captures 10,000 captures
  • 51. Aggregating Multiple PWAs JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 51 MY BANK CAPTURES Linda’s Private Captures Bob’s Private Captures GET TOKENs for PWAs Key: abcd1234, Archive: My Key: cab45cbf, Archive: Linda Key: b0b01b, Archive: Bob 3 captures 5 captures 10 captures 5 3 10
  • 52. Aggregating Multiple PWAs JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 52 MY BANK CAPTURES Access OK Token: 7790ca Access OK Token: b0b01b ACCESS DENIED Linda’s Private Captures Bob’s Private Captures 3 captures 5 captures 10 captures 5 3 10
  • 53. PWAs Can Then be Aggregated JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 53 MY BANK CAPTURES GET mementos for URI Token: 7790ca, Archive: My Token: null, Archive: Linda Token: b0b01b, Archive: Bob Linda’s Private Captures Bob’s Private Captures 3 captures 5 captures 10 captures 5 3 10 3 10 ø13
  • 54. Sample TimeMap ... , <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 15:57:03 GMT" , <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 16:39:39 GMT" , <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento"; datetime="Tue, 03 Mar 2015 16:28:41 GMT" , <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento"; datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e" , <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento"; datetime="Tue, 05 Mar 2015 21:59:22 GMT" , <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT" , <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento"; datetime="Tue, 10 Mar 2015 14:07:21 GMT" ... JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 54 TimeMap
  • 55. Access Token Included in TimeMap ... , <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 15:57:03 GMT" , <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 16:39:39 GMT" , <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento"; datetime="Tue, 03 Mar 2015 16:28:41 GMT" , <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento"; datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e" , <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento"; datetime="Tue, 05 Mar 2015 21:59:22 GMT" , <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT" , <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento"; datetime="Tue, 10 Mar 2015 14:07:21 GMT" ... JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 55 MY PRIVATE FACEBOOK CAPTURES
  • 56. My Public Web Archive, Now Aggregated ... , <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 15:57:03 GMT" , <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento"; datetime="Sat, 28 Feb 2015 16:39:39 GMT" , <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento"; datetime="Tue, 03 Mar 2015 16:28:41 GMT" , <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento"; datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e" , <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento"; datetime="Tue, 05 Mar 2015 21:59:22 GMT" , <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="meme nto"; datetime="Wed, 06 Mar 2015 12:34:57 GMT" , <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento"; datetime="Tue, 10 Mar 2015 14:07:21 GMT" ... JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 56 MY PUBLIC FACEBOOK CAPTURES
  • 57. Evaluation Plan • How effective is the Framework? • Scalability ramifications of additional infrastructure? • Is public-private tokenization most suitable method for persistent access? • How can a single archive be sub-divided between private/public and access controlled? JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 57
  • 58. Previous Work Preservation and Replay PDA 2013 - Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving JCDL 2012 - WARCreate - Create Wayback-Consumable WARC Files from Any Webpage Evaluating Capture IJDL 2015 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources IJDL 2015 - The Impact of JavaScript on Archivability JCDL 2014 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources JCDL 2014 - The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScript Dlib 2013 - A Method for Identifying Personalized Representations in the Archives TPDL 2013 - On the Change in Archivability of Websites Over Time Archival Integration JCDL 2015 - Mobile Mink: Merging Mobile and Desktop Archived Webs JCDL 2014 - Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento 58 WARCreate – preserve from the browser WAIL – private web archiving all-in-one suite Mink – Integrate the live and archived web SOFTWARE PRODUCTS PUBLICATIONS
  • 59. Current Work • Other approaches of archival lookup beyond URI • Appropriate metadata to indicate private web content in WARC files • Existing integration attempts by private web archives & individuals A Framework for Aggregating Private and Public Web Archives 59 JCDL 2015 Doctoral Consortium
  • 60.  Background Research  PhD Requirements (Coursework, Qualifying Exam, etc.)  Build preliminary framework model  JCDL Doctoral Consortium EXTENDED RESEARCH • Research prevalence of private web archives • Research access control methods in web archiving and other domains • Investigate other access patterns and expound on those defined • PhD Candidacy Exam describing merit of research plan • Implement feedback received from candidacy exam committee • Programmatically implement MMA and PWAA CASE STUDIES (real-world application) • Publicly Available Non-Aggregated Archives (e.g., Rhizome) • Deep web preservation/access (bank account/Facebook feeds) • DISSERTATION DEFENSE Dissertation Plan
  • 61. Preliminary Publication Plan JCDL 2016 Evaluation of User Access Patterns for Private Web Archives TPDL 2016 Methods in adding JIT Inclusion of Private Web Archives in Memento ACM SACMAT* Research exploring tokenization and similar methods for archival access establishment iPres 2016 Research investigating URI clash & other needed identifiers for distinguishing archived content from the “deep web” with archived content from the public live web. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 61 * Symposium on Access Control Models and Technologies
  • 62. Future Research Questions • Can a PWAA perform content negotiation[1] on the private-public spectrum? • What level of security is needed? – e.g., reporting UNAUTHORIZED vs. 0 mementos A Framework for Aggregating Private and Public Web Archives 62 JCDL 2015 Doctoral Consortium [1] RFC2295 https://www.ietf.org/rfc/rfc2295.txt
  • 63. Summation • Why? – No means exists to integrate private and public web archives. • How to Evaluate? – Does this framework fit real world needs? Scalable? • When will I know I am done? – Any public/private web archive* can be integrated. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 63 * -compliant
  • 64. References • D. Abrams, R. Baecker, and M. Chignell. Information Archiving with Bookmarks: Personal Web Space Construction and Archiving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 41–48, 1998. • A. AlSum, M. Weigle, M. Nelson, and H. Van de Sompel. Profiling Web Archive Coverage for Top-Level Domain and Content Language. International Journal on Digital Libraries, 14(3-4):149–166, 2014. • J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources. In Proceedings of JCDL, pages 321–330, London, England, 2014. • J. F. Brunelle, M. Kelly, M. C. Weigle, and M. L. Nelson. The Impact of JavaScript on Archivability. International Journal on Digital Libraries, pages 1–23, 2015. • J. F. Brunelle and M. L. Nelson. An Evaluation of Caching Policies for Memento TimeMaps. In Proceedings of JCDL, pages 267–276, 2013. • D. Gomes, S. Freitas, and M. J. Silva. Design and Selection Criteria for a National Web Archive. In Research and Advanced Technology for Digital Libraries, pages 196–207. Springer, 2006. • D. Hardt. The OAuth 2.0 Authorization Framework. IETF RFC 6749, October 2012. • M. Jones and D. Hardt. The OAuth 2.0 Authorization Framework: Bearer Token Usage. IETF RFC 6750, October 2012. • M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. A Method for Identifying Personalized Representations in the Archives. D-Lib Magazine, 19(11/12), Nov/Dec 2013. • M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. On the Change in Archivability of Websites Over Time. In Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL), pages 35–47, Valletta, Malta, 2013. • M. Kelly, M. L. Nelson, and M. C. Weigle. Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Using XAMPP. Poster and demo presented at Personal Digital Archiving, February 2013. • M. Kelly, M. L. Nelson, and M. C. Weigle. The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScript. In Proceedings of JCDL, pages 25–28, London, England, September 2014. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 64
  • 65. References • M. Kelly and M. C. Weigle. WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In Proceedings of JCDL, pages 437–438, Washington, DC, June 2012. • C. C. Marshall. Rethinking Personal Digital Archiving, Part 1. D-Lib Magazine, 14(3/4), Mar/Apr 2008. • C. C. Marshall. Rethinking Personal Digital Archiving, Part 2. D-Lib Magazine, 14(3/4), Mar/Apr 2008. • J. Niu. Functionalities of Web Archives. D-Lib Magazine, 18(3/4), Mar/Apr 2012. • M. Phillips. PANDORA, Australia’s Web Archive, and the Digital Archiving System that Supports It. http://pandora.nla.gov.au/pandas.html, 2003. • H. C.-H. Rao, Y.-F. Chen, and M.-F. Chen. A Proxy-based Personal Web Archiving Service. SIGOPS Oper. Syst. Rev., 35(1):61–72, Jan. 2001. • A. Rauber, M. Kaiser, and B. Wachter. Ethical Issues in Web Archive Creation and Usage-Towards a Research Agenda. In 8th International Web Archiving Workshop (IWAW08), 2008. • D. Rosenthal. Re-thinking Memento Aggregation. http://blog.dshr.org/2013/03/re-thinking-memento-aggregation.html, 2013. • T. Schwarz, M. Baker, S. Bassi, B. Baumgart, W. Flagg, C. van Ingen, K. Joste, M. Manasse, and M. Shah. Disk Failure Investigations at the Internet Archive. In Work-in-Progess session, NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST2006), 2006. • S. Strodl, F. Motlik, K. Stadler, and A. Rauber. Personal & Soho Archiving. In Proceedings of JCDL, pages 115–123, 2008. • M. Thelwall and L. Vaughan. A fair history of the Web? Examining country balance in the Internet Archive. Library & Information Science Research, 26(2):162–176, 2004. • B. Tofel. ‘Wayback’ for Accessing Web Archives. In 7th International Web Archiving Workshop (IWAW07), 2007. • H. Van de Sompel, M. Nelson, and R. Sanderson. HTTP Framework for Time-Based Access to Resource States – Memento. IETF RFC 7089, December 2013. • T. Wang, M. Srivatsa, and L. Liu. Fine-Grained Access Control of Personal Data. In Proceedings of the 17th ACM Symposium on Access Control Models and Technologies, pages 145–156, 2012. JCDL 2015 Doctoral Consortium A Framework for Aggregating Private and Public Web Archives 65
  • 66. A Framework for Aggregating Private and Public Web Archives Mat Kelly Old Dominion University, Norfolk, VA Advisor: Michele C. Weigle JCDL 2015 Doctoral Consortium June 21, 2015