SlideShare ist ein Scribd-Unternehmen logo
1 von 41
When Should I Make 
Preservation Copies of Myself? 
Charles L. Cartledge and Michael L. Nelson 
Old Dominion University 
Department of Computer Science 
Norfolk, VA 23529 USA 
JCDL 2014 
London, UK 
September 9, 2014
Unsupervised Small-World (USW) 
has multiple areas of interest 
2
Preservation via benign neglect 
3 
Handwritten on the back of the photo: 
“Josie McClure picture taken Feb 30, 1907 at Poteau, I.T. 
Fifteen years of age When this was taken weighed 140 lbs.” 
(cultural context needed to make sense of the annotation!)
Will Josie last 100+ years as a web object 
(WO) in Flickr, Photobucket, et al.?
Crowd sourcing preservation 
• “Everyone is a curator …” 
– Crowd sourced activity 
– Unscheduled 
– Willing to wait a long time 
• Enlist humans in creation 
and maintenance – 
opposite of benign neglect 
5 
Frank McCown, Michael L. Nelson, and Herbert Van de Sompel, Everyone is 
a Curator: Human-Assisted Preservation for ORE Aggregations, Proceedings 
of the DigCCurr 2009 http://arxiv.org/abs/0901.4571 
See also: http://ws-dl.blogspot.com/2013/10/2013-10-23-preserve-me-if-you-can-using.html
Emergent behavior: flocking boids 
• Craig Reynolds – basis of herd 
and flock behavior in 
computer animations 
– 3 rules 
• Collision avoidance 
• Velocity matching 
• Flock centering 
– No central control, everything 
based on local knowledge only 
• Simple rules produce 
complex, emergent behavior 
6 
Collision avoidance 
Velocity matching 
Craig W. Reynolds, Computer Animation with Scripts and Actors, ACM SIGGRAPH, vol. 16, ACM, 
1982, pp. 289 - 296. 
Images http://www.red3d.com/cwr/boids/ Flock centering
USW interpretation of flocking 
7 
Craig Reynolds’ “boids” USW interpretation 
Collision 
avoidance 
Velocity 
matching 
Flock 
centering 
Each WO has a unique URI 
Matching number of copies/family members 
Move with friends to new hosts
WOs wandering in the USW graph 
• Wandering WO is 
“introduced” to an 
existing WO 
• If a connection is not 
made, then an 
attempt is made to 
another existing WO 
• Process is repeated 
until a connection is 
made 
• No global 
knowledge 
– No omnipotent 
enforcer 
– No omnipresent 
monitor 
• No repositories 
8
USW WO “friendship” links 
• WOs have 
“friendship” 
links to other 
WOs 
• Different than 
HTML 
navigational 
links (i.e., 
<link> instead 
of <a>) 
9
USW WO “families” 
A family is a 
set of copies 
of the same 
WO 
10
USW hosts 
Family 
members live 
on different 
hosts 
11 
Host #1 
Host #2 
Host #3
WO roles & responsibilities 
• Hierarchy of family WOs 
– Progenitor – initial WO 
– Copies – more recent WO copies 
– Each WO is timestamped with creation time 
• WO roles 
– Active maintainer – eldest WO charged with 
making copies and related housekeeping 
– Passive maintainer – all other WOs 
• Order of precedence 
– If progenitor is accessible then it is the active 
maintainer 
– If declared active maintainer is accessible then it is 
the active maintainer 
– Otherwise, WO declares itself active maintainer 
• If family is disconnected then multiple active 
maintainers are possible until reconnection then the 
eldestWO declares itself active maintainer 
12 
Progenitor 
Copies
Active and passive maintenance 
activities 
13 
Active 
Passive 
Active Active 
Passive Passive 
Progenitor 
Is lost 
X • Active maintainer (the WO with earliest timestamp) – currently 
charged with making copies and related housekeeping 
• Passive maintainer – all other WOs 
Progenitor 
returns 
Progenitor 
declares 
act. as copy. 
Time
Progenitor is lost 
14 
Active 
Passive 
Active Active 
Passive Passive 
Progenitor 
Is lost 
X • Active maintainer – currently charged with making copies and 
related housekeeping 
• Passive maintainer – all other WOs 
Progenitor 
returns 
Progenitor 
declares 
act. as copy. 
Time
A new active maintainer 
15 
Active 
Passive 
Active Active 
Passive Passive 
Progenitor 
Is lost 
X • Active maintainer – currently charged with making copies and 
related housekeeping 
• Passive maintainer – all other WOs 
Progenitor 
returns 
Progenitor 
declares 
act. as copy. 
Time
Progenitor returns and assumes active 
maintainer role 
16 
Active 
Passive 
Active Active 
Passive Passive 
Progenitor 
Is lost 
X • Active maintainer – currently charged with making copies and 
related housekeeping 
• Passive maintainer – all other WOs 
Progenitor 
returns 
Progenitor 
declares 
act. as copy. 
Time
Progenitor has made copies 
17 
Time 
Copy 
declares 
active 
Copies 
created 
Replacement 
created 
Excess copies Excess deleted
A copy is disconnected from the 
family 
18 
Time 
Copy 
declares 
active 
Copies 
created 
Replacement 
created 
Excess copies Excess deleted
Two active maintainers make 
copies 
19 
Time 
Copy 
declares 
active 
Copies 
created 
Replacement 
created 
Excess copies Excess deleted
Disconnected copy is 
reconnected to the progenitor 
20 
Time 
Copy 
declares 
active 
Copies 
created 
Replacement 
created 
Excess copies Excess deleted
Family has too many copies 
21 
Time 
Copy 
declares 
active 
Copies 
created 
Replacement 
created 
Excess copies Excess deleted 
• Copy management policies 
– Active: explicit removal 
– Passive: “natural attrition” 
• Equivalent of Reynolds’ 
velocity matching, making 
and monitoring copies
Parameters 
• csoft = minimum number of preservation copies desired by a 
web object 
– e.g., csoft = 3 
• chard = maximum number of preservation copies desired by 
a web object 
– e.g., chard = 5 
• hmax = maximum number of hosts 
– e.g., hmax = 1000 
• hcap = host capacity for web objects 
– e.g., hcap = 5 
• nmax = maximum number of web objects 
– e.g., nmax = 500
Three USW copying policies 
• Least aggressive – one at a time to chard 
• Moderately aggressive – as quickly as possible to 
csoft and then one at a time chard 
• Most aggressive – as quickly as possible to chard 
• Constraints: 
– WOs can only take action when woken up by interactive 
users or other WOs (i.e., mostly they lie dormant 
waiting for crowd sourced preservation) 
– Copying continues until WOs can no longer find hosts 
that are not full 
23
Reading tree ring graphs 
24 
WOs preservation status Hosts utilization status 
None 
< Csoft 
Csoft <= N < Chard 
N == Chard 
0% 
< 25% < 75% 
< 50 % > 75%
Least aggressive (t = 1) 
25
Least aggressive (t = 10) 
26
Least aggressive (t = 50) 
27
Least aggressive (t = 100) 
28 
 A full YouTube video is available at: http://youtu.be/sHJGYphqtK4
Least aggressive (final) 
29 
• Results 
– System 
stabilized 
– Host capacity 
limited 
– Some WOs 
without any 
copies 
– Some hosts 
unused 
• “Least aggressive” is 
not an effective 
policy
Which policy to choose? 
30 
• Moderately aggressive 
results in an additional 
18% of WOs meeting 
their preservation goals 
and makes more 
efficient use of limited 
host resources sooner 
• Most aggressive results 
in almost the same 
percentage of WOs 
meeting their goals, but 
with slightly more hosts 
having unused capacity
How does policy affect message 
exchange? 
Moderately Least aggressive aggressive Most aggressive 
Number of messages is constant, but amortized over different time scales
Conclusions 
• Based on simulations: 
– Be aggressive when making copies! 
– Moderately aggressive copying was approximately 
the same as aggressive copying 
• Aggressive achieves steady state faster 
• But moderately aggressive distributes WOs over hosts 
more equally 
– Moderately aggressive vs. aggressive comes down 
to “go fast” vs. “spread the load”
Video URLs 
• USW video 
– http://youtu.be/JnCMenp73YQ 
• Least Aggressive 
– http://youtu.be/sHJGYphqtK4 
• Moderately Aggressive 
– https://www.youtube.com/watch?v=pVI-VhPh7KQ 
• Most Aggressive 
– https://www.youtube.com/watch?v=eIXz8Njh-QM 
• “Death Star” message histogram 
– https://www.youtube.com/watch?v=X3EShyjFoc4 
• “Traditional” message histogram 
– https://www.youtube.com/watch?v=9CcCup3Td-Q 
33
Backup slides 
34
Some WO reference 
implementation details 
35 
 Sawood Alam, HTTP Mailbox - Asynchronous RESTful Communication, Master's thesis, Old 
Dominion University, Norfolk, VA, 2013. 
 Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and 
Simeon Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open 
Archives Initiative, 2004. 
 Sawood Alam, Charles L. Cartledge, and Michael L. Nelson, Support for Various HTTP Methods 
on the Web, Tech. Report arXiv:1405.2330 (2014). 
WO memory: simulated via “edit” 
service 
Direct WO to WO communication: 
simulated via the HTTP Mailbox
Preserve Me Viz! with new 
connections 
• New friend 
connections 
• New copy 
locations 
36
Preserve Me “Basic” on a copy 
• Differences between 
active and passive 
maintainers. 
• Active maintainer is 
responsible for making 
copies. 
• Passive maintainer 
sends alerts to the 
active maintainer 
• Passive maintainer 
may assume active 
maintainer role if 
active is not available. 
37
A USW instrumented splash page 
… 
<link rel="resourcemap" type="application/atom+xml;type=entry" 
href="http://arxiv.cs.odu.edu/rems/arxiv-0704-3647v1.xml" /> 
<link rel="aggregation" href="http://arxiv.cs.odu.edu/rems/arxiv-0704- 
3647v1.xml#aggregation" /> 
<script src="http://www.cs.odu.edu/~salam/wsdl/uswdo/work/preserveme.js"></script> 
… 
38
USW algorithm popup 
39 
• Written in 
JavaScript 
• Relies on domain 
services 
– Copy -> creates 
copy of a WO 
– Edit -> update 
own REM 
• Uses 
communications 
mechanism based 
on Sawood Alam’s 
master’s thesis
USW copies: famine to feast 
40
Final states for copying policies and 
named conditions 
41

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Ähnlich wie When Should I Make Preservation Copies of Myself?

Ähnlich wie When Should I Make Preservation Copies of Myself? (20)

From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
Advancing Research at London's Global University
Advancing Research at London's Global UniversityAdvancing Research at London's Global University
Advancing Research at London's Global University
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Hands
 
RLG Partnership Update Webinar Slides
RLG Partnership Update Webinar SlidesRLG Partnership Update Webinar Slides
RLG Partnership Update Webinar Slides
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotland
 
The opac and the web
The opac and the webThe opac and the web
The opac and the web
 
Cosi Opac Tweaks
Cosi   Opac TweaksCosi   Opac Tweaks
Cosi Opac Tweaks
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
Radcliffe
RadcliffeRadcliffe
Radcliffe
 
As bibliotecas do mundo conectadas. A um mundo conectado!
As bibliotecas do mundo conectadas. A um mundo conectado!As bibliotecas do mundo conectadas. A um mundo conectado!
As bibliotecas do mundo conectadas. A um mundo conectado!
 
Stream Reasoning: State of the Art and Beyond
Stream Reasoning: State of the Art and BeyondStream Reasoning: State of the Art and Beyond
Stream Reasoning: State of the Art and Beyond
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsug
 
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Cassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects PerformanceCassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects Performance
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling TechniquesNelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
 
2011 ebooks rsk charleston
2011 ebooks rsk charleston2011 ebooks rsk charleston
2011 ebooks rsk charleston
 

Mehr von Michael Nelson

Mehr von Michael Nelson (9)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Kürzlich hochgeladen

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 

When Should I Make Preservation Copies of Myself?

  • 1. When Should I Make Preservation Copies of Myself? Charles L. Cartledge and Michael L. Nelson Old Dominion University Department of Computer Science Norfolk, VA 23529 USA JCDL 2014 London, UK September 9, 2014
  • 2. Unsupervised Small-World (USW) has multiple areas of interest 2
  • 3. Preservation via benign neglect 3 Handwritten on the back of the photo: “Josie McClure picture taken Feb 30, 1907 at Poteau, I.T. Fifteen years of age When this was taken weighed 140 lbs.” (cultural context needed to make sense of the annotation!)
  • 4. Will Josie last 100+ years as a web object (WO) in Flickr, Photobucket, et al.?
  • 5. Crowd sourcing preservation • “Everyone is a curator …” – Crowd sourced activity – Unscheduled – Willing to wait a long time • Enlist humans in creation and maintenance – opposite of benign neglect 5 Frank McCown, Michael L. Nelson, and Herbert Van de Sompel, Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations, Proceedings of the DigCCurr 2009 http://arxiv.org/abs/0901.4571 See also: http://ws-dl.blogspot.com/2013/10/2013-10-23-preserve-me-if-you-can-using.html
  • 6. Emergent behavior: flocking boids • Craig Reynolds – basis of herd and flock behavior in computer animations – 3 rules • Collision avoidance • Velocity matching • Flock centering – No central control, everything based on local knowledge only • Simple rules produce complex, emergent behavior 6 Collision avoidance Velocity matching Craig W. Reynolds, Computer Animation with Scripts and Actors, ACM SIGGRAPH, vol. 16, ACM, 1982, pp. 289 - 296. Images http://www.red3d.com/cwr/boids/ Flock centering
  • 7. USW interpretation of flocking 7 Craig Reynolds’ “boids” USW interpretation Collision avoidance Velocity matching Flock centering Each WO has a unique URI Matching number of copies/family members Move with friends to new hosts
  • 8. WOs wandering in the USW graph • Wandering WO is “introduced” to an existing WO • If a connection is not made, then an attempt is made to another existing WO • Process is repeated until a connection is made • No global knowledge – No omnipotent enforcer – No omnipresent monitor • No repositories 8
  • 9. USW WO “friendship” links • WOs have “friendship” links to other WOs • Different than HTML navigational links (i.e., <link> instead of <a>) 9
  • 10. USW WO “families” A family is a set of copies of the same WO 10
  • 11. USW hosts Family members live on different hosts 11 Host #1 Host #2 Host #3
  • 12. WO roles & responsibilities • Hierarchy of family WOs – Progenitor – initial WO – Copies – more recent WO copies – Each WO is timestamped with creation time • WO roles – Active maintainer – eldest WO charged with making copies and related housekeeping – Passive maintainer – all other WOs • Order of precedence – If progenitor is accessible then it is the active maintainer – If declared active maintainer is accessible then it is the active maintainer – Otherwise, WO declares itself active maintainer • If family is disconnected then multiple active maintainers are possible until reconnection then the eldestWO declares itself active maintainer 12 Progenitor Copies
  • 13. Active and passive maintenance activities 13 Active Passive Active Active Passive Passive Progenitor Is lost X • Active maintainer (the WO with earliest timestamp) – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs Progenitor returns Progenitor declares act. as copy. Time
  • 14. Progenitor is lost 14 Active Passive Active Active Passive Passive Progenitor Is lost X • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs Progenitor returns Progenitor declares act. as copy. Time
  • 15. A new active maintainer 15 Active Passive Active Active Passive Passive Progenitor Is lost X • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs Progenitor returns Progenitor declares act. as copy. Time
  • 16. Progenitor returns and assumes active maintainer role 16 Active Passive Active Active Passive Passive Progenitor Is lost X • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs Progenitor returns Progenitor declares act. as copy. Time
  • 17. Progenitor has made copies 17 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 18. A copy is disconnected from the family 18 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 19. Two active maintainers make copies 19 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 20. Disconnected copy is reconnected to the progenitor 20 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 21. Family has too many copies 21 Time Copy declares active Copies created Replacement created Excess copies Excess deleted • Copy management policies – Active: explicit removal – Passive: “natural attrition” • Equivalent of Reynolds’ velocity matching, making and monitoring copies
  • 22. Parameters • csoft = minimum number of preservation copies desired by a web object – e.g., csoft = 3 • chard = maximum number of preservation copies desired by a web object – e.g., chard = 5 • hmax = maximum number of hosts – e.g., hmax = 1000 • hcap = host capacity for web objects – e.g., hcap = 5 • nmax = maximum number of web objects – e.g., nmax = 500
  • 23. Three USW copying policies • Least aggressive – one at a time to chard • Moderately aggressive – as quickly as possible to csoft and then one at a time chard • Most aggressive – as quickly as possible to chard • Constraints: – WOs can only take action when woken up by interactive users or other WOs (i.e., mostly they lie dormant waiting for crowd sourced preservation) – Copying continues until WOs can no longer find hosts that are not full 23
  • 24. Reading tree ring graphs 24 WOs preservation status Hosts utilization status None < Csoft Csoft <= N < Chard N == Chard 0% < 25% < 75% < 50 % > 75%
  • 28. Least aggressive (t = 100) 28  A full YouTube video is available at: http://youtu.be/sHJGYphqtK4
  • 29. Least aggressive (final) 29 • Results – System stabilized – Host capacity limited – Some WOs without any copies – Some hosts unused • “Least aggressive” is not an effective policy
  • 30. Which policy to choose? 30 • Moderately aggressive results in an additional 18% of WOs meeting their preservation goals and makes more efficient use of limited host resources sooner • Most aggressive results in almost the same percentage of WOs meeting their goals, but with slightly more hosts having unused capacity
  • 31. How does policy affect message exchange? Moderately Least aggressive aggressive Most aggressive Number of messages is constant, but amortized over different time scales
  • 32. Conclusions • Based on simulations: – Be aggressive when making copies! – Moderately aggressive copying was approximately the same as aggressive copying • Aggressive achieves steady state faster • But moderately aggressive distributes WOs over hosts more equally – Moderately aggressive vs. aggressive comes down to “go fast” vs. “spread the load”
  • 33. Video URLs • USW video – http://youtu.be/JnCMenp73YQ • Least Aggressive – http://youtu.be/sHJGYphqtK4 • Moderately Aggressive – https://www.youtube.com/watch?v=pVI-VhPh7KQ • Most Aggressive – https://www.youtube.com/watch?v=eIXz8Njh-QM • “Death Star” message histogram – https://www.youtube.com/watch?v=X3EShyjFoc4 • “Traditional” message histogram – https://www.youtube.com/watch?v=9CcCup3Td-Q 33
  • 35. Some WO reference implementation details 35  Sawood Alam, HTTP Mailbox - Asynchronous RESTful Communication, Master's thesis, Old Dominion University, Norfolk, VA, 2013.  Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and Simeon Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open Archives Initiative, 2004.  Sawood Alam, Charles L. Cartledge, and Michael L. Nelson, Support for Various HTTP Methods on the Web, Tech. Report arXiv:1405.2330 (2014). WO memory: simulated via “edit” service Direct WO to WO communication: simulated via the HTTP Mailbox
  • 36. Preserve Me Viz! with new connections • New friend connections • New copy locations 36
  • 37. Preserve Me “Basic” on a copy • Differences between active and passive maintainers. • Active maintainer is responsible for making copies. • Passive maintainer sends alerts to the active maintainer • Passive maintainer may assume active maintainer role if active is not available. 37
  • 38. A USW instrumented splash page … <link rel="resourcemap" type="application/atom+xml;type=entry" href="http://arxiv.cs.odu.edu/rems/arxiv-0704-3647v1.xml" /> <link rel="aggregation" href="http://arxiv.cs.odu.edu/rems/arxiv-0704- 3647v1.xml#aggregation" /> <script src="http://www.cs.odu.edu/~salam/wsdl/uswdo/work/preserveme.js"></script> … 38
  • 39. USW algorithm popup 39 • Written in JavaScript • Relies on domain services – Copy -> creates copy of a WO – Edit -> update own REM • Uses communications mechanism based on Sawood Alam’s master’s thesis
  • 40. USW copies: famine to feast 40
  • 41. Final states for copying policies and named conditions 41