SlideShare ist ein Scribd-Unternehmen logo
1 von 100
A Framework for
Web Object Self-Preservation
A Ph.D. Defense
Chuck Cartledge
30 May 2014
A warning from Jeff Rothenberg
“Digital Information Lasts
Forever—Or Five Years,
Whichever Comes First.”
2
 Jeff Rothenberg, Ensuring the Longevity of Digital Information, Scientific
American 272 (1995), 42 - 47.
A warning from William Arms
“Tomorrow we could see the
National Library of Medicine
abolished by Congress, Elsevier
dismantled by a corporate raider,
the Royal Society declared
bankrupt, or the University of
Michigan Press destroyed by a
meteor. All are highly unlikely,
but over a long period of time
unlikely events will happen."
3
 William Y. Arms, Preservation of Scientific Serials: Three Current
Examples, Journal of Electronic Publishing 5 (1999), no. 2.
Overview
• Warnings
• Preservation context
• Research questions
• Background and related
work
• Unsupervised Small-World
– Emergent behavior
– Graph theory
– Preservation
• Demonstration
• Questions and answers
4
Preservation in an analog age
• Benign neglect
– Don’t touch
– Keep away from sunshine
– Keep away from moisture
– Keep away from insects
• Last for hundreds of
years
5
 Josie McClure picture taken Feb 30, 1907 at Poteau, I.T.
Fifteen years of age When this was taken weighed 140 lbs.
Preservation in a digital age
6
• Constant use
– Use often
– Exposure to lots of things
– Make lots of copies
– Monitor the integrity
• Last for ??? unknown
years
• This is a Brave New
World
 Google image search, 31 March 2014, about 91,700,000 results (0.84 seconds)
Everything has a lifespan
• Exponential growth of
digital artifacts
• Representing increasing
portion of personal and
cultural heritage
• Short human lifetime to
manage data
• Potentially, short
institutional life time
• Need to preserve
artifacts beyond human
lifespan and institutions
that create and house
artifacts
7 Dissertation, section 1.3
Research questions
• Can web objects (WOs)
be constructed to
outlive the people and
institutions that
created them?
• Can we leverage
aspects of naturally
occurring networks and
group behavior for
preservation?
8
A WO is a digital object that lives on the Web. A WO is a fundamental
element in this dissertation.
Unsupervised Small-World (USW) is
at the nexus of multiple disciplines
9
Mathematical
structures used to
model pairwise
relations between
objects
Ensuring that
digital information
of continuing
value remains
accessible and
usable
Movement of
the inanimate
Shifting gears
10
Focusing on
emergent
behavior
Emergent behavior: model
• Craig Reynolds – basis of herd
and flock behavior in
computer animations
– 3 rules
• Collision avoidance
• Velocity matching
• Flock centering
– No central control, everything
based on local knowledge only
• Simple rules
– Complex behavior
– Emergent behavior
11
 Craig W. Reynolds, Computer Animation with Scripts and
Actors, ACM SIGGRAPH, vol. 16, ACM, 1982, pp. 289 - 296.
 Images http://www.red3d.com/cwr/boids/ Flock centering
Velocity matching
Collision avoidance
Emergent behavior: communication
• Need to know what
my neighbors are
doing
• Need to tell
neighbors what I
am doing
• A school of fish do
not have a Dagon
that controls them
12 Dissertation, section 5.3
Shifting gears
13
Focusing on
preservation
Preservation: primitives
14 William Y. Arms, Digital Libraries, The MIT Press, December 1999
png
png png png
Replication Emulation
png
tiff eps bmp
Migration
Preservation: OAIS model
• Provides
standard model
and terminology
for archival
systems
• Terms of interest
– Submission
Information
Package
– Ingest
– Data
Management
– Archival
Storage
– Access
– Dissemination
Information
Package
15
 Council of the Consultative Committee for Space Data Systems (CCSDS), Reference Model for an Open
Archival Information System (OAIS), Tech. report, Consultive Committee for Space Data Systems 650.0-
M-2, Magenta Book, 2012.
 Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and Simeon
Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open Archives Initiative,
2004.
Shifting gears
16
Focusing on
graph theory
Graph theory: definitions
• Graph: G = (V,E)
• Graph can be connected or
disconnected
• Some graph metrics work
only with connected graphs
and not with disconnected
graphs
– Clustering coefficient (C(G))
– Average path length (L(G))
– Degree distribution
17
 Reka Albert and Albert-Laszlo Barabasi, Statistical Mechanics of
Complex Networks, Reviews of Modern Physics 74 (2002), no. 1, 47.
Graph theory: Watts and Strogatz
small-world
18
 Duncan J. Watts and
Steven H. Strogatz,
Collective dynamics
of `small world'
networks, Nature 393
(1998), 440 - 442.
 Stanley Milgram, The
Small-World Problem,
Psychology Today 2
(1967), no. 1, 60 - 67.
Small-world graphs are common
19
Actual Random graph
Nodes Edges C(G) L(G) C(G) L(G)
WECC 4941 6594 0.0801 18.99 0.00054 8.7
C.
elegans
248 511 0.21 2.87 0.05 2.62
Email 148 ~500,000 0.44 2.25 0.11 2.0
 Ake J Holmgren, Using Graph Models to Analyze the Vulnerability of Electric Power Networks, Risk Analysis 26
(2006), no. 4, 955 - 969.
 Lav R Varshney, Beth L Chen, Eric Paniagua, David H Hall, and Dmitri B Chklovskii, Structural Properties of the
Caenorhabditis elegans Neuronal Network, PLoS computational biology 7 (2011), no. 2, e1001066.
 Shinako Matsuyama and Takao Terano, Analyzing the ENRON Communication Network Using Agent-Based
Simulation, Journal of Networks 3 (2008), no. 7.
West Elect.
Coord.
Council Enron
e-mail
Small-world: high C(G) and low L(G)
20
• The ubiquitous presence of small-world graphs
points to something inherently “correct” and
desirable about them.
Symbol Meaning
k Degree
n Order of the graph
 Dissertation, section 5.2.3
USW is at the nexus of multiple
disciplines
21
Creation of
small-world
graphs that are
robust and
resilient
Meet
fundamental
requirements
of replication,
migration, and
data
management
WO’s use of
emergent
behavior to
create, monitor,
and optimize
the USW
system
Euclidean geometry
• To draw a straight line from any point
to any point.
• To produce [extend] a finite straight
line continuously in a straight line.
• To describe a circle with any center
and distance [radius].
• That all right angles are equal to one
another.
• That, if a straight line falling on two
straight lines make the interior angles
on the same side less than two right
angles, the two straight lines, if
produced indefinitely, meet on that
side on which are the angles less than
the two right angles.
22
The sum of angles A, B, and
C is equal to 180 degrees
 Euclid of Alexandria, The Elements, Alexandria, 300 BCE.
Non-Euclidean geometries
23
• Non-Euclidean geometry
– To draw a straight line from any point to any
point.
– To produce [extend] a finite straight line
continuously in a straight line.
– To describe a circle with any center and
distance [radius].
– That all right angles are equal to one another.
– That, if a straight line falling on two straight
lines make the interior angles on the same side
less than two right angles, the two straight
lines, if produced indefinitely, meet on that side
on which are the angles less than the two right
angles.
• Spherical geometry
– Two lines at right angles to the same line can
meet
– Triangles can have 180 to 540 degrees
– Circles are straight lines
Digital library world
• Digital libraries
– The technical framework exists within a legal
and social framework
– Understanding of digital library concepts is
hampered by terminology
– The underlying architecture should be separate
from the content stored in the library
– Names and identifiers are the basic building
block for the digital library
– Digital library objects are more than collections
of bits
– The digital library object that is used is
different from the stored object
– Repositories must look after the information
they hold
– Users want intellectual works, not digital
objects
• Basic digital library tenets
24
 Robert Kahn and Robert Wilensky, A Framework for Distributed Digital Object Services,
International Journal on Digital Libraries 6 (2006), no. 2, 115 - 123.
 William Y. Arms, Key Concepts in the Architecture of the Digital Library, D-Lib Magazine 1
(1995), no. 1.
Digital library worlds of
possibilities
25
• Digital libraries
– The technical framework exists within a legal and
social framework
– Understanding of digital library concepts is
hampered by terminology
– The underlying architecture should be separate
from the content stored in the library
– Names and identifiers are the basic building
block for the digital library
– Digital library objects are more than collections
of bits
– The digital library object that is used is different
from the stored object
– Repositories must look after the information
they hold
– Users want intellectual works, not digital objects
What if there were no repositories?
“No Repositories” → USW
• No global
knowledge
– No omnipotent
enforcer
– No omnipresent
monitor
• Opportunistic
preservation
• Self-describing
Web Objects
26
USW contributions
27
USW WO “friendship” links
• WOs have
“friendship”
links to other
WOs
• Different
than HTML
navigational
links
28 Dissertation, Chapter 6
USW WO “families”
A family is a
set of copies
of the same
WO
29 Dissertation, Chapter 6
USW hosts
Family
members live
on different
hosts
30
Host #1
Host #2
Host #3
 Dissertation, Chapter 6
Shifting gears
31
Focus on
emergent
behavior
USW interpretation of flocking
32
Flock
centering
Velocity
matching
Collision
avoidance
Craig Reynolds’ “boids” USW interpretation
Each WO has a unique URI
Matching number of copies/family members
Move with friends to new hosts
 Dissertation, Chapter 2
Building a USW graph
33
• Graph
exploration (b)
• Choosing
connections
• Detecting loss
 Dissertation, Chapter 5
WOs wandering in the USW graph
• Wandering WO is
“introduced” to an
existing WO
• If a connection is not
made, then an attempt
is made to another
existing WO
• Process is repeated
until a connection is
made
• No global knowledge
– No omnipotent
enforcer
– No omnipresent
monitor
• No repositories
34
 Dissertation, Chapter 5
USW friend selection process
• Selection from possible sets
– WOset: WOs connected to candidate WO
– visitedSet: WOs that the wandering WO has explored
– toBeVisitedSet: WOs that the wandering WO has
discovered
• Selection approaches
– Random from visitedSetUtoBeVisitedSet
– FIFO from visitedSetUtoBeVisitedSet
– LIFO from visitedSetUtoBeVisitedSet
– Preferentially attach to WOset then random for remaining
35 Dissertation section 6.7.5
Comparing USW to random
graphs
36
 Dissertation, section 6.4
Shifting gears
37
Focus on graph
theory
Robustness of USW graphs
• Definition: able to continue when
damaged
• Attack vs. failure
– Intentional vs. random
• Component selection
– Vertex
– Edge
• Selection attribute
– Degree
– Betweeness
• Attribute value
– High
– Low
• Attack profile notation: A{D|E|V}{H|L}
38
Sample graph
 Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph
by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS
Dept.
Different attack profiles selections
39
AEH AEL AVH
ADLAVL ADH
 Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph
by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS
Dept.
Four attacks using AEL profile
40
Deletion #1. Deletion #2.
Deletion #3. Deletion #4.
Our damage vs. Albert, Jeong,
and Barabasi’s damage
41
s = Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi, Error and Attack Tolerance of
Complex Networks, Nature 406 (2000), no. 6794, 378 - 382.
50 … 5 20 … 20
16 … 1 10 … 10
Measuring damage
• Desired characteristics
– Different framgentation cases result
in different values
– Useful without additional graph
state information
42
 Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph by the Removal of an Edge or
Vertex, Tech. report, arXiv 1103.3075, ODU CS Dept.
 Dissertation Appendix D
Local and global AE* damage
43
Global A{DV}H damage (100 nodes)
44
• 100 node graph
• Execution time:
~36 hours
• Attacker has total
knowledge of the
graph
• Attacker has
unrestricted
resources to
damage the graph
• Results:
– Small-world the
most connected is
not the most
valuable
– Random and
scale-free
degreeness does
not make a
difference
Attack profile efficacy on sample graph
Attack profile Attacks efficacy
AEdge High The core of the graph 1.43
AEdge Low The periphery of the graph 1.00
AVertex High The core of the graph 1.42
AVertex Low The periphery of the graph 1.00
ADegree High The core of the graph 1.40
ADegree Low The periphery of the graph 1.00
45
• If the attacker's goal is to disconnect the graph by repeated use of
the same attack profile, then the most effective profiles in order
are: AEH , AVH , and ADH.
• HTTP/HTML does not support AE* attack profiles
 Dissertation, section 5.6.6
Detecting loss of family members
• Each “active maintainer” WO
checks its family’s status
– Check family member
accessibility
– Check friend accessibility
• If family member is lost, use
friends to select candidate
host
• If too few candidate hosts, use
friends to explore and discover
new hosts
46
 Dissertation, section 6.8
Shifting gears
47
Focus on
preservation
When to make family members?
• What is a copy?
• Who makes the copies?
• How many to make? Answer:
defined by originating domain
– 0 to start
– Soft lower limit (csoft)
– Hard upper limit (chard)
• Where to make them?
– Distributed across known hosts
– Too many or too few hosts
• When to make them?
48
 Norman Paskin, On Making and Identifying a Copy, D-Lib Magazine 9 (2003), no. 1.
 Henry M. Gladney and John L. Bennett, What Do We Mean by Authentic? What's the
Real McCoy?, D-Lib Magazine 9 (2003), no. 7/8.
USW preservation definitions
• Hierarchy of family WOs
– Progenitor – initial WO
– Copies – more recent WO copies
– Each WO is timestamped with creation time
• WO roles
– Active maintainer – eldest WO charged with
making copies and related housekeeping
– Passive maintainer – all other WOs
• Order of precedence
– If progenitor is accessible then it is the active
maintainer
– If declared active maintainer is accessible then it is
the active maintainer
– Otherwise, WO declares itself active maintainer
• If family is disconnected then multiple active
maintainers are possible until reconnection then the
eldest WO declares itself active maintainer
49
Progenitor
Copies
 Dissertation, Appendix A
Active and passive maintenance
activities
50
Active
Passive
Active Active
PassivePassive
• Active maintainer (the WO with earliest timestamp) – currently
charged with making copies and related housekeeping
• Passive maintainer – all other WOs
XProgenitor
Is lost
Progenitor
returns
Progenitor
declares
act. as copy.
Time
Progenitor is lost
51
Active
Passive
Active Active
PassivePassive
• Active maintainer – currently charged with making copies and
related housekeeping
• Passive maintainer – all other WOs
XProgenitor
Is lost
Progenitor
returns
Progenitor
declares
act. as copy.
Time
A new active maintainer
52
Active
Passive
Active Active
PassivePassive
• Active maintainer – currently charged with making copies and
related housekeeping
• Passive maintainer – all other WOs
XProgenitor
Is lost
Progenitor
returns
Progenitor
declares
act. as copy.
Time
Progenitor returns and assumes
active maintainer role
53
Active
Passive
Active Active
PassivePassive
• Active maintainer – currently charged with making copies and
related housekeeping
• Passive maintainer – all other WOs
XProgenitor
Is lost
Progenitor
returns
Progenitor
declares
act. as copy.
Time
Progenitor has made copies
54
Time
Copy
declares
active
Copies
created
Replacement
created
Excess copies Excess deleted
A copy is disconnected from the
family
55
Time
Copy
declares
active
Copies
created
Replacement
created
Excess copies Excess deleted
Two active maintainers make
copies
56
Time
Copy
declares
active
Copies
created
Replacement
created
Excess copies Excess deleted
Disconnected copy is
reconnected to the progenitor
57
Time
Copy
declares
active
Copies
created
Replacement
created
Excess copies Excess deleted
Family has too many copies
58
Time
Copy
declares
active
Copies
created
Replacement
created
Excess copies Excess deleted
• Copy management policies
– Active: explicit removal
– Passive: “natural attrition”
• Equivalent of Reynolds’
velocity matching, making
and monitoring copies
USW copying policies
• Least aggressive –
one at a time to chard
• Moderately
aggressive – as
quickly as possible to
csoft and then one at a
time chard
• Most aggressive – as
quickly as possible to
chard
• Different results
59
WOs preservation status Hosts utilization status
None
< Csoft
Csoft <= N < Chard
N == Chard
0%
< 25% < 75%
< 50 % > 75%
 Dissertation, section 6.7.4
Least aggressive (t = 1)
60
Least aggressive (t = 10)
61
Least aggressive (t = 50)
62
Least aggressive (t = 100)
63
 A full YouTube video is available at: http://youtu.be/sHJGYphqtK4
Least aggressive (final)
64
• Results
– System
stabilized
– Host capacity
limited
– Some WOs
without any
copies
– Some hosts
unused
• “Least aggressive” is
not an effective
policy
Which policy to choose?
65
• Moderately aggressive
results in an additional 18%
of WOs meeting their
preservation goals and makes
more efficient use of limited
host resources sooner
• Most aggressive results in
almost the same percentage
of WOs meeting their goals,
but places a strain on the host
resources
 Charles L. Cartledge and Michael L. Nelson,
When Should I Make Preservation Copies of
Myself?, arXiv preprint arXiv:1202.4185
(2012).
Make new family members on
new hosts
• Spreading copies
across hosts
increases the
WO’s
preservation
likelihood
• Learn about new
hosts from
friends
66 Dissertation, Appendix A
Reynolds’
flock
centering
Move with friends
to new hosts
Crowd sourcing of family
member creation
• “Everyone is a curator …”
– Crowd sourced activity
– Unscheduled
– Willing to wait a long time
• Enlist humans in creation
and maintenance –
opposite of benign neglect
67
 Frank McCown, Michael L. Nelson, and Herbert Van de Sompel,
Everyone is a Curator: Human-Assisted Preservation for ORE
Aggregations, Proceedings of the DigCCurr 2009 (2009).
USW simulation vs.
implementation
68
USW Theory HTTP/HTML
reality
Communications Instantaneous Asynchronous
Edges Bidirectional Directional
Temporal effects None Inconsistences
Some WO reference
implementation details
69
 Sawood Alam, HTTP Mailbox - Asynchronous RESTful Communication, Master's thesis, Old
Dominion University, Norfolk, VA, 2013.
 Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and
Simeon Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open
Archives Initiative, 2004.
 Sawood Alam, Charles L. Cartledge, and Michael L. Nelson, Support for Various HTTP Methods
on the Web, Tech. Report arXiv:1405.2330 (2014).
WO memory: simulated via “edit”
service
Direct WO to WO communication:
simulated via the HTTP Mailbox
Demonstration of the reference
implementation
1. Selection of a web page to be preserved
2. Creation of a WO from the web page
3. Adding the WO to an existing USW graph
a. Pages copied from flickr.com, arXiv.org, radiolab.org, and
gutenburg.org
b. All pages instrumented to become USW WOs
4. Creating preservation copies
5. Detecting that a copy was lost
6. Creating a replacement copy
70
USW contributions
7171
Expanded graph theory by
creating an algorithm that
creates small-world graphs
based on locally collected
data (chapter 6)
Developed a new way to
quantify damage in
connected and disconnected
graphs (section 5.2)
Developed techniques to
optimize when and where to
create preservation copies
(section 5.5)
Developed techniques
to achieve emergent
behavior in WOs
(section 6.2)
Backup slides
72
Preserve Me Viz! with new
connections
• New friend
connections
• New copy
locations
73
Preserve Me “Basic” on a copy
• Differences between
active and passive
maintainers.
• Active maintainer is
responsible for making
copies.
• Passive maintainer
sends alerts to the
active maintainer
• Passive maintainer
may assume active
maintainer role if
active is not available.
74
A USW instrumented splash page
75
…
<link rel="resourcemap" type="application/atom+xml;type=entry"
href="http://arxiv.cs.odu.edu/rems/arxiv-0704-3647v1.xml" />
<link rel="aggregation" href="http://arxiv.cs.odu.edu/rems/arxiv-0704-
3647v1.xml#aggregation" />
<script src="http://www.cs.odu.edu/~salam/wsdl/uswdo/work/preserveme.js"></script>
…
USW algorithm popup
76
• Written in
JavaScript
• Relies on domain
services
– Copy -> creates
copy of a WO
– Edit -> update
own REM
• Uses
communications
mechanism based
on Sawood Alam’s
master’s thesis
USW Preservation: copies (1 of 2)
• WO copies are not bit by
bit identical to the original
WO
• REsource Map (REM)
points to a resource
– Point to the “essence” of
the original
– Point to local copies of
the resources
– Can be used to recreate
the “essence” of the
original
• Resource has two
attributes:
– Size
– Update frequency
77
USW Preservation: copies (2 of 2)
78
Watts Stogratz small-world
growth
79
Graph theory: random graph
80
Graph theory: scale free graph
81
Graph theory: lattice graph
82
Graph theory: Watts and Strogatz small-
world graph
83
Comparing graphs
• Small-world graphs occur in natural
and man made systems
• Small-world graphs are robust
• How to algorithmically and
incrementally create small-world
graphs?
84
Symbol Meaning
K Degree
<k> Average degree
N Order of the graph
Quantifying damage
• All graph components
are not equally
valuable
• How to identify most
valuable
• Greedy repair is the
obverse of identifying
the most damaging
component by
identifying where to
place the most
beneficial component
85
 Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph
by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS
Dept.
Global normalized A{DV}H damage
(40 - 750 nodes)
• Arithmetic
series of
possible
solutions
• Early attacks
are most
effective, later
attacks are
incrementally
effective
86
Long term growth analysis of
USW graph
Based on the idea of a
game
– Create the graph
– Attack the graph using
AVH profile to remove
10% of the WOs
– Repair the graph,
every surviving WO
gets 2 opportunities
(may be unsuccessful
in repair attempts)
– Repeat until steady
state
87
Possible paths in attack/repair
game
88
 Dissertation
section 6.9
USW copies: famine to feast
89
Final states for copying policies
and named conditions
90
 Dissertation,
Appendix H
Host capacity and WO desires
91
Famine FeastStraddle
B.Low
B.High
 Dissertation,
Appendix H
Man-made small-world graph: Western
Electricity Coordinating Council
92
Western Electricity Coordinating Council
Actual Random graph
Nodes Edges C(G) L(G) C(G) L(G)
4941 6594 0.0801 18.99 0.00054 8.7
 Ake J Holmgren, Using Graph Models to Analyze the Vulnerability of
Electric Power Networks, Risk Analysis 26 (2006), no. 4, 955 - 969.
Naturally occurring small-world graph: C.
elegans nematode
93
Caenorhabditis elegans
Actual Random graph
Nodes Edges C(G) L(G) C(G) L(G)
248 511 0.21 2.87 0.05 2.62
 Lav R Varshney, Beth L Chen, Eric Paniagua, David H Hall, and Dmitri B
Chklovskii, Structural Properties of the Caenorhabditis elegans Neuronal
Network, PLoS computational biology 7 (2011), no. 2, e1001066.
Organic small-world graph: Enron e-mail
94
Enron e-mail
Actual Random graph
Nodes Edges C(G) L(G) C(G) L(G)
148 ~500,000 0.44 2.25 0.11 2.0
 Shinako Matsuyama and Takao Terano, Analyzing the ENRON
Communication Network Using Agent-Based Simulation, Journal of
Networks 3 (2008), no. 7.
USW WO determines number of
friends
• Number of
connections
95 Dissertation, section 6.7
Symbol Meaning
ln, log2 Natural and base 2
logarithms
n Order of the
discovered USW
graph
g Simple scalar
Wandering activities
96
 Dissertation, Appendix B
Active maintenance activities
97
 Dissertation, Appendix B
Passive maintenance activities
98
 Dissertation, Appendix B
Video URLs
• USW video
– http://youtu.be/JnCMenp73YQ
• Least Aggressive
– http://youtu.be/sHJGYphqtK4
• Moderately Aggressive
– https://www.youtube.com/watch?v=pVI-VhPh7KQ
• Most Aggressive
– https://www.youtube.com/watch?v=eIXz8Njh-QM
• “Death Star” message histogram
– https://www.youtube.com/watch?v=X3EShyjFoc4
• “Traditional” message histogram
– https://www.youtube.com/watch?v=9CcCup3Td-Q
99
Useful URIs
• Flickr
– https://www.flickr.com/
• Flickr on cs.odu.edu
– http://flickr.cs.odu.edu/
• Adding image
– http://ws-dl-02.cs.odu.edu:10102/rem/generate/0.8/0.95/
• Preseve Me! Viz
– http://www.cs.odu.edu/~salam/preserveme/viz.html
• Delete REM
– http://ws-dl-02.cs.odu.edu:10102/rem/remove/http://flickr.cs.odu.edu/rems/flickr-24791103-N07-
6661587389.xml
• Sawood on flickr
– https://www.flickr.com/photos/122128913@N05
• Chuck on flickr
– https://www.flickr.com/photos/24791103@N07/
• Court de Tomas De Torquemada on flickr
– https://www.flickr.com/photos/24791103@N07/12867674403/
100

Weitere ähnliche Inhalte

Ähnlich wie PhD defense

Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Daniel Katz
 
What is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiWhat is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiProfessor Lili Saghafi
 
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Nees Jan van Eck
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExAustin Benson
 
International journal of engineering issues vol 2015 - no 1 - paper3
International journal of engineering issues   vol 2015 - no 1 - paper3International journal of engineering issues   vol 2015 - no 1 - paper3
International journal of engineering issues vol 2015 - no 1 - paper3sophiabelthome
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
Antoniou: complex systems and web
Antoniou: complex systems and webAntoniou: complex systems and web
Antoniou: complex systems and webOKFN-GR
 
A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksNees Jan van Eck
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Ana Appel
 
AI for Science Grand Challenges
AI for Science Grand ChallengesAI for Science Grand Challenges
AI for Science Grand ChallengesPFHub PFHub
 
How and why study big cultural data v2
How and why study big cultural data v2How and why study big cultural data v2
How and why study big cultural data v2Lev Manovich
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummiesxamdam
 

Ähnlich wie PhD defense (20)

Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
 
What is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiWhat is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili Saghafi
 
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
04 Data Visualization (2017)
04 Data Visualization (2017)04 Data Visualization (2017)
04 Data Visualization (2017)
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphEx
 
International journal of engineering issues vol 2015 - no 1 - paper3
International journal of engineering issues   vol 2015 - no 1 - paper3International journal of engineering issues   vol 2015 - no 1 - paper3
International journal of engineering issues vol 2015 - no 1 - paper3
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
Antoniou: complex systems and web
Antoniou: complex systems and webAntoniou: complex systems and web
Antoniou: complex systems and web
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networks
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.
 
AI for Science Grand Challenges
AI for Science Grand ChallengesAI for Science Grand Challenges
AI for Science Grand Challenges
 
Open science 2014
Open science 2014Open science 2014
Open science 2014
 
How and why study big cultural data v2
How and why study big cultural data v2How and why study big cultural data v2
How and why study big cultural data v2
 
intro to sna.ppt
intro to sna.pptintro to sna.ppt
intro to sna.ppt
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummies
 

Kürzlich hochgeladen

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Kürzlich hochgeladen (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

PhD defense

  • 1. A Framework for Web Object Self-Preservation A Ph.D. Defense Chuck Cartledge 30 May 2014
  • 2. A warning from Jeff Rothenberg “Digital Information Lasts Forever—Or Five Years, Whichever Comes First.” 2  Jeff Rothenberg, Ensuring the Longevity of Digital Information, Scientific American 272 (1995), 42 - 47.
  • 3. A warning from William Arms “Tomorrow we could see the National Library of Medicine abolished by Congress, Elsevier dismantled by a corporate raider, the Royal Society declared bankrupt, or the University of Michigan Press destroyed by a meteor. All are highly unlikely, but over a long period of time unlikely events will happen." 3  William Y. Arms, Preservation of Scientific Serials: Three Current Examples, Journal of Electronic Publishing 5 (1999), no. 2.
  • 4. Overview • Warnings • Preservation context • Research questions • Background and related work • Unsupervised Small-World – Emergent behavior – Graph theory – Preservation • Demonstration • Questions and answers 4
  • 5. Preservation in an analog age • Benign neglect – Don’t touch – Keep away from sunshine – Keep away from moisture – Keep away from insects • Last for hundreds of years 5  Josie McClure picture taken Feb 30, 1907 at Poteau, I.T. Fifteen years of age When this was taken weighed 140 lbs.
  • 6. Preservation in a digital age 6 • Constant use – Use often – Exposure to lots of things – Make lots of copies – Monitor the integrity • Last for ??? unknown years • This is a Brave New World  Google image search, 31 March 2014, about 91,700,000 results (0.84 seconds)
  • 7. Everything has a lifespan • Exponential growth of digital artifacts • Representing increasing portion of personal and cultural heritage • Short human lifetime to manage data • Potentially, short institutional life time • Need to preserve artifacts beyond human lifespan and institutions that create and house artifacts 7 Dissertation, section 1.3
  • 8. Research questions • Can web objects (WOs) be constructed to outlive the people and institutions that created them? • Can we leverage aspects of naturally occurring networks and group behavior for preservation? 8 A WO is a digital object that lives on the Web. A WO is a fundamental element in this dissertation.
  • 9. Unsupervised Small-World (USW) is at the nexus of multiple disciplines 9 Mathematical structures used to model pairwise relations between objects Ensuring that digital information of continuing value remains accessible and usable Movement of the inanimate
  • 11. Emergent behavior: model • Craig Reynolds – basis of herd and flock behavior in computer animations – 3 rules • Collision avoidance • Velocity matching • Flock centering – No central control, everything based on local knowledge only • Simple rules – Complex behavior – Emergent behavior 11  Craig W. Reynolds, Computer Animation with Scripts and Actors, ACM SIGGRAPH, vol. 16, ACM, 1982, pp. 289 - 296.  Images http://www.red3d.com/cwr/boids/ Flock centering Velocity matching Collision avoidance
  • 12. Emergent behavior: communication • Need to know what my neighbors are doing • Need to tell neighbors what I am doing • A school of fish do not have a Dagon that controls them 12 Dissertation, section 5.3
  • 14. Preservation: primitives 14 William Y. Arms, Digital Libraries, The MIT Press, December 1999 png png png png Replication Emulation png tiff eps bmp Migration
  • 15. Preservation: OAIS model • Provides standard model and terminology for archival systems • Terms of interest – Submission Information Package – Ingest – Data Management – Archival Storage – Access – Dissemination Information Package 15  Council of the Consultative Committee for Space Data Systems (CCSDS), Reference Model for an Open Archival Information System (OAIS), Tech. report, Consultive Committee for Space Data Systems 650.0- M-2, Magenta Book, 2012.  Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and Simeon Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open Archives Initiative, 2004.
  • 17. Graph theory: definitions • Graph: G = (V,E) • Graph can be connected or disconnected • Some graph metrics work only with connected graphs and not with disconnected graphs – Clustering coefficient (C(G)) – Average path length (L(G)) – Degree distribution 17  Reka Albert and Albert-Laszlo Barabasi, Statistical Mechanics of Complex Networks, Reviews of Modern Physics 74 (2002), no. 1, 47.
  • 18. Graph theory: Watts and Strogatz small-world 18  Duncan J. Watts and Steven H. Strogatz, Collective dynamics of `small world' networks, Nature 393 (1998), 440 - 442.  Stanley Milgram, The Small-World Problem, Psychology Today 2 (1967), no. 1, 60 - 67.
  • 19. Small-world graphs are common 19 Actual Random graph Nodes Edges C(G) L(G) C(G) L(G) WECC 4941 6594 0.0801 18.99 0.00054 8.7 C. elegans 248 511 0.21 2.87 0.05 2.62 Email 148 ~500,000 0.44 2.25 0.11 2.0  Ake J Holmgren, Using Graph Models to Analyze the Vulnerability of Electric Power Networks, Risk Analysis 26 (2006), no. 4, 955 - 969.  Lav R Varshney, Beth L Chen, Eric Paniagua, David H Hall, and Dmitri B Chklovskii, Structural Properties of the Caenorhabditis elegans Neuronal Network, PLoS computational biology 7 (2011), no. 2, e1001066.  Shinako Matsuyama and Takao Terano, Analyzing the ENRON Communication Network Using Agent-Based Simulation, Journal of Networks 3 (2008), no. 7. West Elect. Coord. Council Enron e-mail
  • 20. Small-world: high C(G) and low L(G) 20 • The ubiquitous presence of small-world graphs points to something inherently “correct” and desirable about them. Symbol Meaning k Degree n Order of the graph  Dissertation, section 5.2.3
  • 21. USW is at the nexus of multiple disciplines 21 Creation of small-world graphs that are robust and resilient Meet fundamental requirements of replication, migration, and data management WO’s use of emergent behavior to create, monitor, and optimize the USW system
  • 22. Euclidean geometry • To draw a straight line from any point to any point. • To produce [extend] a finite straight line continuously in a straight line. • To describe a circle with any center and distance [radius]. • That all right angles are equal to one another. • That, if a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles. 22 The sum of angles A, B, and C is equal to 180 degrees  Euclid of Alexandria, The Elements, Alexandria, 300 BCE.
  • 23. Non-Euclidean geometries 23 • Non-Euclidean geometry – To draw a straight line from any point to any point. – To produce [extend] a finite straight line continuously in a straight line. – To describe a circle with any center and distance [radius]. – That all right angles are equal to one another. – That, if a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles. • Spherical geometry – Two lines at right angles to the same line can meet – Triangles can have 180 to 540 degrees – Circles are straight lines
  • 24. Digital library world • Digital libraries – The technical framework exists within a legal and social framework – Understanding of digital library concepts is hampered by terminology – The underlying architecture should be separate from the content stored in the library – Names and identifiers are the basic building block for the digital library – Digital library objects are more than collections of bits – The digital library object that is used is different from the stored object – Repositories must look after the information they hold – Users want intellectual works, not digital objects • Basic digital library tenets 24  Robert Kahn and Robert Wilensky, A Framework for Distributed Digital Object Services, International Journal on Digital Libraries 6 (2006), no. 2, 115 - 123.  William Y. Arms, Key Concepts in the Architecture of the Digital Library, D-Lib Magazine 1 (1995), no. 1.
  • 25. Digital library worlds of possibilities 25 • Digital libraries – The technical framework exists within a legal and social framework – Understanding of digital library concepts is hampered by terminology – The underlying architecture should be separate from the content stored in the library – Names and identifiers are the basic building block for the digital library – Digital library objects are more than collections of bits – The digital library object that is used is different from the stored object – Repositories must look after the information they hold – Users want intellectual works, not digital objects What if there were no repositories?
  • 26. “No Repositories” → USW • No global knowledge – No omnipotent enforcer – No omnipresent monitor • Opportunistic preservation • Self-describing Web Objects 26
  • 28. USW WO “friendship” links • WOs have “friendship” links to other WOs • Different than HTML navigational links 28 Dissertation, Chapter 6
  • 29. USW WO “families” A family is a set of copies of the same WO 29 Dissertation, Chapter 6
  • 30. USW hosts Family members live on different hosts 30 Host #1 Host #2 Host #3  Dissertation, Chapter 6
  • 32. USW interpretation of flocking 32 Flock centering Velocity matching Collision avoidance Craig Reynolds’ “boids” USW interpretation Each WO has a unique URI Matching number of copies/family members Move with friends to new hosts  Dissertation, Chapter 2
  • 33. Building a USW graph 33 • Graph exploration (b) • Choosing connections • Detecting loss  Dissertation, Chapter 5
  • 34. WOs wandering in the USW graph • Wandering WO is “introduced” to an existing WO • If a connection is not made, then an attempt is made to another existing WO • Process is repeated until a connection is made • No global knowledge – No omnipotent enforcer – No omnipresent monitor • No repositories 34  Dissertation, Chapter 5
  • 35. USW friend selection process • Selection from possible sets – WOset: WOs connected to candidate WO – visitedSet: WOs that the wandering WO has explored – toBeVisitedSet: WOs that the wandering WO has discovered • Selection approaches – Random from visitedSetUtoBeVisitedSet – FIFO from visitedSetUtoBeVisitedSet – LIFO from visitedSetUtoBeVisitedSet – Preferentially attach to WOset then random for remaining 35 Dissertation section 6.7.5
  • 36. Comparing USW to random graphs 36  Dissertation, section 6.4
  • 38. Robustness of USW graphs • Definition: able to continue when damaged • Attack vs. failure – Intentional vs. random • Component selection – Vertex – Edge • Selection attribute – Degree – Betweeness • Attribute value – High – Low • Attack profile notation: A{D|E|V}{H|L} 38 Sample graph  Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS Dept.
  • 39. Different attack profiles selections 39 AEH AEL AVH ADLAVL ADH  Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS Dept.
  • 40. Four attacks using AEL profile 40 Deletion #1. Deletion #2. Deletion #3. Deletion #4.
  • 41. Our damage vs. Albert, Jeong, and Barabasi’s damage 41 s = Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi, Error and Attack Tolerance of Complex Networks, Nature 406 (2000), no. 6794, 378 - 382. 50 … 5 20 … 20 16 … 1 10 … 10
  • 42. Measuring damage • Desired characteristics – Different framgentation cases result in different values – Useful without additional graph state information 42  Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS Dept.  Dissertation Appendix D
  • 43. Local and global AE* damage 43
  • 44. Global A{DV}H damage (100 nodes) 44 • 100 node graph • Execution time: ~36 hours • Attacker has total knowledge of the graph • Attacker has unrestricted resources to damage the graph • Results: – Small-world the most connected is not the most valuable – Random and scale-free degreeness does not make a difference
  • 45. Attack profile efficacy on sample graph Attack profile Attacks efficacy AEdge High The core of the graph 1.43 AEdge Low The periphery of the graph 1.00 AVertex High The core of the graph 1.42 AVertex Low The periphery of the graph 1.00 ADegree High The core of the graph 1.40 ADegree Low The periphery of the graph 1.00 45 • If the attacker's goal is to disconnect the graph by repeated use of the same attack profile, then the most effective profiles in order are: AEH , AVH , and ADH. • HTTP/HTML does not support AE* attack profiles  Dissertation, section 5.6.6
  • 46. Detecting loss of family members • Each “active maintainer” WO checks its family’s status – Check family member accessibility – Check friend accessibility • If family member is lost, use friends to select candidate host • If too few candidate hosts, use friends to explore and discover new hosts 46  Dissertation, section 6.8
  • 48. When to make family members? • What is a copy? • Who makes the copies? • How many to make? Answer: defined by originating domain – 0 to start – Soft lower limit (csoft) – Hard upper limit (chard) • Where to make them? – Distributed across known hosts – Too many or too few hosts • When to make them? 48  Norman Paskin, On Making and Identifying a Copy, D-Lib Magazine 9 (2003), no. 1.  Henry M. Gladney and John L. Bennett, What Do We Mean by Authentic? What's the Real McCoy?, D-Lib Magazine 9 (2003), no. 7/8.
  • 49. USW preservation definitions • Hierarchy of family WOs – Progenitor – initial WO – Copies – more recent WO copies – Each WO is timestamped with creation time • WO roles – Active maintainer – eldest WO charged with making copies and related housekeeping – Passive maintainer – all other WOs • Order of precedence – If progenitor is accessible then it is the active maintainer – If declared active maintainer is accessible then it is the active maintainer – Otherwise, WO declares itself active maintainer • If family is disconnected then multiple active maintainers are possible until reconnection then the eldest WO declares itself active maintainer 49 Progenitor Copies  Dissertation, Appendix A
  • 50. Active and passive maintenance activities 50 Active Passive Active Active PassivePassive • Active maintainer (the WO with earliest timestamp) – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs XProgenitor Is lost Progenitor returns Progenitor declares act. as copy. Time
  • 51. Progenitor is lost 51 Active Passive Active Active PassivePassive • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs XProgenitor Is lost Progenitor returns Progenitor declares act. as copy. Time
  • 52. A new active maintainer 52 Active Passive Active Active PassivePassive • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs XProgenitor Is lost Progenitor returns Progenitor declares act. as copy. Time
  • 53. Progenitor returns and assumes active maintainer role 53 Active Passive Active Active PassivePassive • Active maintainer – currently charged with making copies and related housekeeping • Passive maintainer – all other WOs XProgenitor Is lost Progenitor returns Progenitor declares act. as copy. Time
  • 54. Progenitor has made copies 54 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 55. A copy is disconnected from the family 55 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 56. Two active maintainers make copies 56 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 57. Disconnected copy is reconnected to the progenitor 57 Time Copy declares active Copies created Replacement created Excess copies Excess deleted
  • 58. Family has too many copies 58 Time Copy declares active Copies created Replacement created Excess copies Excess deleted • Copy management policies – Active: explicit removal – Passive: “natural attrition” • Equivalent of Reynolds’ velocity matching, making and monitoring copies
  • 59. USW copying policies • Least aggressive – one at a time to chard • Moderately aggressive – as quickly as possible to csoft and then one at a time chard • Most aggressive – as quickly as possible to chard • Different results 59 WOs preservation status Hosts utilization status None < Csoft Csoft <= N < Chard N == Chard 0% < 25% < 75% < 50 % > 75%  Dissertation, section 6.7.4
  • 63. Least aggressive (t = 100) 63  A full YouTube video is available at: http://youtu.be/sHJGYphqtK4
  • 64. Least aggressive (final) 64 • Results – System stabilized – Host capacity limited – Some WOs without any copies – Some hosts unused • “Least aggressive” is not an effective policy
  • 65. Which policy to choose? 65 • Moderately aggressive results in an additional 18% of WOs meeting their preservation goals and makes more efficient use of limited host resources sooner • Most aggressive results in almost the same percentage of WOs meeting their goals, but places a strain on the host resources  Charles L. Cartledge and Michael L. Nelson, When Should I Make Preservation Copies of Myself?, arXiv preprint arXiv:1202.4185 (2012).
  • 66. Make new family members on new hosts • Spreading copies across hosts increases the WO’s preservation likelihood • Learn about new hosts from friends 66 Dissertation, Appendix A Reynolds’ flock centering Move with friends to new hosts
  • 67. Crowd sourcing of family member creation • “Everyone is a curator …” – Crowd sourced activity – Unscheduled – Willing to wait a long time • Enlist humans in creation and maintenance – opposite of benign neglect 67  Frank McCown, Michael L. Nelson, and Herbert Van de Sompel, Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations, Proceedings of the DigCCurr 2009 (2009).
  • 68. USW simulation vs. implementation 68 USW Theory HTTP/HTML reality Communications Instantaneous Asynchronous Edges Bidirectional Directional Temporal effects None Inconsistences
  • 69. Some WO reference implementation details 69  Sawood Alam, HTTP Mailbox - Asynchronous RESTful Communication, Master's thesis, Old Dominion University, Norfolk, VA, 2013.  Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, and Simeon Warner, ORE User Guide - Resource Map Implementation in Atom, Tech. report, Open Archives Initiative, 2004.  Sawood Alam, Charles L. Cartledge, and Michael L. Nelson, Support for Various HTTP Methods on the Web, Tech. Report arXiv:1405.2330 (2014). WO memory: simulated via “edit” service Direct WO to WO communication: simulated via the HTTP Mailbox
  • 70. Demonstration of the reference implementation 1. Selection of a web page to be preserved 2. Creation of a WO from the web page 3. Adding the WO to an existing USW graph a. Pages copied from flickr.com, arXiv.org, radiolab.org, and gutenburg.org b. All pages instrumented to become USW WOs 4. Creating preservation copies 5. Detecting that a copy was lost 6. Creating a replacement copy 70
  • 71. USW contributions 7171 Expanded graph theory by creating an algorithm that creates small-world graphs based on locally collected data (chapter 6) Developed a new way to quantify damage in connected and disconnected graphs (section 5.2) Developed techniques to optimize when and where to create preservation copies (section 5.5) Developed techniques to achieve emergent behavior in WOs (section 6.2)
  • 73. Preserve Me Viz! with new connections • New friend connections • New copy locations 73
  • 74. Preserve Me “Basic” on a copy • Differences between active and passive maintainers. • Active maintainer is responsible for making copies. • Passive maintainer sends alerts to the active maintainer • Passive maintainer may assume active maintainer role if active is not available. 74
  • 75. A USW instrumented splash page 75 … <link rel="resourcemap" type="application/atom+xml;type=entry" href="http://arxiv.cs.odu.edu/rems/arxiv-0704-3647v1.xml" /> <link rel="aggregation" href="http://arxiv.cs.odu.edu/rems/arxiv-0704- 3647v1.xml#aggregation" /> <script src="http://www.cs.odu.edu/~salam/wsdl/uswdo/work/preserveme.js"></script> …
  • 76. USW algorithm popup 76 • Written in JavaScript • Relies on domain services – Copy -> creates copy of a WO – Edit -> update own REM • Uses communications mechanism based on Sawood Alam’s master’s thesis
  • 77. USW Preservation: copies (1 of 2) • WO copies are not bit by bit identical to the original WO • REsource Map (REM) points to a resource – Point to the “essence” of the original – Point to local copies of the resources – Can be used to recreate the “essence” of the original • Resource has two attributes: – Size – Update frequency 77
  • 81. Graph theory: scale free graph 81
  • 83. Graph theory: Watts and Strogatz small- world graph 83
  • 84. Comparing graphs • Small-world graphs occur in natural and man made systems • Small-world graphs are robust • How to algorithmically and incrementally create small-world graphs? 84 Symbol Meaning K Degree <k> Average degree N Order of the graph
  • 85. Quantifying damage • All graph components are not equally valuable • How to identify most valuable • Greedy repair is the obverse of identifying the most damaging component by identifying where to place the most beneficial component 85  Charles L. Cartledge and Michael L. Nelson, Connectivity Damage to a Graph by the Removal of an Edge or Vertex, Tech. report, arXiv 1103.3075, ODU CS Dept.
  • 86. Global normalized A{DV}H damage (40 - 750 nodes) • Arithmetic series of possible solutions • Early attacks are most effective, later attacks are incrementally effective 86
  • 87. Long term growth analysis of USW graph Based on the idea of a game – Create the graph – Attack the graph using AVH profile to remove 10% of the WOs – Repair the graph, every surviving WO gets 2 opportunities (may be unsuccessful in repair attempts) – Repeat until steady state 87
  • 88. Possible paths in attack/repair game 88  Dissertation section 6.9
  • 89. USW copies: famine to feast 89
  • 90. Final states for copying policies and named conditions 90  Dissertation, Appendix H
  • 91. Host capacity and WO desires 91 Famine FeastStraddle B.Low B.High  Dissertation, Appendix H
  • 92. Man-made small-world graph: Western Electricity Coordinating Council 92 Western Electricity Coordinating Council Actual Random graph Nodes Edges C(G) L(G) C(G) L(G) 4941 6594 0.0801 18.99 0.00054 8.7  Ake J Holmgren, Using Graph Models to Analyze the Vulnerability of Electric Power Networks, Risk Analysis 26 (2006), no. 4, 955 - 969.
  • 93. Naturally occurring small-world graph: C. elegans nematode 93 Caenorhabditis elegans Actual Random graph Nodes Edges C(G) L(G) C(G) L(G) 248 511 0.21 2.87 0.05 2.62  Lav R Varshney, Beth L Chen, Eric Paniagua, David H Hall, and Dmitri B Chklovskii, Structural Properties of the Caenorhabditis elegans Neuronal Network, PLoS computational biology 7 (2011), no. 2, e1001066.
  • 94. Organic small-world graph: Enron e-mail 94 Enron e-mail Actual Random graph Nodes Edges C(G) L(G) C(G) L(G) 148 ~500,000 0.44 2.25 0.11 2.0  Shinako Matsuyama and Takao Terano, Analyzing the ENRON Communication Network Using Agent-Based Simulation, Journal of Networks 3 (2008), no. 7.
  • 95. USW WO determines number of friends • Number of connections 95 Dissertation, section 6.7 Symbol Meaning ln, log2 Natural and base 2 logarithms n Order of the discovered USW graph g Simple scalar
  • 97. Active maintenance activities 97  Dissertation, Appendix B
  • 98. Passive maintenance activities 98  Dissertation, Appendix B
  • 99. Video URLs • USW video – http://youtu.be/JnCMenp73YQ • Least Aggressive – http://youtu.be/sHJGYphqtK4 • Moderately Aggressive – https://www.youtube.com/watch?v=pVI-VhPh7KQ • Most Aggressive – https://www.youtube.com/watch?v=eIXz8Njh-QM • “Death Star” message histogram – https://www.youtube.com/watch?v=X3EShyjFoc4 • “Traditional” message histogram – https://www.youtube.com/watch?v=9CcCup3Td-Q 99
  • 100. Useful URIs • Flickr – https://www.flickr.com/ • Flickr on cs.odu.edu – http://flickr.cs.odu.edu/ • Adding image – http://ws-dl-02.cs.odu.edu:10102/rem/generate/0.8/0.95/ • Preseve Me! Viz – http://www.cs.odu.edu/~salam/preserveme/viz.html • Delete REM – http://ws-dl-02.cs.odu.edu:10102/rem/remove/http://flickr.cs.odu.edu/rems/flickr-24791103-N07- 6661587389.xml • Sawood on flickr – https://www.flickr.com/photos/122128913@N05 • Chuck on flickr – https://www.flickr.com/photos/24791103@N07/ • Court de Tomas De Torquemada on flickr – https://www.flickr.com/photos/24791103@N07/12867674403/ 100