Strategies for Landing an Oracle DBA Job as a Fresher
Preserving a Web of Linked Data: Lessons and challenges from a fading web
1. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 1/60
Preserving a
Web of Linked Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec
2. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 2/60
There are many sides
to preservation.
3. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 3/60
Web of
Linked Data?
4. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 4/60
5. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 5/60
6. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 6/60
7. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 7/60
8. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 8/60
“
We are loosing thousands of Alexandria
libraries each day
We have lost so much of the early Web history, just
as we have lost so much of early Human history.
—Kalev H. Leetaru - University of Illinois
9. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 9/60
The forces of decay
Link Rot
Content Drift
Digital Preservation Business Case Toolkit http://wiki.dpconline.org/
10. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 10/60
Link Rot
Illustration by the Project Twins
11. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 11/60
12. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 12/60
Content Drift
Significant change in content
within a 3-Month Period
13. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 13/60
14. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 14/60
15. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 15/60
16. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 16/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
17. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 17/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
18. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 18/60
Strategies
Observational: perceived as discrete
Snapshot
Web archive
Historical: perceived as continuous
Versioning systems
Transactional
Notification-based
19. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 19/60
Snapshot
20. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 20/60
Web archive
See: Open Wayback
21. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 21/60
Versioning systems
See: MediaWiki
22. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 22/60
Transactional
See: SiteStory apache plugin
23. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 23/60
If a representation
changes and nobody is
around to see it,
should it be archived?
24. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 24/60
Notification-based
25. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 25/60
Memento: travelling to the Web of the
Past
https://tools.ietf.org/html/rfc7089
26. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 26/60
27. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 27/60
28. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 28/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
29. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 29/60
Archive or
Archiving?
30. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 30/60
Linked Data archiving as the product
RDF indexes for versioning
Dydra, Virtuoso, XRDF3X, ...
Representations of versions, provenance & time:
PROV, LDPatch, LODE, ...
Technical
(Increasingly) Popular research tracks.
31. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 31/60
Linked Data archiving as the process
Some technological building blocks
Linked Data interfaces, change detection, publishing,
crawling & querying
Technical, as well as Infrastructural & Societal.
Rather unknown territory (but there are technologies).
32. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 32/60
What assumptions are there about data
evolution?
Historical Data
Provenance is a timeline.
Only truth can exist at the same time.
Timeseries databases, Wikipedia
Versioned Data
Provenance is a directed acyclic graph.
Multiple truths can exist at the same time.
33. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 33/60
Decay becomes more complex
Link Rot
Content Drift
Concept Drift
"Please don't change your vocabulary"
(Check out DRIFT-A-LOD workshop)
Problem in other domains as well (Machine Learning)
34. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 34/60
Study these issues within Linked Data
Link Rot
Subject or Object cannot be dereferenced
Dataset/Interface is gone
Content Drift
Context graph of Subject or Object has changed
Concept Drift
Predicate or Object change meaning
35. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 35/60
Archiving for the
Reproducibility of Query results
Sustain the validity of claims
Backwards compatibility of applications
Federated querying is highly affected
How to shape a decentralized Quality of Service?
The Hyperlink is the simplest form of decentralization,
which we are already failing to preserve.
36. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 36/60
Persistent Identification
Figure by Herbert Van de Sompel
37. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 37/60
Persistent Identification
Dependency on publisher registering the PIDs
Possible loss of connection between PIDs and the
original
Dependency on the PID provider
Possibly replacing one potential Link rot problem by
another
38. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 38/60
Who are you to tell me my URI is not
persistent?
ISWC Resources track:
Consensus on and trust in persistence in a decentralized
Web:
community-driven? standardization? blockchain,...?
39. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 39/60
Robust links
<a href="B"
data-versionurl="URL of snapshot of B"
data-versiondate="datetime of snapshot of B">
http://robustlinks.mementoweb.org/spec/
40. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 40/60
Robust Links
Open Annotation
& Memento vocab
Can be linked
to PROV
Figure by Herbert Van de Sompel
41. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 41/60
Real-time data
Parallel truths
Open challenges with Memento
HTTP Datetime format is per second
No solution for accessing Versioned
Data
42. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 42/60
Who will be responsible for archiving?
Publisher
Snapshot
Versioning systems
3rd party
Traditional
Hybrid: Publisher and/or 3rd party
Transactional
Notification-based
43. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 43/60
Snapshot
Often "End of Term" archive (DBPedia version)
Exchangeable archives, eg. file-based HDT
44. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 44/60
45. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 45/60
46. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 46/60
Web
RDF
Versioning systems
Memento support can improve
depends on query expressivity
Significant progress in the RDF domain
MediaWiki
Storage: Dydra, Virtuoso, ...
Memento-supported publishing: DBpedia
Wayback machine, Linked Data Fragments
Server
47. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 47/60
Linked Data pages
Triple Patterns
Hybrid: Snapshot + Versioning
Discrete snapshots + index for continuous versions
Tailr, ...
Ostrich (offset-enabled), ...
48. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 48/60
Web archive
Not much in place yet
Indexes, but no notion of time
Sindice, LODCache, LODLaundromat
Many technologies
targeted crawling, sindice LODLaundromat, Linked Data
Crawling, ...
No guarantees on completeness
49. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 49/60
Transactional
Decentralized, sustainable solution
A challenge for completeness
Dependence on resource granularity
eg. SPARQL results or Linked Data pages?
Interested to see how far we would get...
50. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 50/60
Notification-based
51. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 51/60
Yesterday: Web archiving strategies
Today: tools for a Web of Linked Data
Tomorrow: things to keep in mind
Preserving a Web of Linked Data
1
2
3
52. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 52/60
Data archiving intrests more than curators
& activists
For instance, Data driven journalism.
Product: transparency of the editorial process
Process: interaction with users, public
Scolary communication, cultural heritage, legal
publications, community databases (Wikipedia &
Wikidata)
53. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 53/60
Archivability of Linked Data
54. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 54/60
Linked Data is in essence easier to archive.
Raw, self-contained data
Already machine processable/understandable
No obfuscation by client-side scripting
55. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 55/60
“
Accessibility of content to stimulate
archiving.
The content in HTML+RDFa that dokieli produces is
accessible (readable) without requiring any CSS or
JavaScript, ie. text-browser safe. Breaking this
"rule" in future development should be considered
an anti-pattern (or a bug) in dokieli.
—dokieli documentation, Sarven Capadisli
56. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 56/60
Intelligent Server
Intelligent Client
Choices in Linked Data interface
increase or decrease archiving.
High resource granularity
Data not as accessible
Need to participate in archiving process
data
dump
Triple Pattern
Fragments
SPARQL
endpoint
interface offered by the server
57. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 57/60
Prevent mistakes from the past in
standardization
Query interfaces: what can be archived?
Protocols: is it accessible?
Domain Modeling: can the semantics be preserved?
How to select the subgraph?
58. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 58/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
59. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 59/60
There are many sides
to preservation.
We don't start from scratch,
many technologies are there.
Start covering the uncovered sides.
Add archiving to the discussion.
60. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 60/60
Preserving a Web of Linked
Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec