SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 1/60
Preserving a
Web of Linked Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 2/60
There are many sides
to preservation.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 3/60
Web of
Linked Data?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 4/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 5/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 6/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 7/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 8/60
“
We are loosing thousands of Alexandria
libraries each day
We have lost so much of the early Web history, just
as we have lost so much of early Human history.
—Kalev H. Leetaru - University of Illinois
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 9/60
The forces of decay
Link Rot
Content Drift
Digital Preservation Business Case Toolkit http://wiki.dpconline.org/
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 10/60
Link Rot
Illustration by the Project Twins
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 11/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 12/60
Content Drift
Significant change in content
within a 3-Month Period
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 13/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 14/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 15/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 16/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 17/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 18/60
Strategies
Observational: perceived as discrete
Snapshot
Web archive
Historical: perceived as continuous
Versioning systems
Transactional
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 19/60
Snapshot
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 20/60
Web archive
See: Open Wayback
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 21/60
Versioning systems
See: MediaWiki
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 22/60
Transactional
See: SiteStory apache plugin
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 23/60
If a representation
changes and nobody is
around to see it,
should it be archived?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 24/60
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 25/60
Memento: travelling to the Web of the
Past
https://tools.ietf.org/html/rfc7089
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 26/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 27/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 28/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 29/60
Archive or
Archiving?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 30/60
Linked Data archiving as the product
RDF indexes for versioning
Dydra, Virtuoso, XRDF3X, ...
Representations of versions, provenance & time:
PROV, LDPatch, LODE, ...
Technical
(Increasingly) Popular research tracks.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 31/60
Linked Data archiving as the process
Some technological building blocks
Linked Data interfaces, change detection, publishing,
crawling & querying
Technical, as well as Infrastructural & Societal.
Rather unknown territory (but there are technologies).
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 32/60
What assumptions are there about data
evolution?
Historical Data
Provenance is a timeline.
Only truth can exist at the same time.
Timeseries databases, Wikipedia
Versioned Data
Provenance is a directed acyclic graph.
Multiple truths can exist at the same time.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 33/60
Decay becomes more complex
Link Rot
Content Drift
Concept Drift
"Please don't change your vocabulary"
(Check out DRIFT-A-LOD workshop)
Problem in other domains as well (Machine Learning)
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 34/60
Study these issues within Linked Data
Link Rot
Subject or Object cannot be dereferenced
Dataset/Interface is gone
Content Drift
Context graph of Subject or Object has changed
Concept Drift
Predicate or Object change meaning
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 35/60
Archiving for the
Reproducibility of Query results
Sustain the validity of claims
Backwards compatibility of applications
Federated querying is highly affected
How to shape a decentralized Quality of Service?
The Hyperlink is the simplest form of decentralization,
which we are already failing to preserve.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 36/60
Persistent Identification
Figure by Herbert Van de Sompel
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 37/60
Persistent Identification
Dependency on publisher registering the PIDs
Possible loss of connection between PIDs and the
original
Dependency on the PID provider
Possibly replacing one potential Link rot problem by
another
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 38/60
Who are you to tell me my URI is not
persistent?
ISWC Resources track:
Consensus on and trust in persistence in a decentralized
Web:
community-driven? standardization? blockchain,...?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 39/60
Robust links
<a href="B"
data-versionurl="URL of snapshot of B"
data-versiondate="datetime of snapshot of B">
http://robustlinks.mementoweb.org/spec/
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 40/60
Robust Links
Open Annotation
& Memento vocab
Can be linked
to PROV
Figure by Herbert Van de Sompel
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 41/60
Real-time data
Parallel truths
Open challenges with Memento
HTTP Datetime format is per second
No solution for accessing Versioned
Data
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 42/60
Who will be responsible for archiving?
Publisher
Snapshot
Versioning systems
3rd party
Traditional
Hybrid: Publisher and/or 3rd party
Transactional
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 43/60
Snapshot
Often "End of Term" archive (DBPedia version)
Exchangeable archives, eg. file-based HDT
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 44/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 45/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 46/60
Web
RDF
Versioning systems
Memento support can improve
depends on query expressivity
Significant progress in the RDF domain
MediaWiki
Storage: Dydra, Virtuoso, ...
Memento-supported publishing: DBpedia
Wayback machine, Linked Data Fragments
Server
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 47/60
Linked Data pages
Triple Patterns
Hybrid: Snapshot + Versioning
Discrete snapshots + index for continuous versions
Tailr, ...
Ostrich (offset-enabled), ...
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 48/60
Web archive
Not much in place yet
Indexes, but no notion of time
Sindice, LODCache, LODLaundromat
Many technologies
targeted crawling, sindice LODLaundromat, Linked Data
Crawling, ...
No guarantees on completeness
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 49/60
Transactional
Decentralized, sustainable solution
A challenge for completeness
Dependence on resource granularity
eg. SPARQL results or Linked Data pages?
Interested to see how far we would get...
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 50/60
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 51/60
Yesterday: Web archiving strategies
Today: tools for a Web of Linked Data
Tomorrow: things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 52/60
Data archiving intrests more than curators
& activists
For instance, Data driven journalism.
Product: transparency of the editorial process
Process: interaction with users, public
Scolary communication, cultural heritage, legal
publications, community databases (Wikipedia &
Wikidata)
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 53/60
Archivability of Linked Data
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 54/60
Linked Data is in essence easier to archive.
Raw, self-contained data
Already machine processable/understandable
No obfuscation by client-side scripting
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 55/60
“
Accessibility of content to stimulate
archiving.
The content in HTML+RDFa that dokieli produces is
accessible (readable) without requiring any CSS or
JavaScript, ie. text-browser safe. Breaking this
"rule" in future development should be considered
an anti-pattern (or a bug) in dokieli.
—dokieli documentation, Sarven Capadisli
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 56/60
Intelligent Server
Intelligent Client
Choices in Linked Data interface
increase or decrease archiving.
High resource granularity
Data not as accessible
Need to participate in archiving process
data
dump
Triple Pattern
Fragments
SPARQL
endpoint
interface offered by the server
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 57/60
Prevent mistakes from the past in
standardization
Query interfaces: what can be archived?
Protocols: is it accessible?
Domain Modeling: can the semantics be preserved?
How to select the subgraph?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 58/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 59/60
There are many sides
to preservation.
We don't start from scratch,
many technologies are there.
Start covering the uncovered sides.
Add archiving to the discussion.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 60/60
Preserving a Web of Linked
Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec

Weitere ähnliche Inhalte

Ähnlich wie Preserving a Web of Linked Data: Lessons and challenges from a fading web

Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Ähnlich wie Preserving a Web of Linked Data: Lessons and challenges from a fading web (20)

Oggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful ImagesOggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful Images
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Milano ux
Milano uxMilano ux
Milano ux
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics ZooAutomated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Interacting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its SustainabilityInteracting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its Sustainability
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Turin webperf meetup
Turin webperf meetupTurin webperf meetup
Turin webperf meetup
 
Introduction to Big Data Technologies
Introduction to Big Data TechnologiesIntroduction to Big Data Technologies
Introduction to Big Data Technologies
 
May 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know WebinarMay 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know Webinar
 
Shareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesShareable Metadata for Visual Resources
Shareable Metadata for Visual Resources
 
Hackference
HackferenceHackference
Hackference
 
Reading gdg images
Reading gdg imagesReading gdg images
Reading gdg images
 
Mobile App Performance, Firenze
Mobile App Performance, FirenzeMobile App Performance, Firenze
Mobile App Performance, Firenze
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 

Mehr von Miel Vander Sande

PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
Miel Vander Sande
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Miel Vander Sande
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
Miel Vander Sande
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
Miel Vander Sande
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
Miel Vander Sande
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
Miel Vander Sande
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
Miel Vander Sande
 

Mehr von Miel Vander Sande (18)

20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
 
The Memento protocol
The Memento protocolThe Memento protocol
The Memento protocol
 
Slight change of plans!
Slight change of plans!Slight change of plans!
Slight change of plans!
 
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
 
Reproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveReproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archive
 
Innovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital nativesInnovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital natives
 
A sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesA sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data Archives
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
 
Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpedia
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
 
The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
 
Follow the stars 25/11/2011
Follow the stars 25/11/2011Follow the stars 25/11/2011
Follow the stars 25/11/2011
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Preserving a Web of Linked Data: Lessons and challenges from a fading web

  • 1. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 1/60 Preserving a Web of Linked Data Lessons and challenges from a fading Web Miel Vander Sande Ghent University – imec
  • 2. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 2/60 There are many sides to preservation.
  • 3. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 3/60 Web of Linked Data?
  • 4. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 4/60
  • 5. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 5/60
  • 6. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 6/60
  • 7. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 7/60
  • 8. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 8/60 “ We are loosing thousands of Alexandria libraries each day We have lost so much of the early Web history, just as we have lost so much of early Human history. —Kalev H. Leetaru - University of Illinois
  • 9. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 9/60 The forces of decay Link Rot Content Drift Digital Preservation Business Case Toolkit http://wiki.dpconline.org/
  • 10. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 10/60 Link Rot Illustration by the Project Twins
  • 11. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 11/60
  • 12. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 12/60 Content Drift Significant change in content within a 3-Month Period
  • 13. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 13/60
  • 14. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 14/60
  • 15. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 15/60
  • 16. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 16/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 17. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 17/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 18. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 18/60 Strategies Observational: perceived as discrete Snapshot Web archive Historical: perceived as continuous Versioning systems Transactional Notification-based
  • 19. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 19/60 Snapshot
  • 20. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 20/60 Web archive See: Open Wayback
  • 21. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 21/60 Versioning systems See: MediaWiki
  • 22. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 22/60 Transactional See: SiteStory apache plugin
  • 23. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 23/60 If a representation changes and nobody is around to see it, should it be archived?
  • 24. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 24/60 Notification-based
  • 25. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 25/60 Memento: travelling to the Web of the Past https://tools.ietf.org/html/rfc7089
  • 26. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 26/60
  • 27. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 27/60
  • 28. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 28/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 29. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 29/60 Archive or Archiving?
  • 30. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 30/60 Linked Data archiving as the product RDF indexes for versioning Dydra, Virtuoso, XRDF3X, ... Representations of versions, provenance & time: PROV, LDPatch, LODE, ... Technical (Increasingly) Popular research tracks.
  • 31. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 31/60 Linked Data archiving as the process Some technological building blocks Linked Data interfaces, change detection, publishing, crawling & querying Technical, as well as Infrastructural & Societal. Rather unknown territory (but there are technologies).
  • 32. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 32/60 What assumptions are there about data evolution? Historical Data Provenance is a timeline. Only truth can exist at the same time. Timeseries databases, Wikipedia Versioned Data Provenance is a directed acyclic graph. Multiple truths can exist at the same time.
  • 33. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 33/60 Decay becomes more complex Link Rot Content Drift Concept Drift "Please don't change your vocabulary" (Check out DRIFT-A-LOD workshop) Problem in other domains as well (Machine Learning)
  • 34. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 34/60 Study these issues within Linked Data Link Rot Subject or Object cannot be dereferenced Dataset/Interface is gone Content Drift Context graph of Subject or Object has changed Concept Drift Predicate or Object change meaning
  • 35. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 35/60 Archiving for the Reproducibility of Query results Sustain the validity of claims Backwards compatibility of applications Federated querying is highly affected How to shape a decentralized Quality of Service? The Hyperlink is the simplest form of decentralization, which we are already failing to preserve.
  • 36. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 36/60 Persistent Identification Figure by Herbert Van de Sompel
  • 37. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 37/60 Persistent Identification Dependency on publisher registering the PIDs Possible loss of connection between PIDs and the original Dependency on the PID provider Possibly replacing one potential Link rot problem by another
  • 38. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 38/60 Who are you to tell me my URI is not persistent? ISWC Resources track: Consensus on and trust in persistence in a decentralized Web: community-driven? standardization? blockchain,...?
  • 39. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 39/60 Robust links <a href="B" data-versionurl="URL of snapshot of B" data-versiondate="datetime of snapshot of B"> http://robustlinks.mementoweb.org/spec/
  • 40. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 40/60 Robust Links Open Annotation & Memento vocab Can be linked to PROV Figure by Herbert Van de Sompel
  • 41. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 41/60 Real-time data Parallel truths Open challenges with Memento HTTP Datetime format is per second No solution for accessing Versioned Data
  • 42. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 42/60 Who will be responsible for archiving? Publisher Snapshot Versioning systems 3rd party Traditional Hybrid: Publisher and/or 3rd party Transactional Notification-based
  • 43. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 43/60 Snapshot Often "End of Term" archive (DBPedia version) Exchangeable archives, eg. file-based HDT
  • 44. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 44/60
  • 45. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 45/60
  • 46. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 46/60 Web RDF Versioning systems Memento support can improve depends on query expressivity Significant progress in the RDF domain MediaWiki Storage: Dydra, Virtuoso, ... Memento-supported publishing: DBpedia Wayback machine, Linked Data Fragments Server
  • 47. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 47/60 Linked Data pages Triple Patterns Hybrid: Snapshot + Versioning Discrete snapshots + index for continuous versions Tailr, ... Ostrich (offset-enabled), ...
  • 48. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 48/60 Web archive Not much in place yet Indexes, but no notion of time Sindice, LODCache, LODLaundromat Many technologies targeted crawling, sindice LODLaundromat, Linked Data Crawling, ... No guarantees on completeness
  • 49. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 49/60 Transactional Decentralized, sustainable solution A challenge for completeness Dependence on resource granularity eg. SPARQL results or Linked Data pages? Interested to see how far we would get...
  • 50. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 50/60 Notification-based
  • 51. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 51/60 Yesterday: Web archiving strategies Today: tools for a Web of Linked Data Tomorrow: things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 52. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 52/60 Data archiving intrests more than curators & activists For instance, Data driven journalism. Product: transparency of the editorial process Process: interaction with users, public Scolary communication, cultural heritage, legal publications, community databases (Wikipedia & Wikidata)
  • 53. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 53/60 Archivability of Linked Data
  • 54. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 54/60 Linked Data is in essence easier to archive. Raw, self-contained data Already machine processable/understandable No obfuscation by client-side scripting
  • 55. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 55/60 “ Accessibility of content to stimulate archiving. The content in HTML+RDFa that dokieli produces is accessible (readable) without requiring any CSS or JavaScript, ie. text-browser safe. Breaking this "rule" in future development should be considered an anti-pattern (or a bug) in dokieli. —dokieli documentation, Sarven Capadisli
  • 56. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 56/60 Intelligent Server Intelligent Client Choices in Linked Data interface increase or decrease archiving. High resource granularity Data not as accessible Need to participate in archiving process data dump Triple Pattern Fragments SPARQL endpoint interface offered by the server
  • 57. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 57/60 Prevent mistakes from the past in standardization Query interfaces: what can be archived? Protocols: is it accessible? Domain Modeling: can the semantics be preserved? How to select the subgraph?
  • 58. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 58/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 59. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 59/60 There are many sides to preservation. We don't start from scratch, many technologies are there. Start covering the uncovered sides. Add archiving to the discussion.
  • 60. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 60/60 Preserving a Web of Linked Data Lessons and challenges from a fading Web Miel Vander Sande Ghent University – imec