SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
©w Cwowp.ysrtiig-ihntn 2s0b0r8u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Preserving Linked DataHands-on use cases 
José María García 
ESWC Summer School 2014
www.sti-innsbruck.at 
Outline 
• 
Use case 1 
• 
Use case 2 
• 
Use case 3 
• 
Additional technical/organizational/economical issues for preserving LD 
2
www.sti-innsbruck.at 
Use case 1 –RDF data archiving and retrieval 
• 
DBpedia data (in RDF format, or Tables-CSV format) is archived and the user requests specific data (or the entire dataset) as it was at a specific date in the past. 
• 
The preservation mechanism must be able to provide the requested data in RDF (or Table) format. 
– 
Timestamps specifying the interval that the data was valid (i.e., dates that the data was created or modified for the last time before the date specified into the user request, and first modification or deletion date after that date) are a desirable feature of the mechanism. 
3
www.sti-innsbruck.at 
Use case 1 -Exercise 
• 
Browse through DBpedia archives at http://wiki.dbpedia.org/Downloads 
• 
Find the date when the last update was generated. Is it concrete enough? 
• 
Can you use the available files to reconstruct DBpedia? 
• 
Look at the links to other datasets. Are the datasets also available there? 
• 
Focus on Amsterdam Museum Data dataset 
– 
Can you download the dataset to reconstruct DBpedia and its link to this dataset? 
– 
Can you retrieve the exact version of the dataset that was linked to DBpedia 3.9? 
• 
Now look at previous versions of DBpedia, such as 3.7 
– 
Which differences you can find? 
– 
Is it also possible to reconstruct this previous version? What about linked datasets? 
• 
What about other datasets? 
– 
Look for Bricklink and GeoNames datasets, for instance. 
– 
Are they accessible? 
– 
Can you retrieve previous versions? 
• 
Try out DBpedia live at http://live.dbpedia.org/ 
4
www.sti-innsbruck.at 
Use case 1 -Extension 
• 
Retrieving data for a specific time interval, e.g., 2010-2012, instead of a specific date is a more complex case since all versions of the data and their corresponding validity intervals with respect to the request interval must be returned. 
• 
Currently complex requests involving intervals are not handled by the DBpedia archiving mechanism. RDF data containing links to other LOD for specific time points can be retrieved, but the content of links to external LOD is not preserved. 
5
www.sti-innsbruck.at 
Use case 1 –Discussion 
• 
Are DBpedia archiving measures sufficient to ensure its long-term preservation? 
• 
Who is responsible for archiving external datasets? 
• 
Up to which link depth datasets should be archived/preserved? 
6
www.sti-innsbruck.at 
Exercise 1 –DBpedia 3.9 
7
www.sti-innsbruck.at 
Exercise 1 –DBpedia 3.9 links to datasets 
8
www.sti-innsbruck.at 
Exercise 1 –Amsterdam Museum Data 
9
www.sti-innsbruck.at 
Exercise 1 –Amsterdam Museum Data dumps 
10
www.sti-innsbruck.at 
Exercise 1 –DBpedia 3.7 
11
www.sti-innsbruck.at 
Exercise 1 –DBpedia 3.7 links to datasets 
12
www.sti-innsbruck.at 
Exercise 1 –DBpedia to Bricklink links 
13
www.sti-innsbruck.at 
Exercise 1 –Bricklink dataset 
14
www.sti-innsbruck.at 
Exercise 1 –GeoNames data 
15
www.sti-innsbruck.at 
Exercise 1 –GeoNames dumps download 
16
www.sti-innsbruck.at 
Exercise 1 –GeoNames premium service 
17
www.sti-innsbruck.at 
Use case 2 –Rendering data as Web page 
• 
User requests the DBpedia data for a specific topic at a given temporal point or interval presented in a Web page format 
• 
The preservation mechanism should be able to return the data in RDF format and in case description is modified during the given interval all corresponding descriptions, the intervals that each one distinct description was true, modification history, differences between versions and editors should be returned as in the first use case. 
• 
Rendering requested data as a Web page will introduce the following problem: can the functionality of external links be preserved and supported as well or not? 
• 
Currently rendering software is not part of the preservation mechanism, although using a standard representation (RDF) minimizes the risk of not been able to render the data in the future. 
18
www.sti-innsbruck.at 
Use case 2 –Exercise & Discussion 
• 
Check some DBpedia pages, such as http://dbpedia.org/page/Semantic_Web 
• 
Look for information on provenance and revision 
• 
Are previous versions available for the user? 
• 
Is it easy to add modification history, diffs, etc.? 
• 
Should we preserve the rendering mechanism along with the data dumps? 
19
www.sti-innsbruck.at 
Exercise 2 –DBpedia entity rendered for web 
20
www.sti-innsbruck.at 
Use case 3 –SPARQL endpoint functionality 
• 
Reconstruct the functionality of the DBpedia SPARQL endpoint at a specific temporal point in the past. 
• 
Specifying both the query and the time point (this use case can be extended by supporting interval queries as in use case 1 and 2 before) . There are different kinds of queries that must be handled corresponding to different use cases: 
a) 
Queries spanning into RDF data into DBpedia dataset only 
b) 
Queries spanning into DBpedia dataset and datasets directly connected to the DBpedia RDF dataset (e.g., Geonames) 
c) 
Queries spanning into DBpedia data and to external datasets connected indirectly with DBpedia (i.e., through links to datasets of case b). 
• 
Currently SPARQL end-point functionality is not directly preserved, i.e., the users must retrieve the data and use their own SPARQL end-point to query them. 
21
www.sti-innsbruck.at 
Use case 3 –Extension 
• 
Currently SPARQL end-point functionality is not directly preserved, i.e., the users must retrieve the data and use their own SPARQL end-point to query them. 
• 
Then, they will be able to issue queries of type (a) above, but not queries of type (b) when the content of external links is requested. 
• 
Also requests of type (c) cannot be handled by the current preservation mechanism 
22
www.sti-innsbruck.at 
Use case 3 –Exercise & Discussion 
• 
Look at DBpedia SPARQL endpoint http://dbpedia.org/sparql 
• 
Try some SPARQL query, such as: 
– 
SELECT ?group WHERE { 
?group a <http://schema.org/MusicGroup>. 
?group dbpprop:origin?origin . 
FILTER contains(str(?origin), "Crete")} 
• 
How can we access previous datasets in the SPARQL-endpoint? 
• 
Can a SPARQL-endpoint be reconstructed from the archived datasets? 
• 
Should SPARQL endpoint be preserved along with the data? 
23
www.sti-innsbruck.at 
Exercise 3 
24
www.sti-innsbruck.at 
Technical issues 
• 
OAIS model compliance 
– 
Standard for archiving 
– 
Other desirable standards to be used (DCAT, PROV…) 
• 
Data ingestion 
– 
What amount of data should be preserved? 
– 
Implicit semantics 
– 
Inferred/derived knowledge 
• 
Preserving URIs dereferencability 
– 
Bricklinkdataset problem 
• 
Dealing with changes 
– 
When to update an archive 
– 
DBpedia Live 
25
www.sti-innsbruck.at 
Organizational issues 
• 
Stakeholders 
– 
Who is interested in LD preservation? 
– 
E.g. DBpedia association, users… 
• 
Scope of preservation systems 
– 
Which datasets are worth to preserve? 
• 
Responsibilities 
– 
Who should take care of long-term preservation? 
– 
Data curators? Users? Government? 
26
www.sti-innsbruck.at 
Economical issues 
• 
Sustainability 
– 
Business plans should be created to offer long-term preservation 
– 
Is it viable to preserve Linked Data? 
• 
Intellectual Property Rights and Licenses 
– 
Open data licenses help 
– 
What about proprietary data? 
• 
Liability and obligations 
27
©w Cwowp.ysrtiig-ihntn 2s0b0r8u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Thanks! 
Any questions? 
Drop me an email at jose.garcia@sti2.at 
28

Weitere ähnliche Inhalte

Was ist angesagt?

20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOPaolo Cristofaro
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsAdila Krisnadhi
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013Frauke Ziedorn
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsQian Lin
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaSebastian Schaffert
 
March 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupMarch 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupGenomeInABottle
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...CIGScotland
 
PIDs and DOI registration with DataCite - IATUL Workshop 2013
PIDs and DOI registration with DataCite - IATUL Workshop 2013PIDs and DOI registration with DataCite - IATUL Workshop 2013
PIDs and DOI registration with DataCite - IATUL Workshop 2013Frauke Ziedorn
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Introducing JDBC for SPARQL
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQLRob Vesse
 
Iochem.carles bo
Iochem.carles boIochem.carles bo
Iochem.carles bomaredata
 

Was ist angesagt? (20)

20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
CKAN as an open-source data management solution for open data
CKAN as an open-source data management solution for open data CKAN as an open-source data management solution for open data
CKAN as an open-source data management solution for open data
 
OAISRB
OAISRBOAISRB
OAISRB
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
 
March 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupMarch 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working Group
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...
 
PIDs and DOI registration with DataCite - IATUL Workshop 2013
PIDs and DOI registration with DataCite - IATUL Workshop 2013PIDs and DOI registration with DataCite - IATUL Workshop 2013
PIDs and DOI registration with DataCite - IATUL Workshop 2013
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Introducing JDBC for SPARQL
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
 
Metadata april 8 2013
Metadata april 8 2013Metadata april 8 2013
Metadata april 8 2013
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
 
Iochem.carles bo
Iochem.carles boIochem.carles bo
Iochem.carles bo
 

Andere mochten auch

ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...eswcsummerschool
 
OWF - Travailler aux frontières
OWF - Travailler aux frontièresOWF - Travailler aux frontières
OWF - Travailler aux frontièresAlexis Monville
 
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014 Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014 eswcsummerschool
 
OWF Agilité et usabilité
OWF Agilité et usabilitéOWF Agilité et usabilité
OWF Agilité et usabilitéAlexis Monville
 
20090302 Pratic Solutions Linux
20090302 Pratic Solutions Linux20090302 Pratic Solutions Linux
20090302 Pratic Solutions LinuxAlexis Monville
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini ProjectsESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projectseswcsummerschool
 

Andere mochten auch (7)

ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
 
OWF - Travailler aux frontières
OWF - Travailler aux frontièresOWF - Travailler aux frontières
OWF - Travailler aux frontières
 
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014 Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
 
OWF Agilité et usabilité
OWF Agilité et usabilitéOWF Agilité et usabilité
OWF Agilité et usabilité
 
20090302 Pratic Solutions Linux
20090302 Pratic Solutions Linux20090302 Pratic Solutions Linux
20090302 Pratic Solutions Linux
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini ProjectsESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
 

Ähnlich wie Wed garcia hands_on_d_bpedia preservation

PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)Nikos Palavitsinis, PhD
 
Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationseswcsummerschool
 
Wed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationsWed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationseswcsummerschool
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptxSoniaDevi15
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaJisc RDM
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...QBiC_Tue
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13Simeon Warner
 
Overview of Data Base Systems Concepts and Architecture
Overview of Data Base Systems Concepts and ArchitectureOverview of Data Base Systems Concepts and Architecture
Overview of Data Base Systems Concepts and ArchitectureRubal Sagwal
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 

Ähnlich wie Wed garcia hands_on_d_bpedia preservation (20)

PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservations
 
Wed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationsWed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservations
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
Ordbms
OrdbmsOrdbms
Ordbms
 
Overview of Data Base Systems Concepts and Architecture
Overview of Data Base Systems Concepts and ArchitectureOverview of Data Base Systems Concepts and Architecture
Overview of Data Base Systems Concepts and Architecture
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 

Mehr von eswcsummerschool

Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projecteswcsummerschool
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student projecteswcsummerschool
 
Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student projecteswcsummerschool
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projecteswcsummerschool
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projecteswcsummerschool
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student projecteswcsummerschool
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...eswcsummerschool
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...eswcsummerschool
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 eswcsummerschool
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 eswcsummerschool
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...eswcsummerschool
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01eswcsummerschool
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the schooleswcsummerschool
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataeswcsummerschool
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataeswcsummerschool
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speedeswcsummerschool
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringeswcsummerschool
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataeswcsummerschool
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semanticeswcsummerschool
 

Mehr von eswcsummerschool (20)

Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student project
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student project
 
Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student project
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 

Wed garcia hands_on_d_bpedia preservation

  • 1. ©w Cwowp.ysrtiig-ihntn 2s0b0r8u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Preserving Linked DataHands-on use cases José María García ESWC Summer School 2014
  • 2. www.sti-innsbruck.at Outline • Use case 1 • Use case 2 • Use case 3 • Additional technical/organizational/economical issues for preserving LD 2
  • 3. www.sti-innsbruck.at Use case 1 –RDF data archiving and retrieval • DBpedia data (in RDF format, or Tables-CSV format) is archived and the user requests specific data (or the entire dataset) as it was at a specific date in the past. • The preservation mechanism must be able to provide the requested data in RDF (or Table) format. – Timestamps specifying the interval that the data was valid (i.e., dates that the data was created or modified for the last time before the date specified into the user request, and first modification or deletion date after that date) are a desirable feature of the mechanism. 3
  • 4. www.sti-innsbruck.at Use case 1 -Exercise • Browse through DBpedia archives at http://wiki.dbpedia.org/Downloads • Find the date when the last update was generated. Is it concrete enough? • Can you use the available files to reconstruct DBpedia? • Look at the links to other datasets. Are the datasets also available there? • Focus on Amsterdam Museum Data dataset – Can you download the dataset to reconstruct DBpedia and its link to this dataset? – Can you retrieve the exact version of the dataset that was linked to DBpedia 3.9? • Now look at previous versions of DBpedia, such as 3.7 – Which differences you can find? – Is it also possible to reconstruct this previous version? What about linked datasets? • What about other datasets? – Look for Bricklink and GeoNames datasets, for instance. – Are they accessible? – Can you retrieve previous versions? • Try out DBpedia live at http://live.dbpedia.org/ 4
  • 5. www.sti-innsbruck.at Use case 1 -Extension • Retrieving data for a specific time interval, e.g., 2010-2012, instead of a specific date is a more complex case since all versions of the data and their corresponding validity intervals with respect to the request interval must be returned. • Currently complex requests involving intervals are not handled by the DBpedia archiving mechanism. RDF data containing links to other LOD for specific time points can be retrieved, but the content of links to external LOD is not preserved. 5
  • 6. www.sti-innsbruck.at Use case 1 –Discussion • Are DBpedia archiving measures sufficient to ensure its long-term preservation? • Who is responsible for archiving external datasets? • Up to which link depth datasets should be archived/preserved? 6
  • 8. www.sti-innsbruck.at Exercise 1 –DBpedia 3.9 links to datasets 8
  • 9. www.sti-innsbruck.at Exercise 1 –Amsterdam Museum Data 9
  • 10. www.sti-innsbruck.at Exercise 1 –Amsterdam Museum Data dumps 10
  • 11. www.sti-innsbruck.at Exercise 1 –DBpedia 3.7 11
  • 12. www.sti-innsbruck.at Exercise 1 –DBpedia 3.7 links to datasets 12
  • 13. www.sti-innsbruck.at Exercise 1 –DBpedia to Bricklink links 13
  • 14. www.sti-innsbruck.at Exercise 1 –Bricklink dataset 14
  • 15. www.sti-innsbruck.at Exercise 1 –GeoNames data 15
  • 16. www.sti-innsbruck.at Exercise 1 –GeoNames dumps download 16
  • 17. www.sti-innsbruck.at Exercise 1 –GeoNames premium service 17
  • 18. www.sti-innsbruck.at Use case 2 –Rendering data as Web page • User requests the DBpedia data for a specific topic at a given temporal point or interval presented in a Web page format • The preservation mechanism should be able to return the data in RDF format and in case description is modified during the given interval all corresponding descriptions, the intervals that each one distinct description was true, modification history, differences between versions and editors should be returned as in the first use case. • Rendering requested data as a Web page will introduce the following problem: can the functionality of external links be preserved and supported as well or not? • Currently rendering software is not part of the preservation mechanism, although using a standard representation (RDF) minimizes the risk of not been able to render the data in the future. 18
  • 19. www.sti-innsbruck.at Use case 2 –Exercise & Discussion • Check some DBpedia pages, such as http://dbpedia.org/page/Semantic_Web • Look for information on provenance and revision • Are previous versions available for the user? • Is it easy to add modification history, diffs, etc.? • Should we preserve the rendering mechanism along with the data dumps? 19
  • 20. www.sti-innsbruck.at Exercise 2 –DBpedia entity rendered for web 20
  • 21. www.sti-innsbruck.at Use case 3 –SPARQL endpoint functionality • Reconstruct the functionality of the DBpedia SPARQL endpoint at a specific temporal point in the past. • Specifying both the query and the time point (this use case can be extended by supporting interval queries as in use case 1 and 2 before) . There are different kinds of queries that must be handled corresponding to different use cases: a) Queries spanning into RDF data into DBpedia dataset only b) Queries spanning into DBpedia dataset and datasets directly connected to the DBpedia RDF dataset (e.g., Geonames) c) Queries spanning into DBpedia data and to external datasets connected indirectly with DBpedia (i.e., through links to datasets of case b). • Currently SPARQL end-point functionality is not directly preserved, i.e., the users must retrieve the data and use their own SPARQL end-point to query them. 21
  • 22. www.sti-innsbruck.at Use case 3 –Extension • Currently SPARQL end-point functionality is not directly preserved, i.e., the users must retrieve the data and use their own SPARQL end-point to query them. • Then, they will be able to issue queries of type (a) above, but not queries of type (b) when the content of external links is requested. • Also requests of type (c) cannot be handled by the current preservation mechanism 22
  • 23. www.sti-innsbruck.at Use case 3 –Exercise & Discussion • Look at DBpedia SPARQL endpoint http://dbpedia.org/sparql • Try some SPARQL query, such as: – SELECT ?group WHERE { ?group a <http://schema.org/MusicGroup>. ?group dbpprop:origin?origin . FILTER contains(str(?origin), "Crete")} • How can we access previous datasets in the SPARQL-endpoint? • Can a SPARQL-endpoint be reconstructed from the archived datasets? • Should SPARQL endpoint be preserved along with the data? 23
  • 25. www.sti-innsbruck.at Technical issues • OAIS model compliance – Standard for archiving – Other desirable standards to be used (DCAT, PROV…) • Data ingestion – What amount of data should be preserved? – Implicit semantics – Inferred/derived knowledge • Preserving URIs dereferencability – Bricklinkdataset problem • Dealing with changes – When to update an archive – DBpedia Live 25
  • 26. www.sti-innsbruck.at Organizational issues • Stakeholders – Who is interested in LD preservation? – E.g. DBpedia association, users… • Scope of preservation systems – Which datasets are worth to preserve? • Responsibilities – Who should take care of long-term preservation? – Data curators? Users? Government? 26
  • 27. www.sti-innsbruck.at Economical issues • Sustainability – Business plans should be created to offer long-term preservation – Is it viable to preserve Linked Data? • Intellectual Property Rights and Licenses – Open data licenses help – What about proprietary data? • Liability and obligations 27
  • 28. ©w Cwowp.ysrtiig-ihntn 2s0b0r8u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Thanks! Any questions? Drop me an email at jose.garcia@sti2.at 28