SlideShare a Scribd company logo
1 of 26
DESWeb 2014
ICDE 2014, Chicago IL, USA, March 3
balloon Fusion
SPARQL Rewriting Based on
Unified Co-Reference Information
Kai Schlegel (kai.schlegel@googlemail.com)
Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch
2
Motivation
SPARQL Rewriting & Federation
Intermediate Results
Outline
supported by the European Commission
under the Seventh Framework Program
3
Linked Data is
the heart of Semantic Web
“
- W3C Semantic Web Group
4
5
• Easy access to Linked Data
• Query Linked Open Data with SPARQL
• Plethora of tools available
• Problems:
• Business oriented
• Complex setup
• Maintenance
• „Paper-only“
• Not developer friendly
•  Simple and „instant“ SPARQL Query Federation (-as-a-Service)
6
Motivation
Nothing-as-a-Service
• How to get information about the German City „Passau“?
• Problem: LOD is not a single database!
7
Querying LOD
SPARQL SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
de.dbpedia.org
Relations, Coordinates, Leader, etc.
What about the population?
SPARQL
• Problem: Selection of appropriate endpoints
• Send query to some endpoints and aggregate the results?
8
Distributed Querying!
SPARQL SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
de.dbpedia.org
SPARQL
linkedgeodata.org
• Problem: Different identifier for the same semantic concept
9
Misunderstanding: Co-Referencing
SPARQL SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
de.dbpedia.org
SPARQL
linkedgeodata.org
Known problem in linguistic:
It’s a spud!“
What?“
I mean potato!“
Co-Referencing: Multiple expressions
refer to the same thing.
10
Problem = Solution?
SPARQL-based crawling of co-reference information
Exploit co-reference information for
• accomplishing immediate SPARQL rewriting
• performing endpoint selection
• execute automatic query federation
Basic idea: Focusing distributed co-reference information
Main principle: Semantic entites over
identifier!
11
Components
balloon toolsuite
12
balloon Overflight
• SPARQL based crawling of LOD endpoints
• Query: Ask for subjects and objects which are
related with special predicate
• Simplified global view on
• Equivalence: owl:SameAs, skos:exactMatch,
coref:coreferenceData, ...
• Graph-Database Neo4j
• Equivalence Cluster:
Multiple synonym URIs representing the same
semantic entity including Provenance
13
balloon Fusion
SPARQL Federation setup using co-reference information
SPARQL Transformation for each BGP
1. Determine synonym URIs
2. Select suitable endpoints
3. Adapt sub-queries to endpoints
4. Federated querying
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
SPARQL
14
1. Determine synonym URIs
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
SPARQL
15
2. Select suitable endpoints
• Provenance based selection (PBS)
• Endpoints which are involved in cluster composition
• Namespace based selection (NBS)
• Prefix and Namespace matching of synonym URLs
Summarized: origin of co-reference
information and origin of synonym URIs
16
2. Select suitable endpoints (2)
Assumption:
• Provenance information only contains „linkedgeodata.org“
as co-reference origin
• Namespaces for freebase and dbpedia available (datahub.io)
PBS:
Linked-Geo-Data
Endpoint
NBS:
DBPedia
endpoint
NBS:
Freebase
endpoint
17
3. Adapt sub-queries to endpoints
PBS:
Linked-Geo-Data
Endpoint
NBS:
DBPedia
endpoint
NBS:
Freebase
endpoint
SELECT ?p ?o WHERE {
<http://rdf.freebase.com/
ns/m.01h5td> ?p ?o.
}
SPARQL
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
SPARQL
SELECT ?p ?o WHERE {
{ <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. }
UNION
{ <http://linkedgeodata.org/triplify/node240057351> ?p ?o. }
UNION
{ <http://de.dbpedia.org/resource/Passau> ?p ?o. }
}
SPARQL
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
SPARQL
• W3C SPARQL 1.1 Federated Query Extension (SERVICE)
• (Partial) Query can be executed against a remote SPARQL
endpoint
• Distributed sub-queries don‘t contain SPARQL 1.1 features
18
4. Federated Querying
SPARQL
SELECT ?p ?o WHERE {
SERVICE <http://dbpedia.org/sparql> {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
} UNION {
SERVICE <http://www.freebase.com/base/sparql> {
<http://rdf.freebase.com/ns/m.01h5td> ?p ? }
} UNION {
SERVICE <http://linkedgeodata.org/sparql/> {
{ <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. }
UNION
{ <http://linkedgeodata.org/triplify/node240057351> ?p ?o. }
UNION
{ <http://de.dbpedia.org/resource/Passau> ?p ?o. }
}}}
• Endpoint status check
• Check routine in terms of availability and latency
• Minimize sub-queries
• Group sub-queries with common endpoint
• Push join to endpoint
• SPARQL Features
• Condense PBS UNION-construct of synonym URIs
• SPARQL 1.1 VALUES or FILTER with IN operator
• Not well implemented in Linked Data endpoints
19
Optimizations (ongoing)
balloon Overflight
Results
20
21
Results from a sounding
balloon
22
balloon toolsuite
23
Statistics
• Datahub.io: Linked Open Data Cloud catalog
• 337 datasets in total
• 237 expose a SPARQL endpoint
• 112 successfully queried for co-reference information
• Balloon Dataset (first run)
• 17.6M co-reference statements
• 22.4M distinct URLs
• 8.4M equivalence cluster (~ 2.68 identifier per cluster)
• Pending Analysis
• Distribution of cluster sizes, Number of different Hosts per cluster
• Main representative per cluster & False-Friends
Open Source:
• Demo, information and sources available (MIT License)
• X as a Service
• SPARQL Rewriting (HTTP API)
• Query Federation (SPARQL)
24
http://schlegel.github.io/balloon
Summary:
• SPARQL-based crawling of distributed co-reference information
• Exploit co-reference information for SPARQL federation
25
Single Point of Access
Any questions?
“
26
Research is formalized curiosity.
It is poking and prying with a
purpose. - Zora Neale Hurston

More Related Content

What's hot

London HUG
London HUGLondon HUG
London HUG
Boudicca
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
South London Geek Nights
 

What's hot (20)

Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Shawn-Averkamp-feb25
Shawn-Averkamp-feb25Shawn-Averkamp-feb25
Shawn-Averkamp-feb25
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
 
2016 urisa track: nhd hydro linked data registery by michael tinker
2016 urisa track:  nhd hydro linked data registery by michael tinker2016 urisa track:  nhd hydro linked data registery by michael tinker
2016 urisa track: nhd hydro linked data registery by michael tinker
 
Data Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural Program
 
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
 
Search Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited Lecture
 
London HUG
London HUGLondon HUG
London HUG
 
20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage Information
 
Neo4j_allHands_04112013
Neo4j_allHands_04112013Neo4j_allHands_04112013
Neo4j_allHands_04112013
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
 
An Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart CommunitiesAn Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart Communities
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 

Similar to balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
eswcsummerschool
 

Similar to balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information (20)

Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQL
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Sparql a simple knowledge query
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge query
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Sparql
SparqlSparql
Sparql
 
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
SFScon 2020 - Peter Hopfgartner - Open Data de luxeSFScon 2020 - Peter Hopfgartner - Open Data de luxe
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information

  • 1. DESWeb 2014 ICDE 2014, Chicago IL, USA, March 3 balloon Fusion SPARQL Rewriting Based on Unified Co-Reference Information Kai Schlegel (kai.schlegel@googlemail.com) Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch
  • 2. 2 Motivation SPARQL Rewriting & Federation Intermediate Results Outline supported by the European Commission under the Seventh Framework Program
  • 3. 3 Linked Data is the heart of Semantic Web “ - W3C Semantic Web Group
  • 4. 4
  • 5. 5
  • 6. • Easy access to Linked Data • Query Linked Open Data with SPARQL • Plethora of tools available • Problems: • Business oriented • Complex setup • Maintenance • „Paper-only“ • Not developer friendly •  Simple and „instant“ SPARQL Query Federation (-as-a-Service) 6 Motivation Nothing-as-a-Service
  • 7. • How to get information about the German City „Passau“? • Problem: LOD is not a single database! 7 Querying LOD SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org Relations, Coordinates, Leader, etc. What about the population? SPARQL
  • 8. • Problem: Selection of appropriate endpoints • Send query to some endpoints and aggregate the results? 8 Distributed Querying! SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org SPARQL linkedgeodata.org
  • 9. • Problem: Different identifier for the same semantic concept 9 Misunderstanding: Co-Referencing SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org SPARQL linkedgeodata.org Known problem in linguistic: It’s a spud!“ What?“ I mean potato!“ Co-Referencing: Multiple expressions refer to the same thing.
  • 10. 10 Problem = Solution? SPARQL-based crawling of co-reference information Exploit co-reference information for • accomplishing immediate SPARQL rewriting • performing endpoint selection • execute automatic query federation Basic idea: Focusing distributed co-reference information Main principle: Semantic entites over identifier!
  • 12. 12 balloon Overflight • SPARQL based crawling of LOD endpoints • Query: Ask for subjects and objects which are related with special predicate • Simplified global view on • Equivalence: owl:SameAs, skos:exactMatch, coref:coreferenceData, ... • Graph-Database Neo4j • Equivalence Cluster: Multiple synonym URIs representing the same semantic entity including Provenance
  • 13. 13 balloon Fusion SPARQL Federation setup using co-reference information SPARQL Transformation for each BGP 1. Determine synonym URIs 2. Select suitable endpoints 3. Adapt sub-queries to endpoints 4. Federated querying SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  • 14. 14 1. Determine synonym URIs SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  • 15. 15 2. Select suitable endpoints • Provenance based selection (PBS) • Endpoints which are involved in cluster composition • Namespace based selection (NBS) • Prefix and Namespace matching of synonym URLs Summarized: origin of co-reference information and origin of synonym URIs
  • 16. 16 2. Select suitable endpoints (2) Assumption: • Provenance information only contains „linkedgeodata.org“ as co-reference origin • Namespaces for freebase and dbpedia available (datahub.io) PBS: Linked-Geo-Data Endpoint NBS: DBPedia endpoint NBS: Freebase endpoint
  • 17. 17 3. Adapt sub-queries to endpoints PBS: Linked-Geo-Data Endpoint NBS: DBPedia endpoint NBS: Freebase endpoint SELECT ?p ?o WHERE { <http://rdf.freebase.com/ ns/m.01h5td> ?p ?o. } SPARQL SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL SELECT ?p ?o WHERE { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. } } SPARQL SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  • 18. • W3C SPARQL 1.1 Federated Query Extension (SERVICE) • (Partial) Query can be executed against a remote SPARQL endpoint • Distributed sub-queries don‘t contain SPARQL 1.1 features 18 4. Federated Querying SPARQL SELECT ?p ?o WHERE { SERVICE <http://dbpedia.org/sparql> { <http://de.dbpedia.org/resource/Passau> ?p ?o. } UNION { SERVICE <http://www.freebase.com/base/sparql> { <http://rdf.freebase.com/ns/m.01h5td> ?p ? } } UNION { SERVICE <http://linkedgeodata.org/sparql/> { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. } }}}
  • 19. • Endpoint status check • Check routine in terms of availability and latency • Minimize sub-queries • Group sub-queries with common endpoint • Push join to endpoint • SPARQL Features • Condense PBS UNION-construct of synonym URIs • SPARQL 1.1 VALUES or FILTER with IN operator • Not well implemented in Linked Data endpoints 19 Optimizations (ongoing)
  • 21. 21 Results from a sounding balloon
  • 23. 23 Statistics • Datahub.io: Linked Open Data Cloud catalog • 337 datasets in total • 237 expose a SPARQL endpoint • 112 successfully queried for co-reference information • Balloon Dataset (first run) • 17.6M co-reference statements • 22.4M distinct URLs • 8.4M equivalence cluster (~ 2.68 identifier per cluster) • Pending Analysis • Distribution of cluster sizes, Number of different Hosts per cluster • Main representative per cluster & False-Friends
  • 24. Open Source: • Demo, information and sources available (MIT License) • X as a Service • SPARQL Rewriting (HTTP API) • Query Federation (SPARQL) 24 http://schlegel.github.io/balloon
  • 25. Summary: • SPARQL-based crawling of distributed co-reference information • Exploit co-reference information for SPARQL federation 25 Single Point of Access
  • 26. Any questions? “ 26 Research is formalized curiosity. It is poking and prying with a purpose. - Zora Neale Hurston