Weitere ähnliche Inhalte
Ähnlich wie Pal gov.tutorial2.session9.lab rdf-stores (14)
Mehr von Mustafa Jarrar (20)
Kürzlich hochgeladen (20)
Pal gov.tutorial2.session9.lab rdf-stores
- 1. أكاديمية الحكومة اإللكترونية الفلسطينية
The Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 9
RDF Stores: Challenges and Solutions
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
PalGov © 2011 1
- 2. About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:
Birzeit University, Palestine
University of Trento, Italy
(Coordinator )
Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium
Palestine Technical University, Palestine
Université de Savoie, France
Ministry of Telecom and IT, Palestine
University of Namur, Belgium
Ministry of Interior, Palestine
TrueTrust, UK
Ministry of Local Government, Palestine
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
2
- 3. © Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 3
- 4. Tutorial Map
Topic h
Intended Learning Objectives
Session 1: XML Basics and Namespaces 3
A: Knowledge and Understanding
Session 2: XML DTD’s 3
2a1: Describe tree and graph data models.
Session 3: XML Schemas 3
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
Session 4: Lab-XML Schemas 3
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath. Session 5: RDF and RDFs 3
2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3
2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3
heterogeneous data. Session 8: Lab-OWL 3
B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3
2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3
RDF). Session 11: Lab-Oracle Semantic Technology 3
2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5
2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5
2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1
C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1
2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1
and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3
D: General and Transferable Skills
2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5
2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3
PalGov © 2011 4
- 5. Module ILOs
After completing this module students will be
able to:
- Demonstrate knowledge about querying techniques
for the graph data model.
- Manage and query data represented in RDF.
PalGov © 2011 5
- 6. Data Integration and Open Information Systems (Tutorial II)
The Palestinian e-Government Academy
January, 2012
Tutorial II: Data Integration and Open Information Systems
Part1: Querying SPO Tables
PalGov © 2011 6
- 7. Module Outline
Part I: Querying SPO Tables
Part 2: Querying SPO Tables using SQL
Part 3: Complexities and Solutions of Querying Graph
Shaped Data
PalGov © 2011 7
- 8. RDF as a Data Model
• RDF (Resource Description Framework) is the Data Model
behind the Semantic/Data Web.
• RDF is a graph-based data model.
• In RDF, Data is represented as triples: <Subject,
Predicate, Object>, or in short: <S,P,O>.
P O
S
PalGov © 2011 8
- 9. RDF as a Data Model
• RDF’s way of representing Data as triples is more
elementary than Databases or XML.
– This enables easy data integration and interoperability between
systems (as will be demonstrated later).
• EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael
Moore from the USA can be represented by the following triples which
form a directed labeled graph:
< M1, name, Sicko > 2007
< M1, year, 2007 >
< M1, directedBy, D1 >
Sicko
< D1, name, Michael Moore > M1
< D1, country, C1 > name Michael
D1
< C1, name, USA> Moore
PalGov © 2011 9
- 10. Persistent Storage of RDF Graphs
S P O
• Typically, an RDF Graph is stored as M1 year 2007
one <S,P,O> Table in a Relational M1
M1
Name
directedBy
Sicko
D1
Database Management System. M2 directedBy D1
M2 Year 2009
2007
Sicko M2 Name Capitalism
Michael Moore M3 Year 1995
directedBy
M1 D1 Emmy Awards M3 directedBy D2
P1 1995 M3 Name Brave Heart
M2 2009
M4 Year 2007
Capitalism
USA M4 directedBy D3
C1
1995 M4 Name Caramel
Washington DC
directedBy D1 Name Michael Moore
M3 D2
actedIn P2 Oscars D1 hasWonPrizeIn P1
Brave Heart Mel Gibson
D1 Country C1
1996
2007 D2 Counrty C1
Nadine Labaki
directedBy D2 hasWonPrizeIn P2
M4 D3 C2 Lebanon
actedIn D2 Name Mel Gibson
Beirut D2 actedIn M3
Caramel location name
P3 Stockholm Festival D3 Name Nadine Labaki
C3
Sweden 2007 D3 Country C2
Stockholm D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
PalGov © 2011 10
- 11. How to Query RDF data stored in one table?
S P O
• How do we query graph-shaped M1
M1
year
Name
2007
Sicko
data stored in one relational table? M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
• How do we traverse a graph stored M3 Year 1995
M3 directedBy D2
in one relational table? M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
Self Joins! D1
D1
Name
hasWonPrizeIn
Michael Moore
P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
PalGov © 2011 11
- 12. How to Query RDF data stored in one table?
S P O 2007
M1 year 2007 Sicko
Michael Moore
M1 Name Sicko directedBy
M1 D1 Emmy Awards
M1 directedBy D1
M2 directedBy D1 P1 1995
M2 2009
M2 Year 2009
M2 Name Capitalism Capitalism
C1 USA
M3 Year 1995 1995
M3 directedBy D2 Washington DC
directedBy
M3 Name Brave Heart M3 D2
actedIn P2
M4 Year 2007 Oscars
M4 directedBy D3 Brave Heart Mel Gibson 1996
M4 Name Caramel 2007 Nadine Labaki
D1 Name Michael Moore directedBy
M4 D3 C2 Lebanon
D1 hasWonPrizeIn P1 actedIn
… … … Beirut
Caramel location name
P3 Stockholm Festival
C3
Exercises: Sweden 2007
(1) What is the name of director D3? Stockholm
(2) What is the name of the director of the movie M1?
(3) List all the movies who have directors from the USA and their directors.
(4) List all the names of the directors from Lebanon who have won prizes and the
prizes they have won. PalGov © 2011 12
- 13. How to Query RDF data stored in one table?
S P O
• Exercise 1: A simple query. M1
M1
year
Name
2007
Sicko
– What is the name of director D3? M1 directedBy D1
M2 directedBy D1
– Consider the table MoviesTable (s,p,o). M2 Year 2009
– SQL: M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
Select o M3 Name Brave Heart
From MoviesTable T M4 Year 2007
M4 directedBy D3
Where T.s = „D3‟ AND T.p = „Name‟; M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
PalGov © 2011 Ans: Nadine Labaki 13
- 14. How to Query RDF data stored in one table?
S P O
• Exercise 2: A path query with 1 join. M1
M1
year
Name
2007
Sicko
– What is the name of the director of the M1 directedBy D1
M2 directedBy D1
movie M1.
M2 Year 2009
– Consider the table MoviesTable (s,p,o). M2 Name Capitalism
M3 Year 1995
– SQL: M3 directedBy D2
M3 Name Brave Heart
Select T2.o M4 Year 2007
M4 directedBy D3
From MoviesTable T1, MoviesTable T2 M4 Name Caramel
Where {T1.s = „M1‟ AND D1 Name Michael Moore
D1 hasWonPrizeIn P1
T1.p = „directedBy‟ AND D1 Country C1
D2 Counrty C1
T1.o = T2.s AND D2 hasWonPrizeIn P2
T2.p = „name‟}; D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
PalGov © 2011 Ans: Michael Moore 14
- 15. How to Query RDF data stored in one table?
S P O
• Exercise 3: A path query with 2 M1
M1
year
Name
2007
Sicko
joins. M1 directedBy D1
M2 directedBy D1
– List all the movies who have directors from M2 Year 2009
the USA and their directors. M2 Name Capitalism
M3 Year 1995
– Consider the table MoviesTable(s,p,o). M3 directedBy D2
– SQL: M3 Name Brave Heart
… … …
D1 Name Michael Moore
Select T1.s, T1.o D1 hasWonPrizeIn P1
From MoviesTable T1, MoviesTable T2, D1 Country C1
D2 Counrty C1
MoviesTable T3 D2 hasWonPrizeIn P2
D2 Name Mel Gibson
Where {T1.o = T2.s AND
D2 actedIn M3
T2.o = T3.s AND … … …
C1 Name USA
T1.p = „directedBy‟ AND C1 Capital Washington DC
T2.p = „country‟ AND C2 Name Lebanon
C2 Capital Beirut
T3.p = „name‟ AND … … …
T3.o = „USA‟}; Ans: M1 D1; M2 D1; M3 D2
PalGov © 2011 15
- 16. How to Query RDF data stored in one table?
• Exercise 4: Star query.
– List all the names of the directors from Lebanon who have
won prizes and the prizes they have won.
S P O
– Consider the table MoviesTable (S,P,O). D1 Name Michael Moore
D1 hasWonPrizeIn P1
Select T1.o, T4.o D1 Country C1
From MoviesTable T1, MoviesTable T2, D2 Counrty C1
D2 hasWonPrizeIn P2
MoviesTable T3, MoviesTable T4
D2 Name Mel Gibson
Where {T1.p = „name‟ AND D2 actedIn M3
D3 Name Nadine Labaki
T2.s = T1.s AND D3 Country C2
T2.p = „country‟ AND D3 hasWonPrizeIn P3
D3 actedIn M4
T2.o = T3.s AND … … …
T3.p = „name‟ AND C1 Name USA
C1 Capital Washington DC
T3.o = „Lebanon‟ AND C2 Name Lebanon
T4.s = T1.s AND C2 Capital Beirut
… … …
T4.p = „hasWonPrizeIn‟}; Ans: ‘Nadine Labaki’ , P3
PalGov © 2011 16
- 17. Data Integration and Open Information Systems (Tutorial II)
The Palestinian e-Government Academy
January, 2012
Tutorial II: Data Integration and Open Information Systems
Part 2:
Querying SPO Tables using SQL
PalGov © 2011 17
- 18. Module Outline
Part I: Querying SPO Tables
Part 2: Querying SPO Tables using SQL
Part 3: Complexities and Solutions of Querying Graph
Shaped Data
PalGov © 2011 18
- 19. Practical Session
• Given the RDF graph in the next slide, do the following:
(1) Insert the data graph as <S,P,O> triples in a 3-column table in
any DBMS.
Schema of the table: Books (Subject, Predicate, Object)
(2) Write the following queries in SQL and execute them over the
newly created table:
• List all the authors born in a country which has the name Palestine.
• List the names of all authors with the name of their affiliation who are
born in a country whose capital’s population is14M. Note that the
author must have an affiliation.
• List the names of all books whose authors are born in Lebanon along
with the name of the author.
PalGov © 2011 19
- 20. Practical Session
Palestine
7.6K
Said
Capital Name
CN1 CA1 Jerusalem
AU1
BK1
CU Colombia University
Author Viswanathan 14.0M
BK2 AU2
India
Capital Name New Delhi
Wamadat CN2 CA2
Name
Author Naima
BK3 AU3 Lebanon
2.0M
The Prophet Gibran
CN3 CA3
BK4 Beirut
Author
AU4
This data graph is about books. It talks about four books (BK1-BK4).
Information recorded about a book includes data such as; its author,
affiliation, country of birth including its capital and the population of
its capital.
PalGov © 2011 20
- 21. Practical Session
• Each student should work alone.
• The student is encouraged to first answer each query mentally from the graph
before writing and executing the SQL query, and to compare the results of the
query execution with his/her expected results.
• The student is strongly recommended to write two additional queries, execute
them on the data graph, and hand them along with the required queries.
• The student must highlight problems and challenges of executing queries on
graph-shaped data and propose solutions to the problem.
• Each student must expect to present his/her queries at class and, more
importantly, he/she must be ready to discuss the problems of querying graph-
shaped data and his/her proposed solutions.
• The final delivery should include (i) a snapshot of the built table, (ii) the SQL
queries, (iii) The results of the queries, and (iv) a one-page discussion of the
problems of querying graph-shaped data and the proposed possible solutions.
These must be handed in a report form in PDF Format.
PalGov © 2011 21
- 22. Data Integration and Open Information Systems (Tutorial II)
The Palestinian e-Government Academy
January, 2012
Tutorial II: Data Integration and Open Information Systems
Part 3
Complexities and Solutions of Querying
Graph Shaped Data
PalGov © 2011 22
- 23. Module Outline
Part 1: Querying SPO Tables
Part 2: Querying SPO Tables using SQL
Part 3: Complexities and Solutions of Querying Graph
Shaped Data
PalGov © 2011 23
- 24. Complexity of querying graph-shaped data
• Where does the complexity of querying graph-shaped
data stored in a relational table lie?
• The complexity of querying a data graph lies in the
many self-joins that are to be performed on the table.
• Accessing multiple properties for a resource requires
subject-subject joins (Star-shaped queries).
• Path expressions require subject-object joins.
• In general, a query with n edges on an SPO table
requires n-1 self-joins of that table.
PalGov © 2011 24
- 25. Solution: Indexes and Dictionaries
• What solutions do you propose to optimized queries on graph-
shaped data?
• Build Indexes.
– On each column: S, P, O
– On Permutations of two columns: SP, SO, PS, PO, …
– On Permutation of three columns: SPO, SOP, PSO, POS,
OSP, OPS, …
• Dictionary Encoding String Data?
– Replacing all literals by ids.
– Triple stores are compressed.
– Allows for faster comparison and matching operations.
RDF3X Solution
PalGov © 2011 25
- 26. Solution: Store RDF in a Different Schema
Property Tables: Clustered Property Tables
Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.
Clustered Property Table:
• A clustered property table, contains clusters of
properties that tend to be defined together.
• Multiple property tables with different clusters of
properties may be created to optimize queries.
• A key requirement for this type of property table is
that a particular property may only appear in at most
one property table.
• Advantage: Some queries can be answered directly
from the property table with no joins.
- Ex: retrieve the title of the book authored in 2001.
PalGov © 2011 26
- 27. Solution: Store RDF in a Different Schema
Property Tables: Property-Class Tables
Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.
Property-Class Tables:
• A property-class table exploits the Type property of
subjects to cluster similar sets of subjects together in
the same table.
• Unlike the first type of property table, a property may
exist in multiple property-class tables.
• This solution is found in Jena2.
• Oracle adopts a property table-like data structure (they
call it a “subject-property matrix”). However, they are
not used for primary storage, but rather as an auxiliary
data structure.
PalGov © 2011 27
- 28. Solution: Store RDF in a Different Schema
Vertically Partitioned Approach
Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.
Six Vertically Partitioned Tables:
• The triples table is rewritten into n two column tables
where n is the number of unique properties in the data.
• In each of these tables, the first column contains the
subjects that define that property and the second
column contains the object values for those subjects.
• This method is called Vertical Partitioning.
• An experimental implementation of this approach was
performed using an open source column-oriented
database system (C-Store).
PalGov © 2011 28
- 29. Discussion
• Advantages and Disadvantages of storing RDF in other
schemas?
• What do you prefer?
• Any other suggestions to optimize queries over graph-
shaped data?
PalGov © 2011 29
- 30. optional
MashQL
(under development at Birzeit University, started at the univeristy of Cyprus)
• What about summarizing the data graph, and instead of querying
the original data, we query the summary and get the same results?
• What about summarizing 15M SPO triples into less then 500,000?
• This is called Graph Signature Indexing. It is currently in research
at Birzeit University.
Initial comparison
between GS performance
and Oracle’s Semantic
Technologies
PalGov © 2011 30
- 31. optional
MashQL
(under development at Birzeit University, started at the univeristy of Cyprus)
• A graphical query formulation language for the Data Web
• A general structured-data retrieval language (not merely an
interface) RDF Input
From:
http:www.site1.com/rdf
http:www.site2.comrdf
MashQL
Article
Title ArticleTitle
Author “^Hacker”
YearPubYear > 2000
Addresses all of the following challenges:
The user does not know the schema.
There is no offline or inline schemaontology.
The query may involve multiple sources.
The query language is expressive (not a single-purpose
interface)
PalGov © 2011 31
- 32. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
Artist “^Dion” :B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
YearProdYear > 2003 :B1 :Year 2008
Title AlbumTitle :B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
Interactive Query Formulation.
MashQL queries are translated into and executed as SPARQL.
PalGov © 2011 32
- 33. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
PalGov © 2011 33
- 34. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
Everything :B1 :Title “The prayer”
Album :B1 :Artist “Celine Dion”
A1 :B1 :Artist “Josh Groban”
A2 :B1 :Year 2008
B1 :B2 rdf:Type bibo:Album
Types Individuals :B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
Background queries
SELECT X WHERE {?X ?P ?O}
Union
SELECT X WHERE {?S ?P ?X}
SELECT X WHERE {?S rdf:Type ?X}
PalGov © 2011 34
- 35. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
Artist Contains Dion :B1 :Artist “Celine Dion”
Artist
Author Equals
Equals :B1 :Artist “Josh Groban”
Contains
Contains
Title OneOf :B1 :Year 2008
ProdYear Not :B2 rdf:Type bibo:Album
Type
Between :B2 :Title “Miracle”
LessThan
Year MoreThan :B2 :Author “Celine Dion”
:B2 :Year 2004
Background query
SELECT P WHERE {?Everything ?P ?O}
PalGov © 2011 35
- 36. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
Artist “^Dion :B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
Year ProdYear MoreTh 2003 :B1 :Year 2008
Artist Equals
Equals
Author
Contains :B2 rdf:Type bibo:Album
Title OneOf :B2 :Title “Miracle”
ProdYear Not
Between
:B2 :Author “Celine Dion”
Type LessThan :B2 :Year 2004
Year MoreThan
MoreThan
Background query
SELECT P WHERE {?Everything ?P ?O}
PalGov © 2011 36
- 37. optional
MashQL Example
Celine Dion’s
Albums after
www.site1.com/rdf
2003?
RDF Input :A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
From: :A1 :ProdYear 2007
http://www.site1.com/rdf :A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani”
Query
www.site2.com/rdf
Everything :B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
Artist “^Dion :B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
YearProdYear > 2003 :B1 :Year 2008
:B2 rdf:Type bibo:Album
Title AlbumTitle :B2 :Title “Miracle”
Artist
Author :B2 :Author “Celine Dion”
Title
Title :B2 :Year 2004
ProdYear
Type
Year
Background query
SELECT P WHERE {?Everything ?P ?O}
PalGov © 2011 37
- 38. optional
MashQL Example
Celine Dion’s
Albums after
2003?
RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf
Query
Everything
Artist “^Dion”
YearProdYear > 2003
Title AlbumTitle
PREFIX S1: <http://site1.com/rdf>
PREFIX S2: <http://site1.com/rdf>
SELECT ?AlbumTitle
FROM <http://site1.com/rdf>
FROM <http://site2.com/rdf>
WHERE {
{?X S1:Artist ?X1} UNION {?X S2:Artist ?X1}
{?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2}
{{?X S1:Title ?AlbumTitle} UNION {?X S2:Title
?AlbumTitle}}
FILTER regex(?X1, “^Dion”)
FILTER (?X2 > 2003)}
PalGov © 2011 38
- 39. optional
MashQL Query Optimization
• The major challenge in MashQL is interactivity!
• User interaction must be within 100ms.
• However, querying RDF is typically slow.
• How about dealing with huge datasets!?
• Implemented Solutions (i.e., Oracle) proved to perform
weakly specially in MashQL background queries.
A query optimization solution was
developed. It is called Graph Signature
Indexing.
PalGov © 2011 39
- 40. optional
MashQL Query Optimization: Graph Signature
‘Graph Signature’ optimizes RDF queries by :
• Summarizing the RDF dataset
(Algorithms were developed)
• Then, executing the queries on the summary
instead of the original dataset
(A new execution plan was developed)
PalGov © 2011 40
- 41. optional
Experiments
• Experiments performed on 15M SPO triples (rows) of the Yago
Dataset.
• Graph Signature Index of less than 500K was built on the data.
• Two sets of queries were performed, and the results were as follows:
Extreme-case linear queries of depth 1-5:
Query A1 A2 A3 A4 A5
GS 0.344 1.953 5.250 10.234 24.672
Oracle 110.390 302.672 525.844 702.969 >20min
Queries that cover all MashQL Background queries some queries
that might occur as the final queries generated in MashQL:
Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469
Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703
PalGov © 2011 41
- 42. References
• Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query
Optimization for the Data Web - Two Disk-Based algorithms: Trace
Equivalence and Bisimilarity.
• Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation
Language for the Data Web. IEEE Transactions on Knowledge and
Data Engineering.IEEE Computer Society.
• Abadi et al, Scalable Semantic Web Data Management Using Vertical
Partitioning. In VLDB 2007.
PalGov © 2011 42