SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
Executing SPARQL Queries
         over the
   Web of Linked Data
Olaf Hartig*
Christian Bizer˚
Johann-Christoph Freytag*
*Humboldt-Universität zu Berlin ˚Freie Universität Berlin
●   Use URIs as names for things
                                                           ●   Use HTTP URIs so that people
                                                               can look up those names.
                                                           ●   When someone looks up a
                                                               URI, provide useful
                                                               information.
                                                           ●   Include links to other URIs so
                                                               that they can discover more
                                                               things.
                                                                        Tim Berners-Lee, July 2006




 My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                       ●   Use HTTP URIs so that people
                                                                                           can look up those names.
                                                                                       ●   When someone looks up a
                                                                                           URI, provide useful
                                                                                           information.
                                                                                       ●   Include links to other URIs so
                                                                                           that they can discover more
                                                                                           things.
                                                                                                    Tim Berners-Lee, July 2006
                                                         http://mymovie.db/movie1342




                         http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                 http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362


                                                                                                                http://geo.db/country21




                                                                                                                                                                 http://geo.db/country7
  http://mymovie.db/movie5112


 My Movie DB                                                                                             http://geo.db/cityCJ
                                                                                                                                          http://geo.db/cityXA

                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362


                                                                                                                http://geo.db/country21




                                                                                                                                                                 http://geo.db/country7
  http://mymovie.db/movie5112


 My Movie DB                                                                                             http://geo.db/cityCJ
                                                                                                                                          http://geo.db/cityXA

                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   The Web: a huge, globally distributed dataspace
 ●   Querying this dataspace opens new possibilities:
     ●   Aggregating data from different sources
     ●   Integrating fragmentary information
     ●   Achieving a more complete view




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1:
    data centralization


 ●   Querying a collection of
     copies from all relevant
     datasets




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1: data centralization
 ●   Querying a collection of
     copies from all relevant
     datasets




 ●   Misses unknown or new sources
 ●   Collection probably out of date
 ●   Will it scale?


Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2:
    federated query processing


 ●   Querying a mediator which                                           ?
     distributes subqueries to
     relevant sources and
     integrates the results

                                                                     ?
                                                                         ?   ?



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2: federated query processing
 ●   Querying a mediator which distributes
     subqueries to relevant sources and
     integrates the results                                              ?
 ●   Requires sources to
     provide a query service
     Requires information
                                                                     ?
 ●

     about the sources
                                                                         ?   ?

 ●   Misses unknown
     or new sources


Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Main drawback:

                                You have to know the relevant
                                  data sources in advance.
                                       You restrict yourself to
                                        the selected sources.
                                            You do not tap the
                                             full potential of
                                                the Web !




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
A novel approach:

  Link Traversal Based Query Execution

                       Allows data sources to be discovered at runtime




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                      filmingLocation
                      http://.../movie2449                                    http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                      filmingLocation
                      http://.../movie2449                                    http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                         ?loc
●   Alternately:
                                                                                 http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                                 tics      http://stat.db/.../it
                                                           statis
                                 http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                 ?loc
●   Alternately:
                                                                                          http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate                                       ?loc                  ?stat
        solutions and add retrieved data                             http://geo.../Italy http://stats.db/../it
        to the queried data set

                                                                 tics             http://stat.db/.../it
                                                           statis
                                 http://geo.../Italy
Queried data


http://.../movie2449                                             s      ?stat unem          Query
                  filmin                                tis t ic                  p_ r a
                         g   Loca                   sta                                  te
                                  t   io n   ?loc                                            ?ur
In a Nutshell

 ●   Link traversal based query execution:
     ●   Evaluation on a continuously augmented dataset
     ●   Discovery of potentially relevant data during execution
     ●   Discovery driven by intermediate solutions


 ●   Main advantage:
     ●   No need to know all data sources in advance




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Real-World Examples
 SELECT DISTINCT ?author ?phone WHERE {
     ?pub swc:isPartOf
           <http://data.semanticweb.org/conference/eswc/2009/proceedings> .
     ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
     FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .


     ?pub swrc:author ?author .
     { ?author owl:sameAs ?authorAlt }
                                              Return phone numbers of
                                        authors of ontology engineering papers
     UNION
                                                     at ESWC'09.
     { ?authorAlt owl:sameAs ?author }


     ?authorAlt foaf:phone ?phone                                       # of query results         2
 }                                                                    # of retrieved graphs      297
                                                                     # of accessed servers        16
                                                                       avg. execution time    1min 30sec
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
     ●   is a group of functions:
                OPEN, GETNEXT, CLOSE




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                                                     I1
     ●   is a group of functions:
                OPEN, GETNEXT, CLOSE
                                                                     I2
 ●   Query execution uses
     a chain of iterators
                                                                     I3




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                                                                      http://.../movie2449
                                                                                                                                 I1
     ●   is a group of functions:                                                                     filmin
                                                                                                            gLoc
                                                                                                                ation    ?loc




                OPEN, GETNEXT, CLOSE

                                                                                                         stati
                                                                                                               stics
                                                                                                                         ?stat   I2
                                                                                              ?loc

 ●   Query execution uses
     a chain of iterators
                                                                                              ?stat
                                                                                                                                 I3
     Each iterator responsible
                                                                                                        unem
                                                                                                             p
 ●                                                                                                               _rate

                                                                                                                          ?ur



     for a single triple pattern


 http://.../movie2449                                                        s   ?stat unem          Query
                   filmin                                           tis t ic               p_ r a
                          g         Loca                        sta                               te
                                         t   io n     ?loc                                            ?ur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi

    1. Substitute tpcur = μcur [ tpi ]


    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]


    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )
                      μ' = { ?s → http://db... }

    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )
                      μ' = { ?s → http://db... }

    4. Return each μcur U μ' as a result
                     { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution


 ●   Results of Ii are solutions for tp1 , … , tpi

                                                                     Ii-1 for tpi-1




                                                                        Ii for tpi




                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal


 ●   The queried data set grows



                                                                     Ii-1 for tpi-1




                                                                        Ii for tpi




                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal


 ●   The queried data set grows


 ●   Look-up Requirement:
                                                                     Ii-1 for tpi-1
       Do not evaluate tpcur until the
       queried data set contains all
       data that can be retrieved from                                  Ii for tpi
       all URIs in tpcur

                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result
                                        Initiate look-ups
                                             and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result
                                        Initiate look-ups
                                             and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Blocked Query Execution
 ●   Waiting for URI look-ups
     blocks query execution
                                                                       Ii-1 for tpi-1




                                                                          Ii for tpi




                                                                      Ii+1 for tpi+1



                                                                     Initiate look-ups
                                                                          and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Waiting for URI look-ups
     blocks query execution
 ●   URI prefetching: when a URI                                                     Ii-1 for tpi-1
     is bound to a variable initiate
     look-up in the background
                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                               Ensure look-up
                                                                 is finished       Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Initiate parallel look-up for each new URI in μ'
       6. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching


                                                                                     Ii-1 for tpi-1



                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                               Ensure look-up
                                                                 is finished       Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching


                                                                                     Ii-1 for tpi-1



                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                             Wait until look-up
                                                                is finished        Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                                                                  Ii-1 for tpi-1




                                                                                     Ii for tpi




                                                                                  Ii+1 for tpi+1


                                                             Wait until look-up
                                                                is finished




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                                                                  Ii-1 for tpi-1




                                                                                     Ii for tpi
 ●   Possible solutions:
     ●   Program parallelism
     ●   Asynchronous pipeline                                                    Ii+1 for tpi+1

 ●   Drawback: requires major                                Wait until look-up
           rewrite of existing                                  is finished
           query engines

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Postponing Iterator


 ●   Enabled by an extension of the iterator paradigm:
     ●   New function POSTPONE: take most recently provided
                                result back
     ●   Adjusted GETNEXT: either return the next result or return
                           a formerly postponed result



 ●   POSTPONE allows to temporarily reject input solution μcur




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Postponing Iterator
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
 1. Substitute tpcur = μcur [ tpi ]
 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur
 3. Find matching triples match(tpcur ) in queried data set
 4. Create solution μ' for each t in match(tpcur )
 5. Initiate parallel look-up for each new URI in μ'
 6. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
 ●   Implementation: Semantic Web Client Library (SWClLib)
          http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
 ●   Berlin SPARQL Benchmark (BSBM)
     ●   Simulates e-commerce scenario
     ●   Mix of 12 SPARQL queries
     ●   Generates datasets of different sizes (scaling factor)
 ●   Simulation of the Web of Linked Data
     ●   Linked Data server publishes BSBM datasets
 ●   Experiment
     ●   Adjusted BSBM queries link to the simulation server
     ●   Execute query mix with SWClLib

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
                                               250

                                                                                                        w/o prefetching
                                                                                                        w/ prefetching
avg. execution time per query mix in seconds




                                                                                                        non-blocking +
                                               200                                                      prefetching
                                                                                                        all data retrieved
                                                                                                        in advance

                                               150




                                               100




                                                50
                                                                                          scal.factor   # of triples   # of entities
                                                                                              10           4,971            613
                                                                                              20           8,485            928
                                                                                              30          11,999          1,245
                                                 0                                            40          16,918          1,845
                                                     10   20      30         40      50       60

                                                               BSBM scaling factor            50          22,616          2,599

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data                            60          26,108          2,914
Take-away Summary
 ●   Novel query execution approach for the Web of Data:
     ●   Utilizes the characteristics of the Web
     ●   Traverses RDF links during query execution
     ●   Discovery of new data sources
     ●   No need to know all data sources in advance
 ●   Implementation approach:
     ●   Iterator based execution with URI Prefetching
     ●   Extension of the iterator paradigm (POSTPONE)
 ●   New research challenges:
     ●   Improving result completeness
     ●   Investigating suitable caching strategies
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Try it!


 ●   SQUIN                                                           http://squin.org
     ●   Provides SWClLib functionality as a Web service
     ●   Accessible like a SPARQL endpoint
 ●   Public SQUIN service at
                      http://squin.informatik.hu-berlin.de/SQUIN/




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
These slides have been created by
                                      Olaf Hartig

                                             http://olafhartig.de


                     This work is licensed under a
       Creative Commons Attribution-Share Alike 3.0 License
           (http://creativecommons.org/licenses/by-sa/3.0/)




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Weitere ähnliche Inhalte

Mehr von Olaf Hartig

LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
 
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Olaf Hartig
 
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...Olaf Hartig
 
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...Olaf Hartig
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Olaf Hartig
 
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Olaf Hartig
 
An Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryAn Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryOlaf Hartig
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Olaf Hartig
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataOlaf Hartig
 
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataHow Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataOlaf Hartig
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataOlaf Hartig
 
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Olaf Hartig
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Olaf Hartig
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Olaf Hartig
 

Mehr von Olaf Hartig (20)

LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
 
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
 
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
 
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
An Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryAn Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and Query
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked Data
 
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataHow Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked Data
 
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 

Kürzlich hochgeladen

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Kürzlich hochgeladen (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Executing SPARQL Queries of the Web of Linked Data

  • 1. Executing SPARQL Queries over the Web of Linked Data Olaf Hartig* Christian Bizer˚ Johann-Christoph Freytag* *Humboldt-Universität zu Berlin ˚Freie Universität Berlin
  • 2. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 My Movie DB Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 3. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 4. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 5. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 6. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 7. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 8. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 9. The Web: a huge, globally distributed dataspace ● Querying this dataspace opens new possibilities: ● Aggregating data from different sources ● Integrating fragmentary information ● Achieving a more complete view Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 10. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 11. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets ● Misses unknown or new sources ● Collection probably out of date ● Will it scale? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 12. Traditional approach 2: federated query processing ● Querying a mediator which ? distributes subqueries to relevant sources and integrates the results ? ? ? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 13. Traditional approach 2: federated query processing ● Querying a mediator which distributes subqueries to relevant sources and integrates the results ? ● Requires sources to provide a query service Requires information ? ● about the sources ? ? ● Misses unknown or new sources Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 14. Main drawback: You have to know the relevant data sources in advance. You restrict yourself to the selected sources. You do not tap the full potential of the Web ! Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 15. A novel approach: Link Traversal Based Query Execution Allows data sources to be discovered at runtime Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 16. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 17. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data
  • 18. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 19. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 20. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 21. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 22. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 23. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 24. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 25. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 26. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 27. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 28. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 29. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 30. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 31. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate ?loc ?stat solutions and add retrieved data http://geo.../Italy http://stats.db/../it to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 32. In a Nutshell ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 33. Real-World Examples SELECT DISTINCT ?author ?phone WHERE { ?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> . ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel . FILTER regex( str(?topicLabel), "ontology engineering", "i" ) . ?pub swrc:author ?author . { ?author owl:sameAs ?authorAlt } Return phone numbers of authors of ontology engineering papers UNION at ESWC'09. { ?authorAlt owl:sameAs ?author } ?authorAlt foaf:phone ?phone # of query results 2 } # of retrieved graphs 297 # of accessed servers 16 avg. execution time 1min 30sec Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 34. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 35. Iterator based Query Execution ● Iterator: ● implements an operation ● is a group of functions: OPEN, GETNEXT, CLOSE Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 36. Iterator based Query Execution ● Iterator: ● implements an operation I1 ● is a group of functions: OPEN, GETNEXT, CLOSE I2 ● Query execution uses a chain of iterators I3 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 37. Iterator based Query Execution ● Iterator: ● implements an operation http://.../movie2449 I1 ● is a group of functions: filmin gLoc ation ?loc OPEN, GETNEXT, CLOSE stati stics ?stat I2 ?loc ● Query execution uses a chain of iterators ?stat I3 Each iterator responsible unem p ● _rate ?ur for a single triple pattern http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 38. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 39. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 40. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 41. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 42. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 43. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... } Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 44. Iterator based Query Execution ● Results of Ii are solutions for tp1 , … , tpi Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 45. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 46. Application to Link Traversal ● The queried data set grows Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 47. Application to Link Traversal ● The queried data set grows ● Look-up Requirement: Ii-1 for tpi-1 Do not evaluate tpcur until the queried data set contains all data that can be retrieved from Ii for tpi all URIs in tpcur Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 48. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 49. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 50. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 51. Blocked Query Execution ● Waiting for URI look-ups blocks query execution Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 52. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 53. URI Prefetching ● Waiting for URI look-ups blocks query execution ● URI prefetching: when a URI Ii-1 for tpi-1 is bound to a variable initiate look-up in the background Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 54. URI Prefetching ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 55. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 56. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 57. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 58. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi ● Possible solutions: ● Program parallelism ● Asynchronous pipeline Ii+1 for tpi+1 ● Drawback: requires major Wait until look-up rewrite of existing is finished query engines Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 59. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 60. Postponing Iterator ● Enabled by an extension of the iterator paradigm: ● New function POSTPONE: take most recently provided result back ● Adjusted GETNEXT: either return the next result or return a formerly postponed result ● POSTPONE allows to temporarily reject input solution μcur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 61. Postponing Iterator ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 62. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 63. Evaluation ● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ ● Berlin SPARQL Benchmark (BSBM) ● Simulates e-commerce scenario ● Mix of 12 SPARQL queries ● Generates datasets of different sizes (scaling factor) ● Simulation of the Web of Linked Data ● Linked Data server publishes BSBM datasets ● Experiment ● Adjusted BSBM queries link to the simulation server ● Execute query mix with SWClLib Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 64. Evaluation 250 w/o prefetching w/ prefetching avg. execution time per query mix in seconds non-blocking + 200 prefetching all data retrieved in advance 150 100 50 scal.factor # of triples # of entities 10 4,971 613 20 8,485 928 30 11,999 1,245 0 40 16,918 1,845 10 20 30 40 50 60 BSBM scaling factor 50 22,616 2,599 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
  • 65. Take-away Summary ● Novel query execution approach for the Web of Data: ● Utilizes the characteristics of the Web ● Traverses RDF links during query execution ● Discovery of new data sources ● No need to know all data sources in advance ● Implementation approach: ● Iterator based execution with URI Prefetching ● Extension of the iterator paradigm (POSTPONE) ● New research challenges: ● Improving result completeness ● Investigating suitable caching strategies Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 66. Try it! ● SQUIN http://squin.org ● Provides SWClLib functionality as a Web service ● Accessible like a SPARQL endpoint ● Public SQUIN service at http://squin.informatik.hu-berlin.de/SQUIN/ Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 67. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data