SlideShare a Scribd company logo
1 of 37
Query-Load aware partitioning
                     of RDF data
                         Luis Galárraga
                    Saarbrücken, July 4th 2011




July 4th, 2011    Query load aware partitioning of RDF data   1/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   2/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   3/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   4/37
Motivation
  ●   Increasing interest in semantic
      representations for knowledge.
            –    Increasing number of data providers (e.g
                   Linked Data initiative)
            –    Semantic Web: “Web of knowledge”
            –    Growing data sources (e.g Wikipedia)
  ●   Need for efficient query processing
            –    Centralized solutions might become infeasible
                  as data steadily grows.
            –    Taking advantage of parallelism can help
                  improve performance.
July 4th, 2011         Query load aware partitioning of RDF data   5/37
Data keeps growing




                             Dbpedia datasets size growth
               40                                                                      Dbpedia 3.6
               35                                                                  3.500.000 resources
               30
                                                                                      0.5 billion facts
               25
                                                                                    http://dbpedia.org
  Size in GB




               20
                                                                        Size
               15

               10                                                              Semantic Web Challenge 2011
                5
                                                                                         2 billion triples
                0
               10/10/06   02/22/08    07/06/09    11/18/10   04/01/12
                                                                                         20 GB dataset
                                       Date                                    http://challenge.semanticweb.org

July 4th, 2011                                Query load aware partitioning of RDF data                    6/37
Data keeps growing




July 4th, 2011     Query load aware partitioning of RDF data   7/37
RDF and triple stores
  ●   Resource Description Framework is a
      language to represent knowledge about
      resources (things).
            –    Resources are identified by URIs
                 <http://www.mpii.de/yago/resource/John_Doe>


 ●    It uses statements or triples
 PREFIX yago: <http://www.mpii.de/yago/resource/John_Doe>
       PREFIX foaf: <http://xmlns.com/foaf/0.1/name>

                   yago:John_Doe foaf:name “John Doe”

                       Subject         Predicate       Object
July 4th, 2011           Query load aware partitioning of RDF data   8/37
RDF and triple stores

  ●   Data in a triple store can be seen as data
      graph or a huge 3-columns relation.
   yago:John_Doe

                                                 Subject         Predicate          Object
                   foaf:name
foaf:knows                                yago:John_Doe         foaf:name    “John Doe”

                      “John Doe”          yago:John_Doe         foaf:knows   yago:Max_Mustermann
                                          yago:Max_Mustermann   foaf:name    “Max Mustermann”
yago:Max_Mustermann


   foaf:name
                     “Max Mustermann”


  ●   Existing solutions like Jena or Sesame use
      some variation of the 3-columns relation.
July 4th, 2011                 Query load aware partitioning of RDF data                     9/37
How to query RDF?

  ●   Use of data graph abstraction.
            –    SQL designed for relational databases
  ●   SPARQL defines queries as subgraphs
      patterns to be matched within the data graph.
                                                                        a
                                                    yago:John_Doe               foaf:Person
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX yago: <http://www.mpii.de/yago/resource>                     foaf:name
SELECT ?name                                      foaf:knows
WHERE {
                                                                                      a
  ?person a foaf:Person .                                               “John Doe”
  ?person foaf:knows yago:Max_Mustermann .
  ?person foaf:name ?name .                       yago:Max_Mustermann
}

                                                     foaf:name
                                                                      “Max Mustermann”


July 4th, 2011         Query load aware partitioning of RDF data                       10/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   11/37
Fragmentation in databases
 ●   Why? To exploit processing power of
     multiples nodes by decomposing operations
     into parallel sub-operations.
 ●   In relational databases:
            –    Horizontal fragmentation [Dimovski, 2010]
            –    Vertical fragmentation [Hoffer 1975]
            –    Workload driven [Curino, 2010]
 ●   It has to be combined with an allocation
     strategy (assignment of fragments to hosts)
July 4th, 2011         Query load aware partitioning of RDF data   12/37
Horizontal & vertical fragmentation
                         Subject            Predicate             Object
                  yago:John_Doe            foaf:name      “John Doe”
                  yago:Max_Mustermann      foaf:name      “Max Mustermann”
                  yago:John_Doe            foaf:knows     yago:Max_Mustermann
                  yago:John_Doe            foaf:mbox      “jdoe@wherever.com”
                  yago:Juan_Perez          foaf:mbox      “jprz@wherever.com”


  Horizontal or tuple based fragmentation                  Vertical or column based fragmentation

       Subject         Predicate           Object                      Subject              Object
yago:John_Doe         foaf:name     “John Doe”                yago:John_Doe          “John Doe”
yago:Max_Mustermann   foaf:name     “Max Mustermann”          yago:Max_Mustermann “Max Mustermann”
                                                                              Subject         Predicate
                                                              yago:John_Doe        yago:Max_Mustermann
                                                                       yago:John_Doe         foaf:name
       Subject         Predicate           Object
                                                              yago:John_Doe        “jdoe@whatever.com”
                                                                       yago:Max_Mustermann   foaf:name
yago:John_Doe         foaf:knows    yago:Max_Mustermann
                                                              yago:Juan_Perez      “jprz@wherever.com”
                                                                       yago:John_Doe         foaf:knows
      Subject          Predicate           Object
                                                                         yago:John_Doe            foaf:mbox
yago:John_Doe         foaf:mbox     “jdoe@wherever.com”
                                                                         yago:Juan_Perez          foaf:mbox
yago:Juan_Perez       foaf:mbox     “jprz@wherever.com”

July 4th, 2011               Query load aware partitioning of RDF data                                13/37
Workload-driven fragmentation
  ●   Relationships between tuples as a graph.
            –    A node per tuple. They share an edge if they
                   are required by the same transaction.
  ●   Partition the graph

  ●   Try to keep
      transactions as
      local as possible



July 4th, 2011         Query load aware partitioning of RDF data   14/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   15/37
Observations
   ●   RDF query load
             –   Updates and insertions are rare
             –   Join oriented
   ●   Data graph
             –   Subjects more selective than objects which are
                  more selective than predicates.
             –   Constants unstable for fragmentation.
   ●   Distributed Query Processing
             –   Communication costs dominate distributed
                  transactions
July 4th, 2011        Query load aware partitioning of RDF data   16/37
Goals
   ●   Fragment RDF dataset based on a workload
       to guarantee:
   ●   Small latency
                 –   Limit communication costs by maximizing
                       local transactions but keeping parallelism
   ●   High throughput
   ●   Scalability
   ●   Load balancing
                 –   Allocate fragments such that hosts get
                       approximately the same load.
July 4th, 2011            Query load aware partitioning of RDF data   17/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   18/37
Proposed methodology
                Partitioning phase
    Determine a complete and non-redundant
 fragmentation of the triple store using a minimal
 set of predicates extracted from the query load.

                 Allocation phase
     Assign fragments to hosts to guarantee load
                     balancing


July 4th, 2011      Query load aware partitioning of RDF data   19/37
Normalizing the query load

  ●   Extract independent sub-queries.
            –    We still want independent subqueries to run in
                  parallel.
  ●   Normalize triple patterns:
            –    Turn infrequent URIs or literals into variables.
            –    Capture patterns of access
            –    Not applicable to data types with a reduced
                  value space (e.g xsd:boolean = {true, false})


July 4th, 2011         Query load aware partitioning of RDF data   20/37
Normalizing the query load
      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      SELECT ?name
      WHERE{                                         Infrequent literal
         ?x foaf:name ?name .
         ?x foaf:mbox "alice@wherever.com"
      }
      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      SELECT ?name
      WHERE{                                         Infrequent literal
         ?x foaf:name ?name .
         ?x foaf:mbox "bob@wherever.com"
      }
      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      SELECT ?name
      WHERE{
         ?x foaf:name ?name .
         ?x foaf:mbox ?mbox
      }

July 4th, 2011       Query load aware partitioning of RDF data            21/37
Extracting predicates
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX yago: <http://www.mpii.de/yago/..>
SELECT ?name                                                  P1:   Predicate = foaf:name from A
WHERE{                                                        P2:   Predicate = foaf:mbox from B
  A ?x foaf:name ?name .                                      P3:   Predicate = foaf:knows from C
  B ?x foaf:mbox ?mbox                                        P4:   Object = yago:John_Doe from C
}
SELECT ?name
WHERE{
  A ?z foaf:name ?name .
  B ?z foaf:mbox ?mbox .
                                       ●                     Remember where the
}
  C ?z foaf:knows yago:John_Doe  .
                                                             predicates come from.
         A: ?x foaf:name ?name
                 Freq: 2                                 ●   Store join relationships
                        1
   C: ?x foaf:knows f:John_Doe
                                       2
                                                             between patterns
              Freq: 1                                        among the queries:
                    1            B: ?x foaf:mbox ?mbox
                                         Freq: 2
                                                             Global Query Graph
 July 4th, 2011                  Query load aware partitioning of RDF data                   22/37
Minterms & Fragments
   ●   Conjunctive expressions over the a set of
       predicates. e.g :   Minterm 00 = ~P ^ ~P                 1     2

       P1: Predicate = foaf:name             Minterm 01 = ~P1 ^ P2
       P2: Predicate = foaf:mbox             Minterm 10 = P1 ^ ~P2
                                             Minterm 11 = P1 ^ P2
   ●   A minterm defines a fragment.
                 –   Set of triples satisfying the logical function
   ●   The set of all possible minterms determines a
       non-redundant and complete fragmentation.
                 –   But we want a minimal set of predicates.
July 4th, 2011            Query load aware partitioning of RDF data       23/37
Optimal Horizontal Fragmentation
    ●   A predicate is redundant if the
        fragmentation is insensitive to its presence
        or absence.
    ●   Start with an empty set
    ●   For every extracted predicate:
                 –   Add it to the set and fragment the
                      database building the minterms
                 –   If the fragment is redundant, ignore it.
                 –   If not redundant, check if it did not make
                        previously added predicates redundant.

July 4th, 2011            Query load aware partitioning of RDF data   24/37
Optimal Horizontal Partitioning
        P1: Predicate = foaf:name
        P2: Predicate = foaf:mbox

        Minterm   00:   Predicate   != foaf:mbox AND Predicate != foaf:name
        Minterm   01:   Predicate   != foaf:mbox AND Predicate = foaf:name
        Minterm   10:   Predicate   = foaf:mbox AND Predicate != foaf:name
        Minterm   11:   Predicate   = foaf:mbox AND Predicate = foaf:name


    ●
        The algorithm is O(n2) in the number of
        predicates.
    ●   Even though there is an exponential
        number of minterms, many will be not
        satisfiable.
July 4th, 2011             Query load aware partitioning of RDF data          25/37
Optimal Horizontal Partitioning
        P1: Predicate = foaf:name
        P2: Predicate = foaf:mbox

        Minterm   00:   Predicate   != foaf:mbox AND Predicate != foaf:name
        Minterm   01:   Predicate   != foaf:mbox AND Predicate = foaf:name
        Minterm   10:   Predicate   = foaf:mbox AND Predicate != foaf:name
        Minterm   11:   Predicate   = foaf:mbox AND Predicate = foaf:name


    ●
        The algorithm is O(n2) in the number of
        predicates.
    ●   Even though there is an exponential
        number of minterms, many will be not
        satisfiable.
July 4th, 2011             Query load aware partitioning of RDF data          26/37
Optimal Horizontal Partitioning
        P1: Predicate = foaf:name
        P2: Predicate = foaf:mbox

        Minterm 00: Predicate != foaf:mbox AND Predicate != foaf:name
        Minterm 01: Predicate = foaf:name
        Minterm 10: Predicate = foaf:mbox




    ●
        The algorithm is O(n2) in the number of
        predicates.
    ●   Even though there is an exponential
        number of minterms, many will be not
        satisfiable.
July 4th, 2011         Query load aware partitioning of RDF data        27/37
Allocating the fragments
  ●   Fragments have access frequencies derived
      from their provenance and might join in the
      query load.                A: ?x foaf:name ?name
                                         Freq: 2

 Minterm 01: Predicate = foaf:name from A                          1              2
 Minterm 10: Predicate = foaf:mbox from B
                                              C: ?x foaf:knows f:John_Doe
                                                         Freq: 1

                                                               1            B: ?x foaf:mbox ?mbox
                                                                                    Freq: 2
  ●   Allocate fragments to hosts so that:
      –   They are in the same host if they can join in the
          query load.
      –   Hosts receive approximately the same load.

July 4th, 2011       Query load aware partitioning of RDF data                          28/37
Allocating the fragments
   ●    Sort fragments descendent by load
   ●    For every fragment, calculate the benefit of
        assigning it to every host.
                                                       TL
                                T L=∑ F g×S g U H =
                                       g               n
                                              UH
                      benefit  f , H =              × ∑ [ E f , g1]
                                           U H CL H g∈H
             F g=Size of fragment g ; F g =Frequency of access of fragment g
                                       n=number of hosts
                   benefit  f , H =Benefit of assigning fragment f to host H
               CL H =Current load for host H (from fragments assigned so far)
        E g , j =Weight between fragments f and g in the global query load graph

    ●   Assign it to the most beneficial host
July 4th, 2011           Query load aware partitioning of RDF data                   29/37
Outline

  ●   Motivation & background
  ●   Fragmentation in databases
  ●   Observations & goals
  ●   Proposed methodology
  ●   Preliminary results




July 4th, 2011   Query load aware partitioning of RDF data   30/37
Evaluating query complexity

  ●   Local query graph
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX yago: <http://www.mpii.de/yago/..>
 SELECT ?name
 WHERE{
    ?z foaf:name ?name .                   ?z foaf:name ?name
    ?z foaf:mbox ?mbox .
    ?z foaf:knows yago:John_Doe   .
 }



                                          ?z foaf:knows f:John_Doe


                                                                     ?z foaf:mbox ?mbox




July 4th, 2011        Query load aware partitioning of RDF data                      31/37
Evaluating the fragmentation
 ●   Distributed query graph for a query Q
     obtained from global query graph +
     fragments definition + Q query graph
            –
            Fragment 10                                         Fragment 01
        Predicate = foaf:mbox                               Predicate = foaf:name
            Relevant to B                                       Relevant to A
                                          Host 1


PREFIX foaf: <http://xmlns.com/foaf/0.1/>                               Fragment 00
PREFIX yago: <http://www.mpii.de/yago/..>                       Predicate != foaf:mbox AND
SELECT ?name                                                       Predicate != foaf:name
WHERE{                                                                 Relevant to C
 A ?z foaf:name ?name .                                     Host 2
 B ?z foaf:mbox ?mbox .                                                 Remote edge
 C ?z foaf:knows yago:John_Doe   .
}                                                                       Local edge
July 4th, 2011              Query load aware partitioning of RDF data                        32/37
Preliminary results

  ●   Metrics to evaluate query complexity:
            –    Number of edges in local query graph
            –    Number of remote edges in the distributed
                  query graph
  ●   Metrics to evaluate fragmentation quality for a
      query
            –    Number of local edges in the distributed query
                  graph
            –    Number of hosts required to answer the query

July 4th, 2011         Query load aware partitioning of RDF data   33/37
Preliminary results
   Number of hosts: 5
 Run #                  Dataset                Dataset description                   File size          # triples          #sub                #
                                                                                       (MB)                               queries          predicates
                1   Subset Dbpedia          Dbpedia foaf information                          136        1745624                     10                  9
                                            (names and dates)
                2   Subset Dbpedia          Dbpedia foaf information                          136        1745624                     10               10
                                            (names and dates)
                3   YAGO Core               YAGO Core databaset                         2662.4          26227687                      9               21
                4   YAGO sample             RDF-3x YAGO dump                            3276.8          35238246                     19               35
                                            sample

   Edge count in local query graphs vs Number of contacted hosts                         Local edges vs remote edges in Distributed Query Graph
                                Per run of the algorithm                                                        per run of the algorithm
          2.5                                                                            2
                                                                                        1.8
           2                                                                                                                                  Average local
                                                                                        1.6
                                                                                                                                              edges in
                                                              Hosts contacted           1.4                                                   Distributed Query
          1.5
Average




                                                              Edges in local query      1.2                                                   Graph
                                                              graph                                                                           Average remote
           1                                                                             1                                                    edges in
                                                                                                                                              Distributed Query
                                                                                        0.8                                                   Graph
          0.5                                                                           0.6
                                                                                        0.4
           0
                    1       2           3          4                                    0.2

                                 Runs                                                    0
                                                                                                    1       2         3          4


    July 4th, 2011                              Query load aware partitioning of RDF data                                                              34/37
Conclusions

  ●   Use of standard techniques from relational
      databases
  ●   Method independent from actual storage
      implementation.
            –    Huge 3-columns table abstraction
  ●   It can be easily extended to support
      redundancy.
  ●   Applicable to evolving query loads
            –    By changing the level of constants
                  normalization
July 4th, 2011         Query load aware partitioning of RDF data   35/37
Future work
  ●   Evaluate quality of partitioning
            –    Using real execution costs: need of a
                  distributed index + query planner +
                  distributed cost model
            –    Against other approaches (e.g fragmentation
                  by predicate)
  ●   Evaluate greedy allocation algorithm
            –    Against optimal solution, round robin, etc..
  ●   Use of estimates for fragment sizes
            –    So far extracted via queries.
July 4th, 2011         Query load aware partitioning of RDF data   36/37
Thanks for your attention




July 4th, 2011     Query load aware partitioning of RDF data   37/37

More Related Content

What's hot

Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedJakob .
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demobarakmich
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes senseFabien Gandon
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
Linking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevLinking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevTrudat
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data ChallengeKnud Möller
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 

What's hot (20)

Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Jgd User Group Demo
Jgd User Group DemoJgd User Group Demo
Jgd User Group Demo
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Linking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevLinking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko Valtchev
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data Challenge
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Sem webmaubeuge
Sem webmaubeugeSem webmaubeuge
Sem webmaubeuge
 

Similar to Query-Load aware partitioning of RDF data

Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
FOAF for Social Network Portability
FOAF for Social Network PortabilityFOAF for Social Network Portability
FOAF for Social Network PortabilityUldis Bojars
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?Aidan Hogan
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFdonaldlsmithjr
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationReynold Xin
 
Jen Ferguson "A tale of two projects"
Jen Ferguson "A tale of two projects"Jen Ferguson "A tale of two projects"
Jen Ferguson "A tale of two projects"The TMC Library
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technologySteve Ray
 
Metadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskMetadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskNico Carver
 
Is linked data something for me?
Is linked data something for me?Is linked data something for me?
Is linked data something for me?Christophe Guéret
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Serendipity in Linked Open Data
Serendipity in Linked Open DataSerendipity in Linked Open Data
Serendipity in Linked Open Datai_serena
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLFariz Darari
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarMustafa Jarrar
 

Similar to Query-Load aware partitioning of RDF data (20)

Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
FOAF for Social Network Portability
FOAF for Social Network PortabilityFOAF for Social Network Portability
FOAF for Social Network Portability
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDF
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Jen Ferguson "A tale of two projects"
Jen Ferguson "A tale of two projects"Jen Ferguson "A tale of two projects"
Jen Ferguson "A tale of two projects"
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technology
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Metadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskMetadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at Risk
 
Is linked data something for me?
Is linked data something for me?Is linked data something for me?
Is linked data something for me?
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Serendipity in Linked Open Data
Serendipity in Linked Open DataSerendipity in Linked Open Data
Serendipity in Linked Open Data
 
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled VocabulariesxAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
 

More from Luis Galárraga

La Théorie de l'évolution
La Théorie de l'évolutionLa Théorie de l'évolution
La Théorie de l'évolutionLuis Galárraga
 
Rule Mining and Applications in Social Data
Rule Mining and Applications in Social DataRule Mining and Applications in Social Data
Rule Mining and Applications in Social DataLuis Galárraga
 
Die ursprünglichen Völker des Amerikas
Die ursprünglichen Völker des AmerikasDie ursprünglichen Völker des Amerikas
Die ursprünglichen Völker des AmerikasLuis Galárraga
 
Administración contenidos con Joomla!
Administración contenidos con Joomla!Administración contenidos con Joomla!
Administración contenidos con Joomla!Luis Galárraga
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTsLuis Galárraga
 
Minimizing cost in distributed multiquery processing applications
Minimizing cost in distributed multiquery processing applicationsMinimizing cost in distributed multiquery processing applications
Minimizing cost in distributed multiquery processing applicationsLuis Galárraga
 
Administración de contenidos para centros de investigación en Joomla!
Administración de contenidos para centros de investigación en Joomla!Administración de contenidos para centros de investigación en Joomla!
Administración de contenidos para centros de investigación en Joomla!Luis Galárraga
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed CoordinationLuis Galárraga
 

More from Luis Galárraga (15)

La Théorie de l'évolution
La Théorie de l'évolutionLa Théorie de l'évolution
La Théorie de l'évolution
 
La Mama Negra
La Mama NegraLa Mama Negra
La Mama Negra
 
Rule Mining and Applications in Social Data
Rule Mining and Applications in Social DataRule Mining and Applications in Social Data
Rule Mining and Applications in Social Data
 
Die ursprünglichen Völker des Amerikas
Die ursprünglichen Völker des AmerikasDie ursprünglichen Völker des Amerikas
Die ursprünglichen Völker des Amerikas
 
Freie Software
Freie SoftwareFreie Software
Freie Software
 
Administración contenidos con Joomla!
Administración contenidos con Joomla!Administración contenidos con Joomla!
Administración contenidos con Joomla!
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTs
 
Minimizing cost in distributed multiquery processing applications
Minimizing cost in distributed multiquery processing applicationsMinimizing cost in distributed multiquery processing applications
Minimizing cost in distributed multiquery processing applications
 
Administración de contenidos para centros de investigación en Joomla!
Administración de contenidos para centros de investigación en Joomla!Administración de contenidos para centros de investigación en Joomla!
Administración de contenidos para centros de investigación en Joomla!
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordination
 
El nuevo Joomla! 1.6
El nuevo Joomla! 1.6El nuevo Joomla! 1.6
El nuevo Joomla! 1.6
 
Desarrollando FOSS
Desarrollando FOSSDesarrollando FOSS
Desarrollando FOSS
 
Kml Diapositivas
Kml DiapositivasKml Diapositivas
Kml Diapositivas
 
Open Scratch
Open ScratchOpen Scratch
Open Scratch
 
Mashups
MashupsMashups
Mashups
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Query-Load aware partitioning of RDF data

  • 1. Query-Load aware partitioning of RDF data Luis Galárraga Saarbrücken, July 4th 2011 July 4th, 2011 Query load aware partitioning of RDF data 1/37
  • 2. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 2/37
  • 3. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 3/37
  • 4. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 4/37
  • 5. Motivation ● Increasing interest in semantic representations for knowledge. – Increasing number of data providers (e.g Linked Data initiative) – Semantic Web: “Web of knowledge” – Growing data sources (e.g Wikipedia) ● Need for efficient query processing – Centralized solutions might become infeasible as data steadily grows. – Taking advantage of parallelism can help improve performance. July 4th, 2011 Query load aware partitioning of RDF data 5/37
  • 6. Data keeps growing Dbpedia datasets size growth 40 Dbpedia 3.6 35 3.500.000 resources 30 0.5 billion facts 25 http://dbpedia.org Size in GB 20 Size 15 10 Semantic Web Challenge 2011 5 2 billion triples 0 10/10/06 02/22/08 07/06/09 11/18/10 04/01/12 20 GB dataset Date http://challenge.semanticweb.org July 4th, 2011 Query load aware partitioning of RDF data 6/37
  • 7. Data keeps growing July 4th, 2011 Query load aware partitioning of RDF data 7/37
  • 8. RDF and triple stores ● Resource Description Framework is a language to represent knowledge about resources (things). – Resources are identified by URIs <http://www.mpii.de/yago/resource/John_Doe> ● It uses statements or triples PREFIX yago: <http://www.mpii.de/yago/resource/John_Doe> PREFIX foaf: <http://xmlns.com/foaf/0.1/name> yago:John_Doe foaf:name “John Doe” Subject Predicate Object July 4th, 2011 Query load aware partitioning of RDF data 8/37
  • 9. RDF and triple stores ● Data in a triple store can be seen as data graph or a huge 3-columns relation. yago:John_Doe Subject Predicate Object foaf:name foaf:knows yago:John_Doe foaf:name “John Doe” “John Doe” yago:John_Doe foaf:knows yago:Max_Mustermann yago:Max_Mustermann foaf:name “Max Mustermann” yago:Max_Mustermann foaf:name “Max Mustermann” ● Existing solutions like Jena or Sesame use some variation of the 3-columns relation. July 4th, 2011 Query load aware partitioning of RDF data 9/37
  • 10. How to query RDF? ● Use of data graph abstraction. – SQL designed for relational databases ● SPARQL defines queries as subgraphs patterns to be matched within the data graph. a yago:John_Doe foaf:Person PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX yago: <http://www.mpii.de/yago/resource> foaf:name SELECT ?name foaf:knows WHERE { a ?person a foaf:Person . “John Doe” ?person foaf:knows yago:Max_Mustermann . ?person foaf:name ?name . yago:Max_Mustermann } foaf:name “Max Mustermann” July 4th, 2011 Query load aware partitioning of RDF data 10/37
  • 11. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 11/37
  • 12. Fragmentation in databases ● Why? To exploit processing power of multiples nodes by decomposing operations into parallel sub-operations. ● In relational databases: – Horizontal fragmentation [Dimovski, 2010] – Vertical fragmentation [Hoffer 1975] – Workload driven [Curino, 2010] ● It has to be combined with an allocation strategy (assignment of fragments to hosts) July 4th, 2011 Query load aware partitioning of RDF data 12/37
  • 13. Horizontal & vertical fragmentation Subject Predicate Object yago:John_Doe foaf:name “John Doe” yago:Max_Mustermann foaf:name “Max Mustermann” yago:John_Doe foaf:knows yago:Max_Mustermann yago:John_Doe foaf:mbox “jdoe@wherever.com” yago:Juan_Perez foaf:mbox “jprz@wherever.com” Horizontal or tuple based fragmentation Vertical or column based fragmentation Subject Predicate Object Subject Object yago:John_Doe foaf:name “John Doe” yago:John_Doe “John Doe” yago:Max_Mustermann foaf:name “Max Mustermann” yago:Max_Mustermann “Max Mustermann” Subject Predicate yago:John_Doe yago:Max_Mustermann yago:John_Doe foaf:name Subject Predicate Object yago:John_Doe “jdoe@whatever.com” yago:Max_Mustermann foaf:name yago:John_Doe foaf:knows yago:Max_Mustermann yago:Juan_Perez “jprz@wherever.com” yago:John_Doe foaf:knows Subject Predicate Object yago:John_Doe foaf:mbox yago:John_Doe foaf:mbox “jdoe@wherever.com” yago:Juan_Perez foaf:mbox yago:Juan_Perez foaf:mbox “jprz@wherever.com” July 4th, 2011 Query load aware partitioning of RDF data 13/37
  • 14. Workload-driven fragmentation ● Relationships between tuples as a graph. – A node per tuple. They share an edge if they are required by the same transaction. ● Partition the graph ● Try to keep transactions as local as possible July 4th, 2011 Query load aware partitioning of RDF data 14/37
  • 15. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 15/37
  • 16. Observations ● RDF query load – Updates and insertions are rare – Join oriented ● Data graph – Subjects more selective than objects which are more selective than predicates. – Constants unstable for fragmentation. ● Distributed Query Processing – Communication costs dominate distributed transactions July 4th, 2011 Query load aware partitioning of RDF data 16/37
  • 17. Goals ● Fragment RDF dataset based on a workload to guarantee: ● Small latency – Limit communication costs by maximizing local transactions but keeping parallelism ● High throughput ● Scalability ● Load balancing – Allocate fragments such that hosts get approximately the same load. July 4th, 2011 Query load aware partitioning of RDF data 17/37
  • 18. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 18/37
  • 19. Proposed methodology Partitioning phase Determine a complete and non-redundant fragmentation of the triple store using a minimal set of predicates extracted from the query load. Allocation phase Assign fragments to hosts to guarantee load balancing July 4th, 2011 Query load aware partitioning of RDF data 19/37
  • 20. Normalizing the query load ● Extract independent sub-queries. – We still want independent subqueries to run in parallel. ● Normalize triple patterns: – Turn infrequent URIs or literals into variables. – Capture patterns of access – Not applicable to data types with a reduced value space (e.g xsd:boolean = {true, false}) July 4th, 2011 Query load aware partitioning of RDF data 20/37
  • 21. Normalizing the query load PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE{ Infrequent literal ?x foaf:name ?name . ?x foaf:mbox "alice@wherever.com" } PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE{ Infrequent literal ?x foaf:name ?name . ?x foaf:mbox "bob@wherever.com" } PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE{ ?x foaf:name ?name . ?x foaf:mbox ?mbox } July 4th, 2011 Query load aware partitioning of RDF data 21/37
  • 22. Extracting predicates PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX yago: <http://www.mpii.de/yago/..> SELECT ?name P1: Predicate = foaf:name from A WHERE{ P2: Predicate = foaf:mbox from B A ?x foaf:name ?name . P3: Predicate = foaf:knows from C B ?x foaf:mbox ?mbox P4: Object = yago:John_Doe from C } SELECT ?name WHERE{ A ?z foaf:name ?name . B ?z foaf:mbox ?mbox . ● Remember where the } C ?z foaf:knows yago:John_Doe . predicates come from. A: ?x foaf:name ?name Freq: 2 ● Store join relationships 1 C: ?x foaf:knows f:John_Doe 2 between patterns Freq: 1 among the queries: 1 B: ?x foaf:mbox ?mbox Freq: 2 Global Query Graph July 4th, 2011 Query load aware partitioning of RDF data 22/37
  • 23. Minterms & Fragments ● Conjunctive expressions over the a set of predicates. e.g : Minterm 00 = ~P ^ ~P 1 2 P1: Predicate = foaf:name Minterm 01 = ~P1 ^ P2 P2: Predicate = foaf:mbox Minterm 10 = P1 ^ ~P2 Minterm 11 = P1 ^ P2 ● A minterm defines a fragment. – Set of triples satisfying the logical function ● The set of all possible minterms determines a non-redundant and complete fragmentation. – But we want a minimal set of predicates. July 4th, 2011 Query load aware partitioning of RDF data 23/37
  • 24. Optimal Horizontal Fragmentation ● A predicate is redundant if the fragmentation is insensitive to its presence or absence. ● Start with an empty set ● For every extracted predicate: – Add it to the set and fragment the database building the minterms – If the fragment is redundant, ignore it. – If not redundant, check if it did not make previously added predicates redundant. July 4th, 2011 Query load aware partitioning of RDF data 24/37
  • 25. Optimal Horizontal Partitioning P1: Predicate = foaf:name P2: Predicate = foaf:mbox Minterm 00: Predicate != foaf:mbox AND Predicate != foaf:name Minterm 01: Predicate != foaf:mbox AND Predicate = foaf:name Minterm 10: Predicate = foaf:mbox AND Predicate != foaf:name Minterm 11: Predicate = foaf:mbox AND Predicate = foaf:name ● The algorithm is O(n2) in the number of predicates. ● Even though there is an exponential number of minterms, many will be not satisfiable. July 4th, 2011 Query load aware partitioning of RDF data 25/37
  • 26. Optimal Horizontal Partitioning P1: Predicate = foaf:name P2: Predicate = foaf:mbox Minterm 00: Predicate != foaf:mbox AND Predicate != foaf:name Minterm 01: Predicate != foaf:mbox AND Predicate = foaf:name Minterm 10: Predicate = foaf:mbox AND Predicate != foaf:name Minterm 11: Predicate = foaf:mbox AND Predicate = foaf:name ● The algorithm is O(n2) in the number of predicates. ● Even though there is an exponential number of minterms, many will be not satisfiable. July 4th, 2011 Query load aware partitioning of RDF data 26/37
  • 27. Optimal Horizontal Partitioning P1: Predicate = foaf:name P2: Predicate = foaf:mbox Minterm 00: Predicate != foaf:mbox AND Predicate != foaf:name Minterm 01: Predicate = foaf:name Minterm 10: Predicate = foaf:mbox ● The algorithm is O(n2) in the number of predicates. ● Even though there is an exponential number of minterms, many will be not satisfiable. July 4th, 2011 Query load aware partitioning of RDF data 27/37
  • 28. Allocating the fragments ● Fragments have access frequencies derived from their provenance and might join in the query load. A: ?x foaf:name ?name Freq: 2 Minterm 01: Predicate = foaf:name from A 1 2 Minterm 10: Predicate = foaf:mbox from B C: ?x foaf:knows f:John_Doe Freq: 1 1 B: ?x foaf:mbox ?mbox Freq: 2 ● Allocate fragments to hosts so that: – They are in the same host if they can join in the query load. – Hosts receive approximately the same load. July 4th, 2011 Query load aware partitioning of RDF data 28/37
  • 29. Allocating the fragments ● Sort fragments descendent by load ● For every fragment, calculate the benefit of assigning it to every host. TL T L=∑ F g×S g U H = g n UH benefit  f , H = × ∑ [ E f , g1] U H CL H g∈H F g=Size of fragment g ; F g =Frequency of access of fragment g n=number of hosts benefit  f , H =Benefit of assigning fragment f to host H CL H =Current load for host H (from fragments assigned so far) E g , j =Weight between fragments f and g in the global query load graph ● Assign it to the most beneficial host July 4th, 2011 Query load aware partitioning of RDF data 29/37
  • 30. Outline ● Motivation & background ● Fragmentation in databases ● Observations & goals ● Proposed methodology ● Preliminary results July 4th, 2011 Query load aware partitioning of RDF data 30/37
  • 31. Evaluating query complexity ● Local query graph PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX yago: <http://www.mpii.de/yago/..> SELECT ?name WHERE{ ?z foaf:name ?name . ?z foaf:name ?name ?z foaf:mbox ?mbox . ?z foaf:knows yago:John_Doe . } ?z foaf:knows f:John_Doe ?z foaf:mbox ?mbox July 4th, 2011 Query load aware partitioning of RDF data 31/37
  • 32. Evaluating the fragmentation ● Distributed query graph for a query Q obtained from global query graph + fragments definition + Q query graph – Fragment 10 Fragment 01 Predicate = foaf:mbox Predicate = foaf:name Relevant to B Relevant to A Host 1 PREFIX foaf: <http://xmlns.com/foaf/0.1/> Fragment 00 PREFIX yago: <http://www.mpii.de/yago/..> Predicate != foaf:mbox AND SELECT ?name Predicate != foaf:name WHERE{ Relevant to C A ?z foaf:name ?name . Host 2 B ?z foaf:mbox ?mbox . Remote edge C ?z foaf:knows yago:John_Doe . } Local edge July 4th, 2011 Query load aware partitioning of RDF data 32/37
  • 33. Preliminary results ● Metrics to evaluate query complexity: – Number of edges in local query graph – Number of remote edges in the distributed query graph ● Metrics to evaluate fragmentation quality for a query – Number of local edges in the distributed query graph – Number of hosts required to answer the query July 4th, 2011 Query load aware partitioning of RDF data 33/37
  • 34. Preliminary results Number of hosts: 5 Run # Dataset Dataset description File size # triples #sub # (MB) queries predicates 1 Subset Dbpedia Dbpedia foaf information 136 1745624 10 9 (names and dates) 2 Subset Dbpedia Dbpedia foaf information 136 1745624 10 10 (names and dates) 3 YAGO Core YAGO Core databaset 2662.4 26227687 9 21 4 YAGO sample RDF-3x YAGO dump 3276.8 35238246 19 35 sample Edge count in local query graphs vs Number of contacted hosts Local edges vs remote edges in Distributed Query Graph Per run of the algorithm per run of the algorithm 2.5 2 1.8 2 Average local 1.6 edges in Hosts contacted 1.4 Distributed Query 1.5 Average Edges in local query 1.2 Graph graph Average remote 1 1 edges in Distributed Query 0.8 Graph 0.5 0.6 0.4 0 1 2 3 4 0.2 Runs 0 1 2 3 4 July 4th, 2011 Query load aware partitioning of RDF data 34/37
  • 35. Conclusions ● Use of standard techniques from relational databases ● Method independent from actual storage implementation. – Huge 3-columns table abstraction ● It can be easily extended to support redundancy. ● Applicable to evolving query loads – By changing the level of constants normalization July 4th, 2011 Query load aware partitioning of RDF data 35/37
  • 36. Future work ● Evaluate quality of partitioning – Using real execution costs: need of a distributed index + query planner + distributed cost model – Against other approaches (e.g fragmentation by predicate) ● Evaluate greedy allocation algorithm – Against optimal solution, round robin, etc.. ● Use of estimates for fragment sizes – So far extracted via queries. July 4th, 2011 Query load aware partitioning of RDF data 36/37
  • 37. Thanks for your attention July 4th, 2011 Query load aware partitioning of RDF data 37/37