SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Strabon
A Semantic Geospatial DBMS
 Kostis Kyzirakos, Manos Karpathiotakis, and
             Manolis Koubarakis



 Dept. of Informatics and Telecommunications,
 National and Kapodistrian University of Athens, Greece
Outline
 Introduction
 The data model stRDF
 The query language stSPARQL
  and a comparison to GeoSPARQL
 The system Strabon for
  stSPARQL and GeoSPARQL
 Experimental Evaluation
 Related Work & Conclusions
Main idea
 How do we represent and query geospatial
 information in the Semantic Web?

    Develop appropriate vocabularies and
     ontologies

    Extend RDF to take into account the
     geospatial dimension

    Extend SPARQL to query the new kinds of
     data

    Use Open Geospatial Consortium (OGC) and
     other geospatial industry standards
Example




          4
National Observatory of Athens:
Fire Products Ontology




                                  5
   Introduction
   The data model


                         The data model
    stRDF
   The query language
    stSPARQL and a


                         stRDF
    comparison to
    GeoSPARQL
   The system Strabon
    for stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
The Data Model stRDF

 stRDF extends RDF with:
    Spatial literals encoded by Boolean combinations of
     linear constraints
    New datatype for spatial literals (strdf:geometry)
    Valid time of triples encoded by Boolean
     combinations of temporal constraints


 stRDF (most recent version)
    Spatial literals encoded in Well-Known Text/GML
     (OGC standards)
    Valid time of triples ignored for the time being
Burnt Area Products
@prefix strdf:
<http://strdf.di.uoa.gr/
ontology#>.
@prefix noa:
<http://teleios.di.uoa.gr/
ontologies/noaOntology.owl#>.

noa:ba_15 rdf:type noa:BurntArea ;
          noa:hasGeometry noa:ba_15g.
                                 Spatial
                                  literal
noa:ba_15g noa:hasSerialization
  "MULTIPOLYGON(((393801.42 4198827.92,
             ..., 393008 424131)));
<http://www.opengis.net/def/crs/EPSG/0/2100>"
                        Spatial  ^^strdf:WKT.
                       data type
The stRDF Data Model
strdf:geometry rdf:type rdfs:Datatype;
              rdfs:subClassOf rdfs:Literal.




strdf:WKT   rdf:type rdfs:Datatype;
            rdfs:subClassOf strdf:geometry.

strdf:GML   rdf:type    rdfs:Datatype;
            rdfs:subClassOf   strdf:geometry.
   Introduction
   The data model
    stRDF
                     The query
   The query
    language         language
                     stSPARQL
    stSPARQL and a
    comparison to
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and

                     and a comparison
    GeoSPARQL
   Experimental

                     to GeoSPARQL
    Evaluation
   Related Work &
    Conclusions
stSPARQL: Geospatial SPARQL 1.1
We define a SPARQL extension function for each function
defined in the OpenGIS Simple Features Access standard

 Basic functions
   Get a property of a geometry (e.g., strdf:srid)
   Get the desired representation of a geometry (e.g., strdf:AsText)
   Test whether a certain condition holds (e.g., strdf:IsEmpty,
    strdf:IsSimple)

 Functions for testing topological spatial relationships
 (e.g., strdf:equals, strdf:intersects)
   OGC Simple Features Access, Egenhofer, RCC-8


 Spatial analysis functions
    Construct new geometric objects from existing geometric objects (e.g.,
     strdf:buffer, strdf:intersection, strdf:convexHull)
    Spatial metric functions (e.g., strdf:distance, strdf:area)


 Spatial aggregate functions (e.g., strdf:union, strdf:extent)
stSPARQL: Geospatial SPARQL 1.1
Select clause
 Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))
 Spatial aggregate functions (e.g., strdf:union(?geo))
 Metric functions (e.g., strdf:area(?geo))

Filter clause
 Functions for testing topological relationships between spatial terms (e.g.,
  strdf:contains(?G1, strdf:union(?G2, ?G3)))
 Numeric expressions involving spatial metric functions
  (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2))
 Boolean combinations

Having clause
 Boolean expressions involving spatial aggregate functions and spatial metric
  functions or functions testing for topological relationships between spatial
  terms (e.g., strdf:area(strdf:union(?geo))>1)
Updates
stSPARQL: An example (1/2)
Find coniferous forests that have been affected by
fires



SELECT ?forest ?burntArea
WHERE {
   ?burntArea rdf:type noa:BurntArea;
               noa:hasGeometry [
                noa:hasSerialization ?baGeo].
   ?forest   rdf:type noa:Region;
             clc:hasLandCover noa:coniferousForest;
   Spatial   clc:hasGeometry [
  Function   clc:hasSerialization ?fGeom].
    FILTER(strdf:intersects(?baGeom,?fGeom))
}
stSPARQL: An example (2/2)
Isolate the parts of the burnt areas that lie in
coniferous forests.                 Spatial
 SELECT ?burntArea                Aggregate
(strdf:intersection(?baGeom,
                     strdf:union(?fGeom))
 AS ?burntForest)
WHERE {
   ?burntArea rdf:type noa:BurntArea;
               noa:hasGeometry [
                noa:hasSerialization ?baGeo].
   ?forest   rdf:type noa:Region;
             clc:hasLandCover noa:coniferousForest;
   Spatial   clc:hasGeometry [
  Function   clc:hasSerialization ?fGeom].
   FILTER(strdf:intersects(?baGeom,?fGeom))
}
GROUP BY ?burntArea ?baGeom
The OGC Standard GeoSPARQL
                    Core



       Topology                  Geometry      Parameters
      Vocabulary                 Extension
      Extension
- relation family
                           - serialization
                           - version           • Serialization
                                                  • WKT
                                                  • GML
                       Geometry Topology
                           Extension
                       - serialization
                                               • Relation Family
                       - version
                       - relation family
                                                  • Simple
                                                     Features
                                                  • RCC-8
  Query Rewrite            RDFS Entailment
    Extension                 Extension           • Egenhofer
- serialization            - serialization
- version                  - version
- relation family          - relation family
   Introduction
   The data model
    stRDF            The system
   The query
    language
    stSPARQL and a
                     Strabon for
    comparison to
    GeoSPARQL        stSPARQL and
                     GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation


                     strabon.di.uoa.gr
   Related Work &
    Conclusions
Strabon Architecture
                             Strabon

 WKT       GML   Sesame v2.6.3
                   Query Engine         Storage Manager
                       Parser              Repository

                     Optimizer                 SAIL
       stRDF
       graphs         Evaluator                RDBMS

                     Transaction
                      Manager




  stSPARQL/
  GeoSPARQL
    queries
                                   GeneralDB




                     PostGIS
Storage Scheme                               uri_values
type_2
Triples                                      ID   VALUE
                                            Dictionary
SUBJEC OBJECT
SUBJECT PREDICATE OBJECT
             PREDICATE                      1
                                     OBJECT VALUE
                                           ID    noa:ba_15
T
noa:ba_15 3 rdf:type
noa:ba_15
1            rdf:type                      12 noa:ba_15
                                                 rdf:type
                                     noa:BurntArea .
1
noa:ba_15 2 noa:hasGeometry
                         3
noa:ba_15 6 noa:hasGeometry
5                                    noa:ba_15g. rdf:type
                                      noa:ba_15g noa:BurntArea
                                             23
1
noa:ba_15g4 rdf:type     5
noa:ba_15g rdf:type
hasgeom_4
                                     noa:Geometry .
                                             34 noa:BurntArea
                                      noa:Geometrynoa:hasGeometry
5
noa:ba_15g2 noa:hasSerialization
                         6           "MULTIPOLYGON(...)"^^
noa:ba_15g OBJECT
SUBJECT      noa:hasserialization            45 noa:hasGeometry
                                                  noa:ba_15g
                                      "MULTIPOLYGON(...)"^^
5         7              8            strdf:WKT .
                                      strdf:WKT noa:ba_15g
                                             56   noa:Geometry
1          5
                                             67 noa:Geometry
                                                  noa:hasSerialization
hasserial_7
                                            7label_values
                                                 noa:hasSerialization
SUBJECT OBJECT
                                            8 ID "MULTIPOLYGON(...)"^^
                                                    VALUE
5             8         geo_values               strdf:WKT
                                              8     "MULTIPOLYGON(...)"^^
                        ID   VALUE                 strdf:WKT
                        8                    datatype_values
                                             ID    VALUE
                                             8     strdf:WKT
Query Processing
                          Strabon   • Parser generates
 stSPARQL/                            abstract syntax tree
                Query Engine
GeoSPARQL
    query           Parser
                                    • Abstract syntax tree
                  Optimizer           mapped to internal
                   Evaluator          algebra of Sesame
                  Transaction
                   Manager          • Standard optimizations
                                      performed
                                    • Evaluator produces
                                      corresponding SQL
                        GeneralDB     query
                                    • DBMS evaluates the
                                      SQL query
             PostGIS
                                    • Post-processing
Query Processing (cont’d)
• Deviate from the evaluation strategy of Sesame
  for SPARQL extension functions
• Push the evaluation of extension functions to
  underlying DBMS
   • Spatial predicates evaluated by PostGIS
   • Spatial joins now affect query plan
   • Avoid Cartesian products


• Results may be returned in well-known industry
 formats
 • KML/KMZ
 • GeoJSON
 • GML
   Introduction
   The data model


                     Experimental
    stRDF
   The query
    language


                     Evaluation
    stSPARQL and a
    comparison to
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
Experimental Evaluation
• Goal: Evaluate the performance of
 Strabon vs other systems

• Real workload based on geospatial
 linked datasets
 150 million triples


• Synthetic workload
  Half a billion triples
Strabon vs other systems

• Strabon over PostgreSQL
  (Strabon-PG)
• Strabon over System X
  (Strabon-X)
• Implementation over RDF-3X
  (Brodt et. al)
• Parliament (BBN Technologies)
• Naive Implementation over Sesame
Real world workload: Data
               DBpedia     GeoNames        LGD       Pachube   SwissEx      CLC        GADM


Size           7.1 GB       2.1 GB       6.6 GB      828 KB    33 MB       14 GB       146 MB

Triples       58,722,893   17,688,602   46,296,978    6,333    277,919   19,711,926      255

Spatial        386,205     1,262,356    5,414,032     101       687      2,190,214       51
Terms

Distinct       375,087     1,099,964    5,035,981      70       623      2,190,214       51
Spatial
Terms

Points         375,087     1,099,964    3,205,015      70       623          -             -

Linestrings       -            -         353,714        -         -          -     1       -
                                                                                       79,831
                                                                                       vertices
Polygons          -            -        1,704,650       -         -      2,190,214       51
Real world workload: Queries
               Commonly Used     Spatial Selection    Spatial Join
    Query 1           X
    Query 2           X
    Query 3                               X
    Query 4                               X
    Query 5                                                X
    Query 6           X                                    X
    Query 7           X                                    X
    Query 8                                                X

    Queries with Spatial Joins   Points       Lines    Polygons

    Query 5                        X                       X
                                   X                       X
    Query 6                        X           X           X
    Query 7                                                X
    Query 8                        X           X           X
Real world workload: Results
Cache    System       Q1     Q2     Q3      Q4      Q5       Q6      Q7     Q8
State
Cold     Naive        0.08   1.65   >8h     28.88   89       170     844    1.699
(sec.)
         Strabon      2.01   6.79   41.39   10.11   78.69    60.25   9.23   702.5
         PG                                                                 5
         Strabon      1.74   3.05   1623    46.52   12.57    2409    >8h    57.83
         X
         Parliament   2.12   6.46   >8h     229.72 1130      872     3627   3786

Warm     Naïve        0.01   0.03   >8h     0.79    43.07    88      708    1712
(sec.)
         Strabon      0.01   0.81   0.96    1.66    38.74    1.22    2.92   648.1
         PG
         Strabon      0.01   0.26   1604.9 35.59    0.18     3196    >8h    44.72
         X
         Parliament   0.01   0.04   >8h     10.91   358.92   483.29 2771    3502
Synthetic Workload
• Workload based on a synthetic dataset
  Dataset based on OpenStreetMaps
  10 million triples (2 GB) up to half a
   billion triples (50GB)
    Implemented custom bulk loader
 Triples with spatial literals: 1 up to 46
  million triples


• Response time of queries with various
 thematic and spatial selectivities
Synthetic Workload: Sample
Data
Synthetic Workload:
    Sample stSPARQL Query
SELECT *
WHERE {
    ?tag geordf:key "1" .
    ?node geordf:hasTag ?tag .
    ?node geo:hasGeography1 ?geo .
FILTER (strdf:inside(?geo,
        "POLYGON((-1 -1, 0.056568542 -1,
       0.056568542 0.056568542,
       -1 0.056568542, -1 -1))"^^strdf:WKT
      ))
}
Real-world Workload:
                 100 million triples – warm caches
Response time (sec)




                                                        Response time (sec)




                      number of Nodes in query region                         number of Nodes in query region

                             Tag 1                                                      Tag 1024
Real-world Workload:
100 million triples – cold caches




   number of Nodes in query region   number of Nodes in query region

         Tag 1                              Tag 1024
Strabon-PG: 500 million triples
Tags comparison
     Response time (sec)
Real-world Workload:
Query Plans
Findings
• Strabon over PostgreSQL outperforms
 other systems in case of warm caches

• Results in case of cold caches mixed


• PostreSQL optimizer needs to take into
 account spatial selectivity
 • PostGIS 2.0 moves towards it
System           Language      Index       Geometries    CRS support     Geospatial
                                                                          Function
                                                                          Support
Strabon        stSPARQL/    R-tree-over-   WKT / GML         Yes       • OGC-SFA
               GeoSPARQL*   GiST           support                     • Egenhofer
                                                                       • RCC-8
Parliament     GeoSPARQL*   R-Tree         WKT / GML         Yes       •OGC-SFA
                                           support                     •Egenhofer
                                                                       •RCC-8
Oracle 12c     GeoSPARQL    R-Tree,        WKT               Yes       •OGC-SFA
                            Quadtree

Brodt et al.   SPARQL       R-Tree         WKT support       No        OGC-SFA
(RDF-3X)
Perry          SPARQL-ST    R-Tree         GeoRSS GML        Yes       RCC-8


AllegroGraph   Extended     Distribution   2D point         Partial    •Buffer
               SPARQL       sweeping       geometries                  •Bounding Box
                            technique                                  •Distance

OWLIM          Extended     Custom         2D point          No        •Point-in-polygon
                                                                       •Buffer
               SPARQL                      geometries
                                                                       •Distance

Virtuoso       SPARQL       R-Tree         2D point          Yes       SQL/MM (subset)
                                           geometries

uSeekM         SPARQL       R-tree-over    WKT support       No        OGC-SFA
                            GiST
   Introduction




    The data model
    stRDF
    The query
                     Related Work &
    language
    stSPARQL and a
    comparison to
                     Conclusions
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
Future Work
 Use even larger datasets
   Test with 1 billion triples successful

 Implement the temporal part of stSPARQL

 Develop a benchmark for geospatial RDF
 stores
  Consider more systems
    GeoSPARQL implementation of Oracle
    Virtuoso

 stSPARQL query processing in MonetDB

 Go distributed!
   Federated queries
Thanks! Any Questions?
 Strabon
   Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis,
      Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta,
      Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros.
     Web site: http://strabon.di.uoa.gr
     Mercurial repository: http://hg.strabon.di.uoa.gr
     Trac: http://bug.strabon.di.uoa.gr
     Mailing list: http://cgi.di.uoa.gr/~mailman/listinfo/strabon-users


 Real Time Fire Monitoring Service,
 National Obervatory of Athens
   http://papos.space.noa.gr/fend_static


 Greek Linked Open Data
   http://www.linkedopendata.gr
  
 TELEIOS EU Project
   http://www.earthobservatory.eu


 SemsorGrid4Env EU Project
   http://www.semsorgrid4env.eu

Weitere ähnliche Inhalte

Ähnlich wie Strabon: A Semantic Geospatial Database System

Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresKostis Kyzirakos
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
SWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPINSWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPINNick Bassiliades
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Sparkfelixcss
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQLOlaf Hartig
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...Blake Regalia
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)Ankit Rathi
 
Facete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with faceteFacete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with facetegeoknow
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and SemanticsTatiana Al-Chueyr
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...Dongpo Deng
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterACSG Section Montréal
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
 

Ähnlich wie Strabon: A Semantic Geospatial Database System (20)

Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
SWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPINSWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPIN
 
Final_show
Final_showFinal_show
Final_show
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQL
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 
SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
Facete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with faceteFacete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with facete
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 

Mehr von Kostis Kyzirakos

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataKostis Kyzirakos
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonKostis Kyzirakos
 
Linked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEOLinked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEOKostis Kyzirakos
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Kostis Kyzirakos
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 

Mehr von Kostis Kyzirakos (7)

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Linked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEOLinked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEO
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 

Kürzlich hochgeladen

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Kürzlich hochgeladen (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Strabon: A Semantic Geospatial Database System

  • 1. Strabon A Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
  • 2. Outline  Introduction  The data model stRDF  The query language stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 3. Main idea How do we represent and query geospatial information in the Semantic Web?  Develop appropriate vocabularies and ontologies  Extend RDF to take into account the geospatial dimension  Extend SPARQL to query the new kinds of data  Use Open Geospatial Consortium (OGC) and other geospatial industry standards
  • 5. National Observatory of Athens: Fire Products Ontology 5
  • 6. Introduction  The data model The data model stRDF  The query language stSPARQL and a stRDF comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 7. The Data Model stRDF  stRDF extends RDF with:  Spatial literals encoded by Boolean combinations of linear constraints  New datatype for spatial literals (strdf:geometry)  Valid time of triples encoded by Boolean combinations of temporal constraints  stRDF (most recent version)  Spatial literals encoded in Well-Known Text/GML (OGC standards)  Valid time of triples ignored for the time being
  • 8. Burnt Area Products @prefix strdf: <http://strdf.di.uoa.gr/ ontology#>. @prefix noa: <http://teleios.di.uoa.gr/ ontologies/noaOntology.owl#>. noa:ba_15 rdf:type noa:BurntArea ; noa:hasGeometry noa:ba_15g. Spatial literal noa:ba_15g noa:hasSerialization "MULTIPOLYGON(((393801.42 4198827.92, ..., 393008 424131))); <http://www.opengis.net/def/crs/EPSG/0/2100>" Spatial ^^strdf:WKT. data type
  • 9. The stRDF Data Model strdf:geometry rdf:type rdfs:Datatype; rdfs:subClassOf rdfs:Literal. strdf:WKT rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry. strdf:GML rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry.
  • 10. Introduction  The data model stRDF The query  The query language language stSPARQL stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and and a comparison GeoSPARQL  Experimental to GeoSPARQL Evaluation  Related Work & Conclusions
  • 11. stSPARQL: Geospatial SPARQL 1.1 We define a SPARQL extension function for each function defined in the OpenGIS Simple Features Access standard  Basic functions  Get a property of a geometry (e.g., strdf:srid)  Get the desired representation of a geometry (e.g., strdf:AsText)  Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)  Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)  OGC Simple Features Access, Egenhofer, RCC-8  Spatial analysis functions  Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)  Spatial metric functions (e.g., strdf:distance, strdf:area)  Spatial aggregate functions (e.g., strdf:union, strdf:extent)
  • 12. stSPARQL: Geospatial SPARQL 1.1 Select clause  Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))  Spatial aggregate functions (e.g., strdf:union(?geo))  Metric functions (e.g., strdf:area(?geo)) Filter clause  Functions for testing topological relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))  Numeric expressions involving spatial metric functions (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2))  Boolean combinations Having clause  Boolean expressions involving spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1) Updates
  • 13. stSPARQL: An example (1/2) Find coniferous forests that have been affected by fires SELECT ?forest ?burntArea WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom)) }
  • 14. stSPARQL: An example (2/2) Isolate the parts of the burnt areas that lie in coniferous forests. Spatial SELECT ?burntArea Aggregate (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom
  • 15. The OGC Standard GeoSPARQL Core Topology Geometry Parameters Vocabulary Extension Extension - relation family - serialization - version • Serialization • WKT • GML Geometry Topology Extension - serialization • Relation Family - version - relation family • Simple Features • RCC-8 Query Rewrite RDFS Entailment Extension Extension • Egenhofer - serialization - serialization - version - version - relation family - relation family
  • 16. Introduction  The data model stRDF The system  The query language stSPARQL and a Strabon for comparison to GeoSPARQL stSPARQL and GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation strabon.di.uoa.gr  Related Work & Conclusions
  • 17. Strabon Architecture Strabon WKT GML Sesame v2.6.3 Query Engine Storage Manager Parser Repository Optimizer SAIL stRDF graphs Evaluator RDBMS Transaction Manager stSPARQL/ GeoSPARQL queries GeneralDB PostGIS
  • 18. Storage Scheme uri_values type_2 Triples ID VALUE Dictionary SUBJEC OBJECT SUBJECT PREDICATE OBJECT PREDICATE 1 OBJECT VALUE ID noa:ba_15 T noa:ba_15 3 rdf:type noa:ba_15 1 rdf:type 12 noa:ba_15 rdf:type noa:BurntArea . 1 noa:ba_15 2 noa:hasGeometry 3 noa:ba_15 6 noa:hasGeometry 5 noa:ba_15g. rdf:type noa:ba_15g noa:BurntArea 23 1 noa:ba_15g4 rdf:type 5 noa:ba_15g rdf:type hasgeom_4 noa:Geometry . 34 noa:BurntArea noa:Geometrynoa:hasGeometry 5 noa:ba_15g2 noa:hasSerialization 6 "MULTIPOLYGON(...)"^^ noa:ba_15g OBJECT SUBJECT noa:hasserialization 45 noa:hasGeometry noa:ba_15g "MULTIPOLYGON(...)"^^ 5 7 8 strdf:WKT . strdf:WKT noa:ba_15g 56 noa:Geometry 1 5 67 noa:Geometry noa:hasSerialization hasserial_7 7label_values noa:hasSerialization SUBJECT OBJECT 8 ID "MULTIPOLYGON(...)"^^ VALUE 5 8 geo_values strdf:WKT 8 "MULTIPOLYGON(...)"^^ ID VALUE strdf:WKT 8 datatype_values ID VALUE 8 strdf:WKT
  • 19. Query Processing Strabon • Parser generates stSPARQL/ abstract syntax tree Query Engine GeoSPARQL query Parser • Abstract syntax tree Optimizer mapped to internal Evaluator algebra of Sesame Transaction Manager • Standard optimizations performed • Evaluator produces corresponding SQL GeneralDB query • DBMS evaluates the SQL query PostGIS • Post-processing
  • 20. Query Processing (cont’d) • Deviate from the evaluation strategy of Sesame for SPARQL extension functions • Push the evaluation of extension functions to underlying DBMS • Spatial predicates evaluated by PostGIS • Spatial joins now affect query plan • Avoid Cartesian products • Results may be returned in well-known industry formats • KML/KMZ • GeoJSON • GML
  • 21. Introduction  The data model Experimental stRDF  The query language Evaluation stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 22. Experimental Evaluation • Goal: Evaluate the performance of Strabon vs other systems • Real workload based on geospatial linked datasets 150 million triples • Synthetic workload Half a billion triples
  • 23. Strabon vs other systems • Strabon over PostgreSQL (Strabon-PG) • Strabon over System X (Strabon-X) • Implementation over RDF-3X (Brodt et. al) • Parliament (BBN Technologies) • Naive Implementation over Sesame
  • 24. Real world workload: Data DBpedia GeoNames LGD Pachube SwissEx CLC GADM Size 7.1 GB 2.1 GB 6.6 GB 828 KB 33 MB 14 GB 146 MB Triples 58,722,893 17,688,602 46,296,978 6,333 277,919 19,711,926 255 Spatial 386,205 1,262,356 5,414,032 101 687 2,190,214 51 Terms Distinct 375,087 1,099,964 5,035,981 70 623 2,190,214 51 Spatial Terms Points 375,087 1,099,964 3,205,015 70 623 - - Linestrings - - 353,714 - - - 1 - 79,831 vertices Polygons - - 1,704,650 - - 2,190,214 51
  • 25. Real world workload: Queries Commonly Used Spatial Selection Spatial Join Query 1 X Query 2 X Query 3 X Query 4 X Query 5 X Query 6 X X Query 7 X X Query 8 X Queries with Spatial Joins Points Lines Polygons Query 5 X X X X Query 6 X X X Query 7 X Query 8 X X X
  • 26. Real world workload: Results Cache System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 State Cold Naive 0.08 1.65 >8h 28.88 89 170 844 1.699 (sec.) Strabon 2.01 6.79 41.39 10.11 78.69 60.25 9.23 702.5 PG 5 Strabon 1.74 3.05 1623 46.52 12.57 2409 >8h 57.83 X Parliament 2.12 6.46 >8h 229.72 1130 872 3627 3786 Warm Naïve 0.01 0.03 >8h 0.79 43.07 88 708 1712 (sec.) Strabon 0.01 0.81 0.96 1.66 38.74 1.22 2.92 648.1 PG Strabon 0.01 0.26 1604.9 35.59 0.18 3196 >8h 44.72 X Parliament 0.01 0.04 >8h 10.91 358.92 483.29 2771 3502
  • 27. Synthetic Workload • Workload based on a synthetic dataset Dataset based on OpenStreetMaps 10 million triples (2 GB) up to half a billion triples (50GB)  Implemented custom bulk loader Triples with spatial literals: 1 up to 46 million triples • Response time of queries with various thematic and spatial selectivities
  • 29. Synthetic Workload: Sample stSPARQL Query SELECT * WHERE { ?tag geordf:key "1" . ?node geordf:hasTag ?tag . ?node geo:hasGeography1 ?geo . FILTER (strdf:inside(?geo, "POLYGON((-1 -1, 0.056568542 -1, 0.056568542 0.056568542, -1 0.056568542, -1 -1))"^^strdf:WKT )) }
  • 30. Real-world Workload: 100 million triples – warm caches Response time (sec) Response time (sec) number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
  • 31. Real-world Workload: 100 million triples – cold caches number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
  • 32. Strabon-PG: 500 million triples Tags comparison Response time (sec)
  • 34. Findings • Strabon over PostgreSQL outperforms other systems in case of warm caches • Results in case of cold caches mixed • PostreSQL optimizer needs to take into account spatial selectivity • PostGIS 2.0 moves towards it
  • 35. System Language Index Geometries CRS support Geospatial Function Support Strabon stSPARQL/ R-tree-over- WKT / GML Yes • OGC-SFA GeoSPARQL* GiST support • Egenhofer • RCC-8 Parliament GeoSPARQL* R-Tree WKT / GML Yes •OGC-SFA support •Egenhofer •RCC-8 Oracle 12c GeoSPARQL R-Tree, WKT Yes •OGC-SFA Quadtree Brodt et al. SPARQL R-Tree WKT support No OGC-SFA (RDF-3X) Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC-8 AllegroGraph Extended Distribution 2D point Partial •Buffer SPARQL sweeping geometries •Bounding Box technique •Distance OWLIM Extended Custom 2D point No •Point-in-polygon •Buffer SPARQL geometries •Distance Virtuoso SPARQL R-Tree 2D point Yes SQL/MM (subset) geometries uSeekM SPARQL R-tree-over WKT support No OGC-SFA GiST
  • 36. Introduction   The data model stRDF The query Related Work & language stSPARQL and a comparison to Conclusions GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 37. Future Work  Use even larger datasets  Test with 1 billion triples successful  Implement the temporal part of stSPARQL  Develop a benchmark for geospatial RDF stores  Consider more systems  GeoSPARQL implementation of Oracle  Virtuoso  stSPARQL query processing in MonetDB  Go distributed!  Federated queries
  • 38. Thanks! Any Questions?  Strabon  Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis, Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta, Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros.  Web site: http://strabon.di.uoa.gr  Mercurial repository: http://hg.strabon.di.uoa.gr  Trac: http://bug.strabon.di.uoa.gr  Mailing list: http://cgi.di.uoa.gr/~mailman/listinfo/strabon-users  Real Time Fire Monitoring Service, National Obervatory of Athens  http://papos.space.noa.gr/fend_static  Greek Linked Open Data  http://www.linkedopendata.gr   TELEIOS EU Project  http://www.earthobservatory.eu  SemsorGrid4Env EU Project  http://www.semsorgrid4env.eu

Hinweis der Redaktion

  1. Hello, I am KostisKyzirakos, and I will be presenting you the work done with my colleagues Manos Karpathiotakis and ManolisKoubarakis, on the development of a geospatial RDF store nick-named Strabon.
  2. Our extension of RDF for the representation of geospatial and temporal information is stRDF.stRDFextens RDF with a new kind of literals, called spatial literals, which are expresssionsof Boolean combinations of linear constraints.stRDF defines also a new datatype for spatial literals [which is a supertype of the respective datatypes for WKT and GML (standardized formats of the Open Geospatial Consortium for the encoding of geometries)]stRDF employs also a forth component to a triple to represent the valid time of triples.(???WHAT EXPRESSIONS CAN BE WRITTEN???)Constraints: theoretical representationIn the most recent version of stRDF, and to be compliant with geospatial industry standards,spatial literals are encoded in the WKT/GML formats, which are widely adopted standards of the Open Geospatial Consortium.The reason for moving to OGC standards was being realistic. No one had any spatial data expressedwith linear constraints! But all the nice work of the theoretical model remained valid. We only had to change the encoding used for spatial literals.(We are currently working on this feature, so this presentation does not consider time anymore.) We will describe our model through an example.
  3. Filter expressions are...stSPARQL supports also the construction of new spatial terms and spatial aggregate functions in SELECT clauses.(spatial aggregate functions are implemented in Strabon and we do not use the implementation of PostGIS to overcome issues with transformatios between cordinate reference systems, something that PostGIS does not take into account)OpenGIS Simple Feature Access specification defines a standard SQL schema that supports storage, retrieval, query and update of collections of spatial features using SQL
  4. Strabon extends the well-known RDF store Sesame, allowing it to manage both thematic and spatial data expressed in stRDF. In the case when PostGIS is used as the relational backend of Strabon,Strabon uses an R-tree-over-GiST (Generalised Search Tree) offered by PostGIS as a spatial index.Strabon is implemented by creating a layer that is included in Sesame&apos;s software stackin a transparent way so that it does not affect its range of functionalities, whilebenefitting from new versions of Sesame. Strabon 3.0 uses Sesame 2.6.3 andcomprises three modules: the storage manager, the query engine and the underlying databaseWe have extended Sesame as follows: Modified the storage scheme, optimizer, and evaluator to take into account geometries and not treat them as plain literals Added a new RDBMS abstract layer to Storage Manager so as any spatially-enabled database can be used with Strabon.Currently, we have been using PostGIS and MonetDB.
  5. “Decomposition Storage Model”
  6. The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  7. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.Query processing in Strabon is performed by the query engine which consistsof a parser, an optimizer, an evaluator and a transaction manager. The parserand the transaction manager are identical to the ones in Sesame. The optimizerand the evaluator have been implemented by modifying the corresponding com-ponents of Sesame as we describe below.The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  8. We designed a simple benchmark
  9. This section presents a detailed evaluation of the system Strabon using two different workloads: a workload based on linked data and a synthetic workload. Forboth workloads, we compare the response time of Strabon on top of PostgreSQL (called Strabon PG from now on) with our closest competitor implementationin [3], the naive, baseline implementation described in Section 4, and the RDF store Parliament. To identify potential benets from using a dierent DBMS asa relational backend for Strabon, we also executed the SQL queries produced by Strabon in a proprietary spatially-enabled DBMS (which we will call System X,and Strabon X the resulting combination). [3] presents an implementation which enhances the RDF-3X triple store withthe ability to perform spatial selections using an R-tree index. The implementation of [3] is not a complete system like Strabon and does not support afull-edged query language such as stSPARQL. In addition, the only way to load data in the system is the use of a generator which has been especially designed for the experiments of [3] thus it cannot be used to load other datasets in the implementation. Moreover, the geospatial indexing support of this implementation is limited to spatial selections. Spatial selections are pushed down inthe query tree (i.e., they are evaluated before other operators). Parliament is an RDF storage engine recently enhanced with GeoSPARQL processing capabilities [2], which is coupled with Jena (an RDF query processor) to provide a complete RDF system.
  10. We have used a number of linked geospatial datasets. I would like to stress the heterogeneity of the dataset.We have included datasets from many application domains (e.g., sensors, land use, administrative areas, gazeteer).They also vary significantly in terms of size and complexity of the spatial geometries. DBpediaGeoNamesLDG (LinkedGeoData, OpenStreetMaps)Pachube (“patch-bay”): Pachube (&quot;patch-bay&quot;) connects people to devices, applications, and the Internet of Things. As a web-based service built to manage the world&apos;s real-time data, Pachube gives people the power to share, collaborate, and make use of information generated from the world around them.SwissEx (Swiss Experiment): http://www.swiss-experiment.chA platform to enable real-time environmental experiments through wireless sensor networks and a common, modern, generic cyber-infrastructure. This infrastructure will be used to enable interdisciplinary environmental research, allowing scientists to work efficiently and collaboratively to find the key mechanisms in the triggering of natural hazards and to efficiently distribute the information to increase public awareness.CLC (Corine Land Cover)GADM (Global Administrative Areas): http://www.gadm.org/ GADM is a spatial database of the location of the world&apos;s administrative areas (or adminstrative boundaries) for use in GIS and similar software. Administrative areas in this database are countries and lower level subdivisions such as provinces, departments, bibhag, bundeslander, daerahistimewa, fivondronana, krong, landsvæðun, opština, sous-préfectures, counties, and thana. GADM describes where these administrative areas are (the &quot;spatial features&quot;), and for each area it provides some attributes, such as the name and variant names.
  11. Queries 1,2: no spatial dimension
  12. Cold caches vs work cachesGreen is the winning colorRed is the loosing colorThe various versions of Strabon are the winning onesNo cases where Parliament is the winning systemThere are many interesting things we found in this experiment. For example,Parliament: reference implementation of GeoSPARQLIn queries Q1 and Q2, the baseline implementation outperforms all other systems as these queries do not include any spatial function. As mentioned earlier, the native store of Sesame outperforms Sesame implementations on top of a DBMS. All other systems producecomparable result times for these non-spatial queries. In all other queries the DBMS-based implementations outperform Parliament and the naive implemen-tation. Strabon PG outperforms the naive implementation since it incorporates the two stSPARQL-specic optimizations discussed in Section 4. Thus, spatialoperations are evaluated by PostGIS using a spatial index, instead of being evaluated after all the results have been retrieved. The naive implementation andParliament fail to execute Q3 as this query involves a spatial join that is very expensive for systems using a naive approach. The only exception in the behaviorof the naive implementation is Q4 in the case of warm caches, where the non-spatial part of the query produces very few results and the file blocks needed forquery evaluation are cached in main memory. In this case, the non-spatial part of the query is executed rapidly while the evaluation of the spatial function overthe results thus far is not signicant. All queries except Q8 are executed significantly faster when run using Strabon on warm caches. Q8 involves many triplepatterns and spatial functions which result in the production of a large number of intermediate results. As these do not fit in the system&apos;s cache, the responsetime is unaffected by the cache contents. System X decides to ignore the spatial index in queries Q3, Q6-Q8 and evaluate any spatial predicate exhaustively overthe results of a thematic join. In queries Q6-Q8, it also uses Cartesian products in the query plan as it considers their evaluation more profitable. These decisionsare correct in the case of Q8 where it outperforms all other systems significantly, but very costly in the cases of Q3, Q6 and Q7.
  13. Για το synthetic dataset, έχουμε τρέξει πειράματα με 1 billion με τον Strabon. Το naive δεν μπορούσε να αποτιμήσει queries σε τόσο μεγάλο dataset οπότε παρουσιάζουμε γράφους μέχρι 500 millions.
  14. The data of the synthetic dataset follows a general version of the schema of Open Street Map. Each node has a spatial extent (the location of the node) and is placed uniformly on a grid with step 0.001 (min 0, max 10.24).In addition, each node is assigned a number of tags each of which consist of a key-value pair of strings.Every node is tagged with key 1, every second node with key 2, every fourth node with key 4, and so on up to key 1024.By modifying the step of the grid, we produce datasets of arbitrary size.
  15. In this query THIS_VALUE (value “1”) allows us to select a known fraction of the generated nodes. Since every node is tagged with key 1, this triple pattern will produce a binding for every node of the dataset.By altering this value to “2”, we select half the nodes of the dataset. By further changing it to “4”, we select half thenodes we previously did, and so on.The extent of the POLYGON in the filter expression defines how selective the spatial predicate is. We modify the size of the polygon in order to select 10, 100, 1000, etc. spatial nodes.
  16. After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  17. In this case, as previously discussed, PostgreSQL does not select a good plan andthe response times are even higher than the times where PARAM A is equal to 1(Figure 6a), despite producing signicantly more intermediate results.
  18. After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  19. Two different query plans.1. Evaluation of thematic selection first.2. Evaluation of spatial selection first.In Postgres, the functions that estimate the selectivity of a spatial selection or a spatial join return a constant number. In other words, the spatial selectivity of a query did not affect dynamically the decision whether the spatial selection should be the first operation to execute. Therefore, only the thematic selectivity affected the choice for a query plan. To state it in terms of our query template, altering the value for the key between 1 and 1024 was the only factor influencing the query plan.
  20. So, what I would like you to keep for this slide is that Strabon not only has a good performance but it is also one of the richest systems available at the moment.