This document summarizes a benchmark called Geographica that was developed to evaluate the performance of geospatial RDF stores. It describes the benchmark in detail, including its real-world and synthetic workloads. The real-world workload tests common geospatial queries and spatial functions on real geospatial datasets. It also simulates application scenarios like reverse geocoding and web map search. The benchmark was used to evaluate the performance of three geospatial RDF stores: Strabon, Parliament, and uSeekM. Strabon had the slowest storage times due to building PostGIS indexes, while uSeekM performed best using native storage and Parliament slowed down with geo-property inferencing.
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Geographica: A Benchmark for Geospatial RDF Stores
1. Benchmarking Geospatial RDF
Stores
George Garbis, Kostis Kyzirakos, Manolis Koubarakis
Dept. of Informatics and Telecommunications,
National and Kapodistrian University of Athens, Greece
1st International Workshop on Benchmarking
RDF Systems (BeRSys 2013)
2. 5/26/2013 2
Outline
• Motivation
• SPARQL extensions for querying geospatial data
expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF
stores using Geographica
• Conclusions
9. 5/26/2013 10
Outline
• Motivation
• SPARQL extensions for querying
geospatial data expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF
stores using Geographica
• Conclusions
10. Our Contributions
• The data model stRDF and the query language
stSPARQL
• The system Strabon
11
11. The Data Model stRDF
• stRDF stands for spatiotemporal RDF.
• It is an extension of the W3C standard RDF for the
representation of geospatial data that may change
over time.
• stRDF extends RDF with:
• Spatial literals encoded in OGC standards Well-
Known Text or GML
• New datatypes for spatial literals (strdf:WKT,
strdf:GML and strdf:geometry)
• Valid time of triples (ignored in this talk) [ ESWC 2013 ]
13. stRDF: An example (WKT)
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:hasGeometry "POLYGON(( 38.16 23.7, 38.18
23.7, 38.18 23.8, 38.16 23.8, 38.16 23.7));
<http://spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .
Spatial Literal
(OpenGIS
Simple
Features)
Spatial Data Type
Well-Known Text
14. stRDF: An example (GML)
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:hasGeometry
"<gml:Polygon
srsName='http://www.opengis.net/def/crs/EPSG/0/4121'>
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates>38.16,23.70 38.18,23.70 38.18,
23.80 38.16,23.80,38.16 23.70
</gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>"^^strdf:GML .
Spatial Literal
(GML Simple
Features
Profile)
Spatial Data Type
GML
15. • Find all burnt forests close to a city
SELECT ?BA ?BAGEO
WHERE {
?R rdf:type noa:Region .
?R strdf:hasGeometry ?RGEO .
?R noa:hasLandCover ?F .
?F rdfs:subClassOf clc:Forests .
?CITY rdf:type dbpedia:City .
?CITY strdf:hasGeometry ?CGEO .
?BA rdf:type noa:BurntArea .
?BA strdf:hasGeometry ?BAGEO .
FILTER(strdf:anyInteract(?RGEO,?BAGEO) &&
strdf:distance(?BAGEO,?CGEO) < 0.02) }
stSPARQL: An example
Spatial
Functions
(OGC Simple
Feature
Access)
16. stSPARQL: Geospatial SPARQL 1.1
• We start from SPARQL 1.1.
• We add a SPARQL extension function for each function defined in the
OGC standard OpenGIS Simple Feature Access – Part 2: SQL option
(ISO 19125) for adding geospatial data to relational DBMSs and SQL.
• We add appropriate geospatial extensions to SPARQL 1.1 Update
language
17. stSPARQL (cont’d)
• Basic functions
• Get a property of a geometry (e.g., strdf:srid)
• Get the desired representation of a geometry (e.g., strdf:AsText)
• Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)
• Functions for testing topological spatial relationships
(e.g., strdf:equals, strdf:intersects)
• OGC Simple Features Access, Egenhofer, RCC-8
• Spatial analysis functions
• Construct new geometric objects from existing geometric objects (e.g.,
strdf:buffer, strdf:intersection, strdf:convexHull)
• Spatial metric functions (e.g., strdf:distance, strdf:area)
• Spatial aggregate functions (e.g., strdf:union, strdf:extent)
18. stSPARQL (cont’d)
• SELECT clause
Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))
Spatial aggregate functions (e.g., strdf:extent(?geo))
Metric functions (e.g., strdf:area(?geo))
• FILTER clause
Functions for testing topological spatial relationships between spatial
terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))
Numeric expressions involving spatial metric functions
(e.g.,strdf:area(?G1)<=2*strdf:area(?G2)+1)
• HAVING clause
Spatial aggregate functions and spatial metric functions or functions
testing for topological relationships between spatial terms (e.g.,
strdf:area(strdf:union(?geo))>1)
19. • Isolate the parts of the burnt areas that lie in
coniferous forests.
SELECT ?burntArea (strdf:intersection(?baGeom,
strdf:union(?fGeom)) AS ?burntForest)
WHERE {
?burntArea rdf:type noa:BurntArea;
noa:hasGeometry [ noa:hasSerialization ?baGeo ].
?forest rdf:type noa:Region;
clc:hasLandCover noa:coniferousForest;
clc:hasGeometry [ clc:hasSerialization ?fGeom ].
FILTER(strdf:intersects(?baGeom,?fGeom)) }
GROUP BY ?burntArea ?baGeom
stSPARQL: An example
20. GeoSPARQL
Core
Topology Vocabulary
Extension
- relation family
Geometry Extension
- serialization
- version
Geometry Topology
Extension
- serialization
- version
- relation family
Query Rewrite
Extension
- serialization
- version
- relation family
RDFS Entailment
Extension
- serialization
- version
- relation family
Parameters
• Serialization
• WKT
• GML
• Relation Family
• Simple
Features
• RCC8
• Egenhofer
21. System Language Index Geometries CRS
support
Comments on
Functionality
Strabon stSPARQL/
GeoSPARQL*
R-tree-
over-GiST
WKT / GML
support
Yes • OGC-SFA
• Egenhofer
• RCC-8
Parliament GeoSPARQL R-Tree WKT / GML
support
Yes •OGC-SFA
•Egenhofer
•RCC-8
Brodt et al.
(RDF-3X)
SPARQL R-Tree WKT support No OGC-SFA
Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC8
AllegroGraph Extended
SPARQL
Distribution
sweeping
technique
2D point
geometries
Partial •Buffer
•Bounding Box
•Distance
OWLIM Extended
SPARQL
Custom 2D point
geometries
(W3C Basic Geo
Vocabulary)
No •Point-in-polygon
•Buffer
•Distance
Virtuoso SPARQL R-Tree 2D point
geometries
(in WKT)
Yes SQL/MM
(subset)
uSeekM GeoSPARQL R-tree-over
GiST
WKT support No OGC-SFA
22. 5/26/2013 23
Outline
• Motivation
• SPARQL extensions for querying geospatial data
expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial
RDF stores using Geographica
• Conclusions
23. 5/26/2013 24
Related Work
• Benchmarks for SPARQL query processing:
• LUBM (JWS 2005)
• DBpedia SPARQL Benchmark (ISWC 2011)
• …
• Benchmarks for geospatial relational DBMS:
• Sequoia 2000 (SIGMOD 1993)
• VESPA (BNCOD 2000)
• Jackpine (ICDE 2011)
• Benchmarks for spatial indexing and query processing operations
• Benchmarks for geospatial RDF stores:
• A Benchmark for Spatial Semantic Web Systems (Kolas, SSWS 2008)
24. 5/26/2013 25
The Benchmark Geographica
• Aim: measure the performance of today’s geospatial RDF
stores
• Organized around two workloads:
• Real-world workload:
• Based on existing linked geospatial datasets and known
application scenarios
• Synthetic workload:
• Measure performance in a controlled environment where we
can play around with properties of the data and the queries.
• Γεωγραφικά: 17-volume geographical
encyclopedia by Στράβων (AD 17)
25. 5/26/2013 26
Real-World Workload
• Datasets: Real-world datasets for the
geographic area of Greece playing an
important role in the LOD cloud or having
complex geometries
• LinkedGeoData (LGD) for rivers and roads in
Greece
• GeoNames for Greece
• DBpedia for Greece
• Greek Administrative Geography (GAG)
• CORINE land cover (CLC) for Greece
• Hotspots
27. 5/26/2013 28
Real-World Workload
Parts
• For this workload, Geographica has
two parts (following Jackpine):
• Micro part: Tests primitive spatial
functions offered by geospatial RDF
stores
• Macro part: Simulates some
typical application scenarios
28. 5/26/2013 29
Real-World Workload
Micro part
• 29 queries that consist of one or two triple patterns and a
spatial function.
• Functions included:
• Spatial analysis: boundary, envelope, convex hull,
buffer, area
• Topological: equals, intersects, overlaps, crosses,
within, distance, disjoint
• As used in spatial selections and spatial joins
• Spatial aggregates: extent, union
• Functions are applied to many representative types of
geometries .
29. 5/26/2013 30
Example – spatial analysis
• Construct the boundary of all polygons of CLC
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ( geof:boundary(?o1) as ?ret )
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/clc>
{ ?s1 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o1}
}
30. 5/26/2013 31
Example – spatial selection
• Find all points in Geonames that are contained
in a given polygon.
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?s1 ?o1
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/geonames>
{ ?s1 <http://www.geonames.org/ontology#asWKT> ?o1 }
FILTER( geof:sfWithin(?o1,
"GIVEN_POLYGON_IN_WKT"^^<http://www.opengis.net/ont/geosparql#
wktLiteral>)). }
31. 5/26/2013 32
Example – spatial join
• Find all pairs of GAG polygons that overlap
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?s1 ?s2
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/gag>
{?s1 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o1}
GRAPH <http://geographica.di.uoa.gr/dataset/clc>
{?s2 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o2}
FILTER( geof:sfOverlaps(?o1, ?o2) )
}
32. Query Point Query Line Query Polygon
Point Within Buffer
Distance
Within
Disjoint
Line Equals
Crosses
Intersects
Disjoint
Polygon Intersects Equals
Overlaps
Real-World Workload
Micro part
• Spatial Selections
• Spatial Joins
Point Line Polygon
Point Equals Intersects Intersects
Within
Line Intersects
Within
Crosses
Polygon Within
Touches
Overlaps
33. 5/26/2013 34
Real-World Workload
Macro part
• Reverse Geocoding: Attribute a street address and
place to a given point.
• Queries:
• Find the closest populated place (from GeoNames)
• Find the closest street (from LGD)
34. 5/26/2013 35
Real-World Workload
Macro part
• Web Map Search and Browsing
• Queries:
• Find the co-ordinates of a given POI based on thematic
criteria (from GeoNames)
• Find roads in a given bounding box around these co-
ordinates (from LGD)
• Find other POI in a given bounding box around these co-
ordinates (from LGD)
35. 5/26/2013 36
Real-World Workload
Macro part
• Rapid Mapping for Fire Monitoring: representative of typical
rapid mapping tasks carried out by space agencies in the case of
an emergency
36. 5/26/2013 37
Real-World Workload
Macro part
• Rapid Mapping for Fire Monitoring
• Queries:
• Simple tasks retrieving background mapping information:
• Find the land cover of areas inside a given bounding box (from
CLC)
• Find primary roads inside a given bounding box (from LGD)
• Find capitals of prefectures inside a given bounding box (from
GAG)
• (Often) more complex main mapping tasks:
• Find municipality boundaries inside a bounding box (from GAG)
• Find coniferous forests which are on fire (from CLC and
Hotspots)
• Find road segments which may be damaged by fire (from LGD
and Hotspots)
37. 5/26/2013 38
Experimental Evaluation
Real-world workload
• Geospatial RDF stores tested: Strabon, Parliament, uSeekM
• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM,
4 HDD with RAID-5
• Micro part:
• Metric: response time
• Run 3 times and compute the median
• Time out: 1 hour
• Run both on warm caches and cold caches
• Macro part:
• Run each scenario many times for one hour with warm
caches
• Metric: Average time for a complete execution
38. 5/26/2013 39
Results
Data Storage
• Strabon is the slowest given the PostGIS
tables and indices it is building.
• uSeekM does much better using the Sesame
native store
• Parliament slows down due to geo:asWKT
subproperty inferencing
System Strabon uSeekM Parliament
Storage time 550 sec 214 sec 250 sec
40. 5/26/2013 41
Results
Macro part
Scenario Strabon uSeekM Parliament
Reverse Geocoding 65 sec 0.77 sec 2.6 sec
Map Search and
Browsing
0.9 sec 0.6 sec 22.2 sec
Rapid Mapping for Fire
Monitoring
207.4 sec - -
41. 5/26/2013 42
Synthetic Workload
• Goal: Evaluate performance in a controlled environment
where we can vary the thematic and spatial selectivity of
queries
• Thematic selectivity: the fraction of the total
geographic features of a dataset that satisfy the non-
spatial part of a query
• Spatial selectivity: the fraction of the total
geographic features of a dataset which satisfy the
topological relation in the FILTER clause of a query
42. 5/26/2013 43
Synthetic Workload
• Dataset: As in VESPA, the produced datasets are
geographic features on a synthetic map:
• States in a country ( (n/3)2 )
• Land ownership (n2)
• Roads (n)
• POI (n2)
43. 5/26/2013 44
Synthetic Workload
Ontology
• Based roughly on the ontology of OpenStreetMap and
the GeoSPARQL vocabulary
• Tagging each feature with a key enables us to select a
known fraction of features in a uniform way
44. 5/26/2013 45
Synthetic Workload
Query template for spatial selections
SELECT ?s
WHERE {
?s ns:hasGeometry ?g.
?s c:hasTag ?tag.
?g ns:asWKT ?wkt.
?tag ns:hasKey “THEMA”
FILTER(FUNCTION(?wkt, “GEOM”))}
• Parameters:
• ns: specifies the kind of feature (and geometry type) examined
• THEMA: defines the thematic selectivity of the query using another
parameter k
• FUNCTION: specifies the topological function examined
• GEOM: specifies a rectangle that controls the spatial selectivity of
the query
46. 5/26/2013 47
Experimental Evaluation
Synthetic workload
• Geospatial RDF stores tested: Strabon, Parliament,
uSeekM
• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz,
24GB RAM, 4 HDD with RAID-5
• Details:
• Metric: response time
• Run 3 times and compute the median
• Time out: 1 hour
• Run both on warm caches and cold caches
47. 5/26/2013 48
Results
Data Storage
• We generate the synthetic dataset with n=512
and k=9. This results in:
• 28900 states
• 262144 land ownerships
• 512 roads
• 262144 points of interest
• Size: 3,880,224 triples (745 MB)
System Strabon uSeekM Parliament
Storage time 221 sec 406 sec 462 sec
50. 5/26/2013 51
Outline
• Motivation
• SPARQL extensions for querying geospatial data
expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF
stores using Geographica
• Conclusions
51. 5/26/2013 52
Conclusions
• We defined Geographica, a new
comprehensive benchmark for geospatial RDF
stores, and used it to compare 3 relevant
systems (Strabon, Parliament, uSeekM).
• More implementation work is necessary in
adding features to other geospatial RDF stores
beyond the ones tested.
• More real-world scenarios can be added.
• Next target: spatiotemporal RDF stores
52. 5/26/2013 53
Advertisement
Strabon: http://strabon.di.uoa.gr
Geographica: http://geographica.di.uoa.gr
Tutorials/Survey paper
More at ESWC
Paper: Storing and Querying the Valid Time of Triples in Linked
Geospatial Data
Demo: Sextant, a web tool for browsing and mapping Linked
Geospatial Data http://strabon.di.uoa.gr:8080/sextant/
Project networking: TELEIOS
http://www.earthobservatory.eu