2. Geospatial Data
• Values of type Geometry
• Points – location information (latitude and longitude)
• Lines – roads, cables
• Polygons – countries, regions, provinces, cities, cell tower coverage areas
• Stored as strings in Well-Known-Text (WKT) format
CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
3. • Multi-* - a collection of geometries of the same type
Multi-Geometry Types
CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
4. • A collection of geometries of different types
• Used to capture the result of an operation,
• e.g. intersection, difference, etc.
GeometryCollection
intersection
LINESTRING (…)
POLYGON(…)
GEOMETRYCOLLECTION
(LINESTRING(…), POINT(…))
5. Geospatial Functions
• ISO Standard - SQL/MM Part 3
• MM – multimedia
• Part 3 Spatial
• ST_ prefix (S – spatial, T – temporal)
• https://prestodb.io/docs/current/functions/geospatial.html
6. WKT-to-Geometry
• To Geometry
• ST_GeometryFromText(wkt)
• ST_Point(x, y)
• ST_Point(longitude, latitude)
• To WKT
• ST_AsText
7. Operations
• Inputs (and outputs) are geometry objects, not WKT strings
ST_Contains(g1, g2) ST_Intersection(g1, g2)
ST_Intersects(g1, g2) ST_ConvexHull(g)
ST_Distance(g1, g2) * ST_Union(g1, g2)
ST_Area(g) * ST_Centroid(g)
ST_Length(g) * ST_Envelope(g)
(*) Computation is done on Eucledian plane in the units of the input geometries
8. Spatial Join
• ST_Contains, ST_Intersects and ST_Distance
• R-Tree index for the build side
CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree
SELECT *
FROM points, polygons
WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat))
9. Spatial Join Types
• Inner join
• Left join enables scalar correlated subqueries
SELECT (SELECT arbitrary(name) FROM polygons WHERE ST_Contains(polygon, ST_Point(lng, lat)))
FROM points
10. Distance Query
• Logically equivalent to ST_Contains(circle(b.point, radius), a.point)
• Radius can be a constant value or an expression using symbols from b
• A lot more efficient then ST_Contains(ST_Buffer(b.point, radius), a.point)
• What about the units?
SELECT * FROM a, b
WHERE ST_Distance(a.point, b.point) <= radius
11. Angular units
• 1 degree of latitude =~ 111.321 km and stays constant
• 1 degree of longitude =~ 111.321 km * cos(latitude)
• ST_Distance, ST_Area, ST_Length return results in angular units
• Within small areas, multiply by
• 111.321 km * cos(radians(ST_Y(ST_Centroid(ST_Envelope(g1)))))
Latitude at the center of the
bounding box of g1
12. Distance Query in km: Step 1
• ST_Distance(center, p) <= r / 111.321
• For r = 1
• Circle of 1 km near equator
• Ellipse with minor axis along the longitude
• and smaller diameter of 0.34 km at 70th
latitude
13. Distance Query in km: Step 2
• ST_Distance(center, p) <= r / (111.321 * cos(radians(center.latitude))))
• Ellipse with minor axis fixed at r km
• major axis starting at r km near equator
• and growing to 3r at 70th latitude
14. Distance Query in km: Step 3
SELECT *
FROM a, b
WHERE ST_Distance(ST_Point(a.lng, a.lat), ST_Point(b.lng, b.lat)) <=
radius_km / (111.321 * cos(radians(b.lat)))
AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km
• Divide the radius by 111.321 * cos(latitude)
• Refine spatial join results using great_circle_distance
17. • Choose zoom level based on radius
• tile width >= radius
• Refine join results using great_circle_distance
Distance Query using Bing Tiles
SELECT *
FROM a, (
SELECT * FROM b
CROSS JOIN UNNEST (bing_tiles_around(lat, lng, 14)) as t(tile)
) b
WHERE bing_tile_at(a.lat, a.lng, 14) = b.tile
AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km
18. • Tile size depends on zoom level and latitude
• Smaller tiles at larger zoom levels and near the polls
How Large are Bing Tiles?
Tile width in kilometers
20. Spatial Join
• Spatial joins are similar to Hash joins
• Hash-based partitioning -> Spatial partitioning
• Hash table -> Spatial Index (R-Tree)
• Broadcast spatial join requires only spatial index
SELECT *
FROM polygons, points
WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat))
CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree
21. Spatial Partitioning
• Overall extent is split into non-overlapping rectanges
• KDB-Tree (K = 2)
• Total number of records, overall extent and a sample of
the data is needed to compute the partitioning scheme
• Some records may go into multiple partitions
• Polygons may intersect multiple rectangles
• Efficient inline de-dup technique is necessary
• Reference point of the intersection of bounding boxes
22. Inline Deduplication
• Some shapes intersect multiple partitions
• Only one partition contains a reference point
• Lower left corner of the intersection of bounding boxes