This document discusses extending PostgreSQL's BRIN index type to support geospatial data in PostGIS. BRIN indexes store summarized data for blocks of table pages to create smaller indexes. The authors implemented BRIN operator classes and support functions for PostGIS geometry and geography data types that store bounding boxes. Tests on real-world geospatial datasets show BRIN indexes outperforming GiST indexes in size and query performance, especially for sorted data. Future work could include supporting more operators and nearest-neighbor searches.
1. J. Rouhaud - julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Block Range INdexing
on
geospatial
data
extend BRIN support to PostGIS
2. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Who we are?Who we are?
Giuseppe BroccoloGiuseppe Broccolo
@giubro
gbroccolo7
gbroccolo
gemini__81
@rjuju123
rjuju
rjuju
Julien RouhaudJulien Rouhaud
3. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
PostgreSQL index typePostgreSQL index type
• Aka Access Methods
• Can add user defined new access methods
– Fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD (avoid catalog update)
• Generic WAL interface (crash safe)
• Some external access methods available
– Bloom (postgrespro)
– RUM (postgrespro)
4. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BtreeBtree
• Most famous access method
• Balanced tree
• Keys are sorted
• Only handle “standard” operators (=, <, <=, >, >= )
• Index and Index Only Scan
• Unique constraints
5. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Extend native access methodsExtend native access methods
• Access method use operator classes (opclass)
CREATE INDEX idx_name
USING method ON tbl (col opclass_name);
• Define, for each type and access method
– operators for the needed types
– support functions depending on the access method
• Can be extended by third part extensions
6. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN operator class for geometry
CREATE OPERATOR CLASS brin_geometry_inclusion_ops_2d
DEFAULT FOR TYPE geometry
USING brin
FAMILY brin_geometry_inclusion_ops_2d AS
OPERATOR 3 &&(geometry, geometry),
OPERATOR 7 ~(geometry, geometry),
OPERATOR 8 @(geometry, geometry),
FUNCTION 1 brin_inclusion_opcinfo(internal) ,
FUNCTION 2 geom2d_brin_inclusion_add_value(internal, internal,
internal, internal) ,
FUNCTION 3 brin_inclusion_consistent(internal, internal, internal) ,
FUNCTION 4 brin_inclusion_union(internal, internal, internal) ,
STORAGE box2df;
7. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN - OverviewBRIN - Overview
• Block Range Index
– S. Riggs, Á Herrera, H. Linnakangas in 9.5
• “Summarized” index
– Split the table in ranges
• A range is a set of pages (blocks)
• By default 128 pages
• Overloaded with the pages_per_range parameter
– Really small index, faster to create
CREATE INDEX ON table USING BRIN (col) WITH (pages_per_range = 64)
8. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – How data are storedBRIN – How data are stored
• For each range, computes a summary of the blocks
• Two kind of opclass:
– minmax : for numerical values. Key values are added follwing
sorting criteria + sorting operators
– inclusion : for more exotic data. Key values are added following
inclusion criteria + inclusion operators
• Extensibility
– Provide support functions + operator
http://www.postgresql.org/docs/9.5/static/brin-extensibility.html
9. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – internal overviewBRIN – internal overview
10. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – Search the indexBRIN – Search the index
• Finding the matching rows
– Scan the whole index for potentially matching ranges
– Scan and recheck all these table blocks to keep only matching rows
• Can be seen as partial/enhanced sequential scan
– Requires that rows are well distributed to avoid scanning too much
table blocks
– Useless on random data
11. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN - tipsBRIN - tips
• Tuning pages_per_range
– Bigger value
– Smaller index
– More false positive
– More table blocks to recheck
• Need to be tuned depending on queries
– Less selective queries can usually afford bigger pages_per_range
12. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – More tipsBRIN – More tips
• New blocks (not in existing ranges) are not summarized
– Perform VACUUM
– Or call brin_summarize_new_value()
• Brin will never “shrinks” summarized data
– If you update / delete boundary data, need to REINDEX
13. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGISBRIN for PostGIS
• Original idea and POC by Giuseppe Broccolo
• C implementation and debugging in OSGEO Code Sprint in Paris
– Bug in PostGIS fixed (cast to BOX3d)
– Thanks a lot to Ronan Dunklau
• Code cleanup and some improvements since
• Committed in PostGIS repo on July, 31st
(2b3c01b)
• Bug found few days ago
– Fix already submitted, see trac issue #3665
– wait for PostGIS 2.3.1
14. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – featuresBRIN for PostGIS – features
• Use PostGIS infrastructure for storing bounding box
– Some helper functions and casts added
• 3 opclass
– 2D (default), 3D, 4D geometry
– geography
– box2d/box3d (cross-operators defined in the opfamily)
• Operators: &&, @, ~ (2D), &&& (3D, 4D)
• No kNN support, since BRIN doesn't handle kNN
15. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – storage datatype
• Storing only the bounding box, float-precision (same as GiST)
– gidx (3D, 4D geometry)
– box2df (2D geometry, geography)
• Fixed length, and keeps index as small as possible
• Allow indexing of big geometries ( > 8kB)
• CPU saving, comparing bounding boxes is really cheap
– Check done twice with BRIN (Bitmap Index Scan and Bitmap Heap Scan)
• Recheck with exact arithmetics for exact results needed
– Hopefully most of rows discarded by the index
– done in st_* functions (same as GiST)
16. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – C infrastructureBRIN for PostGIS – C infrastructure
• Use as much as possible BRIN inclusion infrastructure
– support functions don’t handle storing different datatype
• A new “CAST” support function would be nice :)
• brin_inclusion_add_value() need to be redefined
– Datatype is known, save indirect function calls
– Take a geometry / geography and store (or merge) its bounding box
• Fixed number of dimensions, depending on operator class
• Handle NULL and empty geometries
• Still need to import 3 private defines
– #define INCLUSION_UNION 0
– #define INCLUSION_UNMERGEABLE 1
– #define INCLUSION_CONTAINS_EMPTY 2
17. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – What's next?BRIN for PostGIS – What's next?
• Handle more operators for 3D and 4D (with SFCGAL lib)
• kNN? Would require new infrastructure in PostgreSQL
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
k
ORDER BY geoms <-> point
LIMIT N
18. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
• A simple test on a sorted dataset
• OSM data
– Speed up searches of lines intersercting the buffer of a set of points
• LiDAR dataset
– Speed up searches of 3D points with X,Y coordinates included inside
a 2D polygon
Benchmarks:Benchmarks:
19. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE INDEX idx_gist ON points USING gist
-# (geom gist_geometry_ops_nd);
=# CREATE INDEX idx_brin_128 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d);
=# CREATE INDEX idx_brin_10 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d)
-# WITH (pages_per_range=10);
BRIN(128)/GiST 1/2000
BRIN(10)/GiST 1/1000
BRIN(128)/GiST 1/30
BRIN(10)/GiST 1/20
Sizes
Building
times
20. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# SELECT * FROM points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 50/1
BRIN(10)/GiST 6/1
BRIN(128)/SeqScan 1/60
BRIN(10)/SeqScan 1/350
Execution times
21. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
22ndnd
example: “unsorted”example: “unsorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE TABLE unsorted_points AS
-# SELECT * FROM points ORDER BY random();
=# SELECT * FROM unsorted_points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 2800/1
BRIN(10)/GiST 400/1
BRIN(128)/SeqScan 1/1
BRIN(10)/SeqScan 1/12
Execution times
23. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
...what about...what about INSERTINSERT’s?’s?
INSERT INTO points
SELECT ST_MakePoint(a, a, a)
FROM generate_series(1, 1000) AS f(a);
GiST 6 times than insertions without index
BRIN (10) 3 times than insertions without index
BRIN (128) 2 times than insertions without index
...blocks may be “not sorted” anymore!
24. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
http://earthquake.usgs.gov/http://earthquake.usgs.gov/ ~100k lat/lon WGS84 points
(earthquakes in the world)
~1M EPSG 102008 linestrings
(railways in the west side)
Which railway was closeWhich railway was close
(<10km) to an earthquake(<10km) to an earthquake
epicenter?epicenter?
http://www.openstreetmap.org/http://www.openstreetmap.org/
25. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
=# CREATE INDEX idx_rl_gist ON westrailways USING gist
-# (ST_Buffer(GEOGRAPHY(ST_Transform(geom), 4326), 10000));
=# CREATE INDEX idx_eq_brin ON worldearthquakes USING gist (coord);
BRIN/GiST Sizes ~1/100
BRIN/GiST Times ~1/200
=# CREATE INDEX idx_rl_brin ON westrailways USING brin
-# (ST_Buffer(GEOGRAPHY(ST_Trasform(geom), 4326), 10000))
-# WITH (pages_per_range=10);
=# CREATE INDEX idx_eq_brin ON worldearthquakes USING brin (coord)
-# WITH (pages_per_range=10);
BRINBRIN
GiSTGiST
26. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
=# WITH us_eq AS (
-# SELECT coord FROM world
-# WHERE coord && ‘BOX2D(-126.90 49.73, -65.83 24.73)’::box2d
-# ) SELECT * FROM railways r, us_eq u
-# WHERE ST_Buffer(GEOGRAPHY(ST_Transform(geom), 4326), 10000) && u.coord;
without indexes: ~60s
with GiST: ~20ms
with BRIN: ~400ms
BRIN: 20X slower than GiST – 150X faster than full scan
27. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
LiDAR datasetLiDAR dataset
• Build a Data Elevation Model (DEM) of the
terrain from a set of points (point cloud)
• Resolution: ~1÷50 “# of returns”/m2
→ billions of points!
28. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG
• PostgreSQL9.3 + pg_pointcloud + PostGIS
• GiST indexing in PostGIS, achieved performances:
– RAM 16GB, 1 billion of points (~80GB)
– index size ~O(table size)
– Index was used:
• up to ~300M points in bbox inclusion searches
• up to ~10M points in kNN searches
LiDAR size: ~O(109
÷1011
) → few % can be properly indexed!
http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex
29. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project
• 3D point cloud, coverage: almost the whole Netherlands
– EPSG: 28992, 8÷25 points/m2
• 1.6TB, ~250G points in ~560M patches (compression: ~10x)
– .las files imported through PDAL driver – filter.chipper
• available RAM: 16GB
• Database based on pg_pointcloud extension
– the point structure:
X Y Z scan LAS time RGB chipper
32b 32b 32b 40b 16b 64b 48b 32b
the “indexed” part (can be converted to PostGIS datatype)
(thanks to Tom Van Tilburg)
30. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Typical searches on ahn2 -Typical searches on ahn2 - x3d_viewerx3d_viewer
• Intersection with a polygon (PostGIS)
– the part that lasts longer – need indexes here!
• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)
• DEM: constrained Delaunay Triang. (SFCGAL)
31. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
WITH patches AS (
SELECT patches FROM ahn2
WHERE patches && ST_GeomFromText(‘POLYGON(...)’)
), points AS (
SELECT ST_Explode(patches) AS points
FROM patches
), sorted_points AS (
SELECT points,
ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt
FROM points ORDER BY points <#> poly_pt LIMIT 1;
), sel AS (
SELECT points FROM sorted_points
WHERE points && ST_GeomFromText(‘POLYGON(...)’)
)
SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;
All in the DB & just with one query!All in the DB & just with one query!
32. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - GiST performancepatches && polygons - GiST performance
GiST 2.5 days
GiST 29GB
index building
index size
polygon
size
timing
~O(10m) ~20ms
~O(100m) ~60ms
~O(1Km) ~3s
~O(10km) hours
searches based on GiST
index not contained in
RAM anymore
(~5G points → ~3%)
33. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygon
size
timing
~O(10m) ~150s
searches based on BRIN
index size
34. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygon
size
timing
~O(10m) ~150s
searches based on BRIN
How data was inserted...
LASLASLASLASLASLAS
chipper
chipper
chipper
PDAL driver
(parallel processes)
ahn2
index size
35. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE INDEX patch_geohash ON ahn2
USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));
CLUSTER ahn2 USING patch_geohash;
Morton order [http://geohash.org/]
Need more than 2X table size free space
36. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
37. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
45 hours
38. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons – BRIN performancepatches && polygons – BRIN performance
polygon
size
BRIN
timing
GiST
timing
~O(10m) ~380ms ~20ms
~O(100m) ~400ms ~60ms
~O(1Km) ~2.7s ~3s
~O(10km) ~4.7s hours
• After sorting: 150s→380ms
• GiST X20 faster than BRIN [r~O(10m)]
– BRIN X200 faster than full scan
• BRIN X1000 faster than full scan [r~O(100m)]
•
• BRIN ~ GiST [r~O(1Km)]
• Just BRIN is used for searches r>1km
39. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST scan
= x10÷x100 GiST scan
= x10÷x100 GiST scan
patch explosion
+
NN sorting
constrained
triangulation
40. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST scan
= x10÷x100 GiST scan
= x10÷x100 GiST scan
patch explosion
+
NN sorting
constrained
triangulation
41. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
ConclusionsConclusions
• BRINs can be successfully used in geo DB based on PostgreSQL
– totally support PostGIS datatype, starting since PostGIS 2.3.0
– easier indexes maintenance, low indexing time
– less specific than GiST...but not to much!
• Really small indexes!
– GiST performances drop as well as it cannot be totally contained in RAM
• Can PostgreSQL be used to manage LiDAR data?
– Yes, at least for bbox inclusion searches
– make sure that data has the same sequentiality of .las