Slides from my Introduction to PostGIS workshop at the FOSS4G conference in 2009. The material is available at http://revenant.ca/www/postgis/workshop/
Workshop Format Interactive C:orkshopsntro_to_postgis http://localhost/postgis/workshop Hands On All examples are executable Copy and Paste from HTML Exercises Answers will be provided (eventually) Try This sections for greater challenge
What is a Spatial Database? Spatial Data Types Point a single coordinate of two to four dimensions
What is a Spatial Database? Spatial Data Types Linestring a set of two or more coordinates linear interpretation of path between coordinates
What is a Spatial Database? Spatial Data Types Linearring a linestring with three or more coordinates the start and end points are the same
What is a Spatial Database? Spatial Data Types Polygon a set of one or more linearrings one ring defines the exterior boundary remainder defines the holes in the polygon
What is a Spatial Database? Spatial Data Types Multi-geometries (Multipoint, Multilinestring, Multipolygon) a set of like geometries
What is a Spatial Database? Spatial Data Types Geometrycollection a set of various (unmatched) geometries
What is a Spatial Database? Spatial Data Types Spatial Indexing R-tree Quadtree Grid-based
What is a Spatial Database? Spatial Data Types Spatial Indexing Spatial Functions Construction Serialisation Predicates Analysis Accessors Builders Aggregates
What is PostGIS? Spatial Extensions for PostgreSQL Provides Spatial Data Type Provides Spatial Indexing Provides Spatial Functions
What is PostGIS? Spatial Extensions for PostgreSQL PostgreSQL Extensions for Spatial ACID transaction guarantees Enterprise reliability Crash recovery Hot backup Replication SQL support
PostGIS History Initially released May 2001 with only load/store and index support. Functions added based on Simple Features for SQL (SFSQL) UMN MapServer added PostGIS support in mid-2001 Geometry Engine, Open Source (GEOS) was released, providing the hard SFSQL functions PostGIS 1.0 provided a faster, lightweight geometry object
Who uses PostGIS? Institut Geographique National, France National mapping agency of France Stores high-res topographic data GlobeXplorer Provides web-based access to petabytes of imagery PostGIS is used to manage metadata and search for relevant imagery
PostgreSQL Setup Installing PostgreSQL Software provided it C:orkshopsostGISoftware Double click postgresql-8.1.1-1-windows.exe Creating a Spatial Database Using pgAdmin III Loading Spatial Data The horrors of the command line
Within a Distance SELECT namelow FROM jacksonco_streets, medford_parks WHERE ST_Distance(medford_parks.the_geom, jacksonco_streets.the_geom) < 5000 / 0.3048 AND medford_parks.name = 'Hawthorne Park / Pool';
Within a Distance SELECT namelow FROM jacksonco_streets, medford_parks WHERE ST_DWithin(medford_parks.the_geom, jacksonco_streets.the_geom, 5000 / 0.3048) AND medford_parks.name = 'Hawthorne Park / Pool';
Spatial Indexing SELECT namelow FROM jacksonco_streets, medford_parks WHERE jacksonco_streets.the_geom && medford_parks.the_geom AND medford_parks.name = 'Hawthorne Park / Pool';
Spatial Indexing SELECT namelow FROM jacksonco_streets, medford_parks WHERE ST_DWithin(medford_parks.the_geom, jacksonco_streets.the_geom, 5000 / 0.3048) AND medford_parks.name = 'Hawthorne Park / Pool';
Creating Indices CREATE INDEX jacksonco_schools_gix ON jacksonco_schools USING GIST (the_geom); CREATE INDEX jacksonco_taxlots_gix ON jacksonco_taxlots USING GIST (the_geom); CREATE INDEX medford_buildings_gix ON medford_buildings USING GIST (the_geom); CREATE INDEX medford_citylimits_gix ON medford_citylimits USING GIST (the_geom); CREATE INDEX medford_hydro_gix ON medford_hydro USING GIST (the_geom); CREATE INDEX medford_parks_gix ON medford_parks USING GIST (the_geom); CREATE INDEX medford_planzone_gix ON medford_planzone USING GIST (the_geom); CREATE INDEX medford_stormdrain_gix ON medford_stormdrain USING GIST (the_geom); CREATE INDEX medford_wards_gix ON medford_wards USING GIST (the_geom); CREATE INDEX medford_wetlands_gix ON medford_wetlands USING GIST (the_geom); CREATE INDEX medford_zoning_gix ON medford_zoning USING GIST (the_geom); CREATE INDEX tracts_gix ON tracts USING GIST (the_geom);
Vacuum Vacuum Recover or reuse disk space from obsolete rows Analyze Update query planner statistics Cluster Rewrite tables based on index ordering
Spatial Joins – Within SELECT name FROM jacksonco_schools, medford_citylimits WHERE ST_Within(jacksonco_schools.the_geom, medford_citylimits.the_geom);
Spatial Joins – Intersect SELECT SUM(ST_Length(jacksonco_streets.the_geom)) FROM jacksonco_streets, medford_citylimits WHERE ST_Intersects(jacksonco_streets.the_geom,medford_citylimits.the_geom);
Spatial Joins – Intersect SELECT jacksonco_schools.name, white_pop_1race * 1.0 / total_pop AS white_pop, black_pop_1race * 1.0 / total_pop AS black_pop, aindian_1race * 1.0 / total_pop AS indian_pop, asian_1race * 1.0 / total_pop AS asian_popp, hawaiian_1race * 1.0 / total_pop AS hawaiian_pop FROM tracts, race, jacksonco_schools WHERE tracts.ctidfp00 = race.geography_id2 AND ST_Intersects(tracts.the_geom, jacksonco_schools.the_geom);
Exercises How big is the largest building in Medford in square feet? In square metres?
Exercises How big is the largest building in Medford in square feet? SELECT ST_Area(the_geom) AS area FROM medford_buildings ORDER BY area DESC LIMIT 1; In square metres? SELECT ST_Area(ST_Transform(the_geom, 2839)) AS area FROM medford_buildings ORDER BY area DESC LIMIT 1;
Exercises What is the elevation of the 'South Medford' high school building?
Exercises What is the elevation of the 'South Medford' high school building? SELECT medford_buildings.elevation FROM medford_buildings, jacksonco_schools WHERE ST_Within(jacksonco_schools.the_geom, medford_buildings.the_geom) AND jacksonco_schools.name = 'South Medford';
Exercises What are the expected percentages of children in poverty at each school with a Kindergarten class?
Exercises What are the expected percentages of children in poverty at each school with a Kindergarten class? SELECT jacksonco_schools.name, poverty.poverty_level_under5years FROM jacksonco_schools, tracts, poverty WHERE tracts.ctidfp00 = geography_id2 AND ST_Within( jacksonco_schools.the_geom, tracts.the_geom ) AND jacksonco_schools.grade ~ 'K';
Exercises How much park area is there within the Medford city limits? SELECT SUM(ST_Area(medford_parks.the_geom)) FROM medford_parks, medford_citylimits WHERE ST_Intersects(medford_parks.the_geom, medford_citylimits.the_geom);
Exercises How many buildings are located within wetlands? SELECT count(*) FROM medford_buildings, medford_wetlands WHERE ST_Within(medford_buildings.the_geom, medford_wetlands.the_geom);
Exercises Which school is farthest from a park? SELECT jacksonco_schools.name, ST_Distance(jacksonco_schools.the_geom, medford_parks.the_geom) AS distance FROM jacksonco_schools, medford_parks ORDER BY distance desc LIMIT 1; Which is closest? SELECT jacksonco_schools.name, ST_Distance(jacksonco_schools.the_geom, medford_parks.the_geom) AS distance FROM jacksonco_schools, medford_parks ORDER BY distance asc LIMIT 1;
Exercises Which schools have the most park area within 400 feet? SELECT jacksonco_schools.name, SUM(ST_Area(medford_parks.the_geom)) AS area FROM jacksonco_schools, medford_parks WHERE ST_DWithin(jacksonco_schools.the_geom, medford_parks.the_geom, 400) GROUP BY jacksonco_schools.name ORDER BY area DESC; Within 1 km?
Exercises Which schools have the most park area within 400 feet? Within 1 km? SELECT jacksonco_schools.name, SUM(ST_Area(medford_parks.the_geom)) AS area FROM jacksonco_schools, medford_parks WHERE ST_DWithin(ST_Transform(jacksonco_schools.the_geom, 2839), ST_Transform(medford_parks.the_geom, 2839), 1000) GROUP BY jacksonco_schools.name ORDER BY area DESC;
Exercises Which schools have the most park area within 400 feet? Within 1 km? SELECT jacksonco_schools.name, SUM(ST_Area(medford_parks.the_geom)) AS area FROM jacksonco_schools, medford_parks WHERE ST_DWithin(jacksonco_schools.the_geom, medford_parks.the_geom, 1000 / 0.3048) GROUP BY jacksonco_schools.name ORDER BY area DESC;
Exercises What are the expected percentages of unmarried families for each school?
Exercises What are the expected percentages of unmarried families for each school? SELECT s.name, 100.0 * t.hh_unmarried / t.hh_total FROM jacksonco_schools s, unmarriedbytract t WHERE ST_Contains(t.the_geom, s.the_geom);
Exercises How many storm drains are within 500 feet of 'Bear Creek'?
Exercises How many storm drains are within 500 feet of 'Bear Creek'? SELECT count(*) FROM medford_stormdrain, medford_hydro WHERE ST_DWithin(medford_hydro.the_geom, medford_stormdrain.the_geom, 500) AND medford_hydro.stream_nam = 'Bear Creek';
Tuning PostgreSQL for Spatial C:rogram FilesostgreSQL.4ataostgresql.conf pgAdmin provides a Configuration Editor File ->Open postgresql.conf
shared_buffers Determines the amount of memory that is shared by back-end processes Default Value = 32MB Recommended Value = 500MB
work_mem Defines the amount of memory that a single process can use for sorting or hash operations Default Value = 1MB Recommended Value = 16MB
maintenance_work_mem Defines the amount of memory used for maintenance operations, such as vacuuming, index and foreign key creation. Default Value = 16MB Recommended Value = 16MB Can be set per-session before specific operations SET maintenance_work_mem TO '128MB'; VACUUM ANALYZE; SET maintenance_work_mem TO '16MB';
wal_buffers Amount of memory used by the write-ahead log (WAL). Default Value = 64kB Recommended Value = 1MB
checkpoint_segments Sets the number of log file segments that can be filled between WAL logs are flushed to disk. Default Value = 3 Recommended Value = 6
seq_page_cost Represents the cost of a sequential page access from disk. Default Value = 1.0 Recommended Value = 1.0
Query Plans Set of steps that PostgreSQL can use to generate the results of a query Multiple query plans are produced, costed and selected Cost is based on configuration parameters such as random_page_cost and seq_page_cost PostgreSQL and pgAdmin provides a way to view the victorious query plan
Test Query SELECT namelow FROM jacksonco_streets, medford_citylimits WHERE ST_Intersects(jacksonco_streets.the_geom, medford_citylimits.the_geom) GROUP BY namelow;
Sequence Scan Linear scan of every row in the table (medford_citylimits) Can evaluate filter conditions on scan
Index Scan Linear scan of an index (jacksonco_streets_gix) Evaluates the bounding box comparison during scan (jacksonco_streets.the_geom && medford_citylimits.the_geom) Comparison is evaluated for each result of the previous scan Execution time overlaps the sequence scan
Nested Loop Performs the join between the two scans One (sequence scan) is the outer loop Other (index scan) is the inner loop Further filter is evaluated Execution time includes index scan
Hash Aggregate Performs the grouping based on an attribute Only available for attributes with a hashing algorithm Executes after the nested loop completes
Visualisation uDig is used for visualisation It is available in c:orkshopsostGISoftwaredig Double-click on udig.bat
Validity Simple Features for SQL provides strict definitions of a valid feature Some functions require validity to perform as expected ST_IsValid(geometry) is provided to ensure feature validity
Validity Simple Features for SQL provides strict definitions of a valid feature Some functions require validity to perform as expected ST_IsValid(geometry) is provided to ensure feature validity SELECT count(*), ST_IsValid(the_geom) FROM jacksonco_taxlots GROUP BY ST_IsValid;
Fixing Validity UPDATE jacksonco_taxlots SET the_geom = ST_Multi(ST_Buffer(the_geom, 0)); SELECT count(*), ST_IsValid(the_geom) FROM jacksonco_taxlots GROUP BY ST_IsValid;
Exactly Equal Point-by-point comparison of two geometries SELECT a.name, b.name, CASE WHEN a.poly ~= b.poly THEN 'Exactly Equal' ELSE 'Not Exactly Equal' end FROM polygons as a, polygons as b;
Spatially Equal Tests the topology of two geometries for equality SELECT a.name, b.name, CASE WHEN ST_Equals(a.poly, b.poly) THEN 'Spatially Equal' ELSE 'Not Equal' end FROM polygons as a, polygons as b;
Equal Bounds Tests for equality of the bounding box SELECT a.name, b.name, CASE WHEN a.poly = b.poly THEN 'Equal Bounds' ELSE 'Non-equal Bounds' end FROM polygons as a, polygons as b;
Advanced Material Advanced Functions Aggregates / Deaggregates Processing Set Operations Performance Tools Manipulating the Query Planner Denormalization Data Partitioning
Using uDig uDig lacks a dynamic query capability Queries can be viewed by creating views CREATE VIEW example1 AS SELECT * FROM (VALUES (ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))')), (ST_GeomFromText('MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))'))) AS query(the_geom); SELECT populate_geometry_columns();
ST_Union(geometry) Merges geometries into a single (often multi-) geometry Support aggregate form as well as: ST_Union(geomA, geomB) ST_Union(geomArray[ ])
ST_Union(geometry) SELECT ST_AsText(ST_Union(st_geomfromtext)) FROM (SELECT ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))') UNION ALL SELECT ST_GeomFromText( 'MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))') ) as a;
ST_Collect(geometry) Returns a single multi-geometry or collection, but performs no merging of geometries Faster that ST_Union Supports aggregate form as well as: ST_Collect(geomA, geomB) ST_Collect(geomArray[ ])
ST_Collect(geometry) SELECT ST_AsText(ST_Collect(the_geom)) FROM (SELECT 'LINESTRING(0 0, 0 1)'::geometry the_geom UNION ALL SELECT 'LINESTRING(1 0, 1 1)'::geometry the_geom UNION ALL SELECT 'LINESTRING(0 0,1 0)'::geometry the_geom UNION ALL SELECT 'LINESTRING(1 1, 0 1)'::geometry the_geom) as a;
ST_Polygonize(geometry) SELECT ST_AsText(ST_Polygonize(the_geom)) FROM (SELECT 'LINESTRING(0 0, 0 1)'::geometry the_geom UNION ALL SELECT 'LINESTRING(1 0, 1 1)'::geometry the_geom UNION ALL SELECT 'LINESTRING(0 0,1 0)'::geometry the_geom UNION ALL SELECT 'LINESTRING(1 1, 0 1)'::geometry) as a;
ST_Dump(geometry) Splits multi-geometries and collections into a set of simple geometries Inverse ST_Collect (ish) Provides an index of the geometry within the collection (path) and the geometry itself (ST_Dump(the_geom)).geom (ST_Dump(the_geom)).path[1]
ST_DumpRings(geometry) Returns a set of polygons without holes Each polygon is one of the rings of the input polygon Also includes a path and geom components (ST_DumpRings(geom)).geom (ST_DumpRings(geom)).path[1]
Set Operations Produce results based on inclusion or exclusion of points from a geometry A geometry includes all point on or within its boundary Excludes all other points
Set Operations Produce results based on inclusion or exclusion of points from a geometry Point includes only the point itself
Set Operations Produce results based on inclusion or exclusion of points from a geometry Linestring includes the two end points and all point along its length
Set Operations Produce results based on inclusion or exclusion of points from a geometry Polygon Includes all exterior and interior rings Includes all points contained within the exterior ring and not contained within the interior rings Excludes all points contained within interior rings
ST_Union(geomA, geomB) Same as the ST_Union(geometry) aggregate Any point included in either geometry is included in the result SELECT ST_AsText(ST_Union( ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))'), ST_GeomFromText( 'MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))')));
ST_Difference(geomA, geomB) All point included in geomA that are not included in geomB Non-communicative; the order of geomA and geomB matters SELECT ST_AsText(ST_Difference( ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))'), ST_GeomFromText( 'MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))')));
ST_SymDifference(geomA, geomB) All points that or included in only one of geomA and geomB, but not both ST_Union(ST_Difference(A,B),ST_Difference(B,A)) SELECT ST_AsText(ST_SymDifference( ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))'), ST_GeomFromText( 'MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))')));
ST_Intersection(geomA, geomB) All points included in both geomA and geomB SELECT ST_AsText(ST_Intersection( ST_GeomFromText( 'MULTIPOLYGON(((-77 56,-52 18,-88 -27,-10 -13,-11 38,-77 56)))'), ST_GeomFromText( 'MULTIPOLYGON(((-49 63,-32 24,-39 -7,-66 -19,-72 -9,-74 31,-49 63)))')));
ST_Buffer(geometry, distance) Returns a geometry containing an area with distance of the input geometry. Can take a third argument defining number of segments used to approximate a quarter circle (defaults to 8) SELECT ST_AsText(ST_Buffer(ST_GeomFromText( 'LINESTRING(-2 -2,-2 2,2 2,2 4)'), 1));
ST_ConvexHull(geometry) Returns a polygon that encloses the input geometry, removing all possible concave angles Analogous to 'shrink wrapping' the geometry SELECT ST_AsText(ST_ConvexHull( 'LINESTRING(-2 -2,-2 2,2 2,2 4)'));
ST_SnapToGrid(...) Snaps every point in the input geometry to the defined grid Allows you to: control the precision of data for reliable comparison reduce size of data Numerous variants to give you what you need
ST_SnapToGrid(...) Numerous variants to give you what you need ST_SnapToGrid(geom, size) ST_SnapToGrid(geom, sizeX, sizeY) ST_SnapToGrid(geom, originX, originY, sizeX, sizeY) ST_SnapToGrid(geom, originPoint, sizeX, sizeY, sizeZ, sizeM)
ST_Simplify(geom, tolerance) Creates a simpler geometry Simplifications are made to ensure that the new line deviates from the original by less that the tolerance
ST_Simplify(geom, tolerance) The greatest distance between the candidate line and the original line is calculated If the distance is greater than the tolerance, the candidate line is rejected and two candidate lines are created
ST_Simplify(geom, tolerance) Each new candidate line is tested in the same manner as before When the distances is less than the tolerance, the candidate line is accepted
ST_Simplify(geom, tolerance) SELECT ST_AsText(geom) AS original, ST_AsText(ST_Simplify(geom, 3)) AS "3", ST_AsText(ST_Simplify(geom, 2.9)) AS "2.9" FROM ( SELECT ST_GeomFromText('LINESTRING(0 0,3 2.5,0 5)') AS geom) AS a;
ST_SimplifyPreserveTopology Same algorithm as ST_Simplify Will not change the type of geometry SELECT ST_AsText(geom) AS original, ST_AsText(ST_Simplify(geom, 2)) AS Simplify, ST_AsText(ST_SimplifyPreserveTopology(geom, 2)) AS PreserveTopology FROM ( SELECT ST_GeomFromText('POLYGON((0 0,0 1,1 1,1 0,0 0))') AS geom) AS a;
Query Planner Manipulation Query planner only considers approved actions Various execution paths can be disabled Cost estimates can be manipulated on a per-session basis
Denormalisation Split feature types into multiple tables based on known or expected access patterns Roads are visualised with different style classes and rendered at different scales
Partitioning Data partitioning complicates things Updates need to be split across all tables Queries need to be directed at the appropriate table(s) Keeping both normalised and denormalised tables creates huge redundancy Partitioning addresses these problems Stores data in denormalised tables Provides a normalised interface to handle queries across the feature type
Fin Workshop Evaluations are Online (url removed) This material is made available under the Creative Commons Attribution-ShareAlike 3.0 licence http://creativecommons.org/licenses/by-sa/3.0/us/