This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in PostgreSQL with an emphasis on full-text search, various datatypes like hstore, array, xml, json as well as custom datatypes, TOAST compression and a taste of other interesting features worth following up on.
2. How do you pronounce it?
Thanks to Thad for the suggestion
Answer Response Percentage
post-gres-q-l 2379 45%
post-gres 1611 30%
pahst-grey 24 0%
pg-sequel 50 0%
post-gree 350 6%
postgres-sequel 574 10%
p-g 49 0%
database 230 4%
Total 5267
3. Who is this guy?
⢠Iâm Barry Jones
⢠Been a developing web apps since â98
⢠Not a DBA
⢠Performance and infrastructure nut
⢠Pragmatic Idealist
â Prefer the best solution possible for current circumstances vs the best solution
possible at all costs
4. What is PostgreSQL NOT?
⢠NOT a silver bullet
⢠NOT the answer to life, the universe and
everything
⢠NOT better at everything than everybody
⢠NOT always the best option for your needs
⢠NOT used to itâs potential in most cases
⢠NOT owned by Oracle or Microsoft
5. What IS PostgreSQL?
⢠Fully ACID compliant
⢠Feature rich and extensible
⢠Fast, scalable and leverages multicore
processors very well
⢠Enterprise class with quality corporate
support options
⢠Free as in beer
⢠Itâs kindâve nifty
6. Laundry List of Features
⢠Multi-version Concurrency Control (MVCC)
⢠Point in Time Recovery
⢠Tablespaces
⢠Asynchronous replication
⢠Nested Transactions
⢠Online/hot backups
⢠Genetic query optimizer multiple index types
⢠Write ahead logging (WAL)
⢠Internationalization: character sets, locale-aware sorting, case sensitivity,
formatting
⢠Full subquery support
⢠Multiple index scans per query
⢠ANSI-SQL:2008 standard conformant
⢠Table inheritance
⢠LISTEN / NOTIFY event system
⢠Ability to make a Power Point slide run out of room
7. What are we covering today?
⢠Full text-search
⢠Built in data types
⢠User defined data types
⢠Automatic data compression
⢠A look at some other cool features and
extensions, depending how weâre doing on
time
8. Full-text Search
⢠What about�
â Solr
â Elastic Search
â Sphinx
â Lucene
â MySQL
⢠All have their purpose
â Distributed search of multiple document types
⢠Sphinx
â Client search performance is all that matters
⢠Solr
â Search constantly incoming data with
streaming index updates
⢠Elastic Search excels
â You really like Java
⢠Lucene
â You want terrible search results that donât even
make sense to you much less your users
⢠MySQL full text search = the worst thing in the world
9. Full-text Search
⢠Complications of stand alone search engines
â Data synchronization
⢠Managing deltas, index updates
⢠Filtering/deleting/hiding expired data
⢠Search server outages, redundancy
â Learning curve
â Character sets match up with my database?
â Additional hardware / servers just for search
â Can feel like a black box when you get a support
question asking âwhy is/isnât this showing up?â
10. Full-text Search
⢠But what if your needs are more like:
â Search within my database
â Avoid syncing data with outside systems
â Avoid maintaining outside systems
â Less black box, more control
11. Full-text Search
⢠tsvector
â The text to be searched
⢠tsquery
â The search query
⢠to_tsvector(âthe church is AWESOMEâ) @@ to_tsquery(SEARCH)
⢠@@ to_tsquery(âchurchâ) == true
⢠@@ to_tsquery(âchurchesâ) == true
⢠@@ to_tsquery(âawesomeâ) == true
⢠@@ to_tsquery(âtheâ) == false
⢠@@ to_tsquery(âchurches & awesomeâ) == true
⢠@@ to_tsquery(âchurch & okayâ) == false
⢠to_tsvector(âthe church is awesomeâ)
â 'awesom':4 'church':2
⢠to_tsvector(âsimpleâ,âthe church is awesomeâ)
â 'are':3 'awesome':4 'church':2 'the':1
12. Full-text Search
⢠ALTER TABLE mytable ADD COLUMN search_vector tsvector
⢠UPDATE mytable
SET search_vector = to_tsvector(âenglishâ,coalesce(title,ââ) || â â ||
coalesce(body,ââ) || â â || coalesce(tags,ââ))
⢠CREATE INDEX search_text ON mytable USING gin(search_vector)
⢠SELECT some, columns, we, need
FROM mytable
WHERE search_vector @@ to_tsquery(âenglishâ,âJesus & awesomeâ)
ORDER BY ts_rank(search_vector,to_tsquery(âenglishâ,âJesus & awesomeâ))
DESC
⢠CREATE TRIGGER search_update BEFORE INSERT OR UPDATE
ON mytable FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(search_vector, âenglishâ, title, body, tags)
13. Full-text Search
⢠CREATE FUNCTION search_trigger RETURNS trigger AS $$
begin
new.search_vector :=
setweight(to_tsvector(âenglishâ,coalesce(new.title,ââ)),âAâ) ||
setweight(to_tsvector(âenglishâ,coalesce(new.body,ââ)),âDâ) ||
setweight(to_tsvector(âenglishâ,coalesce(new.tags,ââ)),âBâ);
return new;
end
$$ LANGUAGE plpgsql;
⢠CREATE TRIGGER search_vector_update
BEFORE INSERT OR UPDATE OF title, body, tags ON mytable
FOR EACH ROW EXECUTE PROCEDURE search_trigger();
14. Full-text Search
⢠A variety of dictionaries
â Various Languages
â Thesaurus
â Snowball, Stem, Ispell, Synonym
â Write your own
⢠ts_headline
â Snippet extraction and highlighting
15. Datatypes: ranges
⢠int4range, int8range, numrange, tsrange, tstzrange, daterange
⢠SELECT int4range(10,20) @> 3 == false
⢠SELECT numrange(11.1,22.2) && numrange(20.0,30.0) == true
⢠SELECT int4range(10,20) * int4range(15,25) == 15-20
⢠CREATE INDEX res_index ON schedule USING gist(during)
⢠ALTER TABLE schedule ADD EXCLUDE USING gist (during WITH &&)
ERROR: conflicting key value violates exclusion constraint
âschedule_during_exclâ
DETAIL: Key (during)=([ 2010-01-01 14:45:00, 2010-01-01
15:45:00 )) conflicts with existing key (during)=([ 2010-01-01
14:30:00, 2010-01-01 15:30:00 )).
16. Datatypes: hstore
⢠properties
â {âauthorâ => âJohn Grishamâ, âpagesâ => 535}
â {âdirectorâ => âJon Favreauâ, âruntimeâ = 126}
⢠SELECT ⌠FROM mytable
WHERE properties -> âdirectorâ LIKE â%Favreauâ
â Does not use an index
⢠WHERE properties @> (âauthorâ LIKE â%Grishamâ)
â Uses an index to only check properties with an âauthorâ
⢠CREATE INDEX table_properties ON mytable USING gin(properties)
18. Datatypes: JSON
⢠Validate JSON structure
⢠Convert row to JSON
⢠Functions and operators very similar to hstore
19. Datatypes: XML
⢠Validates well-formed XML
⢠Stores like a TEXT field
⢠XML operations like Xpath
⢠Canât index XML column but you can index the
result of an Xpath function
20. Data compression with TOAST
⢠TOAST = The Oversized Attribute Storage Technique
⢠TOASTable data is automatically TOASTed
⢠Example:
â stored a 2.2m XML document
â storage size was 81k
21. User created datatypes
⢠Built in types
â Numerics, monetary, binary, time, date, interval, boolean,
enumerated, geometric, network address, bit string, text search, UUID,
XML, JSON, array, composite, range
â Add-ons for more such as UPC, ISBN and more
⢠Create your own types
â Address (contains 2 streets, city, state, zip, country)
â Define how your datatype is indexed
â GIN and GiST indexes are used by custom datatypes
22. Further exploration: PostGIS
⢠Adds Geographic datatypes
⢠Distance, area, union, intersection, perimeter
⢠Spatial indexes
⢠Tools to load available geographic data
⢠Distance, Within, Overlaps, Touches, Equals,
Contains, Crosses
⢠SELECT name, ST_AsText(geom)
FROM nyc_subway_stations
WHERE name = âBroad Stâ
⢠SELECT name, boroname
FROM nyc_neighborhoods
WHERE ST_Intersects(geom,
ST_GeomFromText(âPOINT(583571 4506714)â,26918)
⢠SELECT sub.name, nh.name, nh.borough
FROM nyc_neighborhoods AS nh
JOIN nyc_subway_stations AS sub
ON ST_Contains(nh.geom, sub.geom)
WHERE sub.name = âBroad Stâ
23. Further exploration: Functions
⢠Can be used in queries
⢠Can be used in stored procedures and triggers
⢠Can be used to build indexes
⢠Can be used as table defaults
⢠Can be written in PL/pgSQL, PL/Tcl, PL/Perl,
PL/Python out of the box
⢠PL/V8 is available an an extension to use
Javascript
24. Further exploration: PLV8
⢠CREATE OR REPLACE FUNCTION plv8_test(keys text[], vals text[])
RETURNS text AS $$
var o = {};
for(var i = 0; i < keys.length; i++) {
o[keys[i]] = vals[i];
}
return JSON.stringify(o);
$$ LANGUAGE plv8 IMMUTABLE STRICT;
SELECT plv8_test(ARRAY[ânameâ,âageâ],ARRAY[âTomâ,â29â]);
⢠CREATE TYPE rec AS (i integer, t text);
CREATE FUNCTION set_of_records RETURNS SETOF rec AS $$
plv8.return_next({âiâ: 1,âtâ: âaâ});
plv8.return_next({âiâ: 2,âtâ: âbâ});
$$ LANGUAGE plv8;
SELECT * FROM set_of_records();
25. Further exploration: Async commands
/ indexes
⢠Fine grained control within functions
â PQsendQuery
â PQsendQueryParams
â PQsendPrepare
â PQsendQueryPrepared
â PQsendDescribePrepared
â PQgetResult
â PQconsumeInput
⢠Per connection asynchronous commits
â set synchronous_commit = off
⢠Concurrent index creation to avoid blocking large tables
â CREATE INDEX CONCURRENTLY big_index ON mytable (things)
27. References / Credits
⢠NOTE: Some code samples in this presentation have minor
alterations for presentation clarity (such as leaving out
dictionary specifications on some search calls, etc)
⢠http://www.postgresql.org/docs/9.2/static/index.html
⢠http://workshops.opengeo.org/postgis-intro/
⢠http://stackoverflow.com/questions/15983152/how-can-i-find-out-
how-big-a-large-text-field-is-in-postgres
⢠https://devcenter.heroku.com/articles/heroku-postgres-
extensions-postgis-full-text-search
⢠http://railscasts.com/episodes/345-hstore?view=asciicast
⢠http://www.slideshare.net/billkarwin/full-text-search-in-
postgresql