SlideShare a Scribd company logo
1 of 100
Download to read offline
On Beyond Data Types
Jonathan S. Katz
PostgreSQL España
February 16, 2015
About
• CTO, VenueBook
• Co-Organizer, NYC PostgreSQL User Group
(NYCPUG)
• Director, United States PostgreSQL Association
• ¡Primera vez en España!
• @jkatz05
2
A Brief Note on NYCPUG
• Active since 2010
• Over 1,300 members
• Monthly Meetups
• PGConf NYC 2014
• 259 attendees
• PGConf US 2015:
• Mar 25 - 27 @ New York Marriott
Downtown
• Already 160+ registrations
3
http://www.pgconf.us
Community Updates
Community Updates
8
Data Types
• Fundamental
• 0 => 1
• 00001111
• Building Blocks
• 0x41424344
• Accessibility
• 1094861636
• 'ABCD'
9
C
• char
• int
• float, double
• bool (C99)
• (short, long, signed, unsigned)
10
PostgreSQL
• char, varchar, text
• smallint, int, bigint
• real, double
• bool
11
I kid you not,
I can spend close to an hour
on just those data types
12
PostgreSQL Primitives
Oversimplified Summary
• Strings
• Use "text" unless you need actual limit on strings, o/w use "varchar"
• Don't use "char"
• Integers
• Use "int"
• If you seriously have big numbers, use "bigint"
• Numerical types
• Use "numeric" almost always
• If have IEEE 754 data source you need to record, use "float"
13
And If We Had More Time
• (argh no pun intended)
• timestamp with time zone, timestamp without time
zone
• date
• time with time zone, time without time zone
• interval
14
Summary of PostgreSQL
Date/Time Types
• They are AWESOME
• Flexible input that you can customize
• Can perform mathematical operations in native
format
• Thank you intervals!
• IMO better support than most programming
languages have, let alone databases
15
16
PostgreSQL is a ORDBMS
• Designed to support more complex data types
• Complex data types => additional functionality
• Data Integrity
• Performance
17
Let's Start Easy: Geometry
18
PostgreSQL
Geometric Types
19
Name Size Representation Format
point 16 bytes point on a plane (x,y)
lseg 32 bytes finite line segment ((x1, y1), (x2, y2))
box 32 bytes rectangular box ((x1, y1), (x2, y2))
path 16 + 16n
bytes
closed path (similar to
polygon, n = total points
((x1, y1), (x2, y2), …, (xn,
yn))
path 16 + 16n
bytes
open path, n = total
points
[(x1, y1), (x2, y2), …, (xn,
yn)]
polygon 40 bytes
+ 16n
polygon ((x1, y1), (x2, y2), …, (xn,
yn))
circle 24 bytes circle – center point and
radius
<(x, y), r>
http://www.postgresql.org/docs/current/static/datatype-geometric.html
Geometric Operators
• 31 different operators built into PostgreSQL
20
obdt=# SELECT point(1,1) + point(2,2);!
----------!
(3,3)
obdt=# SELECT point(1,1) ~= point(2,2);!
----------!
f!
!
obdt=# SELECT point(1,1) ~= point(1,1);!
----------!
t
obdt=# SELECT point(1,1) <-> point(4,4);!
------------------!
4.24264068711929
Equivalence
Translation
Distance
http://www.postgresql.org/docs/current/static/functions-geometry.html
Geometric Operators
21
obdt=# SELECT '(0,0),5)'::circle && '((2,2),3)'::circle;!
----------!
t
obdt=# SELECT '(0,0),5)'::circle @> point(2,2);!
----------!
t
Overlapping
Containment
obdt=# SELECT '((0,0), (1,1))'::lseg ?|| '((1,-1), (2,0))'::lseg;
----------!
t
Is Parallel?
http://www.postgresql.org/docs/current/static/functions-geometry.html
obdt=# SELECT '((0,0), (1,1))'::lseg ?# '((0,0), (5,5))'::box;!
----------!
t
Intersection
Geometric Functions
• 13 non-type conversion functions built into PostgreSQL
22
obdt=# SELECT area('((0,0),5)'::circle);!
------------------!
78.5398163397448
Area
obdt=# SELECT center('((0,0),(5,5))'::box);!
-----------!
(2.5,2.5)
Center
obdt=# SELECT length('((0,0),(5,5))'::lseg);!
------------------!
7.07106781186548
Length
obdt=# SELECT width('((0,0),(3,2))'::box);!
-------!
3
obdt=# SELECT height('((0,0),(3,2))'::box);!
--------!
2
Width
Height
Geometric Performance
• Size on Disk
• Consider I/O on reads
• But indexing should help!!
23
Geometric Performance
24
CREATE TABLE houses (plot box);!
!
INSERT INTO houses!
SELECT box(!
! point((500 * random())::int, (500 * random())::int),!
! point((750 * random() + 500)::int, (750 * random() + 500)::int)!
)!
FROM generate_series(1, 1000000);
obdt=# CREATE INDEX houses_plot_idx ON houses (plot);!
ERROR: data type box has no default operator class for access
method "btree"!
HINT: You must specify an operator class for the index or define
a default operator class for the data type.
Solution #1: Expression Indexes
25
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;!
-------------!
Seq Scan on houses (cost=0.00..27353.00 rows=5000 width=32) (actual
time=0.077..214.431 rows=26272 loops=1)!
Filter: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double
precision))!
Rows Removed by Filter: 973728!
Total runtime: 215.965 ms
obdt=# CREATE INDEX houses_plot_area_idx ON houses (area(plot));!
!
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;!
------------!
Bitmap Heap Scan on houses (cost=107.68..7159.38 rows=5000 width=32) (actual
time=5.433..14.686 rows=26272 loops=1)!
Recheck Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <=
75000::double precision))!
-> Bitmap Index Scan on houses_plot_area_idx (cost=0.00..106.43 rows=5000
width=0) (actual time=4.300..4.300 rows=26272 loops=1)!
Index Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <=
75000::double precision))!
Total runtime: 16.025 ms
http://www.postgresql.org/docs/current/static/indexes-expressional.html
Solution #2: GiST Indexes
26
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;!
------------!
Seq Scan on houses (cost=0.00..19853.00 rows=1000 width=32) (actual time=0.009..96.680
rows=40520 loops=1)!
Filter: (plot @> '(300,300),(100,100)'::box)!
Rows Removed by Filter: 959480!
Total runtime: 98.662 ms
obdt=# CREATE INDEX houses_plot_gist_idx ON houses USING gist(plot);!
!
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;!
------------!
Bitmap Heap Scan on houses (cost=56.16..2813.20 rows=1000 width=32) (actual
time=12.053..24.468 rows=40520 loops=1)!
Recheck Cond: (plot @> '(300,300),(100,100)'::box)!
-> Bitmap Index Scan on houses_plot_gist_idx (cost=0.00..55.91 rows=1000 width=0)
(actual time=10.700..10.700 rows=40520 loops=1)!
Index Cond: (plot @> '(300,300),(100,100)'::box)!
Total runtime: 26.451 ms
http://www.postgresql.org/docs/current/static/indexes-types.html
Solution #2+: KNN-Gist
27
obdt=# CREATE INDEX locations_geocode_gist_idx ON locations USING gist(geocode);!
!
obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;!
------------!
Limit (cost=0.29..1.06 rows=10 width=16) (actual time=0.098..0.235 rows=10 loops=1)!
-> Index Scan using locations_geocode_gist_idx on locations (cost=0.29..77936.29 rows=1000000
width=16) (actual time=0.097..0.234 rows=10 loops=1)!
Order By: (geocode <-> '(41.88853,-87.628852)'::point)!
Total runtime: 0.257 ms
obdt=# CREATE TABLE locations (geocode point);!
!
obdt=# INSERT INTO locations!
SELECT point(90 * random(), 180 * random())!
FROM generate_series(1, 1000000);
obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;!
------------!
Limit (cost=39519.39..39519.42 rows=10 width=16) (actual time=319.306..319.309 rows=10 loops=1)!
-> Sort (cost=39519.39..42019.67 rows=1000110 width=16) (actual time=319.305..319.307 rows=10
loops=1)!
Sort Key: ((geocode <-> '(41.88853,-87.628852)'::point))!
Sort Method: top-N heapsort Memory: 25kB!
-> Seq Scan on locations (cost=0.00..17907.38 rows=1000110 width=16) (actual
time=0.019..189.687 rows=1000000 loops=1)!
Total runtime: 319.332 ms
http://www.slideshare.net/jkatz05/knn-39127023
• For when you are doing real things with shapes
28
• (and geographic information systems)
Solution #3: PostGIS
For more on PostGIS, please
go back in time to yesterday
and see Regina & Leo's tutorial
29
Let's Take a Break With UUIDs
30
2024e06c-44ff-5047-b1ae-00def276d043
! Universally Unique Identifiers
! 16 bytes on disk
! Acceptable input formats include:
– A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
– {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
– a0eebc999c0b4ef8bb6d6bb9bd380a11
– a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
– {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
UUID + PostgreSQL
31
http://www.postgresql.org/docs/current/static/datatype-uuid.html
UUID Functions
32
http://www.postgresql.org/docs/current/static/uuid-ossp.html
obdt=# CREATE EXTENSION IF NOT EXISTS "uuid-ossp";!
"
obdt=# SELECT uuid_generate_v1();!
uuid_generate_v1 !
--------------------------------------!
d2729728-3d50-11e4-b1af-005056b75e1e!
"
obdt=# SELECT uuid_generate_v1mc();!
uuid_generate_v1mc !
--------------------------------------!
e04668a2-3d50-11e4-b1b0-1355d5584528!
"
obdt=# SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresopen.org');!
uuid_generate_v3 !
--------------------------------------!
d0bc1ba2-bf07-312f-bf6a-436e18b5b046!
"
obdt=# SELECT uuid_generate_v4();!
uuid_generate_v4 !
--------------------------------------!
0809d8fe-512c-4f02-ba37-bc2e9865e884!
"
obdt=# SELECT uuid_generate_v5(uuid_ns_url(), 'http://www.postgresopen.org');!
uuid_generate_v5 !
--------------------------------------!
d508c779-da5c-5998-bd88-8d76d446754e
Network Address Types
• inet (IPv4 & IPv6)
– SELECT '192.168.1.1'::inet;
– SELECT '192.168.1.1/32'::inet;
– SELECT '192.168.1.1/24'::inet;
• cidr (IPv4 & IPv6)
– SELECT '192.168.1.1'::cidr;
– SELECT '192.168.1.1/32'::cidr;
– SELECT '192.168.1.1/24'::cidr;
• macaddr
– SELECT '08:00:2b:01:02:03'::macaddr;
33
http://www.postgresql.org/docs/current/static/datatype-net-types.html
Networks can do Math
34
http://www.postgresql.org/docs/current/static/functions-net.html
Postgres Can Help Manage
Your Routing Tables
35
http://www.postgresql.org/docs/current/static/functions-net.html
...perhaps with a foreign data wrapper and a background
worker, perhaps it can fully mange your routing tables?
Arrays
• ...because a database is an "array" of tuples
• ...and a "tuple" is kind of like an array
• ...can we have an array within a tuple?
36
37
Array Facts
obdt=# SELECT (ARRAY[1,2,3])[1];!
-------!
1
38
obdt=# SELECT (ARRAY[1,2,3])[0];!
-------!
Arrays are 1-indexed
obdt=# CREATE TABLE lotto (!
! ! numbers int[3]!
);!
"
obdt=# INSERT INTO lotto
VALUES (!
! ARRAY[1,2,3,4]!
);!
"
obdt=# SELECT * FROM lotto;!
-----------!
{1,2,3,4}
Size constraints not enforced
Arrays Are Malleable
39
obdt=# UPDATE lotto SET numbers = ARRAY[1,2,3];!
"
obdt=# SELECT * FROM lotto;!
---------!
{1,2,3}!
"
obdt=# UPDATE lotto SET numbers[3] = '7';!
"
obdt=# SELECT * FROM lotto;!
---------!
{1,2,7}!
"
obdt=# UPDATE lotto SET numbers[1:2] = ARRAY[6,5];!
"
obdt=# SELECT * FROM lotto;!
---------!
{6,5,7}
Array Operations
• <, <=, =, >= >, <>
– full array comparisons
– B-tree indexable
40
SELECT ARRAY[1,2,3] @> ARRAY[1,2];!
SELECT ARRAY[1,2] <@ ARRAY[1,2,3];
SELECT ARRAY[1,2,3] || ARRAY[3,4,5];!
SELECT ARRAY[ARRAY[1,2]] || ARRAY[3,4];!
SELECT ARRAY[1,2,3] || 4;
SELECT ARRAY[1,2,3] && ARRAY[3,4,5]; Overlaps
Containment
Concatenation
Integer Arrays Use GIN
41
obdt=# CREATE INDEX int_arrays_data_gin_idx ON int_arrays USING GIN(data);!
"
obdt=# EXPLAIN ANALYZE SELECT *!
FROM int_arrays!
WHERE 5432 = ANY (data);!
---------------!
Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual
time=1.237..157.397 rows=3 loops=1)!
Filter: (5432 = ANY (data))!
Rows Removed by Filter: 999997!
Total runtime: 157.419 ms!
"
obdt=# EXPLAIN ANALYZE SELECT *
FROM int_arrays
WHERE ARRAY[5432] <@ data;!
---------------!
Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33)
(actual time=0.023..0.024 rows=3 loops=1)!
Recheck Cond: ('{5432}'::integer[] <@ data)!
-> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50
rows=5000 width=0) (actual time=0.019..0.019 rows=3 loops=1)!
Index Cond: ('{5432}'::integer[] <@ data)!
Total runtime: 0.090 ms
Array Functions
• modification
! SELECT array_append(ARRAY[1,2,3], 4);!
! SELECT array_prepend(1, ARRAY[2,3,4]);!
! SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);!
! SELECT array_remove(ARRAY[1,2,1,3], 1);!
! SELECT array_replace(ARRAY[1,2,1,3], 1, -4);!
• size
! SELECT array_length(ARRAY[1,2,3,4], 1); -- 4!
! SELECT array_ndims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);!
! -- 2!
! SELECT array_dims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);!
! -- [1:2][1:2]
42
http://www.postgresql.org/docs/current/static/functions-array.html
Array Functions
43
obdt=# SELECT array_to_string(ARRAY[1,2,NULL,4], ',', '*');!
-----------------!
1,2,*,4
obdt=# SELECT unnest(ARRAY[1,2,3]);!
unnest !
--------!
1!
2!
3
Array to String
Array to Set
http://www.postgresql.org/docs/current/static/functions-array.html
array_agg
• useful for variable-length lists or "unknown # of columns"
obdt=# SELECT!
! t.title!! array_agg(s.full_name)!
FROM talk t!JOIN speakers_talks st ON st.talk_id = t.id!JOIN speaker s ON s.id = st.speaker_id!GROUP BY t.title;!
"
title | array_agg !
---------------------+-----------!
Data Types | {Jonathan, Jim}!
Administration | {Bruce}!
User Groups | {Josh, Jonathan, Magnus}
44
http://www.postgresql.org/docs/current/static/functions-array.html
Ranges
• Scheduling
• Probability
• Measurements
• Financial applications
• Clinical trial data
• Intersections of ordered data
45
Why Range Overlaps Are Difficult
46
Before Postgres 9.2
• OVERLAPS
"
"
"
• Limitations:
• Only date/time
• Start <= x <= End
SELECT!
! ('2013-01-08`::date, '2013-01-10'::date) OVERLAPS
('2013-01-09'::date, '2013-01-12'::date);
47
Postgres 9.2+
• INT4RANGE (integer)!
• INT8RANGE (bigint)!
• NUMRANGE (numeric)!
• TSRANGE (timestamp without time zone)!
• TSTZRANGE (timestamp with time zone)!
• DATERANGE (date)
48
http://www.postgresql.org/docs/current/static/rangetypes.html
Range Type Size
• Size on disk = 2 * (data type) + 1
• sometimes magic if bounds are
equal
obdt=# SELECT pg_column_size(daterange(CURRENT_DATE, CURRENT_DATE));!
----------------!
9!
"
obdt=# SELECT pg_column_size(daterange(CURRENT_DATE,CURRENT_DATE + 1));!
----------------!
17
49
Range Bounds
• Ranges can be inclusive, exclusive or both
• [2,4] => 2 ≤ x ≤ 4
• [2,4) => 2 ≤ x < 4
• (2,4] => 2 < x ≤ 4
• (2,4) => 2 < x < 4
"
• Can also be empty
50
Infinite Ranges
• Ranges can be infinite
– [2,) => 2 ≤ x < ∞	
  
– (,2] => -∞ < x ≤ 2	
  
• CAVEAT EMPTOR
– “infinity” has special meaning with timestamp ranges
– [CURRENT_TIMESTAMP,) = [CURRENT_TIMESTAMP,]	
  
– [CURRENT_TIMESTAMP, 'infinity') <> [CURRENT_TIMEAMP, 'infinity']
51
Constructing Ranges
obdt=# SELECT '[1,10]'::int4range;!
-----------!
[1,11)
52
Constructing Ranges
• Constructor defaults to '[)'
53
obdt=# SELECT numrange(9.0, 9.5); !
------------!
[9.0,9.5)
Finding Overlapping Ranges
obdt=# SELECT *!
FROM cars!
WHERE cars.price_range && int4range(13000, 15000, '[]')!
ORDER BY lower(cars.price_range);!
-----------!
id | name | price_range !
----+---------------------+---------------!
5 | Ford Mustang | [11000,15001)!
6 | Lincoln Continental | [12000,14001)
54
http://www.postgresql.org/docs/current/static/functions-range.html
Ranges + GiST
obdt=# CREATE INDEX ranges_bounds_gist_idx ON cars USING gist
(bounds);!
"
obdt=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE
int4range(500,1000) && bounds;!
------------!
Bitmap Heap Scan on ranges !
(actual time=0.283..0.370 rows=653 loops=1)!
Recheck Cond: ('[500,1000)'::int4range && bounds)!
-> Bitmap Index Scan on ranges_bounds_gist_idx (actual
time=0.275..0.275 rows=653 loops=1)!
Index Cond: ('[500,1000)'::int4range && bounds)!
Total runtime: 0.435 ms
55
Large Search Range?
test=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE
int4range(10000,1000000) && bounds;!
QUERY PLAN
-------------!
Bitmap Heap Scan on ranges!
(actual time=184.028..270.323 rows=993068 loops=1)!
Recheck Cond: ('[10000,1000000)'::int4range && bounds)!
-> Bitmap Index Scan on ranges_bounds_gist_idx ! !
(actual time=183.060..183.060 rows=993068 loops=1)!
Index Cond: ('[10000,1000000)'::int4range &&
bounds)!
Total runtime: 313.743 ms
56
SP-GiST
• space-partitioned generalized search tree
• ideal for non-balanced data structures
– k-d trees, quad-trees, suffix trees
– divides search space into partitions of unequal size
• matching partitioning rule = fast search
• traditionally for "in-memory" transactions,
converted to play nicely with I/O
57
http://www.postgresql.org/docs/9.3/static/spgist.html
GiST	
  vs	
  SP-­‐GiST:	
  Space
GiST Clustered SP-GiST Clustered GiST Sparse SP-GiST Sparse
100K Size 6MB 5MB 6MB 11MB
100K Time 0.5s .4s 2.5s 7.8s
250K Size 15MB 12MB 15MB 28MB
250K Time 1.5s 1.1s 6.3s 47.2s
500K Size 30MB 25MB 30MB 55MB
500K Time 3.1s 3.0s 13.9s 192s
1MM Size 59MB
52MB
60MB 110MB
1MM Time 5.1s 5.7s 29.2 777s
58
Scheduling
obdt=# CREATE TABLE travel_log (!
id serial PRIMARY KEY,!
name varchar(255),!
travel_range daterange,!
EXCLUDE USING gist (travel_range WITH &&)!
);!
"
obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Chicago',
daterange('2012-03-12', '2012-03-17'));!
"
obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Austin',
daterange('2012-03-16', '2012-03-18'));!
"
ERROR: conflicting key value violates exclusion constraint
"travel_log_trip_range_excl"!
DETAIL: Key (trip_range)=([2012-03-16,2012-03-18)) conflicts with
existing key (trip_range)=([2012-03-12,2012-03-17)).
59
Extending Ranges
obdt=# CREATE TYPE inetrange AS RANGE (!
! SUBTYPE = inet!
);!
"
obdt=# SELECT '192.168.1.8'::inet <@ inetrange('192.168.1.1', '192.168.1.10');!
----------!
t!
"
obdt=# SELECT '192.168.1.20'::inet <@ inetrange('192.168.1.1', '192.168.1.10');!
----------!
f
60
Now For Something Unrelated
Let's talk non-relational data in PostgreSQL
61
hstore
• key-value store in PostgreSQL
• binary storage
• key / values represented as strings when
querying
CREATE EXTENSION hstore;
SELECT 'jk=>1, jm=>2'::hstore; !
--------------------!
"jk"=>"1", "jm"=>"2"
62
http://www.postgresql.org/docs/current/static/hstore.html
Making hstore objects
obdt=# SELECT hstore(ARRAY['jk', 'jm'], ARRAY['1', '2']);!
---------------------!
"jk"=>"1", "jm"=>"2"!
"
obdt=# SELECT hstore(ARRAY['jk', '1', 'jm', '2']);!
---------------------!
"jk"=>"1", "jm"=>"2"!
"
obdt=# SELECT hstore(ROW('jk', 'jm'));!
---------------------!
"f1"=>"jk", "f2"=>"jm"
63
Accessing hstore
obdt=# SELECT ('jk=>1, jm=>2'::hstore) -> 'jk';!
----------!
1!
"
obdt=# SELECT ('jk=>1, jm=>2'::hstore) -> ARRAY['jk','jm'];!
----------!
{1,2}!
"
obdt=# SELECT delete('jk=>1, jm=>2'::hstore, 'jm');!
-----------!
"jk"=>"1"
64
hstore operators
obdt=# SELECT ('jk=>1, jm=>2'::hstore) @> 'jk=>1'::hstore;!
----------!
t!
"
obdt=# SELECT ('jk=>1, jm=>2'::hstore) ? 'sf';!
----------!
f!
"
obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?& ARRAY['jk', 'sf'];!
----------!
f!
"
obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?| ARRAY['jk', 'sf'];!
----------!
t
65
hstore Performance
66
obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';!
-----------------------!
Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual
time=0.071..214.007 rows=1 loops=1)!
Filter: (data ? '3'::text)!
Rows Removed by Filter: 999999!
Total runtime: 214.028 ms
obdt=# CREATE INDEX keypairs_data_gin_idx ON keypairs USING gin(data);!
"
obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';!
--------------!
Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24)
(actual time=0.046..0.046 rows=1 loops=1)!
Recheck Cond: (data ? '3'::text)!
-> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50
rows=1000 width=0) (actual time=0.041..0.041 rows=1 loops=1)!
Index Cond: (data ? '3'::text)!
Total runtime: 0.073 ms
JSON and PostgreSQL
• Started in 2010 as a Google Summer of Code Project
• https://wiki.postgresql.org/wiki/
JSON_datatype_GSoC_2010
• Goal:
• be similar to XML data type functionality in
Postgres
• be committed as an extension for PostgreSQL 9.1
67
What Happened?
• Different proposals over how to finalize the
implementation
• binary vs. text
• Core vs Extension
• Discussions between “old” vs. “new” ways of
packaging for extensions
68
Foreshadowing
69
Foreshadowing
70
PostgreSQL 9.2: JSON
• JSON data type in core PostgreSQL
• based on RFC 4627
• only “strictly” follows if your database encoding
is UTF-8
• text-based format
• checks for validity
71
PostgreSQL 9.2: JSON
obdt=# SELECT '[{"PUG": "NYC"}]'::json;!
------------------!
[{"PUG": "NYC"}]!
"
"
obdt=# SELECT '[{"PUG": "NYC"]'::json;!
ERROR: invalid input syntax for type json at character 8!
DETAIL: Expected "," or "}", but found "]".!
CONTEXT: JSON data, line 1: [{"PUG": "NYC"]
72
http://www.postgresql.org/docs/current/static/datatype-json.html
PostgreSQL 9.2: JSON
• array_to_json
73
obdt=# SELECT array_to_json(ARRAY[1,2,3]);!
---------------!
[1,2,3]
PostgreSQL 9.2: JSON
• row_to_json
74
obdt=# SELECT row_to_json(category) FROM category;!
------------!
{"cat_id":652,"cat_pages":35,"cat_subcats":
17,"cat_files":0,"title":"Continents"}
PostgreSQL 9.2: JSON
• In summary, within core PostgreSQL, it was a
starting point
75
PostgreSQL 9.3:
JSON Ups its Game
• Added operators and functions to read / prepare
JSON
• Added casts from hstore to JSON
76
PostgreSQL 9.3: JSON
Operator Description Example
-> return JSON array element OR
JSON object field
’[1,2,3]’::json -> 0;
’{"a": 1, "b": 2, "c": 3}’::json -> ’b’;
->> return JSON array element OR
JSON object field AS text
[’1,2,3]’::json ->> 0;
’{"a": 1, "b": 2, "c": 3}’::json ->> ’b’;
#> return JSON object using path ’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’;
#>> return JSON object using path
AS text
’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’;
77
http://www.postgresql.org/docs/current/static/functions-json.html
Operator Gotchas
SELECT * FROM category_documents!
WHERE data->’title’ = ’PostgreSQL’;!
ERROR: operator does not exist: json = unknown!
LINE 1: ...ECT * FROM category_documents WHERE data->’title’ =
’Postgre...
^HINT: No operator matches the given name and argument
type(s). You might need to add explicit type casts.
78
Operator Gotchas
SELECT * FROM category_documents!
WHERE data->>’title’ = ’PostgreSQL’;!
-----------------------!
{"cat_id":252739,"cat_pages":14,"cat_subcats":0,"cat_files":
0,"title":"PostgreSQL"}!
(1 row)
79
For the Upcoming Examples
• Wikipedia English category titles – all 1,823,644 that I
downloaded"
• Relation looks something like:
80
Column | Type | Modifiers !
-------------+---------+--------------------!
cat_id | integer | not null!
cat_pages | integer | not null default 0!
cat_subcats | integer | not null default 0!
cat_files | integer | not null default 0!
title | text |
Performance?
EXPLAIN ANALYZE SELECT * FROM category_documents!
WHERE data->>’title’ = ’PostgreSQL’;!
---------------------!
Seq Scan on category_documents (cost=0.00..57894.18
rows=9160 width=32) (actual time=360.083..2712.094 rows=1
loops=1)!
Filter: ((data ->> ’title’::text) = ’PostgreSQL’::text)!
Rows Removed by Filter: 1823643!
Total runtime: 2712.127 ms
81
Performance?
CREATE INDEX category_documents_idx ON category_documents
(data);!
ERROR: data type json has no default operator class for
access method "btree"!
HINT: You must specify an operator class for the index or
define a default operator class for the data type.
82
Let’s Be Clever
• json_extract_path, json_extract_path_text
• LIKE (#>, #>>) but with list of args
83
SELECT json_extract_path(!
! ’{"a": 1, "b": 2, "c": [1,2,3]}’::json,!
! ’c’, ’0’);!
--------!
1
Performance Revisited
CREATE INDEX category_documents_data_idx!
ON category_documents!
! (json_extract_path_text(data, ’title’));!
"
obdt=# EXPLAIN ANALYZE!
SELECT * FROM category_documents!
WHERE json_extract_path_text(data, ’title’) = ’PostgreSQL’;!
------------!
Bitmap Heap Scan on category_documents (cost=303.09..20011.96
rows=9118 width=32) (actual time=0.090..0.091 rows=1 loops=1)!
Recheck Cond: (json_extract_path_text(data, VARIADIC
’{title}’::text[]) = ’PostgreSQL’::text)!
-> Bitmap Index Scan on category_documents_data_idx
(cost=0.00..300.81 rows=9118 width=0) (actual time=0.086..0.086 rows=1
loops=1)!
Index Cond: (json_extract_path_text(data, VARIADIC
’{title}’::text[]) = ’PostgreSQL’::text)!
"
Total runtime: 0.105 ms!
84
The Relation vs JSON
• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• Index Size for “title”
• category - 89MB
• category_documents - 89MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents - 0.070ms
85
JSON Aggregates
• (this is pretty cool)
• json_agg
86
http://www.postgresql.org/docs/current/static/functions-json.html
SELECT b, json_agg(stuff)!
FROM stuff!
GROUP BY b;!
"
b | json_agg !
------+----------------------------------!
neat | [{"a":4,"b":"neat","c":[4,5,6]}]!
wow | [{"a":1,"b":"wow","c":[1,2,3]}, +!
| {"a":3,"b":"wow","c":[7,8,9]}]!
cool | [{"a":2,"b":"cool","c":[4,5,6]}]
hstore gets in the game
• hstore_to_json
• converts hstore to json, treating all values as strings
• hstore_to_json_loose
• converts hstore to json, but also tries to distinguish between
data types and “convert” them to proper JSON representations
SELECT hstore_to_json_loose(’"a key"=>1, b=>t, c=>null, d=>12345,
e=>012345, f=>1.234, g=>2.345e+4’);
----------------
{"b": true, "c": null, "d": 12345, "e": "012345", "f": 1.234,
"g": 2.345e+4, "a key": 1}
87
Next Steps?
• In PostgreSQL 9.3, JSON became much more
useful, but…
• Difficult to search within JSON
• Difficult to build new JSON objects
88
89
“Nested hstore”
• Proposed at PGCon 2013 by Oleg Bartunov and Teodor Sigaev
• Hierarchical key-value storage system that supports arrays too
and stored in binary format
• Takes advantage of GIN indexing mechanism in PostgreSQL
• “Generalized Inverted Index”
• Built to search within composite objects
• Arrays, fulltext search, hstore
• …JSON?
90
http://www.pgcon.org/2013/schedule/attachments/280_hstore-pgcon-2013.pdf
How JSONB Came to Be
• JSON is the “lingua franca per trasmissione la data
nella web”
• The PostgreSQL JSON type was in a text format
and preserved text exactly as input
• e.g. duplicate keys are preserved
• Create a new data type that merges the nested
Hstore work to create a JSON type stored in a
binary format: JSONB
91
JSONB ≠ BSON
BSON is a data type created by MongoDB as a “superset of JSON”
"
JSONB lives in PostgreSQL and is just JSON that is stored in a binary format on disk
92
JSONB Gives Us
More Operators
• a @> b - is b contained within a?
• { "a": 1, "b": 2 } @> { "a": 1} -- TRUE!
• a <@ b - is a contained within b?
• { "a": 1 } <@ { "a": 1, "b": 2 } -- TRUE!
• a ? b - does the key “b” exist in JSONB a?
• { "a": 1, "b": 2 } ? 'a' -- TRUE!
• a ?| b - does the array of keys in “b” exist in JSONB a?
• { "a": 1, "b": 2 } ?| ARRAY['b', 'c'] -- TRUE!
• a ?& b - does the array of keys in "b" exist in JSONB a?
• { "a": 1, "b": 2 } ?& ARRAY['a', 'b'] -- TRUE
93
JSONB Gives us GIN
• Recall - GIN indexes are used to "look inside"
objects
• JSONB has two flavors of GIN:
• Standard - supports @>, ?, ?|, ?&
"
• "Path Ops" - supports only @>
94
CREATE INDEX category_documents_data_idx USING gin(data);
CREATE INDEX category_documents_path_data_idx USING gin(data jsonb_path_ops);
JSONB Gives Us Flexibility
obdt=# SELECT * FROM category_documents WHERE!
! data @> '{"title": "PostgreSQL"}';!
"
----------------!
{"title": "PostgreSQL", "cat_id": 252739,
"cat_files": 0, "cat_pages": 14, "cat_subcats": 0}!
"
"
obdt=# SELECT * FROM category_documents WHERE!
! data @> '{"cat_id": 5432 }';!
"
----------------!
{"title": "1394 establishments", "cat_id": 5432,
"cat_files": 0, "cat_pages": 4, "cat_subcats": 2}
95
JSONB Gives Us Speed
EXPLAIN ANALYZE SELECT * FROM category_documents!
! WHERE data @> '{"title": "PostgreSQL"}';!
------------!
Bitmap Heap Scan on category_documents (cost=38.13..6091.65
rows=1824 width=153) (actual time=0.021..0.022 rows=1 loops=1)!
Recheck Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)!
Heap Blocks: exact=1!
-> Bitmap Index Scan on category_documents_path_data_idx
(cost=0.00..37.68 rows=1824 width=0) (actual time=0.012..0.012
rows=1 loops=1)!
Index Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)!
Planning time: 0.070 ms!
Execution time: 0.043 ms
96
JSONB + Wikipedia Categories:
By the Numbers
• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• category_documents (JSONB) - 325MB
• Index Size for “title”
• category - 89MB
• category_documents (JSON with one key using an expression index) - 89MB
• category_documents (JSONB, all GIN ops) - 311MB
• category_documents (JSONB, just @>) - 203MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents (JSON with one key using an expression index) - 0.070ms
• category_documents (JSONB, all GIN ops) - 0.115ms
• category_documents (JSONB, just @>) - 0.045ms
97
Wow
• That was a lot of material
98
In Summary
• PostgreSQL has a lot of advanced data types
• They are easy to access
• They have a lot of functionality around them
• They are durable
• They perform well (but of course must be used correctly)
• Furthermore, you can extend PostgreSQL to:
• Better manipulate your favorite data type
• Create more data types
• ...well, do basically what you want it to do
99
And That's All
• Thank You!
• Questions?
• @jkatz05
100

More Related Content

What's hot

Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Ontico
 
Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgressyerram
 
PostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDPostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDOleg Bartunov
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLArtur Zakirov
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTomek Borek
 
Jsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportJsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportAlexander Korotkov
 
Geospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexAndrés de la Peña
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...Databricks
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkMartin Goodson
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 

What's hot (19)

Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgres
 
PostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACIDPostgreSQL 9.4: NoSQL on ACID
PostgreSQL 9.4: NoSQL on ACID
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new people
 
Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
 
Jsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportJsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing support
 
Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !
 
Geospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene index
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 

Similar to On Beyond (PostgreSQL) Data Types

Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and PolynomialAroosa Rajput
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for SpeedYung-Yu Chen
 
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~Atsushi Torikoshi
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
Row patternmatching12ctech14
Row patternmatching12ctech14Row patternmatching12ctech14
Row patternmatching12ctech14stewashton
 
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)Makoto Yamazaki
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35Bilal Ahmed
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cStew Ashton
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisАртём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisSpbDotNet Community
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1aravindangc
 
Lecture 3.1 to 3.2 bt
Lecture 3.1 to 3.2 btLecture 3.1 to 3.2 bt
Lecture 3.1 to 3.2 btbtmathematics
 

Similar to On Beyond (PostgreSQL) Data Types (20)

Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
Row patternmatching12ctech14
Row patternmatching12ctech14Row patternmatching12ctech14
Row patternmatching12ctech14
 
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12c
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisАртём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
dld 01-introduction
dld 01-introductiondld 01-introduction
dld 01-introduction
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1
 
Lecture 3.1 to 3.2 bt
Lecture 3.1 to 3.2 btLecture 3.1 to 3.2 bt
Lecture 3.1 to 3.2 bt
 

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 

More from Jonathan Katz (13)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

On Beyond (PostgreSQL) Data Types

  • 1. On Beyond Data Types Jonathan S. Katz PostgreSQL España February 16, 2015
  • 2. About • CTO, VenueBook • Co-Organizer, NYC PostgreSQL User Group (NYCPUG) • Director, United States PostgreSQL Association • ¡Primera vez en España! • @jkatz05 2
  • 3. A Brief Note on NYCPUG • Active since 2010 • Over 1,300 members • Monthly Meetups • PGConf NYC 2014 • 259 attendees • PGConf US 2015: • Mar 25 - 27 @ New York Marriott Downtown • Already 160+ registrations 3
  • 7.
  • 8. 8
  • 9. Data Types • Fundamental • 0 => 1 • 00001111 • Building Blocks • 0x41424344 • Accessibility • 1094861636 • 'ABCD' 9
  • 10. C • char • int • float, double • bool (C99) • (short, long, signed, unsigned) 10
  • 11. PostgreSQL • char, varchar, text • smallint, int, bigint • real, double • bool 11
  • 12. I kid you not, I can spend close to an hour on just those data types 12
  • 13. PostgreSQL Primitives Oversimplified Summary • Strings • Use "text" unless you need actual limit on strings, o/w use "varchar" • Don't use "char" • Integers • Use "int" • If you seriously have big numbers, use "bigint" • Numerical types • Use "numeric" almost always • If have IEEE 754 data source you need to record, use "float" 13
  • 14. And If We Had More Time • (argh no pun intended) • timestamp with time zone, timestamp without time zone • date • time with time zone, time without time zone • interval 14
  • 15. Summary of PostgreSQL Date/Time Types • They are AWESOME • Flexible input that you can customize • Can perform mathematical operations in native format • Thank you intervals! • IMO better support than most programming languages have, let alone databases 15
  • 16. 16
  • 17. PostgreSQL is a ORDBMS • Designed to support more complex data types • Complex data types => additional functionality • Data Integrity • Performance 17
  • 18. Let's Start Easy: Geometry 18
  • 19. PostgreSQL Geometric Types 19 Name Size Representation Format point 16 bytes point on a plane (x,y) lseg 32 bytes finite line segment ((x1, y1), (x2, y2)) box 32 bytes rectangular box ((x1, y1), (x2, y2)) path 16 + 16n bytes closed path (similar to polygon, n = total points ((x1, y1), (x2, y2), …, (xn, yn)) path 16 + 16n bytes open path, n = total points [(x1, y1), (x2, y2), …, (xn, yn)] polygon 40 bytes + 16n polygon ((x1, y1), (x2, y2), …, (xn, yn)) circle 24 bytes circle – center point and radius <(x, y), r> http://www.postgresql.org/docs/current/static/datatype-geometric.html
  • 20. Geometric Operators • 31 different operators built into PostgreSQL 20 obdt=# SELECT point(1,1) + point(2,2);! ----------! (3,3) obdt=# SELECT point(1,1) ~= point(2,2);! ----------! f! ! obdt=# SELECT point(1,1) ~= point(1,1);! ----------! t obdt=# SELECT point(1,1) <-> point(4,4);! ------------------! 4.24264068711929 Equivalence Translation Distance http://www.postgresql.org/docs/current/static/functions-geometry.html
  • 21. Geometric Operators 21 obdt=# SELECT '(0,0),5)'::circle && '((2,2),3)'::circle;! ----------! t obdt=# SELECT '(0,0),5)'::circle @> point(2,2);! ----------! t Overlapping Containment obdt=# SELECT '((0,0), (1,1))'::lseg ?|| '((1,-1), (2,0))'::lseg; ----------! t Is Parallel? http://www.postgresql.org/docs/current/static/functions-geometry.html obdt=# SELECT '((0,0), (1,1))'::lseg ?# '((0,0), (5,5))'::box;! ----------! t Intersection
  • 22. Geometric Functions • 13 non-type conversion functions built into PostgreSQL 22 obdt=# SELECT area('((0,0),5)'::circle);! ------------------! 78.5398163397448 Area obdt=# SELECT center('((0,0),(5,5))'::box);! -----------! (2.5,2.5) Center obdt=# SELECT length('((0,0),(5,5))'::lseg);! ------------------! 7.07106781186548 Length obdt=# SELECT width('((0,0),(3,2))'::box);! -------! 3 obdt=# SELECT height('((0,0),(3,2))'::box);! --------! 2 Width Height
  • 23. Geometric Performance • Size on Disk • Consider I/O on reads • But indexing should help!! 23
  • 24. Geometric Performance 24 CREATE TABLE houses (plot box);! ! INSERT INTO houses! SELECT box(! ! point((500 * random())::int, (500 * random())::int),! ! point((750 * random() + 500)::int, (750 * random() + 500)::int)! )! FROM generate_series(1, 1000000); obdt=# CREATE INDEX houses_plot_idx ON houses (plot);! ERROR: data type box has no default operator class for access method "btree"! HINT: You must specify an operator class for the index or define a default operator class for the data type.
  • 25. Solution #1: Expression Indexes 25 obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;! -------------! Seq Scan on houses (cost=0.00..27353.00 rows=5000 width=32) (actual time=0.077..214.431 rows=26272 loops=1)! Filter: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! Rows Removed by Filter: 973728! Total runtime: 215.965 ms obdt=# CREATE INDEX houses_plot_area_idx ON houses (area(plot));! ! obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;! ------------! Bitmap Heap Scan on houses (cost=107.68..7159.38 rows=5000 width=32) (actual time=5.433..14.686 rows=26272 loops=1)! Recheck Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! -> Bitmap Index Scan on houses_plot_area_idx (cost=0.00..106.43 rows=5000 width=0) (actual time=4.300..4.300 rows=26272 loops=1)! Index Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! Total runtime: 16.025 ms http://www.postgresql.org/docs/current/static/indexes-expressional.html
  • 26. Solution #2: GiST Indexes 26 obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;! ------------! Seq Scan on houses (cost=0.00..19853.00 rows=1000 width=32) (actual time=0.009..96.680 rows=40520 loops=1)! Filter: (plot @> '(300,300),(100,100)'::box)! Rows Removed by Filter: 959480! Total runtime: 98.662 ms obdt=# CREATE INDEX houses_plot_gist_idx ON houses USING gist(plot);! ! obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;! ------------! Bitmap Heap Scan on houses (cost=56.16..2813.20 rows=1000 width=32) (actual time=12.053..24.468 rows=40520 loops=1)! Recheck Cond: (plot @> '(300,300),(100,100)'::box)! -> Bitmap Index Scan on houses_plot_gist_idx (cost=0.00..55.91 rows=1000 width=0) (actual time=10.700..10.700 rows=40520 loops=1)! Index Cond: (plot @> '(300,300),(100,100)'::box)! Total runtime: 26.451 ms http://www.postgresql.org/docs/current/static/indexes-types.html
  • 27. Solution #2+: KNN-Gist 27 obdt=# CREATE INDEX locations_geocode_gist_idx ON locations USING gist(geocode);! ! obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;! ------------! Limit (cost=0.29..1.06 rows=10 width=16) (actual time=0.098..0.235 rows=10 loops=1)! -> Index Scan using locations_geocode_gist_idx on locations (cost=0.29..77936.29 rows=1000000 width=16) (actual time=0.097..0.234 rows=10 loops=1)! Order By: (geocode <-> '(41.88853,-87.628852)'::point)! Total runtime: 0.257 ms obdt=# CREATE TABLE locations (geocode point);! ! obdt=# INSERT INTO locations! SELECT point(90 * random(), 180 * random())! FROM generate_series(1, 1000000); obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;! ------------! Limit (cost=39519.39..39519.42 rows=10 width=16) (actual time=319.306..319.309 rows=10 loops=1)! -> Sort (cost=39519.39..42019.67 rows=1000110 width=16) (actual time=319.305..319.307 rows=10 loops=1)! Sort Key: ((geocode <-> '(41.88853,-87.628852)'::point))! Sort Method: top-N heapsort Memory: 25kB! -> Seq Scan on locations (cost=0.00..17907.38 rows=1000110 width=16) (actual time=0.019..189.687 rows=1000000 loops=1)! Total runtime: 319.332 ms http://www.slideshare.net/jkatz05/knn-39127023
  • 28. • For when you are doing real things with shapes 28 • (and geographic information systems) Solution #3: PostGIS
  • 29. For more on PostGIS, please go back in time to yesterday and see Regina & Leo's tutorial 29
  • 30. Let's Take a Break With UUIDs 30 2024e06c-44ff-5047-b1ae-00def276d043
  • 31. ! Universally Unique Identifiers ! 16 bytes on disk ! Acceptable input formats include: – A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11 – {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11} – a0eebc999c0b4ef8bb6d6bb9bd380a11 – a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11 – {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11} UUID + PostgreSQL 31 http://www.postgresql.org/docs/current/static/datatype-uuid.html
  • 32. UUID Functions 32 http://www.postgresql.org/docs/current/static/uuid-ossp.html obdt=# CREATE EXTENSION IF NOT EXISTS "uuid-ossp";! " obdt=# SELECT uuid_generate_v1();! uuid_generate_v1 ! --------------------------------------! d2729728-3d50-11e4-b1af-005056b75e1e! " obdt=# SELECT uuid_generate_v1mc();! uuid_generate_v1mc ! --------------------------------------! e04668a2-3d50-11e4-b1b0-1355d5584528! " obdt=# SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresopen.org');! uuid_generate_v3 ! --------------------------------------! d0bc1ba2-bf07-312f-bf6a-436e18b5b046! " obdt=# SELECT uuid_generate_v4();! uuid_generate_v4 ! --------------------------------------! 0809d8fe-512c-4f02-ba37-bc2e9865e884! " obdt=# SELECT uuid_generate_v5(uuid_ns_url(), 'http://www.postgresopen.org');! uuid_generate_v5 ! --------------------------------------! d508c779-da5c-5998-bd88-8d76d446754e
  • 33. Network Address Types • inet (IPv4 & IPv6) – SELECT '192.168.1.1'::inet; – SELECT '192.168.1.1/32'::inet; – SELECT '192.168.1.1/24'::inet; • cidr (IPv4 & IPv6) – SELECT '192.168.1.1'::cidr; – SELECT '192.168.1.1/32'::cidr; – SELECT '192.168.1.1/24'::cidr; • macaddr – SELECT '08:00:2b:01:02:03'::macaddr; 33 http://www.postgresql.org/docs/current/static/datatype-net-types.html
  • 34. Networks can do Math 34 http://www.postgresql.org/docs/current/static/functions-net.html
  • 35. Postgres Can Help Manage Your Routing Tables 35 http://www.postgresql.org/docs/current/static/functions-net.html ...perhaps with a foreign data wrapper and a background worker, perhaps it can fully mange your routing tables?
  • 36. Arrays • ...because a database is an "array" of tuples • ...and a "tuple" is kind of like an array • ...can we have an array within a tuple? 36
  • 37. 37
  • 38. Array Facts obdt=# SELECT (ARRAY[1,2,3])[1];! -------! 1 38 obdt=# SELECT (ARRAY[1,2,3])[0];! -------! Arrays are 1-indexed obdt=# CREATE TABLE lotto (! ! ! numbers int[3]! );! " obdt=# INSERT INTO lotto VALUES (! ! ARRAY[1,2,3,4]! );! " obdt=# SELECT * FROM lotto;! -----------! {1,2,3,4} Size constraints not enforced
  • 39. Arrays Are Malleable 39 obdt=# UPDATE lotto SET numbers = ARRAY[1,2,3];! " obdt=# SELECT * FROM lotto;! ---------! {1,2,3}! " obdt=# UPDATE lotto SET numbers[3] = '7';! " obdt=# SELECT * FROM lotto;! ---------! {1,2,7}! " obdt=# UPDATE lotto SET numbers[1:2] = ARRAY[6,5];! " obdt=# SELECT * FROM lotto;! ---------! {6,5,7}
  • 40. Array Operations • <, <=, =, >= >, <> – full array comparisons – B-tree indexable 40 SELECT ARRAY[1,2,3] @> ARRAY[1,2];! SELECT ARRAY[1,2] <@ ARRAY[1,2,3]; SELECT ARRAY[1,2,3] || ARRAY[3,4,5];! SELECT ARRAY[ARRAY[1,2]] || ARRAY[3,4];! SELECT ARRAY[1,2,3] || 4; SELECT ARRAY[1,2,3] && ARRAY[3,4,5]; Overlaps Containment Concatenation
  • 41. Integer Arrays Use GIN 41 obdt=# CREATE INDEX int_arrays_data_gin_idx ON int_arrays USING GIN(data);! " obdt=# EXPLAIN ANALYZE SELECT *! FROM int_arrays! WHERE 5432 = ANY (data);! ---------------! Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual time=1.237..157.397 rows=3 loops=1)! Filter: (5432 = ANY (data))! Rows Removed by Filter: 999997! Total runtime: 157.419 ms! " obdt=# EXPLAIN ANALYZE SELECT * FROM int_arrays WHERE ARRAY[5432] <@ data;! ---------------! Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33) (actual time=0.023..0.024 rows=3 loops=1)! Recheck Cond: ('{5432}'::integer[] <@ data)! -> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50 rows=5000 width=0) (actual time=0.019..0.019 rows=3 loops=1)! Index Cond: ('{5432}'::integer[] <@ data)! Total runtime: 0.090 ms
  • 42. Array Functions • modification ! SELECT array_append(ARRAY[1,2,3], 4);! ! SELECT array_prepend(1, ARRAY[2,3,4]);! ! SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);! ! SELECT array_remove(ARRAY[1,2,1,3], 1);! ! SELECT array_replace(ARRAY[1,2,1,3], 1, -4);! • size ! SELECT array_length(ARRAY[1,2,3,4], 1); -- 4! ! SELECT array_ndims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);! ! -- 2! ! SELECT array_dims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);! ! -- [1:2][1:2] 42 http://www.postgresql.org/docs/current/static/functions-array.html
  • 43. Array Functions 43 obdt=# SELECT array_to_string(ARRAY[1,2,NULL,4], ',', '*');! -----------------! 1,2,*,4 obdt=# SELECT unnest(ARRAY[1,2,3]);! unnest ! --------! 1! 2! 3 Array to String Array to Set http://www.postgresql.org/docs/current/static/functions-array.html
  • 44. array_agg • useful for variable-length lists or "unknown # of columns" obdt=# SELECT! ! t.title!! array_agg(s.full_name)! FROM talk t!JOIN speakers_talks st ON st.talk_id = t.id!JOIN speaker s ON s.id = st.speaker_id!GROUP BY t.title;! " title | array_agg ! ---------------------+-----------! Data Types | {Jonathan, Jim}! Administration | {Bruce}! User Groups | {Josh, Jonathan, Magnus} 44 http://www.postgresql.org/docs/current/static/functions-array.html
  • 45. Ranges • Scheduling • Probability • Measurements • Financial applications • Clinical trial data • Intersections of ordered data 45
  • 46. Why Range Overlaps Are Difficult 46
  • 47. Before Postgres 9.2 • OVERLAPS " " " • Limitations: • Only date/time • Start <= x <= End SELECT! ! ('2013-01-08`::date, '2013-01-10'::date) OVERLAPS ('2013-01-09'::date, '2013-01-12'::date); 47
  • 48. Postgres 9.2+ • INT4RANGE (integer)! • INT8RANGE (bigint)! • NUMRANGE (numeric)! • TSRANGE (timestamp without time zone)! • TSTZRANGE (timestamp with time zone)! • DATERANGE (date) 48 http://www.postgresql.org/docs/current/static/rangetypes.html
  • 49. Range Type Size • Size on disk = 2 * (data type) + 1 • sometimes magic if bounds are equal obdt=# SELECT pg_column_size(daterange(CURRENT_DATE, CURRENT_DATE));! ----------------! 9! " obdt=# SELECT pg_column_size(daterange(CURRENT_DATE,CURRENT_DATE + 1));! ----------------! 17 49
  • 50. Range Bounds • Ranges can be inclusive, exclusive or both • [2,4] => 2 ≤ x ≤ 4 • [2,4) => 2 ≤ x < 4 • (2,4] => 2 < x ≤ 4 • (2,4) => 2 < x < 4 " • Can also be empty 50
  • 51. Infinite Ranges • Ranges can be infinite – [2,) => 2 ≤ x < ∞   – (,2] => -∞ < x ≤ 2   • CAVEAT EMPTOR – “infinity” has special meaning with timestamp ranges – [CURRENT_TIMESTAMP,) = [CURRENT_TIMESTAMP,]   – [CURRENT_TIMESTAMP, 'infinity') <> [CURRENT_TIMEAMP, 'infinity'] 51
  • 52. Constructing Ranges obdt=# SELECT '[1,10]'::int4range;! -----------! [1,11) 52
  • 53. Constructing Ranges • Constructor defaults to '[)' 53 obdt=# SELECT numrange(9.0, 9.5); ! ------------! [9.0,9.5)
  • 54. Finding Overlapping Ranges obdt=# SELECT *! FROM cars! WHERE cars.price_range && int4range(13000, 15000, '[]')! ORDER BY lower(cars.price_range);! -----------! id | name | price_range ! ----+---------------------+---------------! 5 | Ford Mustang | [11000,15001)! 6 | Lincoln Continental | [12000,14001) 54 http://www.postgresql.org/docs/current/static/functions-range.html
  • 55. Ranges + GiST obdt=# CREATE INDEX ranges_bounds_gist_idx ON cars USING gist (bounds);! " obdt=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE int4range(500,1000) && bounds;! ------------! Bitmap Heap Scan on ranges ! (actual time=0.283..0.370 rows=653 loops=1)! Recheck Cond: ('[500,1000)'::int4range && bounds)! -> Bitmap Index Scan on ranges_bounds_gist_idx (actual time=0.275..0.275 rows=653 loops=1)! Index Cond: ('[500,1000)'::int4range && bounds)! Total runtime: 0.435 ms 55
  • 56. Large Search Range? test=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE int4range(10000,1000000) && bounds;! QUERY PLAN -------------! Bitmap Heap Scan on ranges! (actual time=184.028..270.323 rows=993068 loops=1)! Recheck Cond: ('[10000,1000000)'::int4range && bounds)! -> Bitmap Index Scan on ranges_bounds_gist_idx ! ! (actual time=183.060..183.060 rows=993068 loops=1)! Index Cond: ('[10000,1000000)'::int4range && bounds)! Total runtime: 313.743 ms 56
  • 57. SP-GiST • space-partitioned generalized search tree • ideal for non-balanced data structures – k-d trees, quad-trees, suffix trees – divides search space into partitions of unequal size • matching partitioning rule = fast search • traditionally for "in-memory" transactions, converted to play nicely with I/O 57 http://www.postgresql.org/docs/9.3/static/spgist.html
  • 58. GiST  vs  SP-­‐GiST:  Space GiST Clustered SP-GiST Clustered GiST Sparse SP-GiST Sparse 100K Size 6MB 5MB 6MB 11MB 100K Time 0.5s .4s 2.5s 7.8s 250K Size 15MB 12MB 15MB 28MB 250K Time 1.5s 1.1s 6.3s 47.2s 500K Size 30MB 25MB 30MB 55MB 500K Time 3.1s 3.0s 13.9s 192s 1MM Size 59MB 52MB 60MB 110MB 1MM Time 5.1s 5.7s 29.2 777s 58
  • 59. Scheduling obdt=# CREATE TABLE travel_log (! id serial PRIMARY KEY,! name varchar(255),! travel_range daterange,! EXCLUDE USING gist (travel_range WITH &&)! );! " obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Chicago', daterange('2012-03-12', '2012-03-17'));! " obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Austin', daterange('2012-03-16', '2012-03-18'));! " ERROR: conflicting key value violates exclusion constraint "travel_log_trip_range_excl"! DETAIL: Key (trip_range)=([2012-03-16,2012-03-18)) conflicts with existing key (trip_range)=([2012-03-12,2012-03-17)). 59
  • 60. Extending Ranges obdt=# CREATE TYPE inetrange AS RANGE (! ! SUBTYPE = inet! );! " obdt=# SELECT '192.168.1.8'::inet <@ inetrange('192.168.1.1', '192.168.1.10');! ----------! t! " obdt=# SELECT '192.168.1.20'::inet <@ inetrange('192.168.1.1', '192.168.1.10');! ----------! f 60
  • 61. Now For Something Unrelated Let's talk non-relational data in PostgreSQL 61
  • 62. hstore • key-value store in PostgreSQL • binary storage • key / values represented as strings when querying CREATE EXTENSION hstore; SELECT 'jk=>1, jm=>2'::hstore; ! --------------------! "jk"=>"1", "jm"=>"2" 62 http://www.postgresql.org/docs/current/static/hstore.html
  • 63. Making hstore objects obdt=# SELECT hstore(ARRAY['jk', 'jm'], ARRAY['1', '2']);! ---------------------! "jk"=>"1", "jm"=>"2"! " obdt=# SELECT hstore(ARRAY['jk', '1', 'jm', '2']);! ---------------------! "jk"=>"1", "jm"=>"2"! " obdt=# SELECT hstore(ROW('jk', 'jm'));! ---------------------! "f1"=>"jk", "f2"=>"jm" 63
  • 64. Accessing hstore obdt=# SELECT ('jk=>1, jm=>2'::hstore) -> 'jk';! ----------! 1! " obdt=# SELECT ('jk=>1, jm=>2'::hstore) -> ARRAY['jk','jm'];! ----------! {1,2}! " obdt=# SELECT delete('jk=>1, jm=>2'::hstore, 'jm');! -----------! "jk"=>"1" 64
  • 65. hstore operators obdt=# SELECT ('jk=>1, jm=>2'::hstore) @> 'jk=>1'::hstore;! ----------! t! " obdt=# SELECT ('jk=>1, jm=>2'::hstore) ? 'sf';! ----------! f! " obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?& ARRAY['jk', 'sf'];! ----------! f! " obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?| ARRAY['jk', 'sf'];! ----------! t 65
  • 66. hstore Performance 66 obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';! -----------------------! Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.071..214.007 rows=1 loops=1)! Filter: (data ? '3'::text)! Rows Removed by Filter: 999999! Total runtime: 214.028 ms obdt=# CREATE INDEX keypairs_data_gin_idx ON keypairs USING gin(data);! " obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';! --------------! Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual time=0.046..0.046 rows=1 loops=1)! Recheck Cond: (data ? '3'::text)! -> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual time=0.041..0.041 rows=1 loops=1)! Index Cond: (data ? '3'::text)! Total runtime: 0.073 ms
  • 67. JSON and PostgreSQL • Started in 2010 as a Google Summer of Code Project • https://wiki.postgresql.org/wiki/ JSON_datatype_GSoC_2010 • Goal: • be similar to XML data type functionality in Postgres • be committed as an extension for PostgreSQL 9.1 67
  • 68. What Happened? • Different proposals over how to finalize the implementation • binary vs. text • Core vs Extension • Discussions between “old” vs. “new” ways of packaging for extensions 68
  • 71. PostgreSQL 9.2: JSON • JSON data type in core PostgreSQL • based on RFC 4627 • only “strictly” follows if your database encoding is UTF-8 • text-based format • checks for validity 71
  • 72. PostgreSQL 9.2: JSON obdt=# SELECT '[{"PUG": "NYC"}]'::json;! ------------------! [{"PUG": "NYC"}]! " " obdt=# SELECT '[{"PUG": "NYC"]'::json;! ERROR: invalid input syntax for type json at character 8! DETAIL: Expected "," or "}", but found "]".! CONTEXT: JSON data, line 1: [{"PUG": "NYC"] 72 http://www.postgresql.org/docs/current/static/datatype-json.html
  • 73. PostgreSQL 9.2: JSON • array_to_json 73 obdt=# SELECT array_to_json(ARRAY[1,2,3]);! ---------------! [1,2,3]
  • 74. PostgreSQL 9.2: JSON • row_to_json 74 obdt=# SELECT row_to_json(category) FROM category;! ------------! {"cat_id":652,"cat_pages":35,"cat_subcats": 17,"cat_files":0,"title":"Continents"}
  • 75. PostgreSQL 9.2: JSON • In summary, within core PostgreSQL, it was a starting point 75
  • 76. PostgreSQL 9.3: JSON Ups its Game • Added operators and functions to read / prepare JSON • Added casts from hstore to JSON 76
  • 77. PostgreSQL 9.3: JSON Operator Description Example -> return JSON array element OR JSON object field ’[1,2,3]’::json -> 0; ’{"a": 1, "b": 2, "c": 3}’::json -> ’b’; ->> return JSON array element OR JSON object field AS text [’1,2,3]’::json ->> 0; ’{"a": 1, "b": 2, "c": 3}’::json ->> ’b’; #> return JSON object using path ’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’; #>> return JSON object using path AS text ’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’; 77 http://www.postgresql.org/docs/current/static/functions-json.html
  • 78. Operator Gotchas SELECT * FROM category_documents! WHERE data->’title’ = ’PostgreSQL’;! ERROR: operator does not exist: json = unknown! LINE 1: ...ECT * FROM category_documents WHERE data->’title’ = ’Postgre... ^HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. 78
  • 79. Operator Gotchas SELECT * FROM category_documents! WHERE data->>’title’ = ’PostgreSQL’;! -----------------------! {"cat_id":252739,"cat_pages":14,"cat_subcats":0,"cat_files": 0,"title":"PostgreSQL"}! (1 row) 79
  • 80. For the Upcoming Examples • Wikipedia English category titles – all 1,823,644 that I downloaded" • Relation looks something like: 80 Column | Type | Modifiers ! -------------+---------+--------------------! cat_id | integer | not null! cat_pages | integer | not null default 0! cat_subcats | integer | not null default 0! cat_files | integer | not null default 0! title | text |
  • 81. Performance? EXPLAIN ANALYZE SELECT * FROM category_documents! WHERE data->>’title’ = ’PostgreSQL’;! ---------------------! Seq Scan on category_documents (cost=0.00..57894.18 rows=9160 width=32) (actual time=360.083..2712.094 rows=1 loops=1)! Filter: ((data ->> ’title’::text) = ’PostgreSQL’::text)! Rows Removed by Filter: 1823643! Total runtime: 2712.127 ms 81
  • 82. Performance? CREATE INDEX category_documents_idx ON category_documents (data);! ERROR: data type json has no default operator class for access method "btree"! HINT: You must specify an operator class for the index or define a default operator class for the data type. 82
  • 83. Let’s Be Clever • json_extract_path, json_extract_path_text • LIKE (#>, #>>) but with list of args 83 SELECT json_extract_path(! ! ’{"a": 1, "b": 2, "c": [1,2,3]}’::json,! ! ’c’, ’0’);! --------! 1
  • 84. Performance Revisited CREATE INDEX category_documents_data_idx! ON category_documents! ! (json_extract_path_text(data, ’title’));! " obdt=# EXPLAIN ANALYZE! SELECT * FROM category_documents! WHERE json_extract_path_text(data, ’title’) = ’PostgreSQL’;! ------------! Bitmap Heap Scan on category_documents (cost=303.09..20011.96 rows=9118 width=32) (actual time=0.090..0.091 rows=1 loops=1)! Recheck Cond: (json_extract_path_text(data, VARIADIC ’{title}’::text[]) = ’PostgreSQL’::text)! -> Bitmap Index Scan on category_documents_data_idx (cost=0.00..300.81 rows=9118 width=0) (actual time=0.086..0.086 rows=1 loops=1)! Index Cond: (json_extract_path_text(data, VARIADIC ’{title}’::text[]) = ’PostgreSQL’::text)! " Total runtime: 0.105 ms! 84
  • 85. The Relation vs JSON • Size on Disk • category (relation) - 136MB • category_documents (JSON) - 238MB • Index Size for “title” • category - 89MB • category_documents - 89MB • Average Performance for looking up “PostgreSQL” • category - 0.065ms • category_documents - 0.070ms 85
  • 86. JSON Aggregates • (this is pretty cool) • json_agg 86 http://www.postgresql.org/docs/current/static/functions-json.html SELECT b, json_agg(stuff)! FROM stuff! GROUP BY b;! " b | json_agg ! ------+----------------------------------! neat | [{"a":4,"b":"neat","c":[4,5,6]}]! wow | [{"a":1,"b":"wow","c":[1,2,3]}, +! | {"a":3,"b":"wow","c":[7,8,9]}]! cool | [{"a":2,"b":"cool","c":[4,5,6]}]
  • 87. hstore gets in the game • hstore_to_json • converts hstore to json, treating all values as strings • hstore_to_json_loose • converts hstore to json, but also tries to distinguish between data types and “convert” them to proper JSON representations SELECT hstore_to_json_loose(’"a key"=>1, b=>t, c=>null, d=>12345, e=>012345, f=>1.234, g=>2.345e+4’); ---------------- {"b": true, "c": null, "d": 12345, "e": "012345", "f": 1.234, "g": 2.345e+4, "a key": 1} 87
  • 88. Next Steps? • In PostgreSQL 9.3, JSON became much more useful, but… • Difficult to search within JSON • Difficult to build new JSON objects 88
  • 89. 89
  • 90. “Nested hstore” • Proposed at PGCon 2013 by Oleg Bartunov and Teodor Sigaev • Hierarchical key-value storage system that supports arrays too and stored in binary format • Takes advantage of GIN indexing mechanism in PostgreSQL • “Generalized Inverted Index” • Built to search within composite objects • Arrays, fulltext search, hstore • …JSON? 90 http://www.pgcon.org/2013/schedule/attachments/280_hstore-pgcon-2013.pdf
  • 91. How JSONB Came to Be • JSON is the “lingua franca per trasmissione la data nella web” • The PostgreSQL JSON type was in a text format and preserved text exactly as input • e.g. duplicate keys are preserved • Create a new data type that merges the nested Hstore work to create a JSON type stored in a binary format: JSONB 91
  • 92. JSONB ≠ BSON BSON is a data type created by MongoDB as a “superset of JSON” " JSONB lives in PostgreSQL and is just JSON that is stored in a binary format on disk 92
  • 93. JSONB Gives Us More Operators • a @> b - is b contained within a? • { "a": 1, "b": 2 } @> { "a": 1} -- TRUE! • a <@ b - is a contained within b? • { "a": 1 } <@ { "a": 1, "b": 2 } -- TRUE! • a ? b - does the key “b” exist in JSONB a? • { "a": 1, "b": 2 } ? 'a' -- TRUE! • a ?| b - does the array of keys in “b” exist in JSONB a? • { "a": 1, "b": 2 } ?| ARRAY['b', 'c'] -- TRUE! • a ?& b - does the array of keys in "b" exist in JSONB a? • { "a": 1, "b": 2 } ?& ARRAY['a', 'b'] -- TRUE 93
  • 94. JSONB Gives us GIN • Recall - GIN indexes are used to "look inside" objects • JSONB has two flavors of GIN: • Standard - supports @>, ?, ?|, ?& " • "Path Ops" - supports only @> 94 CREATE INDEX category_documents_data_idx USING gin(data); CREATE INDEX category_documents_path_data_idx USING gin(data jsonb_path_ops);
  • 95. JSONB Gives Us Flexibility obdt=# SELECT * FROM category_documents WHERE! ! data @> '{"title": "PostgreSQL"}';! " ----------------! {"title": "PostgreSQL", "cat_id": 252739, "cat_files": 0, "cat_pages": 14, "cat_subcats": 0}! " " obdt=# SELECT * FROM category_documents WHERE! ! data @> '{"cat_id": 5432 }';! " ----------------! {"title": "1394 establishments", "cat_id": 5432, "cat_files": 0, "cat_pages": 4, "cat_subcats": 2} 95
  • 96. JSONB Gives Us Speed EXPLAIN ANALYZE SELECT * FROM category_documents! ! WHERE data @> '{"title": "PostgreSQL"}';! ------------! Bitmap Heap Scan on category_documents (cost=38.13..6091.65 rows=1824 width=153) (actual time=0.021..0.022 rows=1 loops=1)! Recheck Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)! Heap Blocks: exact=1! -> Bitmap Index Scan on category_documents_path_data_idx (cost=0.00..37.68 rows=1824 width=0) (actual time=0.012..0.012 rows=1 loops=1)! Index Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)! Planning time: 0.070 ms! Execution time: 0.043 ms 96
  • 97. JSONB + Wikipedia Categories: By the Numbers • Size on Disk • category (relation) - 136MB • category_documents (JSON) - 238MB • category_documents (JSONB) - 325MB • Index Size for “title” • category - 89MB • category_documents (JSON with one key using an expression index) - 89MB • category_documents (JSONB, all GIN ops) - 311MB • category_documents (JSONB, just @>) - 203MB • Average Performance for looking up “PostgreSQL” • category - 0.065ms • category_documents (JSON with one key using an expression index) - 0.070ms • category_documents (JSONB, all GIN ops) - 0.115ms • category_documents (JSONB, just @>) - 0.045ms 97
  • 98. Wow • That was a lot of material 98
  • 99. In Summary • PostgreSQL has a lot of advanced data types • They are easy to access • They have a lot of functionality around them • They are durable • They perform well (but of course must be used correctly) • Furthermore, you can extend PostgreSQL to: • Better manipulate your favorite data type • Create more data types • ...well, do basically what you want it to do 99
  • 100. And That's All • Thank You! • Questions? • @jkatz05 100