This document summarizes managing temporal data in PostgreSQL. It discusses period temporal data using range types, which allow convenient representation of time periods. It also discusses time series data management using TimescaleDB, a PostgreSQL extension that provides functions for partitioning, compression, and interpolating time series data to improve performance. The talk provides an overview of these tools and highlights areas the presenter is working to further improve temporal data management in PostgreSQL.
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf
1. Managing Temporal Data in PostgreSQL
Anton Dignös
Free University of Bozen-Bolzano
SFScon 2022
November 11, 2022
AUTONOME
PROVINZ
BOZEN
SÜDTIROL
PROVINCIA
AUTONOMA
DI BOLZANO
ALTO ADIGE
Research Südtirol/Alto Adige 2019
Project ISTeP
CUP: I52F20000250003
EFRE 2014-2020
Project EFRE1164 PREMISE
CUP: I59C20000340009
2. Agenda
What is temporal data?
Period temporal data in Postgres
Time series data in Postgres
What are we working on?
This talk will only provide a glimpse, if you are interested in more details,
I am happy to talk to you during the conference!
SFScon 2022 2/22 A. Dignös
3. Temporal data
Temporal data can be found in many application
▶ HR contracts
▶ Insurance policies
▶ Tourism data
▶ Medical domain
▶ Stock market data
▶ Industrial data
SFScon 2022 3/22 A. Dignös
4. What is temporal data?
Data with a “timestamp”
+
The “timestamp” indicates the validity of the data
Examples:
▶ A contract with a validity period
▶ A sensor reading with the measurement time
▶ An error event with the happening time
SFScon 2022 4/22 A. Dignös
5. Basic utilities for date/time in Postgres
▶ Postgres provides different date/time datatypes1
▶ Many functions
▶ Operators (+, -)
▶ Calendar functions (EXTRACT, date trunc)
▶ Whoever worked with dates/timezones knows to appreciate these
1
https://www.postgresql.org/docs/current/datatype-datetime.html
SFScon 2022 5/22 A. Dignös
6. Topic of today
Today it is about temporal data, not just storing dates or time
▶ Period temporal data
▶ Contracts
▶ Manufacturing periods
▶ Error states
▶ Time series data
▶ Sensor readings
▶ Stock market data
▶ Error events
Let’s have a peek on what Postgres and it’s ecosystem has to offer!
SFScon 2022 6/22 A. Dignös
7. Highlights for period temporal data in Postgres
▶ Postgres provides range types2 for managing period data
▶ What are range types?
▶ Datatypes for periods ’[start, end)’
▶ Can have different forms – ’[ , )’,’[ , ]’, ’( , ]’, ’( , )’
▶ Available for different types, e.g., INT, NUMERIC, DATE
▶ Many predicates and functions
▶ Indices available (GiST, SP-GiST, btree gist)
▶ Very easy to use
▶ Avoid many programming mistakes
2
https://www.postgresql.org/docs/current/rangetypes.html
SFScon 2022 7/22 A. Dignös
8. An example
Product prices that change over time
CREATE TABLE prices(
product INT ,
period DATERANGE ,
value FLOAT );
INSERT INTO prices
VALUES (1, ’[2021 -08 -01 ,␣2022 -08 -01) ’, 25),
(1, ’[2022 -08 -01 ,) ’, 30),
(2, ’[2021 -08 -01 ,␣2022 -04 -01) ’, 10),
(2, ’[2022 -04 -01 ,) ’, 20);
product | period | value
---------+-------------------------+-------
1 | [2021-08-01,2022-08-01) | 25
1 | [2022-08-01,) | 30
2 | [2021-08-01,2022-04-01) | 10
2 | [2022-04-01,) | 20
SFScon 2022 8/22 A. Dignös
9. Common queries
▶ What are the prices of products today?
WHERE period @> CURRENT_DATE
▶ What were the prices of products on the 2021-10-30?
WHERE period @> ’2021 -10 -30 ’
▶ What were the previous prices of products?
WHERE period << daterange(CURRENT_DATE , NULL , ’[)’)
▶ What were the prices of products between 2021-10-30 and
2022-10-30?
WHERE period && DATERANGE(’2021 -10 -30 ’,’2022 -10 -30 ’, ’[]’)
SFScon 2022 9/22 A. Dignös
10. Uniqueness Constraints
Ensure a product does not have two prices at the same time
CREATE TABLE prices(
product INT ,
period DATERANGE ,
value FLOAT ,
EXCLUDE USING GIST (product WITH =, period WITH &&));
product | period | value
---------+-------------------------+-------
1 | [2021-08-01,2022-08-01) | 25
1 | [2022-08-01,) | 30
2 | [2021-08-01,2022-04-01) | 10
2 | [2022-04-01,) | 20
INSERT INTO product_prices VALUES (1, ’[2022 -08 -04 ,) ’, 100);
ERROR: conflicting key value violates exclusion constraint ...
DETAIL: Key (product, period)=(1, [2022-08-04,)) conflicts ...
SFScon 2022 10/22 A. Dignös
11. Take home messages
▶ Range types is Postgres’ native period datatype
▶ Convenient representation of periods
▶ Many base datatypes are supported
▶ Support different period definitions if needed
▶ Many convenient predicates and functions
▶ Less error prone than custom builds
▶ Can be speed up using GiST indices
▶ Uniqueness constraints available
▶ Avoid inconsistencies at the source
SFScon 2022 11/22 A. Dignös
12. Highlights for time series data in Postgres
▶ TimescaleDB can be used to manage time series in Postgres
▶ What is TimescaleDB?
▶ TimescaleDB is a Postgres extension (based on UDFs)
▶ Runs on server side
▶ License (two versions of TimescaleDB with different support)3
▶ TimescaleDB Apache 2 Edition (Apache 2.0 license)
▶ TimescaleDB Community Edition (Timescale License – TSL)
▶ See https://docs.timescale.com/timescaledb/latest/
timescaledb-edition-comparison
▶ Available for most platforms as a binary or compile form source
3
Thanks to Chris Mair from 1006.org for pointing this out during a previous talk!
SFScon 2022 12/22 A. Dignös
13. What does TimescaleDB do?
Eases the timeseries data management
▶ Convenient timeseries specific functions (hyperfunctions)
▶ Gap-filling and Interpolation
▶ Weighted averages
▶ . . .
▶ Partitioning (hypertables)
▶ Access less data (faster runtime)
▶ Compression
▶ Make data smaller (also faster runtime)
SFScon 2022 13/22 A. Dignös
15. Hyperfunctions/2
Produce a value every five minutes and interpolate missing ones
SELECT time_bucket_gapfill (’5␣minutes ’, time) AS five_min ,
avg(value) AS value , -- average from data
interpolate(avg(value )) -- interpolate average if missing
FROM sensor_signal
WHERE sensor_id = 3 AND time BETWEEN now () - INTERVAL ’20␣min ’
AND now ()
GROUP BY five_min
ORDER BY five_min;
five_min | value | interpolate
---------------------+-------+-------------
2022-11-11 15:40:00 | 16.2 | 16.2
2022-11-11 15:45:00 | | 16
2022-11-11 15:50:00 | 15.8 | 15.8
2022-11-11 15:55:00 | | 11.9
2022-11-11 16:00:00 | 8 | 8
SFScon 2022 15/22 A. Dignös
17. Hypertables/2
Transform our table into a hypertable
SELECT create_hypertable (
’sensor_signal ’,
’time ’,
chunk_time_interval => INTERVAL ’2␣days ’,
partitioning_column => ’sensor_id ’,
number_partitions => 2,
if_not_exists => true ,
migrate_data => true
);
▶ Partition by range on time every two days
▶ Partition by hash on id using 2 partitions
SFScon 2022 17/22 A. Dignös
18. Hypertables/3
▶ Be careful with the partitioning
▶ Relevant partitions are merged using UNION ALL
▶ New data keeps on adding partitions
▶ Example: 100 sensors and 3 years of data
chunk time interval => ’INTERVAL 7 days’
number partitions => 50
Result: potentially 3 · 52 · 50 = 7800 tables!!
SFScon 2022 18/22 A. Dignös
19. Compression
▶ Compression aims at reducing the size of the data
▶ Done at a per chunk (partition) level
▶ Usually also improves query time
▶ Transparent to the user
▶ Done via a TimescaleDB function
SFScon 2022 19/22 A. Dignös
20. Take home messages
▶ Timescale handles timeseries data transparently
▶ For you it is just a relation
▶ SQL will still work as before
▶ Use hyperfunctions
▶ Handy and much faster than custom builds
▶ Keep on improving
▶ Use hypertables
▶ Limit the search space
▶ But be careful with how to partition
▶ Use compression
▶ Improves performance substantially
▶ Should be used on (old) read-only data
SFScon 2022 20/22 A. Dignös
21. What are we working on?
▶ Period temporal data (project: ISTeP4)
▶ Temporal range and overlap joins
▶ Temporal anomalies in healthcare information systems
▶ Temporal key/foreign constraints
▶ Temporal histograms for cardinality estimation
▶ Time series data (project: PREMISE5)
▶ Predictive maintenance for industrial equipment
▶ Data ingestion infrastructure
▶ Data storage infrastructure
▶ Feature extraction
4
https://dbs.inf.unibz.it/projects/istep/
5
https://dbs.inf.unibz.it/projects/premise/
SFScon 2022 21/22 A. Dignös