SlideShare ist ein Scribd-Unternehmen logo
1 von 22
CLICKHOUSE MATERIALIZED VIEWS
A SECRET WEAPON FOR HIGH
PERFORMANCE ANALYTICS
Robert Hodges -- Percona Live 2018 Amsterdam
Introduction to Presenter
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
Is WAY fast!
a b c d
a b c d
a b c d
a b c d
ClickHouse materialized views are triggers
Source tableINSERT
(Trigger)
Target table
Target table
Target table
You can even create recursive views
Source tableINSERT
(Trigger)
Target table
Target table
Target table
Why might a materialized view be useful?
SELECT
toYear(FlightDate) AS year,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM airline.ontime
GROUP BY year ORDER BY year ASC LIMIT 10
...
10 rows in set. Elapsed: 0.894 sec. Processed 173.82
million rows, 1.74 GB (194.52 million rows/s., 1.95
GB/s.)
Can we make it faster?
Let’s precompute and store aggregates!
CREATE MATERIALIZED VIEW ontime_daily_cancelled_mv
ENGINE = SummingMergeTree
PARTITION BY tuple() ORDER BY (FlightDate, Carrier)
POPULATE
AS SELECT
FlightDate, Carrier, count(*) AS flights,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM ontime
GROUP BY FlightDate, Carrier
Query performance impact is significant
SELECT
toYear(FlightDate) AS year,
sum(flights) AS flights,
sum(cancelled) AS cancelled,
sum(delayed_15) AS delayed_15
FROM airline.ontime_daily_cancelled_mv
GROUP BY year ORDER BY year ASC LIMIT 10
. . .
10 rows in set. Elapsed: 0.007 sec. Processed 148.16
thousand rows, 3.85 MB (20.37 million rows/s., 529.50
MB/s.)
What’s going on under the covers?
ontime
(MergeTree)
.inner.ontime_daily_cancelled_mv
(SummingMergeTree)
ontime_daily_cancelled_mv
(materialized view)
INSERT
SELECT
Compressed size: ~14.6GB
Uncompressed size: ~55.4GB
Compressed size: ~2.31 MB
Uncompressed size: ~4.10 MB
SELECT
INSERT
(Trigger)
Materialized view is 6471x
smaller than source table!
A brief study of ClickHouse table structures
CREATE TABLE ontime (
Year UInt16,
Quarter UInt8,
Month UInt8,
...
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (Carrier, FlightDate)
Table engine type
How to break data
into parts
How to index and
sort data in each part
Possible ways to transform tables
ontime
(Materialized
view)
ontime_agg
ontime_resorted
ontime_kafka
Reduce
data
Change
layout
Change
table type
And/Or
And/Or
Exercise: the famous ‘last point problem’
Host
7023
Host
6522
CPU
Utilization
Host
9601
CPU
Utilization
CPU
UtilizationCPU
Utilization
CPU
UtilizationCPU
Utilization
CPU Table
Problem: Show the
current CPU utilization for
each host
CPU
Utilization
CPU
Utilization
ClickHouse can solve this using a subquery
SELECT t.hostname, tags_id, 100 - usage_idle usage
FROM (
SELECT tags_id, usage_idle
FROM cpu
WHERE (tags_id, created_at) IN
(SELECT tags_id, max(created_at)
FROM cpu GROUP BY tags_id)
) AS c
INNER JOIN tags AS t ON c.tags_id = t.id
ORDER BY
usage DESC,
t.hostname ASC
LIMIT 10
TABLE SCAN!
USE INDEX
OPTIMIZED
JOIN COST
SQL queries work but are inefficient
OUTPUT:
┌─hostname──┬─tags_id─┬─usage─┐
│ host_1002 │ 9003 │ 100 │
│ host_1116 │ 9117 │ 100 │
│ host_1141 │ 9142 │ 100 │
│ host_1163 │ 9164 │ 100 │
│ host_1210 │ 9211 │ 100 │
│ host_1216 │ 9217 │ 100 │
│ host_1234 | 9235 │ 100 │
│ host_1308 │ 9309 │ 100 │
│ host_1419 │ 9420 │ 100 │
│ host_1491 │ 9492 │ 100 │
└───────────┴─────────┴───────┘
Using direct query on table:
10 rows in set. Elapsed: 0.566 sec.
Processed 32.87 million rows, 263.13
MB (53.19 million rows/s., 425.81
MB/s.)
Can we bring last
point performance
closer to real-time?
Create target table for aggregate data
CREATE TABLE cpu_last_point_idle_agg (
created_date AggregateFunction(argMax, Date, DateTime),
max_created_at AggregateFunction(max, DateTime),
time AggregateFunction(argMax, String, DateTime),
tags_id UInt32,
usage_idle AggregateFunction(argMax, Float64, DateTime)
)
ENGINE = AggregatingMergeTree()
PARTITION BY tuple()
ORDER BY tags_id
Different table type
Different storage layout
Minimal data
argMaxState links columns with aggregates
CREATE MATERIALIZED VIEW cpu_last_point_idle_mv
TO cpu_last_point_idle_agg
AS SELECT
argMaxState(created_date, created_at) AS created_date,
maxState(created_at) AS max_created_at,
argMaxState(time, created_at) AS time,
tags_id,
argMaxState(usage_idle, created_at) AS usage_idle
FROM cpu
GROUP BY tags_id
Derive data
MV
table
Digression: How aggregation works
created_at
maxState(created_at)
maxMerge(max_created_at)
Source value
Partial
aggregate
Merged
aggregate
Selecting rows that match max value
created_at
maxState(created_at)
avgMerge(created_at)
Source
values
Partial
aggregates
Merged
aggregates
usage_idle
argMaxState(usage_idle,
created_at)
avgMaxMerge(usage_idle)
(Same row)
(Pick usage_idle from
aggregate with
matching created_at)
(Pick usage_idle value
from any row with
matching created_at)
Let’s hide the merge details with a view
CREATE VIEW cpu_last_point_idle_v AS
SELECT
argMaxMerge(created_date) AS created_date,
maxMerge(max_created_at) AS created_at,
argMaxMerge(time) AS time,
tags_id,
argMaxMerge(usage_idle) AS usage_idle
FROM cpu_last_point_idle_mv
GROUP BY tags_id
...Select again from the covering view
SELECT t.hostname, tags_id, 100 - usage_idle usage
FROM cpu_last_point_idle_v AS b
INNER JOIN tags AS t ON b.tags_id = t.id
ORDER BY usage DESC, t.hostname ASC
LIMIT 10
...
10 rows in set. Elapsed: 0.005 sec. Processed 14.00
thousand rows, 391.65 KB (2.97 million rows/s., 82.97
MB/s.)
Last point view is 113 times faster
Common uses for materialized views
● Precompute aggregates
● Fetch last point data
● Transform table on-disk indexing and sorting
○ Like a Vertica projection
● Keep aggregates after raw input is dropped
● Create data cleaning pipelines
● Read from Kafka queues
Thank you!
We’re hiring!
Presenter:
rhodges@altinity.com
ClickHouse:
https://github.com/ClickHouse/ClickHouse
Altinity:
https://www.altinity.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Fun with click house window functions webinar slides 2021-08-19
Fun with click house window functions webinar slides  2021-08-19Fun with click house window functions webinar slides  2021-08-19
Fun with click house window functions webinar slides 2021-08-19
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 

Ähnlich wie ClickHouse materialized views - a secret weapon for high performance analytics, by Robert Hodges, Altinity CEO

Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
Chris Adkin
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 

Ähnlich wie ClickHouse materialized views - a secret weapon for high performance analytics, by Robert Hodges, Altinity CEO (20)

A day in the life of a click house query
A day in the life of a click house queryA day in the life of a click house query
A day in the life of a click house query
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with ClickhouseWebinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 

Mehr von Altinity Ltd

Mehr von Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

ClickHouse materialized views - a secret weapon for high performance analytics, by Robert Hodges, Altinity CEO

  • 1. CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges -- Percona Live 2018 Amsterdam
  • 2. Introduction to Presenter www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast! a b c d a b c d a b c d a b c d
  • 4. ClickHouse materialized views are triggers Source tableINSERT (Trigger) Target table Target table Target table
  • 5. You can even create recursive views Source tableINSERT (Trigger) Target table Target table Target table
  • 6. Why might a materialized view be useful? SELECT toYear(FlightDate) AS year, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM airline.ontime GROUP BY year ORDER BY year ASC LIMIT 10 ... 10 rows in set. Elapsed: 0.894 sec. Processed 173.82 million rows, 1.74 GB (194.52 million rows/s., 1.95 GB/s.) Can we make it faster?
  • 7. Let’s precompute and store aggregates! CREATE MATERIALIZED VIEW ontime_daily_cancelled_mv ENGINE = SummingMergeTree PARTITION BY tuple() ORDER BY (FlightDate, Carrier) POPULATE AS SELECT FlightDate, Carrier, count(*) AS flights, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM ontime GROUP BY FlightDate, Carrier
  • 8. Query performance impact is significant SELECT toYear(FlightDate) AS year, sum(flights) AS flights, sum(cancelled) AS cancelled, sum(delayed_15) AS delayed_15 FROM airline.ontime_daily_cancelled_mv GROUP BY year ORDER BY year ASC LIMIT 10 . . . 10 rows in set. Elapsed: 0.007 sec. Processed 148.16 thousand rows, 3.85 MB (20.37 million rows/s., 529.50 MB/s.)
  • 9. What’s going on under the covers? ontime (MergeTree) .inner.ontime_daily_cancelled_mv (SummingMergeTree) ontime_daily_cancelled_mv (materialized view) INSERT SELECT Compressed size: ~14.6GB Uncompressed size: ~55.4GB Compressed size: ~2.31 MB Uncompressed size: ~4.10 MB SELECT INSERT (Trigger) Materialized view is 6471x smaller than source table!
  • 10. A brief study of ClickHouse table structures CREATE TABLE ontime ( Year UInt16, Quarter UInt8, Month UInt8, ... ) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, FlightDate) Table engine type How to break data into parts How to index and sort data in each part
  • 11. Possible ways to transform tables ontime (Materialized view) ontime_agg ontime_resorted ontime_kafka Reduce data Change layout Change table type And/Or And/Or
  • 12. Exercise: the famous ‘last point problem’ Host 7023 Host 6522 CPU Utilization Host 9601 CPU Utilization CPU UtilizationCPU Utilization CPU UtilizationCPU Utilization CPU Table Problem: Show the current CPU utilization for each host CPU Utilization CPU Utilization
  • 13. ClickHouse can solve this using a subquery SELECT t.hostname, tags_id, 100 - usage_idle usage FROM ( SELECT tags_id, usage_idle FROM cpu WHERE (tags_id, created_at) IN (SELECT tags_id, max(created_at) FROM cpu GROUP BY tags_id) ) AS c INNER JOIN tags AS t ON c.tags_id = t.id ORDER BY usage DESC, t.hostname ASC LIMIT 10 TABLE SCAN! USE INDEX OPTIMIZED JOIN COST
  • 14. SQL queries work but are inefficient OUTPUT: ┌─hostname──┬─tags_id─┬─usage─┐ │ host_1002 │ 9003 │ 100 │ │ host_1116 │ 9117 │ 100 │ │ host_1141 │ 9142 │ 100 │ │ host_1163 │ 9164 │ 100 │ │ host_1210 │ 9211 │ 100 │ │ host_1216 │ 9217 │ 100 │ │ host_1234 | 9235 │ 100 │ │ host_1308 │ 9309 │ 100 │ │ host_1419 │ 9420 │ 100 │ │ host_1491 │ 9492 │ 100 │ └───────────┴─────────┴───────┘ Using direct query on table: 10 rows in set. Elapsed: 0.566 sec. Processed 32.87 million rows, 263.13 MB (53.19 million rows/s., 425.81 MB/s.) Can we bring last point performance closer to real-time?
  • 15. Create target table for aggregate data CREATE TABLE cpu_last_point_idle_agg ( created_date AggregateFunction(argMax, Date, DateTime), max_created_at AggregateFunction(max, DateTime), time AggregateFunction(argMax, String, DateTime), tags_id UInt32, usage_idle AggregateFunction(argMax, Float64, DateTime) ) ENGINE = AggregatingMergeTree() PARTITION BY tuple() ORDER BY tags_id Different table type Different storage layout Minimal data
  • 16. argMaxState links columns with aggregates CREATE MATERIALIZED VIEW cpu_last_point_idle_mv TO cpu_last_point_idle_agg AS SELECT argMaxState(created_date, created_at) AS created_date, maxState(created_at) AS max_created_at, argMaxState(time, created_at) AS time, tags_id, argMaxState(usage_idle, created_at) AS usage_idle FROM cpu GROUP BY tags_id Derive data MV table
  • 17. Digression: How aggregation works created_at maxState(created_at) maxMerge(max_created_at) Source value Partial aggregate Merged aggregate
  • 18. Selecting rows that match max value created_at maxState(created_at) avgMerge(created_at) Source values Partial aggregates Merged aggregates usage_idle argMaxState(usage_idle, created_at) avgMaxMerge(usage_idle) (Same row) (Pick usage_idle from aggregate with matching created_at) (Pick usage_idle value from any row with matching created_at)
  • 19. Let’s hide the merge details with a view CREATE VIEW cpu_last_point_idle_v AS SELECT argMaxMerge(created_date) AS created_date, maxMerge(max_created_at) AS created_at, argMaxMerge(time) AS time, tags_id, argMaxMerge(usage_idle) AS usage_idle FROM cpu_last_point_idle_mv GROUP BY tags_id
  • 20. ...Select again from the covering view SELECT t.hostname, tags_id, 100 - usage_idle usage FROM cpu_last_point_idle_v AS b INNER JOIN tags AS t ON b.tags_id = t.id ORDER BY usage DESC, t.hostname ASC LIMIT 10 ... 10 rows in set. Elapsed: 0.005 sec. Processed 14.00 thousand rows, 391.65 KB (2.97 million rows/s., 82.97 MB/s.) Last point view is 113 times faster
  • 21. Common uses for materialized views ● Precompute aggregates ● Fetch last point data ● Transform table on-disk indexing and sorting ○ Like a Vertica projection ● Keep aggregates after raw input is dropped ● Create data cleaning pipelines ● Read from Kafka queues