SlideShare ist ein Scribd-Unternehmen logo
1 von 46
ClickHouse 2018
How to stop waiting for your queries
to complete and start having fun
Alexander Zaitsev
Altinity
2
Who am I
M.Sc. In mathematics from Moscow State University
Software engineer since 1997
Developed distributed systems since 2002
Focused on high performance analytics since 2007
Director of Engineering in LifeStreet
Co-founder of Altinity – ClickHouse Service Provider
3
.. and I am not Peter’s brother :)
4
What Is ClickHouse?
5
© http://mattturck.com/
6
ClickHouse DBMS is
Column Store
MPP
Realtime
SQL
Open Source
7
http://clickhouse.yandex
• Developed by Yandex for Yandex.Metrica
- Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+
b2b and b2c products)
- Yandex.Metrica – world 2nd largest web analytics platform
• Open Source since June 2016 (Apache 2.0 license)
• 200+ companies using in production today
• Several hundred experimenting, doing POC etc.
• Dozens of contributors to the source code
8
Why Yet Another DBMS?
9
SQLFlexible
11
OpenSource
Analytical
DBMS
Commercial
Analytical
DBMS
12
ClickHouse
Fast!
Flexible!
Free!
Fun!
13
How Fast?
14
:) select count(*) from dw.ad8_fact_event;
SELECT count(*)
FROM dw.ad8_fact_event
┌───────count()─┐
│ 1261705085657 │
└───────────────┘
1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion
rows/s., 355.22 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
15
:) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2;
SELECT sum(price_cpm)
FROM dw.ad8_fact_event
WHERE (access_day = (today() - 1)) AND (event_key = -2)
┌────sum(price_cpm)─┐
│ 87579.09035192338 │
└───────────────────┘
1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million
rows/s., 17.31 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
16
WikiStat data, 28B rows.
https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
17
Query 1 Query 2 Query 3 Query 4 Setup
0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster
0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs
- 2.415 3.599 4.962 ClickHouse at Altinity demo server
0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster
1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K
1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster
2 2 1 3 BigQuery
6.41 6.19 6.09 6.63 Amazon Athena
8.1 18.18 n/a n/a Elasticsearch (heavily tuned)
14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K
22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS
35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS
152 175 235 368 PostgreSQL 9.5 & cstore_fdw
“1.1 Billion Taxi Rides Benchmarks”
http://tech.marksblogg.com/benchmarks.html
18• 19 queries, 1200M rows table, 3-node clusters
2016 LifeStreet benchmark
(unpublished)
19
Time Series benchmarks
(first time today!)
https://github.com/timescale/tsbs
Benchmark suite to automate testing
Loads 103M rows, 10 metrics per row
Runs 15 queries, 1000 runs each in 8 parallel threads
Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse
(Altinity PR is submitted)
20
0
100
200
300
400
500
600
700
800
900
ClickHouse TimescaleDB InfluxDB
Load time (s)
21
0
10
20
30
40
50
60
70
80
ClickHouse
TimescaleDB
InfluxDB
“Light” queries, time in ms
22
0
10
20
30
40
50
60
70
80
90
ClickHouse
TimescaleDB
InfluxDB
“Heavy” queries, time in sec
23
How flexible?
24
ClickHouse runs at
Bare metal (any Linux)
Amazon
Azure
VMware, VirtualBox
Docker, K8s
25
ClickHouse solves business problems at:
Mobile App and Web analytics
AdTech bidding analytics
Operational Logs analytics
DNS queries analysis
Stock correlation analytics
Telecom
Security audit
Fintech SaaS
Manufactoring process control
BlockChain transactions analysis
26
Worldwide
* www.altinity.com visits in 2018
27
Size does not matter
Yandex: 500+ servers, 25B rec/day
LifeStreet: 60 servers, 75B rec/day
CloudFlare: 36 servers, 200B rec/day
Bloomberg: 102 servers, 1000B rec/day
Toutiao: 400 servers, moving to 1000 this month
28
How fun ☺
life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
29
with (select groupArray(C) from C) as Ca
select id,
groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da,
groupArray(P) Pa,
arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka,
arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] -
Sa[k])*(c-Sa[k]),Ca,Ka) Ta,
arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ?
arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja,
arrayMap(i -> Ta[i], Ja) Ra,
arrayMap((v,r) -> v - r, Va, Ra) ARa,
arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result
from T group by id
30
What’s new in 2018
• Table functions mysql/odbc/file/http
• clickhouse-copier
• Predicate pushdown for views/subselects
• LowCardinality datatype
• Decimal datatype
• JOIN enhancements
• ALTER TABLE UPDATE/DELETE
• WITH ROLLUP
… and tons of performance improvements and small features
31
More user friendly than ever!
• GDPR compliance – thanks to UPDATE/DELETE
• Easier BI integration – thanks to SQL compatibility changes
and improvements in ODBC driver
• Easier cluster operation – thanks to clickhouse-copier,
distributed DDL
• Easier integration with other systems. Thanks to:
• Table functions
• Kafka storage engine
• Logs integration with Logstash, ClickTail
• clickhouse-mysql for migration from MySQL
32
Case Study. Ivinco jump on to ClickHouse
Supports mature boardreader system
A lot of data collected from different sources
A lot of operational data (performance monitoring)
200TB in MySQL!
33
Operational problems
Hard to scale
Hard to make HA solution
Performance issues:
• ‘Manual’ partitioning and sharding
• Dozens of indexes per table etc.
34
Organizational problems
No development resources to rewrite
Minimal changes to current system are allowed
35
Binary log replication from MySQL
to ClickHouse
MySQL
clickhouse-mysql
Queries
Source Data
See details at:
https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
36
Results
Seamless integration of ClickHouse into the current system
No developers/coding involved, project is done with DevOps
Easy to test performance side by side
ClickHouse is 100 times faster
Now ready to re-write main system
37
More ways to integrate with MySQL
• mysql() table function
• MySQL table engine
• MySQL external dictionaries
• ProxySQL
38
mysql() table function
select * from mysql('host:port', database, 'table', 'user', 'password');
https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse
• Easiest and fastest way to get data from MySQL
• Load to CH table and run queries much faster
39
MySQL table engine
CREATE TABLE …
Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query,
'on_duplicate_clause']);
•SELECTs and INSERTs!
•No caching, data is queried from remote server
https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
40
MySQL external dictionaries
• Makes data from mysql database accessible in ClickHouse queries
• Stores in memory
• Updates when the source data changes
SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name,
sum(imps)
FROM T
GROUP BY country_name;
41
Accessing ClickHouse from MySQL
42
ClickTail
• Log ingesting based on honeycomb.io
• Understands Nginx Access Log, MySQL Slow Log,
MySQL Audit Logs, MongoDB and Regex Custom Format
• Easily extensible with other formats
https://github.com/Altinity/clicktail
https://www.altinity.com/blog/2018/3/12/clicktail-introduction
https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/
https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-logs-clickhouse-
clicktail/
43
Kafka Engine
Engine = Kafka MV
Engine =
MergeTree
https://clickhouse.yandex/docs/en/operations/table_engines/kafka/
ClickHouse
Kafka
44
“Secret” Roadmap disclosed
ANSI SQL JOIN support:
• Multi-table joins – Q1/2019
• merge joins – Q2/2019
Protobuf/Parquet formats - Q4/2018
Per column compression/encoding settings – Q4/2018
Dictionary DDLs – Q1/2019
Secondary indexes – Q2/2019
LDAP integration, security enhancements -- Q2/2019
45
ClickHouse Today
Mature Analytic DBMS. Proven by many companies
2+ years in Open Source
Constantly improves – new cool features were added recently
Many community contributors
Emerging eco-system
Support from Altinity
46
ClickHouse Today
47
Q&A
Contact me:
alexander.zaitsev@lifestreet.com
alz@altinity.com
skype: alex.zaitsev
telegram: @alexanderzaitsev
Altinity

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter ZaitsevAnalyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
 
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 
12 in 12 – A closer look at twelve or so new things in Postgres 12
12 in 12 – A closer look at twelve or so new things in Postgres 1212 in 12 – A closer look at twelve or so new things in Postgres 12
12 in 12 – A closer look at twelve or so new things in Postgres 12
 
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
 

Ähnlich wie ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO

Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 

Ähnlich wie ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO (20)

Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
What is MariaDB Server 10.3?
What is MariaDB Server 10.3?What is MariaDB Server 10.3?
What is MariaDB Server 10.3?
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Presentation
PresentationPresentation
Presentation
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
MySQL 5.7 - What's new and How to upgrade
MySQL 5.7 - What's new and How to upgradeMySQL 5.7 - What's new and How to upgrade
MySQL 5.7 - What's new and How to upgrade
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
 
MySQL 8.0.1 DMR
MySQL 8.0.1 DMRMySQL 8.0.1 DMR
MySQL 8.0.1 DMR
 
MySQL 5.7 - What's new, How to upgrade and Document Store
MySQL 5.7 - What's new, How to upgrade and Document StoreMySQL 5.7 - What's new, How to upgrade and Document Store
MySQL 5.7 - What's new, How to upgrade and Document Store
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 

Mehr von Altinity Ltd

Mehr von Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 

Kürzlich hochgeladen

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Kürzlich hochgeladen (20)

How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 

ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO

  • 1. ClickHouse 2018 How to stop waiting for your queries to complete and start having fun Alexander Zaitsev Altinity
  • 2. 2 Who am I M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics since 2007 Director of Engineering in LifeStreet Co-founder of Altinity – ClickHouse Service Provider
  • 3. 3 .. and I am not Peter’s brother :)
  • 6. 6 ClickHouse DBMS is Column Store MPP Realtime SQL Open Source
  • 7. 7 http://clickhouse.yandex • Developed by Yandex for Yandex.Metrica - Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+ b2b and b2c products) - Yandex.Metrica – world 2nd largest web analytics platform • Open Source since June 2016 (Apache 2.0 license) • 200+ companies using in production today • Several hundred experimenting, doing POC etc. • Dozens of contributors to the source code
  • 13. 14 :) select count(*) from dw.ad8_fact_event; SELECT count(*) FROM dw.ad8_fact_event ┌───────count()─┐ │ 1261705085657 │ └───────────────┘ 1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  • 14. 15 :) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2; SELECT sum(price_cpm) FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2) ┌────sum(price_cpm)─┐ │ 87579.09035192338 │ └───────────────────┘ 1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  • 15. 16 WikiStat data, 28B rows. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
  • 16. 17 Query 1 Query 2 Query 3 Query 4 Setup 0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster 0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs - 2.415 3.599 4.962 ClickHouse at Altinity demo server 0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2 1 3 BigQuery 6.41 6.19 6.09 6.63 Amazon Athena 8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS 35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS 152 175 235 368 PostgreSQL 9.5 & cstore_fdw “1.1 Billion Taxi Rides Benchmarks” http://tech.marksblogg.com/benchmarks.html
  • 17. 18• 19 queries, 1200M rows table, 3-node clusters 2016 LifeStreet benchmark (unpublished)
  • 18. 19 Time Series benchmarks (first time today!) https://github.com/timescale/tsbs Benchmark suite to automate testing Loads 103M rows, 10 metrics per row Runs 15 queries, 1000 runs each in 8 parallel threads Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted)
  • 23. 24 ClickHouse runs at Bare metal (any Linux) Amazon Azure VMware, VirtualBox Docker, K8s
  • 24. 25 ClickHouse solves business problems at: Mobile App and Web analytics AdTech bidding analytics Operational Logs analytics DNS queries analysis Stock correlation analytics Telecom Security audit Fintech SaaS Manufactoring process control BlockChain transactions analysis
  • 26. 27 Size does not matter Yandex: 500+ servers, 25B rec/day LifeStreet: 60 servers, 75B rec/day CloudFlare: 36 servers, 200B rec/day Bloomberg: 102 servers, 1000B rec/day Toutiao: 400 servers, moving to 1000 this month
  • 27. 28 How fun ☺ life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
  • 28. 29 with (select groupArray(C) from C) as Ca select id, groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa, arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka, arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] - Sa[k])*(c-Sa[k]),Ca,Ka) Ta, arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja, arrayMap(i -> Ta[i], Ja) Ra, arrayMap((v,r) -> v - r, Va, Ra) ARa, arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result from T group by id
  • 29. 30 What’s new in 2018 • Table functions mysql/odbc/file/http • clickhouse-copier • Predicate pushdown for views/subselects • LowCardinality datatype • Decimal datatype • JOIN enhancements • ALTER TABLE UPDATE/DELETE • WITH ROLLUP … and tons of performance improvements and small features
  • 30. 31 More user friendly than ever! • GDPR compliance – thanks to UPDATE/DELETE • Easier BI integration – thanks to SQL compatibility changes and improvements in ODBC driver • Easier cluster operation – thanks to clickhouse-copier, distributed DDL • Easier integration with other systems. Thanks to: • Table functions • Kafka storage engine • Logs integration with Logstash, ClickTail • clickhouse-mysql for migration from MySQL
  • 31. 32 Case Study. Ivinco jump on to ClickHouse Supports mature boardreader system A lot of data collected from different sources A lot of operational data (performance monitoring) 200TB in MySQL!
  • 32. 33 Operational problems Hard to scale Hard to make HA solution Performance issues: • ‘Manual’ partitioning and sharding • Dozens of indexes per table etc.
  • 33. 34 Organizational problems No development resources to rewrite Minimal changes to current system are allowed
  • 34. 35 Binary log replication from MySQL to ClickHouse MySQL clickhouse-mysql Queries Source Data See details at: https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
  • 35. 36 Results Seamless integration of ClickHouse into the current system No developers/coding involved, project is done with DevOps Easy to test performance side by side ClickHouse is 100 times faster Now ready to re-write main system
  • 36. 37 More ways to integrate with MySQL • mysql() table function • MySQL table engine • MySQL external dictionaries • ProxySQL
  • 37. 38 mysql() table function select * from mysql('host:port', database, 'table', 'user', 'password'); https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse • Easiest and fastest way to get data from MySQL • Load to CH table and run queries much faster
  • 38. 39 MySQL table engine CREATE TABLE … Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']); •SELECTs and INSERTs! •No caching, data is queried from remote server https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
  • 39. 40 MySQL external dictionaries • Makes data from mysql database accessible in ClickHouse queries • Stores in memory • Updates when the source data changes SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name, sum(imps) FROM T GROUP BY country_name;
  • 41. 42 ClickTail • Log ingesting based on honeycomb.io • Understands Nginx Access Log, MySQL Slow Log, MySQL Audit Logs, MongoDB and Regex Custom Format • Easily extensible with other formats https://github.com/Altinity/clicktail https://www.altinity.com/blog/2018/3/12/clicktail-introduction https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/ https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-logs-clickhouse- clicktail/
  • 42. 43 Kafka Engine Engine = Kafka MV Engine = MergeTree https://clickhouse.yandex/docs/en/operations/table_engines/kafka/ ClickHouse Kafka
  • 43. 44 “Secret” Roadmap disclosed ANSI SQL JOIN support: • Multi-table joins – Q1/2019 • merge joins – Q2/2019 Protobuf/Parquet formats - Q4/2018 Per column compression/encoding settings – Q4/2018 Dictionary DDLs – Q1/2019 Secondary indexes – Q2/2019 LDAP integration, security enhancements -- Q2/2019
  • 44. 45 ClickHouse Today Mature Analytic DBMS. Proven by many companies 2+ years in Open Source Constantly improves – new cool features were added recently Many community contributors Emerging eco-system Support from Altinity