Abstract:- ClickHouse is an OpenSource real-time analytical database that handles petabyte scale data sizes with massive linear scaling and SQL-like language.
16. SELECT foo FROM distributed_table
SELECT foo FROM local_table GROUP BY col1
• Server 1
SELECT foo FROM local_table GROUP BY col1
• Server 2
SELECT foo FROM local_table GROUP BY col1
• Server 3
17. N Servers 1 3 140
Time, sec 1.224 0.438 0.043
Speedup x2.8 x28.5
Analytic database landscape:
Commerical -- fast and expensive:
Vertica
RedShift
Teradata
etc.
Open Source -- somewhat slow, buggy but free
InfiniDB (part of MariaDB now)
InfoBright
GreenPlum (started as commerical)
Hadoop systems
ClickHouse: fast as free!
ClickHouse story:
Yandex -- Russian Google
Yandex Metrika -- Russian Google Analytics
Interactive Ad Hoc reports at multiple petabytes
That's why they developed ClickHouse
ClickHouse is extremelly fast and scalable.
Why ClickHouse is so fast
Popular Yandex answer -- because they had no choice
Techical details
Vectorised processing (see VectorWise)
True MPP
True shared nothing
True column store with late materialization (like C-Store and Vertica but unlike many others):
Data compression
Column locality
No random reads
Some technical details (in Russian): https://clickhouse.yandex/presentations/meetup7/internals.pdf
What is column store (I think it is important to explain)
Why it is good for quries like range scan + aggregation (look at Yandex presentation above, there are examples)
Conclusion -- such an architecture allows very fast queries on a single table with filters and group by's
Not enough speed -- let's see how data distribution works
Care about reliability -- let's see how replication is set up (again, can use Yandex slides above as the source)
Benchmark 1
Benchmark 2
Benchmark 3
Few words on limitations:
Custom SQL dialect
As a consequence -- limited ecosystem (can not fit to standard one)
No deletes/updates:
but there are mutable table types (engines)
there is a way to connect to external updateble data (dictionaries)
Somewhat hard to manage -- no tools
Final word:
Potential to be MySQL for Analytics
Invite to try
Need more info -- http://clickhouse.yandex
Need consulting/support -- http://www.altinity.com