Apache Druid Design and Future prospect

© Cloudera, Inc. All rights reserved.
Apache Druid: Present and Latest
Slim Bouguerra: (bslim@apache.org)
Apache Druid PMC
Apache Hive Committer
Apache Calcite Committer

© Cloudera, Inc. All rights reserved. 2
Talk Agenda
1.Use cases overview
2.Design Overview
3.Project status and What’s coming Next

Why Druid

TIME SERIES DATABASEs
Data model and use case
• Event -> timestamp, attribute_1…attribute_n, || metric_1….metric_n
05/01/2019 @ 9:02pm (UTC), | zipcode, ….,state_id, departement_id | | total_sales, ……, total_added …|
• Questions:
• Business intelligence type: Top 10 revenue deals the last week broken down by states?
• User behaviors analytics type:
• Approximate Number of unique users visited site1.com and site2.com last hour?
• Page Views, Impressions ?
• IOT type: Count distinct of truck drivers passing certain speed limit over the past year?
• Server metrics monitoring type: Top 100 latency of servers grouped by region over the last
black Friday week?

TIME SERIES USE CASE
SELECT `user`, sum(`c_added`) AS s, EXTRACT(year FROM `__time`)
FROM druid_table
WHERE EXTRACT(year FROM `__time`)
BETWEEN 2010 AND 2011
GROUP BY `user`, EXTRACT(year FROM `__time`) ORDER BY s DESC LIMIT 10;
Time-series data workload != normal database workload
● Queries always contain date columns as group by keys.
● Queries Filters by time.
● Queries touch few columns < 10.
● Queries have very selective Filters (< thousands of rows out of billions)
● Clear Distinction between attributes and metrics columns
Expected Workload:

Data Model
• Mostly an insert/append workload, very few
updates.
• application/server logs
• field sensor logs
• Schema less
• Adding new server/application/OS type or upgrade
• Adding new sensors

Requirements
• Data Ingestion Rate
• Ingest petabytes of data
• Queryable in real-time.
• Arbitrary Drill-Downs, Slice-n-Dice
• Arbitrary Boolean filters
• Availability
• Downtime is evil
• Runs on commodity computers
• Scale up at peak time.
• Scale down when the traffic slow.
• Cloud Native

Companies Using Druid http://druid.io/druid-powered

Who Uses Druid?
⬢ 100s of nodes
⬢ 30+ trillion events -> 100+ PB of raw data
⬢ 300 billion events/day
⬢ More 1 million events/sec through Druid's real-time ingestion (at peak)
⬢ 20K – 100K ingested events per second per core.
⬢ Query Latency
– 90% < Second; 95% < 5 Seconds; 99% < 10 Seconds
⬢ Real-time Ingestion hundreds of queries per second for thousands of users

⬢ Replaced 5,000 Hbase cluster serving six petabytes of metrics to power Flurry
mobile analytics alone.[infoworld.com]
⬢ Tracking more than 2 billion mobiles devices [Flurry SDK @ MDC 2016].
⬢ Real-time Ingestion 20 billion events per day [Flurry SDK @ MDC 2016].
⬢ Sub second query latency.
⬢ Query the last 15 second.
Ad-hoc analytics.
High concurrency user-facing real-time slice-and-dice.
Real-time loads of 10s of billions of events per day.

Druid Internals: Segments (indexes)

Data is partitioned by time !
timestamp publisher advertiser gender country ... click price
2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
...
⬢ Time partitioned immutable partitions.
⬢ Typically 5 Million row per segment.
⬢ Druid segments are stored in a column orientation (only what is needed is actually
loaded and scanned).

COLUMN COMPRESSION -
DICTIONARIES
• Create ids
• bieberfever.com -> 0, ultratrimfast.com -> 1
• Store
• publisher -> [0, 0, 0, 1, 1, 1]
• advertiser -> [0, 0, 0, 0, 0, 0]
...

BITMAP INDEXES
...
• bieberfever.com -> [0, 1, 2] -> [111000]
• ultratrimfast.com -> [3, 4, 5] -> [000111]
• Compress using Concice or Roaring (take advantage of dimension sparsity)

Druid Internals: Query Processing Flow

Query Language
Timeseries : -> No grouping OR grouping time column
● Select SUM(sales) from table;
● Select floor_hour(timecolumn), SUM(sales) from table;
TopN :-> Grouping on time column + one column then approximate sort with
limit
● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id from
table group by state_id order by s_sales limit 10;
GroupBy :-> Grouping on time column and or multiple column then limit.
● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id,
departement_id from table group by state_id, departement_id order by
s_sales limit 10;

DATA PROCESSING
⬢ Prune Partitions based on Time range predicate.
○ Segments [2010-2011]
⬢ Split query by time intervals to target processing nodes:
○ Push down filter predicates on attributes.
○ Prune attribute/metric columns.
⬢ Each processing node locally:
○ Build iterator columns/rows matching filters
○ Compute partial aggregate (apply limits if possible)
⬢ Collect partial results:
○ Finalize aggregate results
○ Compute post aggregate (eg AVG = SUM/COUNT or SUM(sales)/SUM(expenses))
○ Apply limits
SELECT ùser`,
sum(`c_added`) AS s,
EXTRACT(year FROM
`__time`)
FROM druid_table
WHERE EXTRACT(year FROM
`__time`)
BETWEEN 2010 AND 2011
AND ùser_country` =
‘USA’
GROUP BY ùser`,
EXTRACT(year FROM
`__time`)
ORDER BY s DESC
LIMIT 10;

Theta Sketches KMV: Open sourced by Yahoo! [datasketches.github.io]
⬢ Predictable approximation error can be trade-off by sketch size
○ k = 4096 corresponds to an RSE of +/- 3.2% with 95% confidence.
○ k = 16K corresponds to an RSE of +/- 1.6% with 95% confidence.
⬢ Limited memory footprint and independent from data size
○ k = 4096 -> 32768 bytes.
○ K = 16384 -> 131072 bytes.
⬢ Mergebale at query time.
○ “merge rate of about 14.5 million sketches per second per processor thread”
[http://datasketches.github.io/docs/Theta/ThetaMergeSpeed.html].
⬢ Intersection can be computed at query time.
⬢ Duplication insensitive.
https://speakerdeck.com/druidio/approximate-algorithms-and-sketches-in-druid

Voltron Like Architecture:

Druid DIY

Current Druid Architecture
Hadoop
Historical
Node
Historical
Node
Historical
Node
Batch Data
Broker
Node
Queries
ETL
(Samza,
Kafka, Storm,
Spark etc)
Streaming
Data Realtime
Node
Realtime
Node
Handoff

Coordinator Nodes
⬢ Assigns segments to historical nodes
⬢ Interval based cost function to distribute segments
⬢ Makes sure query load is uniform across historical nodes
⬢ Handles replication of data
⬢ Configurable rules to load/drop data

Summary
⬢ Scalability
○ Horizontal Scalability.
○ Columnar storage, indexing and compression.
○ Multi-tennancy.
⬢ Real-time
○ Ingestion latency < seconds.
○ Query latency < seconds.
⬢ Arbitrary slice and dice big data like ninja
○ No more pre-canned drill downs.
○ Query with more fine-grained granularity.
⬢ High availability and Rolling deployment capabilities
○ Less costly to run.
○ Very active open source community.

Apache Druid: Current() && Next()

Apache Incubating Druid: Current
⬢ Over 200 new features, performance/stability
improvements, and bug fixes from 54 contributors.
⬢ New web console
⬢ Amazon Kinesis indexing service
⬢ Decommissioning mode for Historicals
⬢ Published segment cache in Broker
⬢ Bloom filter aggregator and expression
⬢ Updated Apache Parquet extension
⬢ Force push down option for nested GroupBy queries
⬢ Better segment handoff and drop rule handling
⬢ Automatically kill MapReduce jobs when Apache Hadoop
ingestion tasks are killed
⬢ DogStatsD tag support for statsd emitter
⬢ New API for retrieving all lookup specs
⬢ New compaction options
⬢ More efficient caching cost segment balancing strategy
⬢ Over 400 new features, performance/stability/documentation
improvements, and bug fixes from 81 contributors.
⬢ It is the first release of Druid in the Apache Incubator program.
⬢ native parallel batch indexing
⬢ automatic segment compaction
⬢ system schema tables
⬢ improved indexing task status, statistics, and error reporting
⬢ SQL-compatible null handling
⬢ result-level broker caching
⬢ ingestion from RDBMS
⬢ Bloom filter support
⬢ additional SQL result formats
⬢ additional aggregators (stringFirst/stringLast, ArrayOfDoublesSketch,
HllSketch)
⬢ Support for multiple grouping specs in groupBy query
⬢ mutual TLS support
⬢ HTTP-based worker management
⬢ Broker backpressure
Apache Incubating Druid 0.13.0 Apache Incubating Druid 0.14.0

Apache Druid Next
➔[WIP] Java 11 compatibility #5589
➔[DONE] Query vectorization #7093
➔[PROPOSAL] Dynamic prioritization and laning #6993
➔[PROPOSAL] Add support for t-digest backed aggregators #7303
➔[PROPOSAL] Initial Join Support Druid #6416

Thanks! Q & A
Slim Bouguerra: (bslim@apache.org)

Apache Druid Design and Future prospect

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Apache Druid Design and Future prospect

Ähnlich wie Apache Druid Design and Future prospect (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apache Druid Design and Future prospect