SlideShare ist ein Scribd-Unternehmen logo
1 von 27
© Cloudera, Inc. All rights reserved.
Apache Druid: Present and Latest
Slim Bouguerra: (bslim@apache.org)
Apache Druid PMC
Apache Hive Committer
Apache Calcite Committer
© Cloudera, Inc. All rights reserved. 2
Talk Agenda
1.Use cases overview
2.Design Overview
3.Project status and What’s coming Next
© Cloudera, Inc. All rights reserved. 3
Why Druid
© Cloudera, Inc. All rights reserved. 4
TIME SERIES DATABASEs
Data model and use case
• Event -> timestamp, attribute_1…attribute_n, || metric_1….metric_n
05/01/2019 @ 9:02pm (UTC), | zipcode, ….,state_id, departement_id | | total_sales, ……, total_added …|
• Questions:
• Business intelligence type: Top 10 revenue deals the last week broken down by states?
• User behaviors analytics type:
• Approximate Number of unique users visited site1.com and site2.com last hour?
• Page Views, Impressions ?
• IOT type: Count distinct of truck drivers passing certain speed limit over the past year?
• Server metrics monitoring type: Top 100 latency of servers grouped by region over the last
black Friday week?
© Cloudera, Inc. All rights reserved. 5
TIME SERIES USE CASE
SELECT `user`, sum(`c_added`) AS s, EXTRACT(year FROM `__time`)
FROM druid_table
WHERE EXTRACT(year FROM `__time`)
BETWEEN 2010 AND 2011
GROUP BY `user`, EXTRACT(year FROM `__time`) ORDER BY s DESC LIMIT 10;
Time-series data workload != normal database workload
● Queries always contain date columns as group by keys.
● Queries Filters by time.
● Queries touch few columns < 10.
● Queries have very selective Filters (< thousands of rows out of billions)
● Clear Distinction between attributes and metrics columns
Expected Workload:
© Cloudera, Inc. All rights reserved. 6
Data Model
• Mostly an insert/append workload, very few
updates.
• application/server logs
• field sensor logs
• Schema less
• Adding new server/application/OS type or upgrade
• Adding new sensors
© Cloudera, Inc. All rights reserved. 7
Requirements
• Data Ingestion Rate
• Ingest petabytes of data
• Queryable in real-time.
• Arbitrary Drill-Downs, Slice-n-Dice
• Arbitrary Boolean filters
• Availability
• Downtime is evil
• Runs on commodity computers
• Scale up at peak time.
• Scale down when the traffic slow.
• Cloud Native
© Cloudera, Inc. All rights reserved.
Companies Using Druid http://druid.io/druid-powered
© Cloudera, Inc. All rights reserved.
Who Uses Druid?
⬢ 100s of nodes
⬢ 30+ trillion events -> 100+ PB of raw data
⬢ 300 billion events/day
⬢ More 1 million events/sec through Druid's real-time ingestion (at peak)
⬢ 20K – 100K ingested events per second per core.
⬢ Query Latency
– 90% < Second; 95% < 5 Seconds; 99% < 10 Seconds
⬢ Real-time Ingestion hundreds of queries per second for thousands of users
© Cloudera, Inc. All rights reserved.
⬢ Replaced 5,000 Hbase cluster serving six petabytes of metrics to power Flurry
mobile analytics alone.[infoworld.com]
⬢ Tracking more than 2 billion mobiles devices [Flurry SDK @ MDC 2016].
⬢ Real-time Ingestion 20 billion events per day [Flurry SDK @ MDC 2016].
⬢ Sub second query latency.
⬢ Query the last 15 second.
Ad-hoc analytics.
High concurrency user-facing real-time slice-and-dice.
Real-time loads of 10s of billions of events per day.
© Cloudera, Inc. All rights reserved.
Druid Internals: Segments (indexes)
© Cloudera, Inc. All rights reserved.
Data is partitioned by time !
timestamp publisher advertiser gender country ... click price
2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
...
⬢ Time partitioned immutable partitions.
⬢ Typically 5 Million row per segment.
⬢ Druid segments are stored in a column orientation (only what is needed is actually
loaded and scanned).
© Cloudera, Inc. All rights reserved.
COLUMN COMPRESSION -
DICTIONARIES
• Create ids
• bieberfever.com -> 0, ultratrimfast.com -> 1
• Store
• publisher -> [0, 0, 0, 1, 1, 1]
• advertiser -> [0, 0, 0, 0, 0, 0]
timestamp publisher advertiser gender country ... click price
2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
...
© Cloudera, Inc. All rights reserved.
BITMAP INDEXES
timestamp publisher advertiser gender country ... click price
2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65
2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62
2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45
2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
...
• bieberfever.com -> [0, 1, 2] -> [111000]
• ultratrimfast.com -> [3, 4, 5] -> [000111]
• Compress using Concice or Roaring (take advantage of dimension sparsity)
© Cloudera, Inc. All rights reserved.
Druid Internals: Query Processing Flow
© Cloudera, Inc. All rights reserved. 17
Query Language
Timeseries : -> No grouping OR grouping time column
● Select SUM(sales) from table;
● Select floor_hour(timecolumn), SUM(sales) from table;
TopN :-> Grouping on time column + one column then approximate sort with
limit
● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id from
table group by state_id order by s_sales limit 10;
GroupBy :-> Grouping on time column and or multiple column then limit.
● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id,
departement_id from table group by state_id, departement_id order by
s_sales limit 10;
© Cloudera, Inc. All rights reserved.
DATA PROCESSING
⬢ Prune Partitions based on Time range predicate.
○ Segments [2010-2011]
⬢ Split query by time intervals to target processing nodes:
○ Push down filter predicates on attributes.
○ Prune attribute/metric columns.
⬢ Each processing node locally:
○ Build iterator columns/rows matching filters
○ Compute partial aggregate (apply limits if possible)
⬢ Collect partial results:
○ Finalize aggregate results
○ Compute post aggregate (eg AVG = SUM/COUNT or SUM(sales)/SUM(expenses))
○ Apply limits
SELECT `user`,
sum(`c_added`) AS s,
EXTRACT(year FROM
`__time`)
FROM druid_table
WHERE EXTRACT(year FROM
`__time`)
BETWEEN 2010 AND 2011
AND `user_country` =
‘USA’
GROUP BY `user`,
EXTRACT(year FROM
`__time`)
ORDER BY s DESC
LIMIT 10;
© Cloudera, Inc. All rights reserved.
Theta Sketches KMV: Open sourced by Yahoo! [datasketches.github.io]
⬢ Predictable approximation error can be trade-off by sketch size
○ k = 4096 corresponds to an RSE of +/- 3.2% with 95% confidence.
○ k = 16K corresponds to an RSE of +/- 1.6% with 95% confidence.
⬢ Limited memory footprint and independent from data size
○ k = 4096 -> 32768 bytes.
○ K = 16384 -> 131072 bytes.
⬢ Mergebale at query time.
○ “merge rate of about 14.5 million sketches per second per processor thread”
[http://datasketches.github.io/docs/Theta/ThetaMergeSpeed.html].
⬢ Intersection can be computed at query time.
⬢ Duplication insensitive.
https://speakerdeck.com/druidio/approximate-algorithms-and-sketches-in-druid
© Cloudera, Inc. All rights reserved.
Voltron Like Architecture:
© Cloudera, Inc. All rights reserved.
Druid DIY
© Cloudera, Inc. All rights reserved.
Current Druid Architecture
Hadoop
Historical
Node
Historical
Node
Historical
Node
Batch Data
Broker
Node
Queries
ETL
(Samza,
Kafka, Storm,
Spark etc)
Streaming
Data Realtime
Node
Realtime
Node
Handoff
© Cloudera, Inc. All rights reserved.
Coordinator Nodes
⬢ Assigns segments to historical nodes
⬢ Interval based cost function to distribute segments
⬢ Makes sure query load is uniform across historical nodes
⬢ Handles replication of data
⬢ Configurable rules to load/drop data
© Cloudera, Inc. All rights reserved.
Summary
⬢ Scalability
○ Horizontal Scalability.
○ Columnar storage, indexing and compression.
○ Multi-tennancy.
⬢ Real-time
○ Ingestion latency < seconds.
○ Query latency < seconds.
⬢ Arbitrary slice and dice big data like ninja
○ No more pre-canned drill downs.
○ Query with more fine-grained granularity.
⬢ High availability and Rolling deployment capabilities
○ Less costly to run.
○ Very active open source community.
© Cloudera, Inc. All rights reserved. 25
Apache Druid: Current() && Next()
© Cloudera, Inc. All rights reserved.
Apache Incubating Druid: Current
⬢ Over 200 new features, performance/stability
improvements, and bug fixes from 54 contributors.
⬢ New web console
⬢ Amazon Kinesis indexing service
⬢ Decommissioning mode for Historicals
⬢ Published segment cache in Broker
⬢ Bloom filter aggregator and expression
⬢ Updated Apache Parquet extension
⬢ Force push down option for nested GroupBy queries
⬢ Better segment handoff and drop rule handling
⬢ Automatically kill MapReduce jobs when Apache Hadoop
ingestion tasks are killed
⬢ DogStatsD tag support for statsd emitter
⬢ New API for retrieving all lookup specs
⬢ New compaction options
⬢ More efficient caching cost segment balancing strategy
⬢ Over 400 new features, performance/stability/documentation
improvements, and bug fixes from 81 contributors.
⬢ It is the first release of Druid in the Apache Incubator program.
⬢ native parallel batch indexing
⬢ automatic segment compaction
⬢ system schema tables
⬢ improved indexing task status, statistics, and error reporting
⬢ SQL-compatible null handling
⬢ result-level broker caching
⬢ ingestion from RDBMS
⬢ Bloom filter support
⬢ additional SQL result formats
⬢ additional aggregators (stringFirst/stringLast, ArrayOfDoublesSketch,
HllSketch)
⬢ Support for multiple grouping specs in groupBy query
⬢ mutual TLS support
⬢ HTTP-based worker management
⬢ Broker backpressure
Apache Incubating Druid 0.13.0 Apache Incubating Druid 0.14.0
© Cloudera, Inc. All rights reserved. 27
Apache Druid Next
➔[WIP] Java 11 compatibility #5589
➔[DONE] Query vectorization #7093
➔[PROPOSAL] Dynamic prioritization and laning #6993
➔[PROPOSAL] Add support for t-digest backed aggregators #7303
➔[PROPOSAL] Initial Join Support Druid #6416
© Cloudera, Inc. All rights reserved. 28
Thanks! Q & A
Slim Bouguerra: (bslim@apache.org)

Weitere ähnliche Inhalte

Was ist angesagt?

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 

Was ist angesagt? (20)

Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and Tricks
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache Druid
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Druid
DruidDruid
Druid
 

Ähnlich wie Apache Druid Design and Future prospect

What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 

Ähnlich wie Apache Druid Design and Future prospect (20)

How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in
How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in

 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
 
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
 
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Beginners Guide to High Availability for Postgres
Beginners Guide to High Availability for PostgresBeginners Guide to High Availability for Postgres
Beginners Guide to High Availability for Postgres
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #Winning
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNING
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 

Kürzlich hochgeladen

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Kürzlich hochgeladen (20)

A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Apache Druid Design and Future prospect

  • 1. © Cloudera, Inc. All rights reserved. Apache Druid: Present and Latest Slim Bouguerra: (bslim@apache.org) Apache Druid PMC Apache Hive Committer Apache Calcite Committer
  • 2. © Cloudera, Inc. All rights reserved. 2 Talk Agenda 1.Use cases overview 2.Design Overview 3.Project status and What’s coming Next
  • 3. © Cloudera, Inc. All rights reserved. 3 Why Druid
  • 4. © Cloudera, Inc. All rights reserved. 4 TIME SERIES DATABASEs Data model and use case • Event -> timestamp, attribute_1…attribute_n, || metric_1….metric_n 05/01/2019 @ 9:02pm (UTC), | zipcode, ….,state_id, departement_id | | total_sales, ……, total_added …| • Questions: • Business intelligence type: Top 10 revenue deals the last week broken down by states? • User behaviors analytics type: • Approximate Number of unique users visited site1.com and site2.com last hour? • Page Views, Impressions ? • IOT type: Count distinct of truck drivers passing certain speed limit over the past year? • Server metrics monitoring type: Top 100 latency of servers grouped by region over the last black Friday week?
  • 5. © Cloudera, Inc. All rights reserved. 5 TIME SERIES USE CASE SELECT `user`, sum(`c_added`) AS s, EXTRACT(year FROM `__time`) FROM druid_table WHERE EXTRACT(year FROM `__time`) BETWEEN 2010 AND 2011 GROUP BY `user`, EXTRACT(year FROM `__time`) ORDER BY s DESC LIMIT 10; Time-series data workload != normal database workload ● Queries always contain date columns as group by keys. ● Queries Filters by time. ● Queries touch few columns < 10. ● Queries have very selective Filters (< thousands of rows out of billions) ● Clear Distinction between attributes and metrics columns Expected Workload:
  • 6. © Cloudera, Inc. All rights reserved. 6 Data Model • Mostly an insert/append workload, very few updates. • application/server logs • field sensor logs • Schema less • Adding new server/application/OS type or upgrade • Adding new sensors
  • 7. © Cloudera, Inc. All rights reserved. 7 Requirements • Data Ingestion Rate • Ingest petabytes of data • Queryable in real-time. • Arbitrary Drill-Downs, Slice-n-Dice • Arbitrary Boolean filters • Availability • Downtime is evil • Runs on commodity computers • Scale up at peak time. • Scale down when the traffic slow. • Cloud Native
  • 8. © Cloudera, Inc. All rights reserved. Companies Using Druid http://druid.io/druid-powered
  • 9. © Cloudera, Inc. All rights reserved. Who Uses Druid? ⬢ 100s of nodes ⬢ 30+ trillion events -> 100+ PB of raw data ⬢ 300 billion events/day ⬢ More 1 million events/sec through Druid's real-time ingestion (at peak) ⬢ 20K – 100K ingested events per second per core. ⬢ Query Latency – 90% < Second; 95% < 5 Seconds; 99% < 10 Seconds ⬢ Real-time Ingestion hundreds of queries per second for thousands of users
  • 10. © Cloudera, Inc. All rights reserved. ⬢ Replaced 5,000 Hbase cluster serving six petabytes of metrics to power Flurry mobile analytics alone.[infoworld.com] ⬢ Tracking more than 2 billion mobiles devices [Flurry SDK @ MDC 2016]. ⬢ Real-time Ingestion 20 billion events per day [Flurry SDK @ MDC 2016]. ⬢ Sub second query latency. ⬢ Query the last 15 second. Ad-hoc analytics. High concurrency user-facing real-time slice-and-dice. Real-time loads of 10s of billions of events per day.
  • 11. © Cloudera, Inc. All rights reserved. Druid Internals: Segments (indexes)
  • 12. © Cloudera, Inc. All rights reserved. Data is partitioned by time ! timestamp publisher advertiser gender country ... click price 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53 ... ⬢ Time partitioned immutable partitions. ⬢ Typically 5 Million row per segment. ⬢ Druid segments are stored in a column orientation (only what is needed is actually loaded and scanned).
  • 13. © Cloudera, Inc. All rights reserved. COLUMN COMPRESSION - DICTIONARIES • Create ids • bieberfever.com -> 0, ultratrimfast.com -> 1 • Store • publisher -> [0, 0, 0, 1, 1, 1] • advertiser -> [0, 0, 0, 0, 0, 0] timestamp publisher advertiser gender country ... click price 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53 ...
  • 14. © Cloudera, Inc. All rights reserved. BITMAP INDEXES timestamp publisher advertiser gender country ... click price 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53 ... • bieberfever.com -> [0, 1, 2] -> [111000] • ultratrimfast.com -> [3, 4, 5] -> [000111] • Compress using Concice or Roaring (take advantage of dimension sparsity)
  • 15. © Cloudera, Inc. All rights reserved. Druid Internals: Query Processing Flow
  • 16. © Cloudera, Inc. All rights reserved. 17 Query Language Timeseries : -> No grouping OR grouping time column ● Select SUM(sales) from table; ● Select floor_hour(timecolumn), SUM(sales) from table; TopN :-> Grouping on time column + one column then approximate sort with limit ● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id from table group by state_id order by s_sales limit 10; GroupBy :-> Grouping on time column and or multiple column then limit. ● Select SUM(sales) as s_sales,floor_hour(timecolumn), state_id, departement_id from table group by state_id, departement_id order by s_sales limit 10;
  • 17. © Cloudera, Inc. All rights reserved. DATA PROCESSING ⬢ Prune Partitions based on Time range predicate. ○ Segments [2010-2011] ⬢ Split query by time intervals to target processing nodes: ○ Push down filter predicates on attributes. ○ Prune attribute/metric columns. ⬢ Each processing node locally: ○ Build iterator columns/rows matching filters ○ Compute partial aggregate (apply limits if possible) ⬢ Collect partial results: ○ Finalize aggregate results ○ Compute post aggregate (eg AVG = SUM/COUNT or SUM(sales)/SUM(expenses)) ○ Apply limits SELECT `user`, sum(`c_added`) AS s, EXTRACT(year FROM `__time`) FROM druid_table WHERE EXTRACT(year FROM `__time`) BETWEEN 2010 AND 2011 AND `user_country` = ‘USA’ GROUP BY `user`, EXTRACT(year FROM `__time`) ORDER BY s DESC LIMIT 10;
  • 18. © Cloudera, Inc. All rights reserved. Theta Sketches KMV: Open sourced by Yahoo! [datasketches.github.io] ⬢ Predictable approximation error can be trade-off by sketch size ○ k = 4096 corresponds to an RSE of +/- 3.2% with 95% confidence. ○ k = 16K corresponds to an RSE of +/- 1.6% with 95% confidence. ⬢ Limited memory footprint and independent from data size ○ k = 4096 -> 32768 bytes. ○ K = 16384 -> 131072 bytes. ⬢ Mergebale at query time. ○ “merge rate of about 14.5 million sketches per second per processor thread” [http://datasketches.github.io/docs/Theta/ThetaMergeSpeed.html]. ⬢ Intersection can be computed at query time. ⬢ Duplication insensitive. https://speakerdeck.com/druidio/approximate-algorithms-and-sketches-in-druid
  • 19. © Cloudera, Inc. All rights reserved. Voltron Like Architecture:
  • 20. © Cloudera, Inc. All rights reserved. Druid DIY
  • 21. © Cloudera, Inc. All rights reserved. Current Druid Architecture Hadoop Historical Node Historical Node Historical Node Batch Data Broker Node Queries ETL (Samza, Kafka, Storm, Spark etc) Streaming Data Realtime Node Realtime Node Handoff
  • 22. © Cloudera, Inc. All rights reserved. Coordinator Nodes ⬢ Assigns segments to historical nodes ⬢ Interval based cost function to distribute segments ⬢ Makes sure query load is uniform across historical nodes ⬢ Handles replication of data ⬢ Configurable rules to load/drop data
  • 23. © Cloudera, Inc. All rights reserved. Summary ⬢ Scalability ○ Horizontal Scalability. ○ Columnar storage, indexing and compression. ○ Multi-tennancy. ⬢ Real-time ○ Ingestion latency < seconds. ○ Query latency < seconds. ⬢ Arbitrary slice and dice big data like ninja ○ No more pre-canned drill downs. ○ Query with more fine-grained granularity. ⬢ High availability and Rolling deployment capabilities ○ Less costly to run. ○ Very active open source community.
  • 24. © Cloudera, Inc. All rights reserved. 25 Apache Druid: Current() && Next()
  • 25. © Cloudera, Inc. All rights reserved. Apache Incubating Druid: Current ⬢ Over 200 new features, performance/stability improvements, and bug fixes from 54 contributors. ⬢ New web console ⬢ Amazon Kinesis indexing service ⬢ Decommissioning mode for Historicals ⬢ Published segment cache in Broker ⬢ Bloom filter aggregator and expression ⬢ Updated Apache Parquet extension ⬢ Force push down option for nested GroupBy queries ⬢ Better segment handoff and drop rule handling ⬢ Automatically kill MapReduce jobs when Apache Hadoop ingestion tasks are killed ⬢ DogStatsD tag support for statsd emitter ⬢ New API for retrieving all lookup specs ⬢ New compaction options ⬢ More efficient caching cost segment balancing strategy ⬢ Over 400 new features, performance/stability/documentation improvements, and bug fixes from 81 contributors. ⬢ It is the first release of Druid in the Apache Incubator program. ⬢ native parallel batch indexing ⬢ automatic segment compaction ⬢ system schema tables ⬢ improved indexing task status, statistics, and error reporting ⬢ SQL-compatible null handling ⬢ result-level broker caching ⬢ ingestion from RDBMS ⬢ Bloom filter support ⬢ additional SQL result formats ⬢ additional aggregators (stringFirst/stringLast, ArrayOfDoublesSketch, HllSketch) ⬢ Support for multiple grouping specs in groupBy query ⬢ mutual TLS support ⬢ HTTP-based worker management ⬢ Broker backpressure Apache Incubating Druid 0.13.0 Apache Incubating Druid 0.14.0
  • 26. © Cloudera, Inc. All rights reserved. 27 Apache Druid Next ➔[WIP] Java 11 compatibility #5589 ➔[DONE] Query vectorization #7093 ➔[PROPOSAL] Dynamic prioritization and laning #6993 ➔[PROPOSAL] Add support for t-digest backed aggregators #7303 ➔[PROPOSAL] Initial Join Support Druid #6416
  • 27. © Cloudera, Inc. All rights reserved. 28 Thanks! Q & A Slim Bouguerra: (bslim@apache.org)