Apache drill

•Download as PPTX, PDF•

3 likes•713 views

Jakub Pieprzyk

Short introduction to Apache Drill

Data & Analytics

Apache Drill
Introduction
Jakub Pieprzyk
1

Long time ago… Hadoop
Big data batch processing.
Huge volumes of data.
Responsiveness was not a concern.
2

Batch processing by design
HDFS
Map-Reduce applications
M R M RM R M RM R
3

At some point of time...
Hadoop cluster has a lot of data useful for ad-hoc analysis.
Hard to perform data exploration in batch mode (“data lake”, “schema on read”); lot of
iterative tasks.
Servers have more RAM, SSD drives...
4

Wide range of products emerged...
Tez
Spark
Facebook: Presto
(Google Dremel) → Apache Drill
Cloudera: Impala
6

Apache Drill
Scalable query engine
Querying different data sources - both schema and schema-free
JDBC / Mongo / File System / Hive / HBase
Text files / Parquet / Sequence files / MapR-DB
8

Integration with existing BI tools
Apache Drill come with JDBC/ODBC driver.
Supporting many data sources and formats + responsiveness make it good
candidate to Business Intelligence tools backend.
Drill
9

Interfaces
Command line (~beeline)
JDBC/ODBC
Web Console
C/Java API
REST API
10

Architecture highlights
Cluster of nodes on which drillbit service is installed.
Drillbit responsible for receiving queries, generating plan and executing.
Zookeeper is used to maintain cluster membership.
Clients can connect to any node (or via Zookeeper) and submit queries.
11

Architecture highlights (cont.)
Schema can be discovered in the runtime - no need to know the schema before
executing the query.
Storage plugins - can access custom databases.
Distributed cache is used to share metadata, plans and statistics (Infinispan in-
memory key-value data store)
12

Performance
Columnar processing
Data locality (when executed on Hadoop cluster)
Vectorization (processing vector of values from single column rather than
whole rows)
13

Simple query
reading data from classpath
file is JSON
FROM cp.`employee.json`
14

Hive → Drill Migration ?
Apache Drill is a good candidate to Fast SQL solution over Hadoop.
When deployed alongside Hive it gives ad-hoc capabilities
Can use Hive Metastore
Can use Hive UDFs
15

Hive → Drill
Data types ~ match those in Hive (although DECIMAL still in alpha)
Analytical functions ~ like in Hive (but still not 100% implemented, like moving average AVG(x) OVER
(ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
Support for Hive UDFs (but JAR needs to be uploaded into every host)
16

Web Console - Hive Plugin Configuration
22

Embarrassingly simple performance test...
...just to put some numbers in the presentation ;)
Hadoop cluster:
3 nodes: data node / node manager / apache drill
2 nodes: 16GB RAM, 2 CPU x 2 cores
1 node: 10GB RAM, 2 CPU x 2 cores
+1: name node / resource manager / hive server
23

Hive MR vs. Drill
Wikipedia pageview counts:
en A1_road_in_London 1 35107
en A1_steak_sauce 1 13905
en A1_volleyball_league_(Greece) 1 17636
en A1chieve 1 6558
en A2%20road 1 7402
project article
page
views
bytes
24

Hive schema
create table wiki_pagecounts(
prj string,
page string,
pv int,
bytes bigint
) partitioned by (ts string)
row format delimited fields terminated by ' ';
25

Timing: Hive (MR) vs. Drill
Q1 - simple count per partition
(group by)
Q2 - top page within hour/lang.
(row_number)
Q3 - mobile share
(group by, case stmt)
Q4 - top pages with pct pv
(join, group by, row_number)
26

Integration with YARN?
Currently (Drill 1.5) not supported
There is a ticket for this DRILL-142
Would make deployment much easier and more efficient resource
management.
27

Kerberos?
Currently (Drill 1.5) doesn’t support Kerberos when accessing HDFS
Ticket opened: DRILL-3584
Without it it may be challenging to fit Drill into existing secured Hadoop
environment.
28

What's hot

Apache Drill - Why, What, Howmcsrivas

Apache DrillTed Dunning

Apache DrillBig Data User Group Karlsruhe/Stuttgart

Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov

Drilling into Data with Apache DrillMapR Technologies

Rethinking SQL for Big Data with Apache DrillMapR Technologies

An introduction to apache drill presentationMapR Technologies

Working with Delimited Data in Apache Drill 1.6.0Vince Gonzalez

Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre

Understanding the Value and Architecture of Apache DrillDataWorks Summit

Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies

Hadoop overviewSiva Pandeti

Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit

Putting Apache Drill into ProductionMapR Technologies

Hadoop and Spark for the SAS DeveloperDataWorks Summit

NoSQL HBase schema design and SQL with Apache Drill Carol McDonald

The Evolution of the Hadoop EcosystemCloudera, Inc.

Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah

Apache Hadoop 1.1Sperasoft

What's hot (20)

Apache Drill - Why, What, How

Apache Drill

Apache Drill @ PJUG, Jan 15, 2013

Drilling into Data with Apache Drill

Rethinking SQL for Big Data with Apache Drill

An introduction to apache drill presentation

Working with Delimited Data in Apache Drill 1.6.0

Introduction to Apache HBase, MapR Tables and Security

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of

Understanding the Value and Architecture of Apache Drill

Drill into Drill – How Providing Flexibility and Performance is Possible

Hadoop overview

Spark SQL versus Apache Drill: Different Tools with Different Rules

Putting Apache Drill into Production

Hadoop and Spark for the SAS Developer

NoSQL HBase schema design and SQL with Apache Drill

The Evolution of the Hadoop Ecosystem

Hadoop in Practice (SDN Conference, Dec 2014)

Apache Hadoop 1.1

Similar to Apache drill

Hadoop introductionChirag Ahuja

Hadoop and Big Data: RevealedSachin Holla

Hadoop_arunam_pptjerrin joseph

BIG DATA: Apache HadoopOleksiy Krotov

Hadoop and mysql by Chris SchneiderDmitry Makarchuk

Big data or big dealeduarderwee

Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen

Hadoop MapReduce FundamentalsLynn Langit

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Hadoop PrimerSteve Staso

Big dataAbilash Mavila

Windows Azure HDInsight ServiceNeil Mackenzie

Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar

Overview of big data & hadoop v1Thanh Nguyen

Hadoop demo pptPhil Young

20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang

Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.

What is hadoopAsis Mohanty

Similar to Apache drill (20)

Hadoop introduction

Hadoop and Big Data: Revealed

Hadoop_arunam_ppt

BIG DATA: Apache Hadoop

Hadoop and mysql by Chris Schneider

Big data or big deal

Overview of big data & hadoop version 1 - Tony Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1

Hadoop MapReduce Fundamentals

Big Data Analytics with Hadoop, MongoDB and SQL Server

Hadoop Primer

Big data

Windows Azure HDInsight Service

Big Data Hoopla Simplified - TDWI Memphis 2014

Overview of big data & hadoop v1

Hadoop demo ppt

20131205 hadoop-hdfs-map reduce-introduction

Cloudera Impala - San Diego Big Data Meetup August 13th 2014

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...

What is hadoop

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

Discover Why Less is More in B2B Researchmichael115558

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

April 2024 - Crypto Market Report's Analysismanisha194592

Recently uploaded (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Predicting Loan Approval: A Data Science Project

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand

Probability Grade 10 Third Quarter Lessons

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Sampling (random) method and Non random.ppt

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

Discover Why Less is More in B2B Research

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

Mature dropshipping via API with DroFx.pptx

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

April 2024 - Crypto Market Report's Analysis

Apache drill

1. Apache Drill Introduction Jakub Pieprzyk 1

2. Long time ago… Hadoop Big data batch processing. Huge volumes of data. Responsiveness was not a concern. 2

3. Batch processing by design HDFS Map-Reduce applications M R M RM R M RM R 3

4. At some point of time... Hadoop cluster has a lot of data useful for ad-hoc analysis. Hard to perform data exploration in batch mode (“data lake”, “schema on read”); lot of iterative tasks. Servers have more RAM, SSD drives... 4

5. Big Data (&Fast SQL) Analytics 5

6. Wide range of products emerged... Tez Spark Facebook: Presto (Google Dremel) → Apache Drill Cloudera: Impala 6

7. Apache Drill 7

8. Apache Drill Scalable query engine Querying different data sources - both schema and schema-free JDBC / Mongo / File System / Hive / HBase Text files / Parquet / Sequence files / MapR-DB 8

9. Integration with existing BI tools Apache Drill come with JDBC/ODBC driver. Supporting many data sources and formats + responsiveness make it good candidate to Business Intelligence tools backend. Drill 9

10. Interfaces Command line (~beeline) JDBC/ODBC Web Console C/Java API REST API 10

11. Architecture highlights Cluster of nodes on which drillbit service is installed. Drillbit responsible for receiving queries, generating plan and executing. Zookeeper is used to maintain cluster membership. Clients can connect to any node (or via Zookeeper) and submit queries. 11

12. Architecture highlights (cont.) Schema can be discovered in the runtime - no need to know the schema before executing the query. Storage plugins - can access custom databases. Distributed cache is used to share metadata, plans and statistics (Infinispan in- memory key-value data store) 12

13. Performance Columnar processing Data locality (when executed on Hadoop cluster) Vectorization (processing vector of values from single column rather than whole rows) 13

14. Simple query reading data from classpath file is JSON FROM cp.`employee.json` 14

15. Hive → Drill Migration ? Apache Drill is a good candidate to Fast SQL solution over Hadoop. When deployed alongside Hive it gives ad-hoc capabilities Can use Hive Metastore Can use Hive UDFs 15

16. Hive → Drill Data types ~ match those in Hive (although DECIMAL still in alpha) Analytical functions ~ like in Hive (but still not 100% implemented, like moving average AVG(x) OVER (ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) Support for Hive UDFs (but JAR needs to be uploaded into every host) 16

17. Web Console http://<node>:8047 17

18. Web Console - Running Queries 18

19. Web Console - Running Queries 19

20. Web Console - Query Execution 20

21. Web Console - Storage Plugins 21

22. Web Console - Hive Plugin Configuration 22

23. Embarrassingly simple performance test... ...just to put some numbers in the presentation ;) Hadoop cluster: 3 nodes: data node / node manager / apache drill 2 nodes: 16GB RAM, 2 CPU x 2 cores 1 node: 10GB RAM, 2 CPU x 2 cores +1: name node / resource manager / hive server 23

24. Hive MR vs. Drill Wikipedia pageview counts: en A1_road_in_London 1 35107 en A1_steak_sauce 1 13905 en A1_volleyball_league_(Greece) 1 17636 en A1chieve 1 6558 en A2%20road 1 7402 project article page views bytes 24

25. Hive schema create table wiki_pagecounts( prj string, page string, pv int, bytes bigint ) partitioned by (ts string) row format delimited fields terminated by ' '; 25

26. Timing: Hive (MR) vs. Drill Q1 - simple count per partition (group by) Q2 - top page within hour/lang. (row_number) Q3 - mobile share (group by, case stmt) Q4 - top pages with pct pv (join, group by, row_number) 26

27. Integration with YARN? Currently (Drill 1.5) not supported There is a ticket for this DRILL-142 Would make deployment much easier and more efficient resource management. 27

28. Kerberos? Currently (Drill 1.5) doesn’t support Kerberos when accessing HDFS Ticket opened: DRILL-3584 Without it it may be challenging to fit Drill into existing secured Hadoop environment. 28

29. Apache Drill Github commits 29

30. Thanks! 30

Apache drill

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache drill

Similar to Apache drill (20)

Recently uploaded

Recently uploaded (20)

Apache drill