SlideShare ist ein Scribd-Unternehmen logo
1 von 24
1
> Technical Overview
Jacques Nadeau, jacques@apache.org
May 22, 2013
2
Basic Process
Zookeeper
DFS/HBase DFS/HBase DFS/HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Query
1. Query comes to any Drillbit
2. Drillbit generates execution plan based on affinity
3. Fragments are farmed to individual nodes
4. Data is returned to driving node
3
Core Modules within a Drillbit
SQL Parser
Optimizer
PhysicalPlan
DFS Engine
HBase Engine
RPC Endpoint
Distributed Cache
StorageEngineInterface
LogicalPlan
Execution
4
Query States
SQL
 What we want to do (analyst friendly)
Logical Plan:
 What we want to do (language agnostic, computer friendly)
Physical Plan
 How we want to do it (the best way we can tell)
Execution Plan (fragments)
 Where we want to do it
5
SQL
SELECT
t.cf1.name as name,
SUM(t.cf1.sales) as total_sales
FROM m7://cluster1/sales t
GROUP BY name
ORDER BY by total_sales desc
LIMIT 10;
6
Logical Plan: API/Format using JSON
 Designed to be as easy as possible for language implementers to utilize
– Sugared syntax such as sequence meta-operator
 Don’t constrain ourselves to SQL specific paradigm – support complex data type
operators such as collapse and expand as well
 Allow late typing
sequence: [
{ op: scan, storageengine: m7, selection: {table: sales}}
{ op: project, projections: [
{ref: name, expr: cf1.name},
{ref: sales, expr: cf1.sales}]}
{ op: segment, ref: by_name, exprs: [name]}
{ op: collapsingaggregate, target: by_name, carryovers: [name],
aggregations: [{ref: total_sales, expr: sum(name)}]}
{ op: order, ordering: [{order: desc, expr: total_sales}]}
{ op: store, storageengine: screen}
]
7
Physical Plan
 Insert points of parallelization where optimizer thinks they are necessary
– If we thought that the cardinality of name would be high, we might use an alternative of
sort > range-merge-exchange > streaming aggregate > sort > range-merge-exchange
instead of the simpler hash-random-exchange > sorting-hash-aggregate.
 Pick the right version of each operator
– For example, here we’ve picked the sorting hash aggregate. Since a hash aggregate is
already a blocking operator, doing the sort simultaneously allows us to avoid
materializing an intermediate state
 Apply projection and other push-down rules into capable operators
– Note that the projection is gone, applied directly to the m7scan operator.
{ @id: 1, pop: m7scan, cluster: def, table: sales, cols: [cf1.name, cf2.name]}
{ @id: 2, op: hash-random-exchange, input: 1, expr: 1}
{ @id: 3, op: sorting-hash-aggregate, input: 2,
grouping: 1, aggr:[sum(2)], carry: [1], sort: ~agrr[0]
}
{ @id: 4, op: screen, input: 4}
8
Execution Plan
 Break plan into major fragments
 Determine quantity of parallelization for each task based on
estimated costs as well as maximum parallelization for each
fragment (file size for now)
 Collect up endpoint affinity for each particular HasAffinity operator
 Assign particular nodes based on affinity, load and topology
 Generate minor versions of each fragment for individual execution
FragmentId:
 Major = portion of dataflow
 Minor = a particular version of that execution (1 or more)
9
Execution Plan, cont’d
Each execution plan has:
 One root fragment (runs on driving node)
 Leaf fragments (first tasks to run)
 Intermediate fragments (won’t start until
they receive data from their children)
 In the case where the query output is
routed to storage, the root operator will
often receive metadata to present rather
than data
Root
Intermediate
Leaf
Intermediate
Leaf
10
Example Fragments
Leaf Fragment 1
{
pop : "hash-partition-sender",
@id : 1,
child : {
pop : "mock-scan",
@id : 2,
url : "http://apache.org",
entries : [ {
id : 1,
records : 4000}]
},
destinations : [ "Cglsb2NhbGhvc3QY0gk=" ]
Leaf Fragment 2
{
pop : "hash-partition-sender",
@id : 1,
child : {
pop : "mock-scan",
@id : 2,
url : "http://apache.org",
entries : [ {
id : 1,
records : 4000
}, {
id : 2,
records : 4000
} ]
},
destinations : [ "Cglsb2NhbGhvc3QY0gk=" ]
}
Root Fragment
{
pop : "screen",
@id : 1,
child : {
pop : "random-receiver",
@id : 2,
providingEndpoints : [ "Cglsb2NhbGhvc3QY0gk=" ]
}
}
Intermediate Fragment
{
pop : "single-sender",
@id : 1,
child : {
pop : "mock-store",
@id : 2,
child : {
pop : "filter",
@id : 3,
child : {
pop : "random-receiver",
@id : 4,
providingEndpoints : [ "Cglsb2NhbGhvc3QYqRI=",
"Cglsb2NhbGhvc3QY0gk=" ]
},
expr : " ('b') > (5) "
}
},
destinations : [ "Cglsb2NhbGhvc3QYqRI=" ]
}
11
Execution Flow
Drill Client
UserServer
Query
Foreman
BitCom
Parser Optimizer
Execution
Planner
12
SQL Parser
 Leverage Optiq
 Add support for “any” type
 Add support for nested and repeated[] references
 Add transformation rules to convert from SQL AST to Logical plan
syntax
13
Optimizer
 Convert Logical to Physical
 Very much TBD
 Likely leverage Optiq
 Hardest problem in system, especially given lack of statistics
 Probably not parallel
14
Execution Planner
 Each scan operator provides a maximum width of parallelization
based on the number of read entries (similar to splits)
 Decision of parallelization width is based on simple disk costs size
 Affinity orders the location of fragment assignment
 Storage, Scan and Exchange operators are informed of the actual
endpoint assignments to then re-decide their entries (splits)
15
Grittier
16
Execution Engine
 Single JVM per Drillbit
 Small heap space for object management
 Small set of network event threads to manage socket operations
 Callbacks for each message sent
 Messages contain header and collection of native byte buffers
 Designed to minimize copies and ser/de costs
 Query setup and fragment runners are managed via processing
queues & thread pools
17
Data
 Records are broken into batches
 Batches contain a schema and a collection of fields
 Each field has a particular type (e.g. smallint)
 Fields (a.k.a. columns) are stored in ValueVectors
 ValueVectors are façades to byte buffers.
 The in-memory structure of each ValueVector is well defined and
language agnostic
 ValueVectors defined based on the width and nature of the
underlying data
– RepeatMap Fixed1 Fixed2 Fixed4 Fixed8 Fixed12 Fixed16 Bit FixedLen
VarLen1 VarLen2 VarLen4
 There are three sub value vector types
– Optional (nullable), required or repeated
18
Execution Paradigm
 We will have a large amount of operators
 Each operator works on a batch of records at a time
 A loose goal is batches are roughly a single core’s L2 cache in size
 Each batch of records carries a schema
 An operator is responsible for reconfiguring itself if a new schema arrives (or rejecting
the record batch if the schema is disallowed)
 Most operators are the combination of a set of static operations along with the
evaluation of query specific expressions
 Runtime compiled operators are the combination of a pre-compiled template and a
runtime compiled set of expressions
 Exchange operators are converted into Senders and Receiver when execution plan is
materialized
 Each operator must support consumption of a SelectionVector, a partial
materialization of a filter
19
Storage Engine
 Input and output is done through storage engines
– (and the screen specialized storage operator)
 A storage engine is responsible for providing metadata and statistics about
the data
 A storage engine exposes a set of optimizer (plan rewrite) rules to support
things such as predicate pushdown
 A storage engine provides one or more storage engine specific scan
operators that can support affinity exposure and task splitting
– These are generated based on a StorageEngine specific configuration
 The primary interfaces are RecordReader and RecordWriter.
 RecordReaders are responsible for
– Converting stored data into Drill canonical ValueVector format a batch at a time
– Providing schema for each record batch
 Our initial storage engines will be for DFS and HBase
20
Messages
 Foreman drives query
 Foreman saves intermediate fragments to distributed cache
 Foreman sends leaf fragments directly to execution nodes
 Executing fragments push record batches to their fragment’s destination
nodes
 When destination node receives first fragment for a new query, it retrieves
its appropriate fragment from distributed cache, setups up required
framework, then waits until the start requirement is needed:
– A fragment is evaluated for the number of different sending streams that are
required before the query can actually be scheduled based on each exchanges
“supportsOutOfOrder” capability.
– When the IncomingBatchHandler recognizes that its start criteria has been
reached, it begins
– In the meantime, destination mode will buffer (potentially to disk)
 Fragment status messages are pushed back to foreman directly from
individual nodes
 A single failure status causes the foreman to cancel all other parts of query
21
Scheduling
 Plan is to leverage the concepts inside Sparrow
 Reality is that receiver-side buffering and pre-assigned execution
locations means that this is very much up in the air right now
22
Operation/Configuration
 Drillbit is a single JVM
 Extension is done by building to an api and generating a jar file
that includes a drill-module.conf file with information about where
that module needs to be inserted
 All configuration is done via a JSON like configuration metaphort
that supports complex types
 Node discovery/service registry is done through Zookeeper
 Metrics are collected utilizing the Yammer metrics module
23
User Interfaces
 Drill provides DrillClient
– Encapsulates endpoint discovery
– Supports logical and physical plan submission, query cancellation, query
status
– Supports streaming return results
 Drill will provide a JDBC driver which converts JDBC into DrillClient
communication.
– Currently SQL parsing is done client side
• Artifact of the current state of Optiq
• Need to slim up the JDBC driver and push stuff remotely
 In time, will add REST proxy for DrillClient
24
Technologies
 Jackson for JSON SerDe for metadata
 Typesafe HOCON for configuration and module management
 Netty4 as core RPC engine, protobuf for communication
 Vanilla Java, Larray and Netty ByteBuf for off-heap large data structure help
 Hazelcast for distributed cache
 Curator on top of Zookeeper for service registry
 Optiq for SQL parsing and cost optimization
 Parquet (probably) as ‘native’ format
 Janino for expression compilation
 ASM for ByteCode manipulation
 Yammer Metrics for metrics
 Guava extensively
 Carrot HPC for primitive collections

Weitere ähnliche Inhalte

Was ist angesagt?

Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationKanwar Batra
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layerKiyoto Tamura
 
Major features postgres 11
Major features postgres 11Major features postgres 11
Major features postgres 11EDB
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Treasure Data, Inc.
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryEDB
 
Apache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra InternalsApache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra Internalsaaronmorton
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010Thejas Nair
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
 
Configuration management II - Terraform
Configuration management II - TerraformConfiguration management II - Terraform
Configuration management II - TerraformXavier Serrat Bordas
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3PoguttuezhiniVP
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Use perl creating web services with xml rpc
Use perl creating web services with xml rpcUse perl creating web services with xml rpc
Use perl creating web services with xml rpcJohnny Pork
 

Was ist angesagt? (20)

Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Major features postgres 11
Major features postgres 11Major features postgres 11
Major features postgres 11
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared Memory
 
Apache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra InternalsApache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra Internals
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Configuration management II - Terraform
Configuration management II - TerraformConfiguration management II - Terraform
Configuration management II - Terraform
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Use perl creating web services with xml rpc
Use perl creating web services with xml rpcUse perl creating web services with xml rpc
Use perl creating web services with xml rpc
 

Ähnlich wie Technical Overview of Apache Drill by Jacques Nadeau

Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangaloresrikanthhadoop
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introducejhao niu
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsInvenire Aude
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overviewjessesanford
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialAntonios Giannopoulos
 
FBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversFBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversAngelo Failla
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 

Ähnlich wie Technical Overview of Apache Drill by Jacques Nadeau (20)

Gg steps
Gg stepsGg steps
Gg steps
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangalore
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data Processors
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
FBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversFBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp servers
 
Hadoop 3
Hadoop 3Hadoop 3
Hadoop 3
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Technical Overview of Apache Drill by Jacques Nadeau

  • 1. 1 > Technical Overview Jacques Nadeau, jacques@apache.org May 22, 2013
  • 2. 2 Basic Process Zookeeper DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit 2. Drillbit generates execution plan based on affinity 3. Fragments are farmed to individual nodes 4. Data is returned to driving node
  • 3. 3 Core Modules within a Drillbit SQL Parser Optimizer PhysicalPlan DFS Engine HBase Engine RPC Endpoint Distributed Cache StorageEngineInterface LogicalPlan Execution
  • 4. 4 Query States SQL  What we want to do (analyst friendly) Logical Plan:  What we want to do (language agnostic, computer friendly) Physical Plan  How we want to do it (the best way we can tell) Execution Plan (fragments)  Where we want to do it
  • 5. 5 SQL SELECT t.cf1.name as name, SUM(t.cf1.sales) as total_sales FROM m7://cluster1/sales t GROUP BY name ORDER BY by total_sales desc LIMIT 10;
  • 6. 6 Logical Plan: API/Format using JSON  Designed to be as easy as possible for language implementers to utilize – Sugared syntax such as sequence meta-operator  Don’t constrain ourselves to SQL specific paradigm – support complex data type operators such as collapse and expand as well  Allow late typing sequence: [ { op: scan, storageengine: m7, selection: {table: sales}} { op: project, projections: [ {ref: name, expr: cf1.name}, {ref: sales, expr: cf1.sales}]} { op: segment, ref: by_name, exprs: [name]} { op: collapsingaggregate, target: by_name, carryovers: [name], aggregations: [{ref: total_sales, expr: sum(name)}]} { op: order, ordering: [{order: desc, expr: total_sales}]} { op: store, storageengine: screen} ]
  • 7. 7 Physical Plan  Insert points of parallelization where optimizer thinks they are necessary – If we thought that the cardinality of name would be high, we might use an alternative of sort > range-merge-exchange > streaming aggregate > sort > range-merge-exchange instead of the simpler hash-random-exchange > sorting-hash-aggregate.  Pick the right version of each operator – For example, here we’ve picked the sorting hash aggregate. Since a hash aggregate is already a blocking operator, doing the sort simultaneously allows us to avoid materializing an intermediate state  Apply projection and other push-down rules into capable operators – Note that the projection is gone, applied directly to the m7scan operator. { @id: 1, pop: m7scan, cluster: def, table: sales, cols: [cf1.name, cf2.name]} { @id: 2, op: hash-random-exchange, input: 1, expr: 1} { @id: 3, op: sorting-hash-aggregate, input: 2, grouping: 1, aggr:[sum(2)], carry: [1], sort: ~agrr[0] } { @id: 4, op: screen, input: 4}
  • 8. 8 Execution Plan  Break plan into major fragments  Determine quantity of parallelization for each task based on estimated costs as well as maximum parallelization for each fragment (file size for now)  Collect up endpoint affinity for each particular HasAffinity operator  Assign particular nodes based on affinity, load and topology  Generate minor versions of each fragment for individual execution FragmentId:  Major = portion of dataflow  Minor = a particular version of that execution (1 or more)
  • 9. 9 Execution Plan, cont’d Each execution plan has:  One root fragment (runs on driving node)  Leaf fragments (first tasks to run)  Intermediate fragments (won’t start until they receive data from their children)  In the case where the query output is routed to storage, the root operator will often receive metadata to present rather than data Root Intermediate Leaf Intermediate Leaf
  • 10. 10 Example Fragments Leaf Fragment 1 { pop : "hash-partition-sender", @id : 1, child : { pop : "mock-scan", @id : 2, url : "http://apache.org", entries : [ { id : 1, records : 4000}] }, destinations : [ "Cglsb2NhbGhvc3QY0gk=" ] Leaf Fragment 2 { pop : "hash-partition-sender", @id : 1, child : { pop : "mock-scan", @id : 2, url : "http://apache.org", entries : [ { id : 1, records : 4000 }, { id : 2, records : 4000 } ] }, destinations : [ "Cglsb2NhbGhvc3QY0gk=" ] } Root Fragment { pop : "screen", @id : 1, child : { pop : "random-receiver", @id : 2, providingEndpoints : [ "Cglsb2NhbGhvc3QY0gk=" ] } } Intermediate Fragment { pop : "single-sender", @id : 1, child : { pop : "mock-store", @id : 2, child : { pop : "filter", @id : 3, child : { pop : "random-receiver", @id : 4, providingEndpoints : [ "Cglsb2NhbGhvc3QYqRI=", "Cglsb2NhbGhvc3QY0gk=" ] }, expr : " ('b') > (5) " } }, destinations : [ "Cglsb2NhbGhvc3QYqRI=" ] }
  • 12. 12 SQL Parser  Leverage Optiq  Add support for “any” type  Add support for nested and repeated[] references  Add transformation rules to convert from SQL AST to Logical plan syntax
  • 13. 13 Optimizer  Convert Logical to Physical  Very much TBD  Likely leverage Optiq  Hardest problem in system, especially given lack of statistics  Probably not parallel
  • 14. 14 Execution Planner  Each scan operator provides a maximum width of parallelization based on the number of read entries (similar to splits)  Decision of parallelization width is based on simple disk costs size  Affinity orders the location of fragment assignment  Storage, Scan and Exchange operators are informed of the actual endpoint assignments to then re-decide their entries (splits)
  • 16. 16 Execution Engine  Single JVM per Drillbit  Small heap space for object management  Small set of network event threads to manage socket operations  Callbacks for each message sent  Messages contain header and collection of native byte buffers  Designed to minimize copies and ser/de costs  Query setup and fragment runners are managed via processing queues & thread pools
  • 17. 17 Data  Records are broken into batches  Batches contain a schema and a collection of fields  Each field has a particular type (e.g. smallint)  Fields (a.k.a. columns) are stored in ValueVectors  ValueVectors are façades to byte buffers.  The in-memory structure of each ValueVector is well defined and language agnostic  ValueVectors defined based on the width and nature of the underlying data – RepeatMap Fixed1 Fixed2 Fixed4 Fixed8 Fixed12 Fixed16 Bit FixedLen VarLen1 VarLen2 VarLen4  There are three sub value vector types – Optional (nullable), required or repeated
  • 18. 18 Execution Paradigm  We will have a large amount of operators  Each operator works on a batch of records at a time  A loose goal is batches are roughly a single core’s L2 cache in size  Each batch of records carries a schema  An operator is responsible for reconfiguring itself if a new schema arrives (or rejecting the record batch if the schema is disallowed)  Most operators are the combination of a set of static operations along with the evaluation of query specific expressions  Runtime compiled operators are the combination of a pre-compiled template and a runtime compiled set of expressions  Exchange operators are converted into Senders and Receiver when execution plan is materialized  Each operator must support consumption of a SelectionVector, a partial materialization of a filter
  • 19. 19 Storage Engine  Input and output is done through storage engines – (and the screen specialized storage operator)  A storage engine is responsible for providing metadata and statistics about the data  A storage engine exposes a set of optimizer (plan rewrite) rules to support things such as predicate pushdown  A storage engine provides one or more storage engine specific scan operators that can support affinity exposure and task splitting – These are generated based on a StorageEngine specific configuration  The primary interfaces are RecordReader and RecordWriter.  RecordReaders are responsible for – Converting stored data into Drill canonical ValueVector format a batch at a time – Providing schema for each record batch  Our initial storage engines will be for DFS and HBase
  • 20. 20 Messages  Foreman drives query  Foreman saves intermediate fragments to distributed cache  Foreman sends leaf fragments directly to execution nodes  Executing fragments push record batches to their fragment’s destination nodes  When destination node receives first fragment for a new query, it retrieves its appropriate fragment from distributed cache, setups up required framework, then waits until the start requirement is needed: – A fragment is evaluated for the number of different sending streams that are required before the query can actually be scheduled based on each exchanges “supportsOutOfOrder” capability. – When the IncomingBatchHandler recognizes that its start criteria has been reached, it begins – In the meantime, destination mode will buffer (potentially to disk)  Fragment status messages are pushed back to foreman directly from individual nodes  A single failure status causes the foreman to cancel all other parts of query
  • 21. 21 Scheduling  Plan is to leverage the concepts inside Sparrow  Reality is that receiver-side buffering and pre-assigned execution locations means that this is very much up in the air right now
  • 22. 22 Operation/Configuration  Drillbit is a single JVM  Extension is done by building to an api and generating a jar file that includes a drill-module.conf file with information about where that module needs to be inserted  All configuration is done via a JSON like configuration metaphort that supports complex types  Node discovery/service registry is done through Zookeeper  Metrics are collected utilizing the Yammer metrics module
  • 23. 23 User Interfaces  Drill provides DrillClient – Encapsulates endpoint discovery – Supports logical and physical plan submission, query cancellation, query status – Supports streaming return results  Drill will provide a JDBC driver which converts JDBC into DrillClient communication. – Currently SQL parsing is done client side • Artifact of the current state of Optiq • Need to slim up the JDBC driver and push stuff remotely  In time, will add REST proxy for DrillClient
  • 24. 24 Technologies  Jackson for JSON SerDe for metadata  Typesafe HOCON for configuration and module management  Netty4 as core RPC engine, protobuf for communication  Vanilla Java, Larray and Netty ByteBuf for off-heap large data structure help  Hazelcast for distributed cache  Curator on top of Zookeeper for service registry  Optiq for SQL parsing and cost optimization  Parquet (probably) as ‘native’ format  Janino for expression compilation  ASM for ByteCode manipulation  Yammer Metrics for metrics  Guava extensively  Carrot HPC for primitive collections