SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
© MapR Technologies, confidential
© 2014 MapR Technologies
© MapR Technologies, confidential
Top Ranked 500+ CustomersCloud Leaders
MapR Enterprise Hadoop
© MapR Technologies, confidential
Hadoop Distributions
Open Source Open Source
Distribution A Distribution C
MANAGEMENT
Open Source
MANAGEMENT
ARCHITECTURAL INNOVATIONS
© MapR Technologies, confidential
Enterprise Hadoop from MapR
Management
MapR Data Platform
APACHE HADOOP ECOSYSTEM
4
Storm
Shark
Accumulo
Sentry
Spark
Impala
HBase
MapReduce
Hue
Solr
YARN
Flume
Cascading
Pig
Sqoop
Hive/
Stinger/
Tez
Whirr
Oozie
Mahout
Zookeeper
Enterprise-grade Inter-operability Multi-tenancy Security Operational
Drill
Apache
Drill
© MapR Technologies, confidential
© MapR Technologies, confidential
Hadoop an augmentation for EDW—Why?
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
Consolidating multiple schemas is very hard.
Why? Since schema-on-write, retrieval is pre-determined.
© MapR Technologies, confidential
Silos make analysis very difficult
• How do I identify a
unique {customer,
trade} across data sets?
• How can I guarantee
the lack of anomalous
behavior  if  I  can’t  see  
all data?
© 2014 MapR Technologies 11
Hard  to  know  what’s  of  value  a priori
© 2014 MapR Technologies 12
Why Hadoop
© MapR Technologies, confidential
Rethink SQL for Big Data
Preserve
•ANSI SQL
• Familiar and ubiquitous
•Performance
• Interactive nature crucial for BI/Analytics
•One technology
• Painful to manage different technologies
•Enterprise ready
• System-of-record, HA, DR, Security, Multi-
tenancy,  …
Invent
•Flexible data-model
• Allow schemas to evolve rapidly
• Support semi-structured data types
•Agility
• Self-service possible when developer and DBA is
same
•Scalability
• In all dimensions: data, speed, schemas, processes,
management
© MapR Technologies, confidential
SQL is here to stay
© MapR Technologies, confidential
Hadoop is here to stay
© MapR Technologies, confidential
YOU  CAN’T  HANDLE  REAL  SQL
© MapR Technologies, confidential
SQL
select * from A
where exists (
select 1 from B where B.b < 100 );
• Did you know Apache HIVE cannot compute it?
– eg, Hive, Impala, Spark/Shark
© MapR Technologies, confidential
Self-described Data
select cf.month, cf.year
from hbase.table1;
• Did you know normal SQL cannot handle the above?
• Nor can HIVE and its variants like Impala, Shark?
• Because  there’s  no  meta-store definition available
© MapR Technologies, confidential
Self-Describing Data Ubiquitous
Centralized schema
- Static
- Managed by the DBAs
- In a centralized repository
Long, meticulous data preparation process (ETL,
create/alter schema, etc.)
– can take 6-18 months
Self-describing, or schema-less, data
- Dynamic/evolving
- Managed by the applications
- Embedded in the data
Less schema, more suitable for data that has
higher volume, variety and velocity
Apache Drill
© MapR Technologies, confidential
A Quick Tour through Apache Drill
© MapR Technologies, confidential
Data Source is in the Query
select timestamp, message
from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet`
where errorLevel > 2
This is a cluster in Apache Drill
- DFS
- HBase
- Hive meta-store
A work-space
- Typically a sub-
directory
- HIVE database
A table
- pathnames
- Hbase table
- Hive table
© MapR Technologies, confidential
Combine data sources on the fly
• JSON
• CSV
• ORC (ie, all Hive types)
• Parquet
• HBase tables
• …  can  combine  them
Select USERS.name, PROF.emails.work
from
dfs.logs.`/data/logs` LOGS,
dfs.users.`/profiles.json` USERS,
where
LOGS.uid = USERS.uid and
errorLevel > 5
order by count(*);
© MapR Technologies, confidential
Can be an entire directory tree
// On a file
select errorLevel, count(*)
from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet`
group by errorLevel;
// On the entire data collection: all years, all months
select errorLevel, count(*)
from dfs.logs.`/AppServerLogs`
group by errorLevel;
© MapR Technologies, confidential
Querying JSON
{ name: classic
fillings: [
{ name: sugar cal: 400 }]}
{ name: choco
fillings: [
{ name: sugar cal: 400 }
{ name: chocolate cal: 300 }]}
{ name: bostoncreme
fillings: [
{ name: sugar cal: 400 }
{ name: cream cal: 1000 }
{ name: jelly cal: 600 }]}
donuts.json
© MapR Technologies, confidential
Cursors inside Drill
DrillClient drill=newDrillClient().connect(…);
ResultReader r= drill.runSqlQuery("select*from`donuts.json`");
while(r.next()){
StringdonutName=r.reader(“name").readString();
ListReaderfillings=r.reader("fillings");
while(fillings.next()){
intcalories=fillings.reader("cal").readInteger();
if(calories>400)
print(donutName, calories,fillings.reader("name").readString());
}
}
{ name: classic
fillings: [
{ name: sugar cal: 400 }]}
{ name: choco
fillings: [
{ name: sugar cal: 400 }
{ name: chocolate cal: 300 }]}
{ name: bostoncreme
fillings: [
{ name: sugar cal: 400 }
{ name: cream cal: 1000 }
{ name: jelly cal: 600 }]}
© MapR Technologies, confidential
Direct queries on nested data
// Flattening maps in JSON, parquet and other
nested records
select name, flatten(fillings) as f
from dfs.users.`/donuts.json`
where f.cal < 300;
// lists the fillings < 300 calories
{ name: classic
fillings: [
{ name: sugar cal: 400 }]}
{ name: choco
fillings: [
{ name: sugar cal: 400 }
{ name: chocolate cal: 300 }]}
{ name: bostoncreme
fillings: [
{ name: sugar cal: 400 }
{ name: cream cal: 1000 }
{ name: jelly cal: 600 }]}
© MapR Technologies, confidential
Queries on embedded data
// embedded JSON value inside column donut-json inside column-family cf1
of an hbase table donuts
select d.name, count( d.fillings),
from (
select convert_from( cf1.donut-json, json) as d
from hbase.user.`donuts` );
© MapR Technologies, confidential
Queries inside JSON records
// Each JSON record itself can be a whole database
// example: get all donuts with at least 1 filling with > 300 calories
select d.name, count( d.fillings),
max(d.fillings.cal) within record as mincal
from ( select convert_from( cf1.donut-json, json) as d
from hbase.user.`donuts` )
where mincal > 300;
© MapR Technologies, confidential
a
• Schema can change over course of query
• Operators are able to reconfigure themselves on schema
change events
– Minimize flexibility overhead
– Support more advanced execution optimization based on actual data
characteristics
© MapR Technologies, confidential
De-centralized metadata
// count the number of tweets per customer, where the customers are in Hive, and
their tweets are in HBase. Note that the hbase data has no meta-data information
select c.customerName, hb.tweets.count
from hive.CustomersDB.`Customers` c
join hbase.user.`SocialData` hb
on c.customerId = convert_from( hb.rowkey, UTF-8);
© MapR Technologies, confidential
Underneath the Covers
© MapR Technologies, confidential
Basic Process
Zookeeper
DFS/HBase DFS/HBase DFS/HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf)
2. Drillbit generates execution plan based on query optimization & locality
3. Fragments are farmed to individual nodes
4. Result is returned to driving node
c c c
© MapR Technologies, confidential
Stages of Query Planning
Parser
Logical
Planner
Physical
Planner
Query
Foreman
Plan
fragments
sent to drill
bits
SQL
Query
Heuristic and
cost based
Cost based
© MapR Technologies, confidential
Query Execution
SQL Parser
Optimizer
Scheduler
Pig Parser
PhysicalPlan
Mongo
Cassandra
HiveQL Parser
RPC Endpoint
Distributed Cache
StorageEngineInterface
OperatorsOperators
Foreman
LogicalPlan
HDFS
HBase
JDBC Endpoint ODBC Endpoint
© MapR Technologies, confidential
A  Query  engine  that  is…
• Columnar/Vectorized
• Optimistic/pipelined
• Runtime compilation
• Late binding
• Extensible
© MapR Technologies, confidential
Columnar representation
A B C D E
A
B
C
D
On disk
E
© MapR Technologies, confidential
Columnar Encoding
• Values in a col. stored next to one-another
– Better compression
– Range-map: save min-max, can skip if not
present
• Only retrieve columns participating in
query
• Aggregations can be performed without
decoding
A
B
C
D
On disk
E
© MapR Technologies, confidential
Run-length-encoding & Sum
• Dataset encoded as <val> <run-length>:
– 2, 4    (4  2’s)
– 8,  10    (10  8’s)
• Goal: sum all the records
• Normally:
– Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8
– Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8
• Optimized work: 2 * 4 + 8 * 10
– Less memory, less operations
© MapR Technologies, confidential
Bit-packed Dictionary Sort
• Dataset encoded with a dictionary and bit-positions:
– Dictionary: [Rupert, Bill, Larry] {0, 1, 2}
– Values: [1,0,1,2,1,2,1,0]
• Normal work
– Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert
– Sort: ~24 comparisons of variable width strings
• Optimized work
– Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0}
– Sort bit-packed values
– Work: max 3 string comparisons, ~24 comparisons of fixed-width dictionary bits
© MapR Technologies, confidential
Drill 4-value semantics
• SQL’s  3-valued semantics
– True
– False
– Unknown
• Drill adds fourth
– Repeated
© MapR Technologies, confidential
Batches of Values
• Value vectors
– List of values, with same schema
– With the 4-value semantics for each value
• Shipped around in batches
– max 256k bytes in a batch
– max 64K rows in a batch
• RPC designed for multiple replies to a request
© MapR Technologies, confidential
Fixed Value Vectors
© MapR Technologies, confidential
Nullable Values
© MapR Technologies, confidential
Repeated Values
© MapR Technologies, confidential
Variable Width
© MapR Technologies, confidential
Repeated Map
© MapR Technologies, confidential
Vectorization
• Drill operates on more than one record at a time
– Word-sized manipulations
– SIMD instructions
• GCC, LLVM and JVM all do various optimizations automatically
– Manually code algorithms
• Logical Vectorization
– Bitmaps allow lightning fast null-checks
– Avoid branching to speed CPU pipeline
© MapR Technologies, confidential
Runtime Compilation is Faster
• JIT is smart, but
more gains with
runtime
compilation
• Janino: Java-based
Java compiler
From http://bit.ly/16Xk32x
© MapR Technologies, confidential
Drill compiler
Loaded class
Merge byte-
code of the two
classes
Janino compiles
runtime
byte-code
CodeModel
generates code
Precompiled
byte-code
templates
© MapR Technologies, confidential
Optimistic
0
20
40
60
80
100
120
140
160
Speed vs. check-pointing
No need to checkpoint
Checkpoint frequentlyApache Drill
© MapR Technologies, confidential
Optimistic Execution
• Recovery code trivial
– Running  instances  discard  the  failed  query’s  intermediate  state
• Pipelining possible
– Send results as soon as batch is large enough
– Requires barrier-less decomposition of query
© MapR Technologies, confidential
Pipelining
• Record batches are pipelined
between nodes
– ~256kB usually
• Unit of work for Drill
– Operators works on a batch
• Operator reconfiguration happens
at batch boundaries
DrillBit
DrillBit DrillBit
© MapR Technologies, confidential
Pipelining Record Batches
SQL Parser
Optimizer
Scheduler
Pig Parser
PhysicalPlan
Mongo
Cassandra
HiveQL Parser
RPC Endpoint
Distributed Cache
StorageEngineInterface
OperatorsOperators
Foreman
LogicalPlan
HDFS
HBase
JDBC Endpoint ODBC Endpoint
© MapR Technologies, confidential
DISK
Pipelining
• Random access: sort without copy or
restructuring
• Avoids serialization/deserialization
• Off-heap (no GC woes when lots of memory)
• Full specification + off-heap + batch
– Enables C/C++ operators (fast!)
• Read/write to disk
– when data larger than memory
Drill Bit
Memory
overflow
uses disk
© MapR Technologies, confidential
Cost-based Optimization
• Using Optiq, an extensible framework
• Pluggable rules, and cost model
• Rules for distributed plan generation
• Insert Exchange operator into physical plan
• Optiq enhanced to explore parallel query plans
• Pluggable cost model
– CPU, IO, memory, network cost (data locality)
– Storage engine features (HDFS vs HIVE vs HBase)
Query
Optimizer
Pluggable
rules
Pluggable
cost model
© MapR Technologies, confidential
Distributed Plan Cost
• Operators have distribution property
• Hash,  Broadcast,  Singleton,  …    
• Exchange operator to enforce distributions
• Hash: HashToRandomExchange
• Broadcast: BroadcastExchange
• Singleton: UnionExchange, SingleMergeExchange
• Enumerate all, use cost to pick best
• Merge Join vs Hash Join
• Partition-based join vs Broadcast-based join
• Streaming Aggregation vs Hash Aggregation
• Aggregation in one phase or two phases
• partial local aggregation followed by final aggregation
HashToRandomExchange
Sort
Streaming-Aggregation
Data Data Data
© MapR Technologies, confidential
Apache Drill
FLEXIBLE SCHEMA
MANAGEMENT
FRICTIONLESS ANALYTICS
ON NESTED DATA
PLUG AND PLAY
WITH EXISTING
Analyze data, self-
described or central
metadata
Reuse investments in
SQL/BI tools
and Apache Hive
Analyze semi structured
& nested data
…  and  with  an  architecture  built  ground  up  for  Low Latency queries at Scale
© MapR Technologies, confidential
Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Shark 0.9
Latency Low Medium Low Medium
Files Yes (all Hive file formats,
plus    JSON,  Text,  …)
Yes (all Hive file formats) Yes (Parquet, Sequence,
…)
Yes (all Hive file
formats)
HBase/M7 Yes Yes, perf issues Yes, with issues Yes, perf issues
Schema Hive or schema-less Hive Hive Hive
SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL
Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC
Hive compat High High Low High
Large datasets Yes Yes Limited Limited
Nested data Yes Limited No Limited
Concurrency High Limited Medium Limited
Interactive SQL-on-Hadoop options
© MapR Technologies, confidential
59
Management
MapR Data Platform
APACHE HADOOP & OSS ECOSYSTEM
Spark Hue
HP
Vertica
SharkImpalaDrill
Hive/
Stinger/
Tez
Storm SentrySolrMahoutCascadingZookeeperFlume
Oozie HBaseMapReduceYARNPigWhirrSqoop
SQL ACCESS
SQL-on-Hadoop
© MapR Technologies, confidential
Drill at MapR
• World-class SQL team, ~20 people
• 150+ years combined experience building commercial
databases
• Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica
• Fixed some of the toughest problems in Apache Hive
© MapR Technologies, confidential
Active Drill Community
• Large community, growing rapidly
– 35-40 contributors, 16 committers
– Microsoft, Linked-in, Oracle, Facebook, Visa, Lucidworks,
Concurrent, many universities
• In 2014
– over 20 meet-ups, many more coming soon
– 2 hackathons, with 40+ participants
• Encourage you to join, learn, contribute  and  have  fun  …
© MapR Technologies, confidential
Apache Drill Resources
• Getting started with Drill is easy
– just download tarball and start running SQL queries on local files
• Mailing lists
– drill-user@incubator.apache.org
– drill-dev@incubator.apache.org
• Docs: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki
• Fork us on GitHub: http://github.com/apache/incubator-drill/
• Create a JIRA: https://issues.apache.org/jira/browse/DRILL
© MapR Technologies, confidential
Thank you!
M. C. Srivas
srivas@mapr.com
Did  I  mention  we  are  hiring…  

Weitere ähnliche Inhalte

Was ist angesagt?

Transport Layer Security (TLS)
Transport Layer Security (TLS)Transport Layer Security (TLS)
Transport Layer Security (TLS)Arun Shukla
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiHBaseCon
 
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)Martin Loetzsch
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use casesChristian Joudrey
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server SecurityJason Strate
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Information and data security cryptographic hash functions
Information and data security cryptographic hash functionsInformation and data security cryptographic hash functions
Information and data security cryptographic hash functionsMazin Alwaaly
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Network security Lab manual
Network security Lab manual Network security Lab manual
Network security Lab manual Vivek Kumar Sinha
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
 
Database Systems Security
Database Systems SecurityDatabase Systems Security
Database Systems Securityamiable_indian
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Bobby Curtis
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 

Was ist angesagt? (20)

Transport Layer Security (TLS)
Transport Layer Security (TLS)Transport Layer Security (TLS)
Transport Layer Security (TLS)
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 
Apache hive
Apache hiveApache hive
Apache hive
 
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use cases
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server Security
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Information and data security cryptographic hash functions
Information and data security cryptographic hash functionsInformation and data security cryptographic hash functions
Information and data security cryptographic hash functions
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Network security Lab manual
Network security Lab manual Network security Lab manual
Network security Lab manual
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Database Systems Security
Database Systems SecurityDatabase Systems Security
Database Systems Security
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
 
Redis introduction
Redis introductionRedis introduction
Redis introduction
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 

Ähnlich wie Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C. Srivas, Co-founder and CTO at MapR

Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillMapR Technologies
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse OffloadJohn Berns
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 

Ähnlich wie Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C. Srivas, Co-founder and CTO at MapR (20)

Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse Offload
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 

Mehr von The Hive

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie MuirheadThe Hive
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroThe Hive
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 

Mehr von The Hive (20)

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve Omohundro
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at Twitter
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Kürzlich hochgeladen

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C. Srivas, Co-founder and CTO at MapR

  • 1. © MapR Technologies, confidential © 2014 MapR Technologies
  • 2. © MapR Technologies, confidential Top Ranked 500+ CustomersCloud Leaders MapR Enterprise Hadoop
  • 3. © MapR Technologies, confidential Hadoop Distributions Open Source Open Source Distribution A Distribution C MANAGEMENT Open Source MANAGEMENT ARCHITECTURAL INNOVATIONS
  • 4. © MapR Technologies, confidential Enterprise Hadoop from MapR Management MapR Data Platform APACHE HADOOP ECOSYSTEM 4 Storm Shark Accumulo Sentry Spark Impala HBase MapReduce Hue Solr YARN Flume Cascading Pig Sqoop Hive/ Stinger/ Tez Whirr Oozie Mahout Zookeeper Enterprise-grade Inter-operability Multi-tenancy Security Operational Drill Apache Drill
  • 5. © MapR Technologies, confidential
  • 6. © MapR Technologies, confidential Hadoop an augmentation for EDW—Why?
  • 7. © MapR Technologies, confidential
  • 8. © MapR Technologies, confidential
  • 9. © MapR Technologies, confidential Consolidating multiple schemas is very hard. Why? Since schema-on-write, retrieval is pre-determined.
  • 10. © MapR Technologies, confidential Silos make analysis very difficult • How do I identify a unique {customer, trade} across data sets? • How can I guarantee the lack of anomalous behavior  if  I  can’t  see   all data?
  • 11. © 2014 MapR Technologies 11 Hard  to  know  what’s  of  value  a priori
  • 12. © 2014 MapR Technologies 12 Why Hadoop
  • 13. © MapR Technologies, confidential Rethink SQL for Big Data Preserve •ANSI SQL • Familiar and ubiquitous •Performance • Interactive nature crucial for BI/Analytics •One technology • Painful to manage different technologies •Enterprise ready • System-of-record, HA, DR, Security, Multi- tenancy,  … Invent •Flexible data-model • Allow schemas to evolve rapidly • Support semi-structured data types •Agility • Self-service possible when developer and DBA is same •Scalability • In all dimensions: data, speed, schemas, processes, management
  • 14. © MapR Technologies, confidential SQL is here to stay
  • 15. © MapR Technologies, confidential Hadoop is here to stay
  • 16. © MapR Technologies, confidential YOU  CAN’T  HANDLE  REAL  SQL
  • 17. © MapR Technologies, confidential SQL select * from A where exists ( select 1 from B where B.b < 100 ); • Did you know Apache HIVE cannot compute it? – eg, Hive, Impala, Spark/Shark
  • 18. © MapR Technologies, confidential Self-described Data select cf.month, cf.year from hbase.table1; • Did you know normal SQL cannot handle the above? • Nor can HIVE and its variants like Impala, Shark? • Because  there’s  no  meta-store definition available
  • 19. © MapR Technologies, confidential Self-Describing Data Ubiquitous Centralized schema - Static - Managed by the DBAs - In a centralized repository Long, meticulous data preparation process (ETL, create/alter schema, etc.) – can take 6-18 months Self-describing, or schema-less, data - Dynamic/evolving - Managed by the applications - Embedded in the data Less schema, more suitable for data that has higher volume, variety and velocity Apache Drill
  • 20. © MapR Technologies, confidential A Quick Tour through Apache Drill
  • 21. © MapR Technologies, confidential Data Source is in the Query select timestamp, message from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` where errorLevel > 2 This is a cluster in Apache Drill - DFS - HBase - Hive meta-store A work-space - Typically a sub- directory - HIVE database A table - pathnames - Hbase table - Hive table
  • 22. © MapR Technologies, confidential Combine data sources on the fly • JSON • CSV • ORC (ie, all Hive types) • Parquet • HBase tables • …  can  combine  them Select USERS.name, PROF.emails.work from dfs.logs.`/data/logs` LOGS, dfs.users.`/profiles.json` USERS, where LOGS.uid = USERS.uid and errorLevel > 5 order by count(*);
  • 23. © MapR Technologies, confidential Can be an entire directory tree // On a file select errorLevel, count(*) from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` group by errorLevel; // On the entire data collection: all years, all months select errorLevel, count(*) from dfs.logs.`/AppServerLogs` group by errorLevel;
  • 24. © MapR Technologies, confidential Querying JSON { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]} donuts.json
  • 25. © MapR Technologies, confidential Cursors inside Drill DrillClient drill=newDrillClient().connect(…); ResultReader r= drill.runSqlQuery("select*from`donuts.json`"); while(r.next()){ StringdonutName=r.reader(“name").readString(); ListReaderfillings=r.reader("fillings"); while(fillings.next()){ intcalories=fillings.reader("cal").readInteger(); if(calories>400) print(donutName, calories,fillings.reader("name").readString()); } } { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]}
  • 26. © MapR Technologies, confidential Direct queries on nested data // Flattening maps in JSON, parquet and other nested records select name, flatten(fillings) as f from dfs.users.`/donuts.json` where f.cal < 300; // lists the fillings < 300 calories { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]}
  • 27. © MapR Technologies, confidential Queries on embedded data // embedded JSON value inside column donut-json inside column-family cf1 of an hbase table donuts select d.name, count( d.fillings), from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` );
  • 28. © MapR Technologies, confidential Queries inside JSON records // Each JSON record itself can be a whole database // example: get all donuts with at least 1 filling with > 300 calories select d.name, count( d.fillings), max(d.fillings.cal) within record as mincal from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` ) where mincal > 300;
  • 29. © MapR Technologies, confidential a • Schema can change over course of query • Operators are able to reconfigure themselves on schema change events – Minimize flexibility overhead – Support more advanced execution optimization based on actual data characteristics
  • 30. © MapR Technologies, confidential De-centralized metadata // count the number of tweets per customer, where the customers are in Hive, and their tweets are in HBase. Note that the hbase data has no meta-data information select c.customerName, hb.tweets.count from hive.CustomersDB.`Customers` c join hbase.user.`SocialData` hb on c.customerId = convert_from( hb.rowkey, UTF-8);
  • 31. © MapR Technologies, confidential Underneath the Covers
  • 32. © MapR Technologies, confidential Basic Process Zookeeper DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Result is returned to driving node c c c
  • 33. © MapR Technologies, confidential Stages of Query Planning Parser Logical Planner Physical Planner Query Foreman Plan fragments sent to drill bits SQL Query Heuristic and cost based Cost based
  • 34. © MapR Technologies, confidential Query Execution SQL Parser Optimizer Scheduler Pig Parser PhysicalPlan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache StorageEngineInterface OperatorsOperators Foreman LogicalPlan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 35. © MapR Technologies, confidential A  Query  engine  that  is… • Columnar/Vectorized • Optimistic/pipelined • Runtime compilation • Late binding • Extensible
  • 36. © MapR Technologies, confidential Columnar representation A B C D E A B C D On disk E
  • 37. © MapR Technologies, confidential Columnar Encoding • Values in a col. stored next to one-another – Better compression – Range-map: save min-max, can skip if not present • Only retrieve columns participating in query • Aggregations can be performed without decoding A B C D On disk E
  • 38. © MapR Technologies, confidential Run-length-encoding & Sum • Dataset encoded as <val> <run-length>: – 2, 4    (4  2’s) – 8,  10    (10  8’s) • Goal: sum all the records • Normally: – Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 – Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 • Optimized work: 2 * 4 + 8 * 10 – Less memory, less operations
  • 39. © MapR Technologies, confidential Bit-packed Dictionary Sort • Dataset encoded with a dictionary and bit-positions: – Dictionary: [Rupert, Bill, Larry] {0, 1, 2} – Values: [1,0,1,2,1,2,1,0] • Normal work – Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert – Sort: ~24 comparisons of variable width strings • Optimized work – Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0} – Sort bit-packed values – Work: max 3 string comparisons, ~24 comparisons of fixed-width dictionary bits
  • 40. © MapR Technologies, confidential Drill 4-value semantics • SQL’s  3-valued semantics – True – False – Unknown • Drill adds fourth – Repeated
  • 41. © MapR Technologies, confidential Batches of Values • Value vectors – List of values, with same schema – With the 4-value semantics for each value • Shipped around in batches – max 256k bytes in a batch – max 64K rows in a batch • RPC designed for multiple replies to a request
  • 42. © MapR Technologies, confidential Fixed Value Vectors
  • 43. © MapR Technologies, confidential Nullable Values
  • 44. © MapR Technologies, confidential Repeated Values
  • 45. © MapR Technologies, confidential Variable Width
  • 46. © MapR Technologies, confidential Repeated Map
  • 47. © MapR Technologies, confidential Vectorization • Drill operates on more than one record at a time – Word-sized manipulations – SIMD instructions • GCC, LLVM and JVM all do various optimizations automatically – Manually code algorithms • Logical Vectorization – Bitmaps allow lightning fast null-checks – Avoid branching to speed CPU pipeline
  • 48. © MapR Technologies, confidential Runtime Compilation is Faster • JIT is smart, but more gains with runtime compilation • Janino: Java-based Java compiler From http://bit.ly/16Xk32x
  • 49. © MapR Technologies, confidential Drill compiler Loaded class Merge byte- code of the two classes Janino compiles runtime byte-code CodeModel generates code Precompiled byte-code templates
  • 50. © MapR Technologies, confidential Optimistic 0 20 40 60 80 100 120 140 160 Speed vs. check-pointing No need to checkpoint Checkpoint frequentlyApache Drill
  • 51. © MapR Technologies, confidential Optimistic Execution • Recovery code trivial – Running  instances  discard  the  failed  query’s  intermediate  state • Pipelining possible – Send results as soon as batch is large enough – Requires barrier-less decomposition of query
  • 52. © MapR Technologies, confidential Pipelining • Record batches are pipelined between nodes – ~256kB usually • Unit of work for Drill – Operators works on a batch • Operator reconfiguration happens at batch boundaries DrillBit DrillBit DrillBit
  • 53. © MapR Technologies, confidential Pipelining Record Batches SQL Parser Optimizer Scheduler Pig Parser PhysicalPlan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache StorageEngineInterface OperatorsOperators Foreman LogicalPlan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 54. © MapR Technologies, confidential DISK Pipelining • Random access: sort without copy or restructuring • Avoids serialization/deserialization • Off-heap (no GC woes when lots of memory) • Full specification + off-heap + batch – Enables C/C++ operators (fast!) • Read/write to disk – when data larger than memory Drill Bit Memory overflow uses disk
  • 55. © MapR Technologies, confidential Cost-based Optimization • Using Optiq, an extensible framework • Pluggable rules, and cost model • Rules for distributed plan generation • Insert Exchange operator into physical plan • Optiq enhanced to explore parallel query plans • Pluggable cost model – CPU, IO, memory, network cost (data locality) – Storage engine features (HDFS vs HIVE vs HBase) Query Optimizer Pluggable rules Pluggable cost model
  • 56. © MapR Technologies, confidential Distributed Plan Cost • Operators have distribution property • Hash,  Broadcast,  Singleton,  …     • Exchange operator to enforce distributions • Hash: HashToRandomExchange • Broadcast: BroadcastExchange • Singleton: UnionExchange, SingleMergeExchange • Enumerate all, use cost to pick best • Merge Join vs Hash Join • Partition-based join vs Broadcast-based join • Streaming Aggregation vs Hash Aggregation • Aggregation in one phase or two phases • partial local aggregation followed by final aggregation HashToRandomExchange Sort Streaming-Aggregation Data Data Data
  • 57. © MapR Technologies, confidential Apache Drill FLEXIBLE SCHEMA MANAGEMENT FRICTIONLESS ANALYTICS ON NESTED DATA PLUG AND PLAY WITH EXISTING Analyze data, self- described or central metadata Reuse investments in SQL/BI tools and Apache Hive Analyze semi structured & nested data …  and  with  an  architecture  built  ground  up  for  Low Latency queries at Scale
  • 58. © MapR Technologies, confidential Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Shark 0.9 Latency Low Medium Low Medium Files Yes (all Hive file formats, plus    JSON,  Text,  …) Yes (all Hive file formats) Yes (Parquet, Sequence, …) Yes (all Hive file formats) HBase/M7 Yes Yes, perf issues Yes, with issues Yes, perf issues Schema Hive or schema-less Hive Hive Hive SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC Hive compat High High Low High Large datasets Yes Yes Limited Limited Nested data Yes Limited No Limited Concurrency High Limited Medium Limited Interactive SQL-on-Hadoop options
  • 59. © MapR Technologies, confidential 59 Management MapR Data Platform APACHE HADOOP & OSS ECOSYSTEM Spark Hue HP Vertica SharkImpalaDrill Hive/ Stinger/ Tez Storm SentrySolrMahoutCascadingZookeeperFlume Oozie HBaseMapReduceYARNPigWhirrSqoop SQL ACCESS SQL-on-Hadoop
  • 60. © MapR Technologies, confidential Drill at MapR • World-class SQL team, ~20 people • 150+ years combined experience building commercial databases • Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica • Fixed some of the toughest problems in Apache Hive
  • 61. © MapR Technologies, confidential Active Drill Community • Large community, growing rapidly – 35-40 contributors, 16 committers – Microsoft, Linked-in, Oracle, Facebook, Visa, Lucidworks, Concurrent, many universities • In 2014 – over 20 meet-ups, many more coming soon – 2 hackathons, with 40+ participants • Encourage you to join, learn, contribute  and  have  fun  …
  • 62. © MapR Technologies, confidential Apache Drill Resources • Getting started with Drill is easy – just download tarball and start running SQL queries on local files • Mailing lists – drill-user@incubator.apache.org – drill-dev@incubator.apache.org • Docs: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki • Fork us on GitHub: http://github.com/apache/incubator-drill/ • Create a JIRA: https://issues.apache.org/jira/browse/DRILL
  • 63. © MapR Technologies, confidential Thank you! M. C. Srivas srivas@mapr.com Did  I  mention  we  are  hiring…