SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Introduction to Apache Drill –
interactive query and analysis at scale

     Michael Hausenblas, MapR EMEA
        2013-02-22, HUG Munich
About Michael
• Background in large-scale data integration
• Chief Data Engineer EMEA, MapR
• Apache Drill contributor
Workloads
•   Batch processing (MapReduce)
•   Light-weight OLTP (HBase, Cassandra)
•   Stream processing (Storm, S4)
•   Search (Solr, Elasticsearch)
•   Interactive analysis
Interactive Query at scale



                       Impala




         low-latency
Use Case
• Jane, a marketing analyst
• Determine target segments
• Data from different sources
Today’s Solutions
• RDBMS-focused
  – ETL data from MongoDB/Hadoop
  – Query with SQL
• MapReduce-focused
  – ETL from RDBMS/MongoDB
  – Use Hive
Requirements
•   Support for different data sources
•   Support for different query interfaces
•   Low-latency/real-time
•   Ad-hoc queries
•   Scalable and fast
•   Reliable
Google’s Dremel




      http://research.google.com/pubs/pub36632.html
Apache Drill Overview
•   Inspired by Google Dremel
•   Standard SQL2003 support
•   …. other QL (DSL, etc.) possible
•   Plug-able data sources
•   Support for nested data (JSON, etc.)
•   Schema is optional
•   Community driven, open, 100’s involved
Apache Drill Overview
High-level Architecture
How does it work?
  • Drillbits per node, maximize data locality
  • Co-ordination, query planning, optimization,
    scheduling, execution are distributed

Source                     Logical                          Physical
Query       Parser          Plan                Optimizer    Plan       Execution




SQL 2003,             query: [ {                topology               Scanner API
                       @id: "log",
DrQL,                  op: "sequence",
                       do: [ {
MongoQL,                op: "scan",
                        source: “logs"}
DSL                     {
                         op: "filter",
                         condition: "x > 3"},
                      …
How does it work?
Key Features
•   Full SQL
•   Nested data
•   Optional schema
•   Extensibility points
Full SQL – ANSI SQL2003
• SQL-like is often not enough
• Integration with existing tools
  – Tableau, Excel, SAP Crystal Reports
  – Use standard ODBC/JDBC driver
Nested Data
• Nested data becoming prevalent
  – JSON/BSON, XML, ProtoBuf, Avro
  – Some data sources support it natively
    (MongoDB, etc.)
  – Innovation through Dremel
• Flattening nested data is error-prone
• Apache Drill supports nested data,
  extension to ANSI SQL2003
Optional Schema
• Many data sources don’t have rigid schemas
  – Schema changes rapidly
  – Different schema per record (e.g. HBase)
• Apache Drill supports queries against
  unknown schema
• user can define schema or via discovery
Extensibility Points
 •   Query language (parser) - UDFs
 •   Data sources/formats (scanner)
 •   Optimizer
 •   Custom operators (logical plan)


Source                Logical               Physical
Query     Parser       Plan     Optimizer    Plan      Execution
Demo
{
 "id": "0001",
 "type": "donut",
 "name": "Cake",
 "batters":
 {
                                                                    {
   "batter”:
                                                                         "sales" : 700.0,
   [
                                                                         "typeCount" : 1,
       { "id": "1001", "type": "Regular" },
                                                                         "quantity" : 700,
       { "id": "1002", "type": "Chocolate" },
                                                                         "ppu" : 1.0
…
                                                                    }
                                                                     {
                                                                         "sales" : 109.71,
data source: donuts.json                                                 "typeCount" : 2,
                                                                         "quantity" : 159,
 query:[ {                                                               "ppu" : 0.69
      op:"sequence",                                                }
      do:[                                                           {
           {                                                             "sales" : 184.25,
              op: "scan",                                                "typeCount" : 2,
              ref: "donuts",                                             "quantity" : 335,
              source: "local-logs",                                      "ppu" : 0.55
              selection: {data: "activity"}                         }
           },
           {                                                       result: out.json
              op: "filter",
              expr: "donuts.ppu < 2.00"
           },
…

logical plan: simple_plan.json
                                  https://cwiki.apache.org/confluence/display/DRILL/Demo+HowTo
Status
• Heavy development by multiple orgs
• Logical plan, reference interpreter available
• SQL interpreter, storage engine
  implementations (Accumolo, Cassandra,
  Hbase, etc.) are WIP
• Schedule:
  – Prototype Q1
  – Alpha Q2
Why do we do it?
Engage!
• Follow @ApacheDrill on Twitter

• Sign up at mailing lists (user|dev)
  http://incubator.apache.org/drill/mailing-lists.html


• Keep an eye on http://drill-user.org/

• Ping me: mhausenblas@maprtech.com

Más contenido relacionado

Was ist angesagt?

Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons Provectus
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REXdidmarin
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinDataWorks Summit
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 

Was ist angesagt? (20)

Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with Zeppelin
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 

Andere mochten auch

Merlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science EnvironmentMerlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science EnvironmentCharles Givre
 
Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?Charles Givre
 
Km 65 tahun 2002
Km 65 tahun 2002Km 65 tahun 2002
Km 65 tahun 2002Bp Nafri
 
RAPIM 2011
RAPIM 2011RAPIM 2011
RAPIM 2011Bp Nafri
 
Apache Storm - Minando redes sociales y medios en tiempo real
Apache Storm - Minando redes sociales y medios en tiempo realApache Storm - Minando redes sociales y medios en tiempo real
Apache Storm - Minando redes sociales y medios en tiempo realAndrés Mauricio Palacios
 
What Does Your Smart Car Know About You? Strata London 2016
What Does Your Smart Car Know About You?  Strata London 2016What Does Your Smart Car Know About You?  Strata London 2016
What Does Your Smart Car Know About You? Strata London 2016Charles Givre
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill WorkshopCharles Givre
 
RAKORNIS 2010
RAKORNIS 2010RAKORNIS 2010
RAKORNIS 2010Bp Nafri
 
Pristine Advisers Presentation
Pristine Advisers PresentationPristine Advisers Presentation
Pristine Advisers PresentationPattyBaronowski
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache DrillCharles Givre
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPALKELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPALBeny Jackson Maliota
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1Charles Givre
 

Andere mochten auch (20)

Merlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science EnvironmentMerlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science Environment
 
Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?
 
Km 65 tahun 2002
Km 65 tahun 2002Km 65 tahun 2002
Km 65 tahun 2002
 
Narkoba
NarkobaNarkoba
Narkoba
 
RAPIM 2011
RAPIM 2011RAPIM 2011
RAPIM 2011
 
PSCO
PSCOPSCO
PSCO
 
Apache Storm - Minando redes sociales y medios en tiempo real
Apache Storm - Minando redes sociales y medios en tiempo realApache Storm - Minando redes sociales y medios en tiempo real
Apache Storm - Minando redes sociales y medios en tiempo real
 
What Does Your Smart Car Know About You? Strata London 2016
What Does Your Smart Car Know About You?  Strata London 2016What Does Your Smart Car Know About You?  Strata London 2016
What Does Your Smart Car Know About You? Strata London 2016
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
RAKORNIS 2010
RAKORNIS 2010RAKORNIS 2010
RAKORNIS 2010
 
Pristine Advisers Presentation
Pristine Advisers PresentationPristine Advisers Presentation
Pristine Advisers Presentation
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
ISPS Code
ISPS CodeISPS Code
ISPS Code
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPALKELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL
KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 

Ähnlich wie Introduction to Apache Drill - interactive query and analysis at scale

Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesMapR Technologies
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG Perú
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBMongoDB
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in BerlinToshiaki Katayama
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 
Azure DocumentDB: Advanced Features for Large Scale-Apps
Azure DocumentDB: Advanced Features for Large Scale-AppsAzure DocumentDB: Advanced Features for Large Scale-Apps
Azure DocumentDB: Advanced Features for Large Scale-AppsAndrew Liu
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardGeorg Sorst
 
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent PlatformUnlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent Platformconfluent
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 

Ähnlich wie Introduction to Apache Drill - interactive query and analysis at scale (20)

Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Mongo scaling
Mongo scalingMongo scaling
Mongo scaling
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No Problem
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 
Azure DocumentDB: Advanced Features for Large Scale-Apps
Azure DocumentDB: Advanced Features for Large Scale-AppsAzure DocumentDB: Advanced Features for Large Scale-Apps
Azure DocumentDB: Advanced Features for Large Scale-Apps
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent PlatformUnlock the Power of Streaming Data with Kinetica and Confluent Platform
Unlock the Power of Streaming Data with Kinetica and Confluent Platform
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Último

Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 

Último (20)

Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 

Introduction to Apache Drill - interactive query and analysis at scale

  • 1. Introduction to Apache Drill – interactive query and analysis at scale Michael Hausenblas, MapR EMEA 2013-02-22, HUG Munich
  • 2. About Michael • Background in large-scale data integration • Chief Data Engineer EMEA, MapR • Apache Drill contributor
  • 3. Workloads • Batch processing (MapReduce) • Light-weight OLTP (HBase, Cassandra) • Stream processing (Storm, S4) • Search (Solr, Elasticsearch) • Interactive analysis
  • 4. Interactive Query at scale Impala low-latency
  • 5. Use Case • Jane, a marketing analyst • Determine target segments • Data from different sources
  • 6. Today’s Solutions • RDBMS-focused – ETL data from MongoDB/Hadoop – Query with SQL • MapReduce-focused – ETL from RDBMS/MongoDB – Use Hive
  • 7. Requirements • Support for different data sources • Support for different query interfaces • Low-latency/real-time • Ad-hoc queries • Scalable and fast • Reliable
  • 8. Google’s Dremel http://research.google.com/pubs/pub36632.html
  • 9. Apache Drill Overview • Inspired by Google Dremel • Standard SQL2003 support • …. other QL (DSL, etc.) possible • Plug-able data sources • Support for nested data (JSON, etc.) • Schema is optional • Community driven, open, 100’s involved
  • 12. How does it work? • Drillbits per node, maximize data locality • Co-ordination, query planning, optimization, scheduling, execution are distributed Source Logical Physical Query Parser Plan Optimizer Plan Execution SQL 2003, query: [ { topology Scanner API @id: "log", DrQL, op: "sequence", do: [ { MongoQL, op: "scan", source: “logs"} DSL { op: "filter", condition: "x > 3"}, …
  • 13. How does it work?
  • 14. Key Features • Full SQL • Nested data • Optional schema • Extensibility points
  • 15. Full SQL – ANSI SQL2003 • SQL-like is often not enough • Integration with existing tools – Tableau, Excel, SAP Crystal Reports – Use standard ODBC/JDBC driver
  • 16. Nested Data • Nested data becoming prevalent – JSON/BSON, XML, ProtoBuf, Avro – Some data sources support it natively (MongoDB, etc.) – Innovation through Dremel • Flattening nested data is error-prone • Apache Drill supports nested data, extension to ANSI SQL2003
  • 17. Optional Schema • Many data sources don’t have rigid schemas – Schema changes rapidly – Different schema per record (e.g. HBase) • Apache Drill supports queries against unknown schema • user can define schema or via discovery
  • 18. Extensibility Points • Query language (parser) - UDFs • Data sources/formats (scanner) • Optimizer • Custom operators (logical plan) Source Logical Physical Query Parser Plan Optimizer Plan Execution
  • 19. Demo { "id": "0001", "type": "donut", "name": "Cake", "batters": { { "batter”: "sales" : 700.0, [ "typeCount" : 1, { "id": "1001", "type": "Regular" }, "quantity" : 700, { "id": "1002", "type": "Chocolate" }, "ppu" : 1.0 … } { "sales" : 109.71, data source: donuts.json "typeCount" : 2, "quantity" : 159, query:[ { "ppu" : 0.69 op:"sequence", } do:[ { { "sales" : 184.25, op: "scan", "typeCount" : 2, ref: "donuts", "quantity" : 335, source: "local-logs", "ppu" : 0.55 selection: {data: "activity"} } }, { result: out.json op: "filter", expr: "donuts.ppu < 2.00" }, … logical plan: simple_plan.json https://cwiki.apache.org/confluence/display/DRILL/Demo+HowTo
  • 20. Status • Heavy development by multiple orgs • Logical plan, reference interpreter available • SQL interpreter, storage engine implementations (Accumolo, Cassandra, Hbase, etc.) are WIP • Schedule: – Prototype Q1 – Alpha Q2
  • 21. Why do we do it?
  • 22. Engage! • Follow @ApacheDrill on Twitter • Sign up at mailing lists (user|dev) http://incubator.apache.org/drill/mailing-lists.html • Keep an eye on http://drill-user.org/ • Ping me: mhausenblas@maprtech.com

Hinweis der Redaktion

  1. Hive: compile to MR, Aster: external tables in MPP, Oracle/MySQL: export MR results to RDBMSDrill, Impala, CitusDB: real-time
  2. Suppose a marketing analyst trying to experiment with ways to do targeting of user segments for next campaign. Needs access to web logs stored in Hadoop, and also needs to access user profiles stored in MongoDB as well as access to transaction data stored in a conventional database.
  3. Re ad-hoc:You might not know ahead of time what queries you will want to make. You may need to react to changing circumstances.
  4. Two innovations: handle nested-data column style (column-striped representation) and query push-down
  5. Source query is parsed and transformed to produce the logical planTypically, the logical plan lives in memory in the form of Java objects, but it also has a textual form. The logical query is then transformed and optimized into the physical plan.The physical plan represents the actual structure of computation as it is done by the system. One of the most important things the optimizer does is the introduction of parallel computation (other: columnar data to improve processing speed)