SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
NoSQL no more
SQL on Druid with Apache Calcite
Gian Merlino
gian@imply.io
Who am I?
Gian Merlino
Committer & PMC member on
Committer on Apache Calcite
Cofounder at
2
Agenda
● What is Druid?
● What is NoSQL?
● What is Apache Calcite?
● From NoSQL to SQL
● Do try this at home!
3
4
open source, high-performance,
column-oriented, distributed data store
What is Druid?
● “high performance”: low query latency, high ingest rates
● “column-oriented”: best possible scan rates
● “distributed”: deployed in clusters, typically 10s–100s of nodes
● “data store”: the cluster stores a copy of your data
5
Why does Druid exist?
6
The Problem
● OLAP slice-and-dice for big data
● Interactive exploration
● Look under the hood of reports and dashboards
● And we want our data fresh, too
7
The Problem
8
Challenges
● Scale: big data is tough to process quickly
● Complexity: too much fine grain to precompute
● High dimensionality: 10s or 100s of dimensions
● Concurrency: many users and tenants
● Freshness: load from streams
9
Motivation
● Sub-second responses allow dialogue with data
● Rapid iteration on questions
● Remove barriers to understanding
10
Powered by Druid
11
Source: http://druid.io/druid-powered.html
Powered by Druid
“The performance is great ... some of the tables that we have
internally in Druid have billions and billions of events in them,
and we’re scanning them in under a second.”
12
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo:
Druid Key Features
● Low latency ingestion from Kafka
● Bulk load from Hadoop
● Can pre-aggregate data during ingestion
● “Schema light”
● Ad-hoc queries
● Exact and approximate algorithms
● Can keep a lot of history (years are ok)
13
Druid
Druid makes interactive data exploration fast and
flexible, and powers analytic applications.
14
What is NoSQL?
15
What is NoSQL?
“There's no strong definition of the concept out there, no
trademarks, no standard group, not even a manifesto.”
16
Source: https://martinfowler.com/bliki/NosqlDefinition.html
What is NoSQL?
Early examples:
Voldemort, Cassandra, Dynomite,
HBase, Hypertable, CouchDB, MongoDB
17
Source: https://martinfowler.com/bliki/NosqlDefinition.html
What is NoSQL?
What are they?
● Document stores
● Key/value stores
● Graph databases
● Timeseries databases
18
What is NoSQL?
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
19
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
20
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
21
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
22
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
23
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
24
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
25
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Categorizing Druid
Is avoiding the SQL language and
relational model really a good thing?
26
The Relational Model
● The relational model is based around relations
● SQL calls them tables and those tables have columns
● SQL queries describe relational operations
○ Scan
○ Project
○ Filter
○ Aggregate
○ Union
○ Join
27
The Relational Model
28
timestamp product_id user_id revenue
2030-01-01 212 1 180.00
2030-01-01 998 2 24.95
Table: “sales”
Table: “products”
id name
212 Office chair
998 Coffee mug, 2-pack
Table: “users”
id country city user_gender user_age
1 US New York F 34
2 FR Paris M 28
Druid and the Relational Model
29
timestamp product country city gender age revenue
2030-01-01 Office chair US New York F 34 180.00
2030-01-01 Coffee mug,
2-pack
FR Paris M 28 24.95
Datasource: “sales”
Druid and the Relational Model
30
Datasource: “sales”
Lookup: “products”
id name
212 Office chair
998 Coffee mug, 2-pack
timestamp product_id country city gender age revenue
2030-01-01 212 US New York F 34 180.00
2030-01-01 998 FR Paris M 28 24.95
Druid and the Relational Model
● Datasources are like tables
○ Druid “lookups” apply to a common join use case
○ Big, flat tables are common in SQL databases anyway, when
analytical performance is critical
● Benefits of offering SQL
○ Developers and analysts know it
○ Integration with 3rd party apps
31
32
Enter…
Apache Calcite
● SQL parser
● Query optimizer
● Query interpreter
● JDBC server (Avatica)
33
Apache Calcite
● Widely used
○ Druid
○ Hive
○ Storm
○ Samza
○ Drill
○ Phoenix
○ Flink
34
Apache Calcite
35
SQL
SqlNode
Parse tree
RelNode
Relational
operator tree
RelNode
Optimized in
target calling
convention
SQL query
SELECT dim1, COUNT(*)
FROM druid.foo
WHERE dim1 IN ('abc', 'def', 'ghi')
GROUP BY dim1
36
SQL parse tree
SELECT dim1, COUNT(*)
FROM druid.foo
WHERE dim1 IN ('abc', 'def', 'ghi')
GROUP BY dim1
37
Identifier“Select”
keyword
Operator
Identifier
Literal“Where”
keyword
“Group by”
keyword
Relational operators
SELECT dim1, COUNT(*)
FROM druid.foo
WHERE dim1 IN ('abc', 'def', 'ghi')
GROUP BY dim1
38
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
LogicalProject(dim1=[$2])
LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))])
LogicalTableScan(table=[[druid, foo]])
Query planner
● Planner rules
○ Match certain relational operator patterns
○ Can transform one set of operators into another
○ New set must have same behavior, but may have a different cost
● HepPlanner (heuristic)
○ Applies all matching rules
● VolcanoPlanner (cost based)
○ Applies rules while searching for low cost plans
39
Using Calcite
Calcite can be embedded or it can
be used directly by end-users.
Druid SQL embeds Calcite.
40
From NoSQL to SQL
41
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
42
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
43
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
44
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
45
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
46
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
47
Native vs SQL
{
"queryType": "topN",
"dataSource": “wikipedia”,
"dimension": "countryName",
"metric": {
"type": "numeric",
"metric": "added"
},
"intervals": "2018-03-01/2018-03-06",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "channel",
"value": "#en.wikipedia",
"extractionFn": null
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "countryName",
"value": "",
"extractionFn": null
}
}
]
},
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"name": "added",
"fieldName": "added"
}
],
"threshold": 5
}
SELECT
countryName,
SUM(added)
FROM wikipedia
WHERE
channel = '#en.wikipedia'
AND countryName IS NOT NULL
AND __time BETWEEN '2018-03-01' AND '2018-03-06'
GROUP BY countryName
ORDER BY SUM(added) DESC
LIMIT 5
48
SQL to Native translation
49
PartialDruidQuery
Scan
Filter
Project
Aggregate
Filter
Project
Sort
Druid’s query
execution
pipeline
SQL to Native translation
SELECT dim1, COUNT(*)
FROM druid.foo
WHERE dim1 IN ('abc', 'def', 'ghi')
GROUP BY dim1
50
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
LogicalProject(dim1=[$2])
LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))])
LogicalTableScan(table=[[druid, foo]])
SQL to Native translation
51
PartialDruidQuery
Scan(table=[[druid, foo]])
Filter(condition=[OR(=($2,
'abc'), =($2, 'def'), =($2, 'ghi'))])
Project(dim1=[$2])
Aggregate(group=[{0}],EXPR$1=[COUNT()])
Filter
Project
Sort
LogicalTableScan(table=[[druid, foo]])
LogicalFilter(condition=[OR(=($2,
'abc'), =($2, 'def'), =($2, 'ghi'))])
LogicalProject(dim1=[$2])
LogicalAggregate(group=[{0}],EXPR$1=[COUNT()])
SQL to Native translation
52
PartialDruidQuery
Filter
Project
Sort
{
"queryType" : "groupBy",
"dataSource" : “foo”,
"filter" : {
"type" : "in",
"dimension" : "dim1",
"values" : [ "abc", "def", "ghi" ]
},
"dimensions" : [ “dim1” ],
"aggregations" : [ {
"type" : "count",
"name" : "a0"
} ],
}
Scan(table=[[druid, foo]])
Filter(condition=[OR(=($2,
'abc'), =($2, 'def'), =($2, 'ghi'))])
Project(dim1=[$2])
Aggregate(group=[{0}],EXPR$1=[COUNT()])
toDruidQuery()
SQL to Native translation
● Calcite implements:
○ SQL parser
○ Basic set of rules for reordering and combining operators
○ Rule-based optimizer frameworks
● Druid implements:
○ Construct Calcite catalog from Druid datasources
○ Cost functions guide reordering and combining operators
○ Rules to push operators one-by-one into a PartialDruidQuery
○ Convert PartialDruidQuery to DruidQuery
53
SQL to Native translation
Minimal performance overhead.
Can even be faster due to transferring less data to the client!
54
Challenges: Writing good queries
SQL makes it surprisingly easy to write inefficient queries.
Databases strive to optimize as best as they can.
But the “EXPLAIN” tool is still essential.
55
Challenges: Schema-lightness
● Druid is schema-light (columns and their types are flexible)
● SQL model has tables and columns with specific types
● Druid native queries use type coercions at query time (e.g. user
specifies: treat column XYZ as “string”)
● Druid SQL populates catalog with latest metadata
56
Challenges: Lookups
Think back to lookups.
57
Lookup: “products”
id name
212 Office chair
998 Coffee mug, 2-pack
timestamp product_id country city gender age revenue
2030-01-01 212 US New York F 34 180.00
2030-01-01 998 FR Paris M 28 24.95
Challenges: Lookups
SQL experts may think of this as a JOIN.
SELECT
products.name,
SUM(sales.revenue)
FROM sales JOIN products ON sales.product_id = products.id
GROUP BY products.name
58
Challenges: Lookups
Druid SQL does not support JOINs, but provides a “LOOKUP”
function instead.
SELECT
LOOKUP(id, ‘products’) AS product_name
SUM(sales.revenue)
FROM sales
GROUP BY product_name
59
Future work
● Druid features not supported in Druid SQL
○ Multi-value dimensions
○ Spatial filters
○ Theta sketches (approx. set intersection, differences)
● JOIN related
○ Allow users to write lookups as a SQL JOIN
○ Allow JOINs between two Druid datasources
● Others: SQL window functions, SQL UNION, GROUPING SETS
60
Try this at home
61
Download
Druid community site: http://druid.io/
Imply distribution: https://imply.io/get-started
62
Contribute
63
http://druid.io/community
https://github.com/druid-io/druid
Contribute
64
Druid has recently begun migration to the Apache Incubator.
Apache Druid is coming soon!

Weitere ähnliche Inhalte

Was ist angesagt?

Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsDataWorks Summit
 
Solving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraSolving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraAaron Ploetz
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Data Con LA
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...Cask Data
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipelineDataWorks Summit
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexDataWorks Summit
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...Databricks
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...Dataconomy Media
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent
 

Was ist angesagt? (20)

Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Solving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraSolving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache Cassandra
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 

Ähnlich wie NoSQL no more: SQL on Druid with Apache Calcite

Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13gianmerlino
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systemsramazan fırın
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLNordic APIs
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Dave Stokes
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesAll Things Open
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's studentsMohamed Nadjib MAMI
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdfcNguyn506241
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleEDB
 
MySQL Without the SQL -- Oh My!
MySQL Without the SQL -- Oh My!MySQL Without the SQL -- Oh My!
MySQL Without the SQL -- Oh My!Data Con LA
 
Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my! Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my! Dave Stokes
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document StoreRui Quelhas
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...Dave Stokes
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDave Stokes
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDave Stokes
 

Ähnlich wie NoSQL no more: SQL on Druid with Apache Calcite (20)

Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systems
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real Differences
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
 
MySQL Without the SQL -- Oh My!
MySQL Without the SQL -- Oh My!MySQL Without the SQL -- Oh My!
MySQL Without the SQL -- Oh My!
 
Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my! Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my!
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document Store
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQL
 

Kürzlich hochgeladen

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Kürzlich hochgeladen (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

NoSQL no more: SQL on Druid with Apache Calcite

  • 1. NoSQL no more SQL on Druid with Apache Calcite Gian Merlino gian@imply.io
  • 2. Who am I? Gian Merlino Committer & PMC member on Committer on Apache Calcite Cofounder at 2
  • 3. Agenda ● What is Druid? ● What is NoSQL? ● What is Apache Calcite? ● From NoSQL to SQL ● Do try this at home! 3
  • 5. What is Druid? ● “high performance”: low query latency, high ingest rates ● “column-oriented”: best possible scan rates ● “distributed”: deployed in clusters, typically 10s–100s of nodes ● “data store”: the cluster stores a copy of your data 5
  • 6. Why does Druid exist? 6
  • 7. The Problem ● OLAP slice-and-dice for big data ● Interactive exploration ● Look under the hood of reports and dashboards ● And we want our data fresh, too 7
  • 9. Challenges ● Scale: big data is tough to process quickly ● Complexity: too much fine grain to precompute ● High dimensionality: 10s or 100s of dimensions ● Concurrency: many users and tenants ● Freshness: load from streams 9
  • 10. Motivation ● Sub-second responses allow dialogue with data ● Rapid iteration on questions ● Remove barriers to understanding 10
  • 11. Powered by Druid 11 Source: http://druid.io/druid-powered.html
  • 12. Powered by Druid “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 12 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  • 13. Druid Key Features ● Low latency ingestion from Kafka ● Bulk load from Hadoop ● Can pre-aggregate data during ingestion ● “Schema light” ● Ad-hoc queries ● Exact and approximate algorithms ● Can keep a lot of history (years are ok) 13
  • 14. Druid Druid makes interactive data exploration fast and flexible, and powers analytic applications. 14
  • 16. What is NoSQL? “There's no strong definition of the concept out there, no trademarks, no standard group, not even a manifesto.” 16 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 17. What is NoSQL? Early examples: Voldemort, Cassandra, Dynomite, HBase, Hypertable, CouchDB, MongoDB 17 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 18. What is NoSQL? What are they? ● Document stores ● Key/value stores ● Graph databases ● Timeseries databases 18
  • 19. What is NoSQL? ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 19 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 20. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 20 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 21. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 21 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 22. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 22 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 23. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 23 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 24. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 24 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 25. Categorizing Druid ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 25 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 26. Categorizing Druid Is avoiding the SQL language and relational model really a good thing? 26
  • 27. The Relational Model ● The relational model is based around relations ● SQL calls them tables and those tables have columns ● SQL queries describe relational operations ○ Scan ○ Project ○ Filter ○ Aggregate ○ Union ○ Join 27
  • 28. The Relational Model 28 timestamp product_id user_id revenue 2030-01-01 212 1 180.00 2030-01-01 998 2 24.95 Table: “sales” Table: “products” id name 212 Office chair 998 Coffee mug, 2-pack Table: “users” id country city user_gender user_age 1 US New York F 34 2 FR Paris M 28
  • 29. Druid and the Relational Model 29 timestamp product country city gender age revenue 2030-01-01 Office chair US New York F 34 180.00 2030-01-01 Coffee mug, 2-pack FR Paris M 28 24.95 Datasource: “sales”
  • 30. Druid and the Relational Model 30 Datasource: “sales” Lookup: “products” id name 212 Office chair 998 Coffee mug, 2-pack timestamp product_id country city gender age revenue 2030-01-01 212 US New York F 34 180.00 2030-01-01 998 FR Paris M 28 24.95
  • 31. Druid and the Relational Model ● Datasources are like tables ○ Druid “lookups” apply to a common join use case ○ Big, flat tables are common in SQL databases anyway, when analytical performance is critical ● Benefits of offering SQL ○ Developers and analysts know it ○ Integration with 3rd party apps 31
  • 33. Apache Calcite ● SQL parser ● Query optimizer ● Query interpreter ● JDBC server (Avatica) 33
  • 34. Apache Calcite ● Widely used ○ Druid ○ Hive ○ Storm ○ Samza ○ Drill ○ Phoenix ○ Flink 34
  • 35. Apache Calcite 35 SQL SqlNode Parse tree RelNode Relational operator tree RelNode Optimized in target calling convention
  • 36. SQL query SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 36
  • 37. SQL parse tree SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 37 Identifier“Select” keyword Operator Identifier Literal“Where” keyword “Group by” keyword
  • 38. Relational operators SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 38 LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) LogicalProject(dim1=[$2]) LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) LogicalTableScan(table=[[druid, foo]])
  • 39. Query planner ● Planner rules ○ Match certain relational operator patterns ○ Can transform one set of operators into another ○ New set must have same behavior, but may have a different cost ● HepPlanner (heuristic) ○ Applies all matching rules ● VolcanoPlanner (cost based) ○ Applies rules while searching for low cost plans 39
  • 40. Using Calcite Calcite can be embedded or it can be used directly by end-users. Druid SQL embeds Calcite. 40
  • 41. From NoSQL to SQL 41
  • 42. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 42
  • 43. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 43
  • 44. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 44
  • 45. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 45
  • 46. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 46
  • 47. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 47
  • 48. Native vs SQL { "queryType": "topN", "dataSource": “wikipedia”, "dimension": "countryName", "metric": { "type": "numeric", "metric": "added" }, "intervals": "2018-03-01/2018-03-06", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "channel", "value": "#en.wikipedia", "extractionFn": null }, { "type": "not", "field": { "type": "selector", "dimension": "countryName", "value": "", "extractionFn": null } } ] }, "granularity": "all", "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" } ], "threshold": 5 } SELECT countryName, SUM(added) FROM wikipedia WHERE channel = '#en.wikipedia' AND countryName IS NOT NULL AND __time BETWEEN '2018-03-01' AND '2018-03-06' GROUP BY countryName ORDER BY SUM(added) DESC LIMIT 5 48
  • 49. SQL to Native translation 49 PartialDruidQuery Scan Filter Project Aggregate Filter Project Sort Druid’s query execution pipeline
  • 50. SQL to Native translation SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 50 LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) LogicalProject(dim1=[$2]) LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) LogicalTableScan(table=[[druid, foo]])
  • 51. SQL to Native translation 51 PartialDruidQuery Scan(table=[[druid, foo]]) Filter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) Project(dim1=[$2]) Aggregate(group=[{0}],EXPR$1=[COUNT()]) Filter Project Sort LogicalTableScan(table=[[druid, foo]]) LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) LogicalProject(dim1=[$2]) LogicalAggregate(group=[{0}],EXPR$1=[COUNT()])
  • 52. SQL to Native translation 52 PartialDruidQuery Filter Project Sort { "queryType" : "groupBy", "dataSource" : “foo”, "filter" : { "type" : "in", "dimension" : "dim1", "values" : [ "abc", "def", "ghi" ] }, "dimensions" : [ “dim1” ], "aggregations" : [ { "type" : "count", "name" : "a0" } ], } Scan(table=[[druid, foo]]) Filter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) Project(dim1=[$2]) Aggregate(group=[{0}],EXPR$1=[COUNT()]) toDruidQuery()
  • 53. SQL to Native translation ● Calcite implements: ○ SQL parser ○ Basic set of rules for reordering and combining operators ○ Rule-based optimizer frameworks ● Druid implements: ○ Construct Calcite catalog from Druid datasources ○ Cost functions guide reordering and combining operators ○ Rules to push operators one-by-one into a PartialDruidQuery ○ Convert PartialDruidQuery to DruidQuery 53
  • 54. SQL to Native translation Minimal performance overhead. Can even be faster due to transferring less data to the client! 54
  • 55. Challenges: Writing good queries SQL makes it surprisingly easy to write inefficient queries. Databases strive to optimize as best as they can. But the “EXPLAIN” tool is still essential. 55
  • 56. Challenges: Schema-lightness ● Druid is schema-light (columns and their types are flexible) ● SQL model has tables and columns with specific types ● Druid native queries use type coercions at query time (e.g. user specifies: treat column XYZ as “string”) ● Druid SQL populates catalog with latest metadata 56
  • 57. Challenges: Lookups Think back to lookups. 57 Lookup: “products” id name 212 Office chair 998 Coffee mug, 2-pack timestamp product_id country city gender age revenue 2030-01-01 212 US New York F 34 180.00 2030-01-01 998 FR Paris M 28 24.95
  • 58. Challenges: Lookups SQL experts may think of this as a JOIN. SELECT products.name, SUM(sales.revenue) FROM sales JOIN products ON sales.product_id = products.id GROUP BY products.name 58
  • 59. Challenges: Lookups Druid SQL does not support JOINs, but provides a “LOOKUP” function instead. SELECT LOOKUP(id, ‘products’) AS product_name SUM(sales.revenue) FROM sales GROUP BY product_name 59
  • 60. Future work ● Druid features not supported in Druid SQL ○ Multi-value dimensions ○ Spatial filters ○ Theta sketches (approx. set intersection, differences) ● JOIN related ○ Allow users to write lookups as a SQL JOIN ○ Allow JOINs between two Druid datasources ● Others: SQL window functions, SQL UNION, GROUPING SETS 60
  • 61. Try this at home 61
  • 62. Download Druid community site: http://druid.io/ Imply distribution: https://imply.io/get-started 62
  • 64. Contribute 64 Druid has recently begun migration to the Apache Incubator. Apache Druid is coming soon!