SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Streaming SQL w/ Kafka and Flink
Kenny Gorman
Co-Founder and CEO
www.eventador.io
Intro
● Co-Founded Eventador, we are team of streaming data nerds
● 20 years of data experience, especially SQL and relational databases
● Co-Founded a MongoDB as a service company (ObjectRocket)
● Eventador.io: Streaming SQL Engine—SQLStreamBuilder and Fully Managed Apache Flink
● SQL is experiencing a resurgence—and is an important part to our future
Use Case - Aircraft ADSB data
Let’s use a abstract use case to discuss Streaming SQL shall we?
Aircraft emit radio signals with data about their flight. Automatic dependent surveillance – broadcast
(ADS–B) is a standardized data format that allows for the transmission of data in real time from aircraft to
ground, and from aircraft to aircraft. Aircraft transmit and receive this data as a constant stream of data
once a second.
Check out: https://eventador.io/blog/planestream-the-ads-b-datasource/
Erik’s serious about this..
Me
Erik
Dump1090
antirez
Use Case - Aircraft ADSB data
{
"speed": "390",
"lon": "",
"flight": "",
"timestamp": 1566145812,
"lat": "",
"counter": 61,
"icao": "A5739A",
"msg_type": "4",
"timestamp_verbose": "2019-08-18
16:30:12.846211",
"vr": "1728",
"track": "285",
"altitude": ""
}
{
"speed": "",
"lon": "",
"flight": "",
"timestamp": 1566145821,
"lat": "",
"counter": 118,
"icao": "A5739A",
"msg_type": "5",
"timestamp_verbose": "2019-08-18
16:30:21.314870",
"vr": "",
"track": "",
"altitude": "15300"
}
{
"speed": "",
"lon": "-98.24491",
"flight": "",
"timestamp": 1566145821,
"lat": "30.27925",
"counter": 121,
"icao": "A5739A",
"msg_type": "3",
"timestamp_verbose": "2019-08-18
16:30:21.769869",
"vr": "",
"track": "",
"altitude": "15300"
}
Use Case - Aircraft ADSB data
{
"speed": "390",
"lon": "",
"flight": "",
"timestamp": 1566145812,
"lat": "",
"counter": 61,
"icao": "A5739A",
"msg_type": "4",
"timestamp_verbose": "2019-08-18
16:30:12.846211",
"vr": "1728",
"track": "285",
"altitude": ""
}
{
"speed": "",
"lon": "",
"flight": "",
"timestamp": 1566145821,
"lat": "",
"counter": 118,
"icao": "A5739A",
"msg_type": "5",
"timestamp_verbose": "2019-08-18
16:30:21.314870",
"vr": "",
"track": "",
"altitude": "15300"
}
{
"speed": "",
"lon": "-98.24491",
"flight": "",
"timestamp": 1566145821,
"lat": "30.27925",
"counter": 121,
"icao": "A5739A",
"msg_type": "3",
"timestamp_verbose": "2019-08-18
16:30:21.769869",
"vr": "",
"track": "",
"altitude": "15300"
}
Nulls
Key
Kafka
TimeStamp
Data flow
Some sort of
processing
apparatus
Stream processor
We like Apache Flink because
.
● Killer state management
● Rich API
● Production grade
● Killer scalability model
● Uses Apache Calcite for SQL
● Good community
https://eventador.io/blog/apache_flink_checkpoints_and_savepoints/
Data flow
Table API + SQL
SQL on streams
Streaming SQL is very similar to the SQL you know and love, but has some fundamental differences:
● Relations are expressed over time, not point in time
● Results don’t use a cursor, rather, a endless query emitting results
● Time bounded queries matter
○ Event Time vs Processing Time
● It’s super easy to iterate and reason about your data!
● SELECT * FROM foo WHERE 1=0;
Creating processors - java/scala
https://github.com/kgorman/TrafficAnalyzer/
Creating processors - the Schema meat
// define a schema
String[] fieldNames = { "flight", "timestamp_verbose", "msg_type", "track",
"timestamp", "altitude", "counter", "lon",
"icao", "vr", "lat", "speed" };
TypeInformation<?>[] dataTypes = { Types.INT, Types.SQL_TIMESTAMP, Types.STRING, Types.STRING,
Types.SQL_TIMESTAMP, Types.STRING, Types.STRING, Types.STRING,
Types.STRING, Types.STRING, Types.STRING, Types.STRING };
Creating processors - the SQL meat
// define a simple filtering SQL statement
String sql = "SELECT icao, lat, lon, altitude FROM flights WHERE altitude <> ‘’";
// or maybe something more complicated..
String sql = “SELECT icao, max(altitude)
FROM flights
GROUP BY tumble(timestamp, INTERVAL ‘5’ SECOND), icao”;
// apply that statement to the table
tableEnv.registerTableSource("flights", kafkaTableSource);
Table result = tableEnv.sql(sql);
Data flow
Using SQL against aircraft data
{
"speed": "390",
"lon": "",
"flight": "",
"timestamp": 1566145812,
"lat": "",
"counter": 61,
"icao": "A5739A",
"msg_type": "4",
"timestamp_verbose": "2019-08-18
16:30:12.846211",
"vr": "1728",
"track": "285",
"altitude": ""
}
{
"speed": "",
"lon": "",
"flight": "",
"timestamp": 1566145821,
"lat": "",
"counter": 118,
"icao": "A5739A",
"msg_type": "5",
"timestamp_verbose": "2019-08-18
16:30:21.314870",
"vr": "",
"track": "",
"altitude": "15300"
}
{
"speed": "",
"lon": "-98.24491",
"flight": "",
"timestamp": 1566145821,
"lat": "30.27925",
"counter": 121,
"icao": "A5739A",
"msg_type": "3",
"timestamp_verbose": "2019-08-18
16:30:21.769869",
"vr": "",
"track": "",
"altitude": "15300"
}
Using SQL against aircraft data
{
"output_timestamp": 1566145821,
"icao": "A5739A",
"speed": "390",
"altitude": "15300"
}
Streaming SQL - some hints
-- eventTimestamp is the Kafka timestamp
-- as unix timestamp. Magically added to every schema.
SELECT max(eventTimestamp) FROM solar_inputs;
-- make it human readable
SELECT CAST(max(eventTimestamp) AS varchar) as TS FROM
solar_inputs;
-- dete math with interval
SELECT * FROM payments
WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '10' second;
-- detect multiple auths in a short window and
-- send to lock account topic/microservice
SELECT card,
MAX(amount) as theamount,
TUMBLE_END(eventTimestamp, interval '5' minute) as ts
FROM payments
WHERE lat IS NOT NULL
AND lon IS NOT NULL
GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute)
HAVING COUNT(*) > 4 -- >4==fraud
-- unnest each array element as separate row
SELECT b.*, u.*
FROM bgp_avro b,
UNNEST(b.path) AS u(pathitem)
-- union two different virtual tables
SELECT * FROM clickstream
WHERE useragent = 'Chrome/62.0.3202.84 Mobile Safari/537.36'
UNION ALL
SELECT * FROM clickstream
WHERE useragent = 'Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36'
-- inline math
SELECT (amount+10)*upcharge AS total_amount
FROM payments
WHERE account_type = 'merchant'
-- convert C to F
SELECT (temp-32)/1.8 AS temp_fahrenheit
FROM reactor_core_sensors;
-- join multiple streams
SELECT o.name,
SUM(d.clicks),
HOP_END(r.eventTimestamp, interval '20' second, interval '40'
second)
FROM click_stream o JOIN orgs r ON o.org_id = r.org_id
JOIN models d ON d.org_id = r.org_id
GROUP BY o.name,
HOP(r.eventTimestamp, interval '20' second, interval '40' second)
BUT!
● Java coding required, easy to get started, gets tricky as you grow
● CICD pipeline
● Not iterative
● It’s a bit like punch cards eh?
Demo
We are hiring!
● We are hiring!
○ Full stack engineers
○ Java/Scala engineers
● Eventador.io SQLStreamBuilder Early Access ← join!
kgorman@eventador.io
If you liked this, you might also like: Secrets of the sky
Thank You
hello@eventador.io

Weitere Àhnliche Inhalte

Was ist angesagt?

Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_onpko89403
 
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available GraphiteMatthew Barlocker
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafkaconfluent
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesDatabricks
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...Databricks
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Stock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce ImplementationStock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce ImplementationMaruthi Nataraj K
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...confluent
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingRamaninder Singh Jhajj
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 

Was ist angesagt? (20)

Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc Bourlier
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming Pipelines
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...
Large-Scaled Telematics Analytics in Apache Spark with Wayne Zhang and Neil P...
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Stock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce ImplementationStock Analyzer Hadoop MapReduce Implementation
Stock Analyzer Hadoop MapReduce Implementation
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data Processing
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 

Ähnlich wie Streaming sql w kafka and flink

RTAS 2023: Building a Real-Time IoT Application
RTAS 2023:  Building a Real-Time IoT ApplicationRTAS 2023:  Building a Real-Time IoT Application
RTAS 2023: Building a Real-Time IoT ApplicationTimothy Spann
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupRobert Metzger
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKData Con LA
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Timothy Spann
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use BothKSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use Bothconfluent
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle StartLiangjun Jiang
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Monitoring Spark Applications
Monitoring Spark ApplicationsMonitoring Spark Applications
Monitoring Spark ApplicationsTzach Zohar
 
KSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache KafkaKSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache KafkaChris Mueller
 

Ähnlich wie Streaming sql w kafka and flink (20)

RTAS 2023: Building a Real-Time IoT Application
RTAS 2023:  Building a Real-Time IoT ApplicationRTAS 2023:  Building a Real-Time IoT Application
RTAS 2023: Building a Real-Time IoT Application
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use BothKSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use Both
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle Start
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Monitoring Spark Applications
Monitoring Spark ApplicationsMonitoring Spark Applications
Monitoring Spark Applications
 
KSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache KafkaKSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache Kafka
 

KĂŒrzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

KĂŒrzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Streaming sql w kafka and flink

  • 1. Streaming SQL w/ Kafka and Flink Kenny Gorman Co-Founder and CEO www.eventador.io
  • 2. Intro ● Co-Founded Eventador, we are team of streaming data nerds ● 20 years of data experience, especially SQL and relational databases ● Co-Founded a MongoDB as a service company (ObjectRocket) ● Eventador.io: Streaming SQL Engine—SQLStreamBuilder and Fully Managed Apache Flink ● SQL is experiencing a resurgence—and is an important part to our future
  • 3. Use Case - Aircraft ADSB data Let’s use a abstract use case to discuss Streaming SQL shall we? Aircraft emit radio signals with data about their flight. Automatic dependent surveillance – broadcast (ADS–B) is a standardized data format that allows for the transmission of data in real time from aircraft to ground, and from aircraft to aircraft. Aircraft transmit and receive this data as a constant stream of data once a second. Check out: https://eventador.io/blog/planestream-the-ads-b-datasource/
  • 4. Erik’s serious about this.. Me Erik
  • 6. Use Case - Aircraft ADSB data { "speed": "390", "lon": "", "flight": "", "timestamp": 1566145812, "lat": "", "counter": 61, "icao": "A5739A", "msg_type": "4", "timestamp_verbose": "2019-08-18 16:30:12.846211", "vr": "1728", "track": "285", "altitude": "" } { "speed": "", "lon": "", "flight": "", "timestamp": 1566145821, "lat": "", "counter": 118, "icao": "A5739A", "msg_type": "5", "timestamp_verbose": "2019-08-18 16:30:21.314870", "vr": "", "track": "", "altitude": "15300" } { "speed": "", "lon": "-98.24491", "flight": "", "timestamp": 1566145821, "lat": "30.27925", "counter": 121, "icao": "A5739A", "msg_type": "3", "timestamp_verbose": "2019-08-18 16:30:21.769869", "vr": "", "track": "", "altitude": "15300" }
  • 7. Use Case - Aircraft ADSB data { "speed": "390", "lon": "", "flight": "", "timestamp": 1566145812, "lat": "", "counter": 61, "icao": "A5739A", "msg_type": "4", "timestamp_verbose": "2019-08-18 16:30:12.846211", "vr": "1728", "track": "285", "altitude": "" } { "speed": "", "lon": "", "flight": "", "timestamp": 1566145821, "lat": "", "counter": 118, "icao": "A5739A", "msg_type": "5", "timestamp_verbose": "2019-08-18 16:30:21.314870", "vr": "", "track": "", "altitude": "15300" } { "speed": "", "lon": "-98.24491", "flight": "", "timestamp": 1566145821, "lat": "30.27925", "counter": 121, "icao": "A5739A", "msg_type": "3", "timestamp_verbose": "2019-08-18 16:30:21.769869", "vr": "", "track": "", "altitude": "15300" } Nulls Key Kafka TimeStamp
  • 8. Data flow Some sort of processing apparatus
  • 9. Stream processor We like Apache Flink because
. ● Killer state management ● Rich API ● Production grade ● Killer scalability model ● Uses Apache Calcite for SQL ● Good community https://eventador.io/blog/apache_flink_checkpoints_and_savepoints/
  • 11. Table API + SQL
  • 12. SQL on streams Streaming SQL is very similar to the SQL you know and love, but has some fundamental differences: ● Relations are expressed over time, not point in time ● Results don’t use a cursor, rather, a endless query emitting results ● Time bounded queries matter ○ Event Time vs Processing Time ● It’s super easy to iterate and reason about your data! ● SELECT * FROM foo WHERE 1=0;
  • 13. Creating processors - java/scala https://github.com/kgorman/TrafficAnalyzer/
  • 14. Creating processors - the Schema meat // define a schema String[] fieldNames = { "flight", "timestamp_verbose", "msg_type", "track", "timestamp", "altitude", "counter", "lon", "icao", "vr", "lat", "speed" }; TypeInformation<?>[] dataTypes = { Types.INT, Types.SQL_TIMESTAMP, Types.STRING, Types.STRING, Types.SQL_TIMESTAMP, Types.STRING, Types.STRING, Types.STRING, Types.STRING, Types.STRING, Types.STRING, Types.STRING };
  • 15. Creating processors - the SQL meat // define a simple filtering SQL statement String sql = "SELECT icao, lat, lon, altitude FROM flights WHERE altitude <> ‘’"; // or maybe something more complicated.. String sql = “SELECT icao, max(altitude) FROM flights GROUP BY tumble(timestamp, INTERVAL ‘5’ SECOND), icao”; // apply that statement to the table tableEnv.registerTableSource("flights", kafkaTableSource); Table result = tableEnv.sql(sql);
  • 17. Using SQL against aircraft data { "speed": "390", "lon": "", "flight": "", "timestamp": 1566145812, "lat": "", "counter": 61, "icao": "A5739A", "msg_type": "4", "timestamp_verbose": "2019-08-18 16:30:12.846211", "vr": "1728", "track": "285", "altitude": "" } { "speed": "", "lon": "", "flight": "", "timestamp": 1566145821, "lat": "", "counter": 118, "icao": "A5739A", "msg_type": "5", "timestamp_verbose": "2019-08-18 16:30:21.314870", "vr": "", "track": "", "altitude": "15300" } { "speed": "", "lon": "-98.24491", "flight": "", "timestamp": 1566145821, "lat": "30.27925", "counter": 121, "icao": "A5739A", "msg_type": "3", "timestamp_verbose": "2019-08-18 16:30:21.769869", "vr": "", "track": "", "altitude": "15300" }
  • 18. Using SQL against aircraft data { "output_timestamp": 1566145821, "icao": "A5739A", "speed": "390", "altitude": "15300" }
  • 19. Streaming SQL - some hints -- eventTimestamp is the Kafka timestamp -- as unix timestamp. Magically added to every schema. SELECT max(eventTimestamp) FROM solar_inputs; -- make it human readable SELECT CAST(max(eventTimestamp) AS varchar) as TS FROM solar_inputs; -- dete math with interval SELECT * FROM payments WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '10' second; -- detect multiple auths in a short window and -- send to lock account topic/microservice SELECT card, MAX(amount) as theamount, TUMBLE_END(eventTimestamp, interval '5' minute) as ts FROM payments WHERE lat IS NOT NULL AND lon IS NOT NULL GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute) HAVING COUNT(*) > 4 -- >4==fraud -- unnest each array element as separate row SELECT b.*, u.* FROM bgp_avro b, UNNEST(b.path) AS u(pathitem) -- union two different virtual tables SELECT * FROM clickstream WHERE useragent = 'Chrome/62.0.3202.84 Mobile Safari/537.36' UNION ALL SELECT * FROM clickstream WHERE useragent = 'Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36' -- inline math SELECT (amount+10)*upcharge AS total_amount FROM payments WHERE account_type = 'merchant' -- convert C to F SELECT (temp-32)/1.8 AS temp_fahrenheit FROM reactor_core_sensors; -- join multiple streams SELECT o.name, SUM(d.clicks), HOP_END(r.eventTimestamp, interval '20' second, interval '40' second) FROM click_stream o JOIN orgs r ON o.org_id = r.org_id JOIN models d ON d.org_id = r.org_id GROUP BY o.name, HOP(r.eventTimestamp, interval '20' second, interval '40' second)
  • 20. BUT! ● Java coding required, easy to get started, gets tricky as you grow ● CICD pipeline ● Not iterative ● It’s a bit like punch cards eh?
  • 21. Demo
  • 22. We are hiring! ● We are hiring! ○ Full stack engineers ○ Java/Scala engineers ● Eventador.io SQLStreamBuilder Early Access ← join! kgorman@eventador.io If you liked this, you might also like: Secrets of the sky