Handwritten Text Recognition for manuscripts and early printed texts
Â
Streaming sql w kafka and flink
1. Streaming SQL w/ Kafka and Flink
Kenny Gorman
Co-Founder and CEO
www.eventador.io
2. Intro
â Co-Founded Eventador, we are team of streaming data nerds
â 20 years of data experience, especially SQL and relational databases
â Co-Founded a MongoDB as a service company (ObjectRocket)
â Eventador.io: Streaming SQL EngineâSQLStreamBuilder and Fully Managed Apache Flink
â SQL is experiencing a resurgenceâand is an important part to our future
3. Use Case - Aircraft ADSB data
Letâs use a abstract use case to discuss Streaming SQL shall we?
Aircraft emit radio signals with data about their flight. Automatic dependent surveillance â broadcast
(ADSâB) is a standardized data format that allows for the transmission of data in real time from aircraft to
ground, and from aircraft to aircraft. Aircraft transmit and receive this data as a constant stream of data
once a second.
Check out: https://eventador.io/blog/planestream-the-ads-b-datasource/
9. Stream processor
We like Apache Flink becauseâŠ.
â Killer state management
â Rich API
â Production grade
â Killer scalability model
â Uses Apache Calcite for SQL
â Good community
https://eventador.io/blog/apache_flink_checkpoints_and_savepoints/
12. SQL on streams
Streaming SQL is very similar to the SQL you know and love, but has some fundamental differences:
â Relations are expressed over time, not point in time
â Results donât use a cursor, rather, a endless query emitting results
â Time bounded queries matter
â Event Time vs Processing Time
â Itâs super easy to iterate and reason about your data!
â SELECT * FROM foo WHERE 1=0;
15. Creating processors - the SQL meat
// define a simple filtering SQL statement
String sql = "SELECT icao, lat, lon, altitude FROM flights WHERE altitude <> ââ";
// or maybe something more complicated..
String sql = âSELECT icao, max(altitude)
FROM flights
GROUP BY tumble(timestamp, INTERVAL â5â SECOND), icaoâ;
// apply that statement to the table
tableEnv.registerTableSource("flights", kafkaTableSource);
Table result = tableEnv.sql(sql);
18. Using SQL against aircraft data
{
"output_timestamp": 1566145821,
"icao": "A5739A",
"speed": "390",
"altitude": "15300"
}
19. Streaming SQL - some hints
-- eventTimestamp is the Kafka timestamp
-- as unix timestamp. Magically added to every schema.
SELECT max(eventTimestamp) FROM solar_inputs;
-- make it human readable
SELECT CAST(max(eventTimestamp) AS varchar) as TS FROM
solar_inputs;
-- dete math with interval
SELECT * FROM payments
WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '10' second;
-- detect multiple auths in a short window and
-- send to lock account topic/microservice
SELECT card,
MAX(amount) as theamount,
TUMBLE_END(eventTimestamp, interval '5' minute) as ts
FROM payments
WHERE lat IS NOT NULL
AND lon IS NOT NULL
GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute)
HAVING COUNT(*) > 4 -- >4==fraud
-- unnest each array element as separate row
SELECT b.*, u.*
FROM bgp_avro b,
UNNEST(b.path) AS u(pathitem)
-- union two different virtual tables
SELECT * FROM clickstream
WHERE useragent = 'Chrome/62.0.3202.84 Mobile Safari/537.36'
UNION ALL
SELECT * FROM clickstream
WHERE useragent = 'Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36'
-- inline math
SELECT (amount+10)*upcharge AS total_amount
FROM payments
WHERE account_type = 'merchant'
-- convert C to F
SELECT (temp-32)/1.8 AS temp_fahrenheit
FROM reactor_core_sensors;
-- join multiple streams
SELECT o.name,
SUM(d.clicks),
HOP_END(r.eventTimestamp, interval '20' second, interval '40'
second)
FROM click_stream o JOIN orgs r ON o.org_id = r.org_id
JOIN models d ON d.org_id = r.org_id
GROUP BY o.name,
HOP(r.eventTimestamp, interval '20' second, interval '40' second)
20. BUT!
â Java coding required, easy to get started, gets tricky as you grow
â CICD pipeline
â Not iterative
â Itâs a bit like punch cards eh?
22. We are hiring!
â We are hiring!
â Full stack engineers
â Java/Scala engineers
â Eventador.io SQLStreamBuilder Early Access â join!
kgorman@eventador.io
If you liked this, you might also like: Secrets of the sky