Stream Processing is a concept used to act on real-time streaming data. This session shows and demos how teams in different industries leverage the innovative Streams API from Apache Kafka to build and deploy mission-critical streaming real time application and microservices.
The session discusses important Streaming concepts like local and distributed state management, exactly once semantics, embedding streaming into any application, deployment to any infrastructure. Afterwards, the session explains key advantages of Kafka's Streams API like distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events so you can recalculate output when your code changes.
The session also introduces KSQL - the Streaming SQL Engine for Apache Kafka. Write SQL streaming queries with the scalability, throughput and fail-over of Kafka Streams under the hood.
The end of the session demos how to combine any custom code with your streams application (either Kafka Streams or KSQL) by an example using an analytic model built with any machine learning framework like Apache Spark ML or TensorFlow.
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
1. R ET HINKING
Stream Processing
With Apache Kafka, Kafka Streams and KSQL
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
41. 41
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Neural Network
built with H2O and TensorFlow
Streaming Platform:
Apache Kafka and Kafka Streams
Live Demo
42. 42
H2O.ai Model +
Kafka Streams
Filter
Map
1) Create H2O Deep Learning model
2) Configure Kafka Streams Application
3) Apply H2O model to Streaming Data
4) Start Kafka Streams App
44. 44
What if you are NOT a Java Coder?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
Java
KSQL
48. 48
What is it for ?
Streaming ETL
• Kafka is popular for data pipelines
• KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM
clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
49. 49
What is it for ?
Analytics, e.g. Anomaly Detection
• Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY card_number
HAVING count(*) > 3;
50. 50
What is it for ?
Real Time Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
51. 51
KSQL is Equally viable for
S / M / L / XL / XXL use cases
Ok. Ok. Ok.