SlideShare ist ein Scribd-Unternehmen logo
1 von 34
@rmoff robin@confluent.io
Steps to Building a
Streaming ETL Pipeline with
Apache Kafka® and KSQL
Robin Moffatt, Developer Advocate
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 2
$ whoami
• Developer Advocate @ Confluent
• Working in data & analytics since 2001
• Oracle ACE Director & Dev Champion
• Blogging : https://rmoff.net & http://cnfl.io/rmoff
• Twitter: @rmoff
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 3
Housekeeping Items
● This session will last about an hour.
● This session will be recorded.
● You can submit your questions by entering them into the
GoToWebinar panel.
● The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Streaming ETL
with Apache Kafka
and KSQL
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 5
Database offload Hadoop/Object Storage/Cloud DW for Analytics
HDFS / S3 /
BigQuery etc
RDBMS
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 6
Streaming ETL with Apache Kafka and KSQL
order items
customer
customer orders
Stream
Processing
RDBMS
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 7
Real-time Event Stream Enrichment with Apache Kafka and KSQL
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 8
Transform Once, Use Many
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
New App
<x>
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 9
Transform Once, Use Many
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
HDFS / S3 / etc
New App
<x>
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 12
The Connect API of Apache Kafka®
✓ Fault tolerant and automatically load balanced
✓ Extensible API
✓ Single Message Transforms
✓ Part of Apache Kafka, included in

Confluent Open Source
Reliable and scalable integration of Kafka
with other systems – no coding required.
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
✓ Centralized management and configuration
✓ Support for hundreds of technologies including
RDBMS, Elasticsearch, HDFS, S3, syslog
✓ Supports CDC ingest of events from RDBMS
✓ Preserves data schema
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 13
Kafka Connect
Kafka Brokers
Kafka Connect
Tasks Workers
Sources Sinks
Amazon S3
syslog
flat file
CSV
JSON
MQTT
MQTT
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 14
Considerations for Integration into Apache Kafka
Photo by Matthew Smith on Unsplash
• Chucking data over the fence into a Kafka topic is
not enough
• We need standard ways of building data pipelines
in Kafka
• Schema handling
• Serialisation formats
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 15
Considerations for Integration into Apache Kafka
Photo by Matthew Smith on Unsplash
• Confluent Schema Registry & Avro is a great way to
do this
• Downstream users of the data can then easily use
the data
• KSQL
• Kafka Connect
• Kafka Streams
• Custom apps
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 16
The Confluent Schema Registry
MySQL
Avro
Message
Elasticsearch
Schema
RegistryAvro
Schema
Kafka
Connect
Kafka
ConnectAvro
Message
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 17
The Confluent Schema Registry
Source (MySQL) schema
is preserved
Target (Elasticsearch) schema
mapping is automagically built
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 18
Integrating Databases with Kafka
• CDC is a generic term referring to
capturing changing data typically
from a RDBMS.
• Two general approaches:
• Query-based CDC
• Log-based CDC
There are other options including hacks with
Triggers, Flashback etc but these are system and/or
technology-specific.
Read more: http://cnfl.io/kafka-cdc
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
• Use a database query to try and identify new & changed rows





• Implemented with the open source Kafka Connect JDBC connector
• Can import based on table names, schema, or bespoke SQL query
•Incremental ingest driven through incrementing ID column and/or
timestamp column
19
Query-based CDC
SELECT * FROM my_table 

WHERE col > <value of col last time we polled>
Read more: http://cnfl.io/kafka-cdc
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 20
Log-based CDC
• Use the database's
transaction log to identify
every single change event
• Various CDC tools available
that integrate with Apache
Kafka (more of this later…)
Read more: http://cnfl.io/kafka-cdc
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 21
Query-based vs Log-based CDC
Photo by Matese Fields on Unsplash
• Query-based
+Usually easier to setup, and
requires fewer permissions
- Needs specific columns in
source schema
- Impact of polling the DB (or
higher latencies tradeoff)
- Can't track deletes
Read more: http://cnfl.io/kafka-cdc
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 22
Query-based vs Log-based CDC
Photo by Sebastian Pociecha on Unsplash
• Log-based
+Greater data fidelity
+Lower latency
+Lower impact on source
- More setup steps
- Higher system privileges required
- For propriatory databases, usually $$$
Read more: http://cnfl.io/kafka-cdc
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 23
Which Log-Based CDC Tool?
For query-based CDC, use the Confluent Kafka Connect JDBC connector
• Open Source RDBMS, 

e.g. MySQL, PostgreSQL
• Debezium
• (+ paid options)
• Mainframe

e.g. VSAM, IMS
• Attunity
• SQData
• Proprietory RDBMS, 

e.g. Oracle, MS SQL
• Attunity
• IBM InfoSphere Data Replication
• Oracle GoldenGate
• SQData
• HVR
Read more: http://cnfl.io/kafka-cdc
All these options integrate with Apache Kafka and Confluent
Platform, including support for the Schema Registry
“
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
But I need to
join…aggregate…filter…
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Declarative
Stream
Language
Processing
KSQLis a
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
KSQL in Development and Production
Interactive KSQL

for development and testing
Headless KSQL

for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try

out this idea...”
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
KSQL for Streaming ETL
CREATE STREAM vip_actions AS 

SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.user_id 

WHERE u.level = 'Platinum';
Joining, filtering, and aggregating streams of event data
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
KSQL for Real-Time Monitoring
• Log data monitoring, tracking and alerting
• syslog data
• Sensor / IoT data
CREATE STREAM SYSLOG_INVALID_USERS AS
SELECT HOST, MESSAGE
FROM SYSLOG
WHERE MESSAGE LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
CREATE STREAM views_by_userid
WITH (PARTITIONS=6, REPLICAS=5,
VALUE_FORMAT='AVRO',
TIMESTAMP='view_time') AS 

SELECT * FROM clickstream
PARTITION BY user_id;
KSQL for Data Transformation
Make simple derivations of existing topics from the command line
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
DEMO!
@rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
MySQL DebeziumKafka Connect
Producer API
Elasticsearch
Kafka Connect
34
Questions?
http://confluent.io/ksql
https://slackpass.io/confluentcommunity

Weitere ähnliche Inhalte

Was ist angesagt?

Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeDatabricks
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore) Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore) Harry McLaren
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Anil Nair
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 

Was ist angesagt? (20)

Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore) Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 

Ähnlich wie Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaMatthias J. Sax
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micBas van Oudenaarde
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 

Ähnlich wie Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL (20)

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache Kafka
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

  • 1. @rmoff robin@confluent.io Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL Robin Moffatt, Developer Advocate
  • 2. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 2 $ whoami • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle ACE Director & Dev Champion • Blogging : https://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff
  • 3. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 3 Housekeeping Items ● This session will last about an hour. ● This session will be recorded. ● You can submit your questions by entering them into the GoToWebinar panel. ● The last 10-15 minutes will consist of Q&A. ● The slides and recording will be available after the talk.
  • 4. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL Streaming ETL with Apache Kafka and KSQL
  • 5. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 5 Database offload Hadoop/Object Storage/Cloud DW for Analytics HDFS / S3 / BigQuery etc RDBMS
  • 6. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 6 Streaming ETL with Apache Kafka and KSQL order items customer customer orders Stream Processing RDBMS
  • 7. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 7 Real-time Event Stream Enrichment with Apache Kafka and KSQL order events customer Stream Processing customer orders RDBMS <y>
  • 8. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 8 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x>
  • 9. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 9 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x>
  • 10. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
  • 11. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
  • 12. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 12 The Connect API of Apache Kafka® ✓ Fault tolerant and automatically load balanced ✓ Extensible API ✓ Single Message Transforms ✓ Part of Apache Kafka, included in
 Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/ ✓ Centralized management and configuration ✓ Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3, syslog ✓ Supports CDC ingest of events from RDBMS ✓ Preserves data schema
  • 13. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 13 Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks Amazon S3 syslog flat file CSV JSON MQTT MQTT
  • 14. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 14 Considerations for Integration into Apache Kafka Photo by Matthew Smith on Unsplash • Chucking data over the fence into a Kafka topic is not enough • We need standard ways of building data pipelines in Kafka • Schema handling • Serialisation formats
  • 15. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 15 Considerations for Integration into Apache Kafka Photo by Matthew Smith on Unsplash • Confluent Schema Registry & Avro is a great way to do this • Downstream users of the data can then easily use the data • KSQL • Kafka Connect • Kafka Streams • Custom apps
  • 16. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 16 The Confluent Schema Registry MySQL Avro Message Elasticsearch Schema RegistryAvro Schema Kafka Connect Kafka ConnectAvro Message
  • 17. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 17 The Confluent Schema Registry Source (MySQL) schema is preserved Target (Elasticsearch) schema mapping is automagically built
  • 18. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 18 Integrating Databases with Kafka • CDC is a generic term referring to capturing changing data typically from a RDBMS. • Two general approaches: • Query-based CDC • Log-based CDC There are other options including hacks with Triggers, Flashback etc but these are system and/or technology-specific. Read more: http://cnfl.io/kafka-cdc
  • 19. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL • Use a database query to try and identify new & changed rows
 
 
 • Implemented with the open source Kafka Connect JDBC connector • Can import based on table names, schema, or bespoke SQL query •Incremental ingest driven through incrementing ID column and/or timestamp column 19 Query-based CDC SELECT * FROM my_table 
 WHERE col > <value of col last time we polled> Read more: http://cnfl.io/kafka-cdc
  • 20. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 20 Log-based CDC • Use the database's transaction log to identify every single change event • Various CDC tools available that integrate with Apache Kafka (more of this later…) Read more: http://cnfl.io/kafka-cdc
  • 21. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 21 Query-based vs Log-based CDC Photo by Matese Fields on Unsplash • Query-based +Usually easier to setup, and requires fewer permissions - Needs specific columns in source schema - Impact of polling the DB (or higher latencies tradeoff) - Can't track deletes Read more: http://cnfl.io/kafka-cdc
  • 22. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 22 Query-based vs Log-based CDC Photo by Sebastian Pociecha on Unsplash • Log-based +Greater data fidelity +Lower latency +Lower impact on source - More setup steps - Higher system privileges required - For propriatory databases, usually $$$ Read more: http://cnfl.io/kafka-cdc
  • 23. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL 23 Which Log-Based CDC Tool? For query-based CDC, use the Confluent Kafka Connect JDBC connector • Open Source RDBMS, 
 e.g. MySQL, PostgreSQL • Debezium • (+ paid options) • Mainframe
 e.g. VSAM, IMS • Attunity • SQData • Proprietory RDBMS, 
 e.g. Oracle, MS SQL • Attunity • IBM InfoSphere Data Replication • Oracle GoldenGate • SQData • HVR Read more: http://cnfl.io/kafka-cdc All these options integrate with Apache Kafka and Confluent Platform, including support for the Schema Registry
  • 24. “ @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL But I need to join…aggregate…filter…
  • 25. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL Declarative Stream Language Processing KSQLis a
  • 26. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL KSQLis the Streaming SQL Engine for Apache Kafka
  • 27. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL KSQL in Development and Production Interactive KSQL
 for development and testing Headless KSQL
 for Production Desired KSQL queries have been identified REST “Hmm, let me try
 out this idea...”
  • 28. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL KSQL for Streaming ETL CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data
  • 29. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 30. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  • 31. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL CREATE STREAM views_by_userid WITH (PARTITIONS=6, REPLICAS=5, VALUE_FORMAT='AVRO', TIMESTAMP='view_time') AS 
 SELECT * FROM clickstream PARTITION BY user_id; KSQL for Data Transformation Make simple derivations of existing topics from the command line
  • 32. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL DEMO!
  • 33. @rmoff / Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL MySQL DebeziumKafka Connect Producer API Elasticsearch Kafka Connect