Speaker: Perry Krol, Senior Sales Engineer, Confluent Germany GmbH
Title of Talk:
Introduction to Apache Kafka as Event-Driven Open Source Streaming Platform
Abstract:
Apache Kafka is a de facto standard event streaming platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time event driven data processing.
The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. This session explains the concepts, architecture and technical details, including live demos.
4. 44
Highly Scalable
Persistent
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Real-timeHighly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
5. 55
Highly Scalable
Durable
Persistent
Maintains Order
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
What happened
in the world
(stored records)
What is happening
in the world
(transient messages)
What is contextually happening in the world (data
as a continually updating stream of events)
6. 66
Event-Driven App
(Location Tracking)
Only Real-Time Events
Messaging Queues and
Event Streaming
Platforms can do this
Contextual
Event-Driven App
(ETA)
Real-Time combined
with stored data
Only Event Streaming
Platforms can do this
Where is my driver? When will my driver
get here?
Where is my driver? When will my driver
get here?
2
min
Why Combine Real-time
With Historical Context?
9. 9C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
10. 10C O N F I D E N T I A L 10C O N F I D E N T I A L
About Confluent We Are The Kafka Experts
30% of Fortune 100
Confluent founders
created Kafka
Confluent team wrote
80% of Kafka
We have over 300,000
hours of Kafka Experience
11. 11C O N F I D E N T I A L
Kafka Integration Architecture
PRODUCERCONSUMER
12. 12C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing Analogy
13. 13C O N F I D E N T I A L
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
14. 14C O N F I D E N T I A L
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT t.account_id,
a.first_name + ’ ’ + a.last_name cust_name,
t.atm, t.amount,
TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time
FROM atm_txns t
INNER JOIN accounts a
ON t.account_id = a.account_id;
Simple SQL syntax for expressing reasoning along and across data streams.
You can write user-defined functions in Java
Stream processing with KSQL
15. 15C O N F I D E N T I A L
KSQL in Development and Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try
out this idea...”
16. 16C O N F I D E N T I A L
ATM Fraud Dataflow: Streaming ETL with KSQL
17. 17C O N F I D E N T I A L
What does KSQL look like?
● First load a topic into a stream
CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR,
atm VARCHAR,
location STRUCT<lon DOUBLE, lat DOUBLE>,
amount INT,
timestamp VARCHAR,
transaction_id VARCHAR)
WITH (KAFKA_TOPIC='atm_txns_gess', VALUE_FORMAT='JSON‘,
TIMESTAMP='timestamp‘,
TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss X‘);
18. 18C O N F I D E N T I A L
What does KSQL look like?
● Create a table on topic for reference data
● Join stream to table for enrichment
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT T.ACCOUNT_ID AS ACCOUNT_ID, T.TX1_TIMESTAMP,
T.TX2_TIMESTAMP, T.TX1_AMOUNT, T.TX2_AMOUNT,
T.TX1_ATM, T.TX2_ATM, T.TX1_LOCATION, T.TX2_LOCATION,
T.TX1_TRANSACTION_ID, T.TX2_TRANSACTION_ID,
T.DISTANCE_BETWEEN_TXN_KM, T.MILLISECONDS_DIFFERENCE,
T.MINUTES_DIFFERENCE, T.KMH_REQUIRED,
A.FIRST_NAME + ' ‚ + A.LAST_NAME AS CUSTOMER_NAME,
A.EMAIL AS CUSTOMER_EMAIL, A.PHONE AS CUSTOMER_PHONE,
A.ADDRESS AS CUSTOMER_ADDRESS, A.COUNTRY AS CUSTOMER_COUNTRY
FROM ATM_POSSIBLE_FRAUD T
INNER JOIN ACCOUNTS A
ON T.ACCOUNT_ID = A.ACCOUNT_ID;
CREATE TABLE ACCOUNTS
WITH (KAFKA_TOPIC='ACCOUNTS',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');
20. 20C O N F I D E N T I A L
Or use the Kafka Streams API
● Java or Scala
● Can do multiple joins in one operation
● Provides an interactive query API which makes it possible to query the state
store.
21. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Hub
hub.confluent.io
One-stop place to discover and download :
• Connectors
• Transformations
• Converters
23. 23Confluent Community - What next?
About 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
There are more than 35,000 Kafkateers
in around 145 meetup groups across all
five continents!
Join the Confluent Community
Slack Channel
Join your local Apache Kafka®
Meetup
Get frequent updates from key names in
Apache Kafka® on best practices,
product updates & more!
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/meetups cnfl.io/read
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
affiliation with and does not endorse the materials provided at this event.