SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
KSQL
Stream Processing leicht gemacht!
Guido Schmutz
DOAG Big Data 2018 – 20.9.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Agenda
1. What is Apache Kafka?
2. KSQL in Action
3. Summary
What is Apache Kafka?
Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Hold Data for Long-Term – Data Retention
Producer 1
Broker 1
Broker 2
Broker 3
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based
(entries with same key are removed):
kafka-topics.sh --zookeeper zk:2181 
--create --topic customers 
--replication-factor 1 
--partitions 1 
--config cleanup.policy=compact
Keep Topics in Compacted Form
0 1 2 3 4 5 6 7 8 9 10 11
K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
Offset
Key
Value
3 4 6 8 9 10
K1 K3 K4 K5 K2 K6
V4 V5 V7 V9 V10 V11
Offset
Key
Value
Compaction
V1
V2
V3 V4
V5
V6
V7
V8 V9
V1
0
V1
1
K1
K3
K4
K5
K2
K6
Demo (I)
Driving
Info
truck_drivin
g
info
Position
console
consumer
Testdata-Generator by Hortonworks
truck_positio
n
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
Demo (I) – Create Kafka Topic
$ kafka-topics --zookeeper zookeeper:2181 --create 
--topic truck_position --partitions 8 --replication-factor 1
$ kafka-topics --zookeeper zookeeper:2181 –list
__consumer_offsets
_confluent-metrics
_schemas
docker-connect-configs
docker-connect-offsets
docker-connect-status
truck_position
Demo (I) – Run Producer and Kafka-Console-Consumer
Demo (I) – Java Producer to "truck_position"
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker-1:9092);
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =
new ProducerRecord<>("truck_position", driverId, eventData);
try {
metadata = producer.send(record).get();
} catch (Exception e) {}
Apache Kafka – wait there is more!
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Kafka Connect - Overview
Source
Connecto
r
Sink
Connecto
r
Choosing the Right API
• Java, c#, c++,
scala, phyton,
node.js, go, php
…
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM
…
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream
Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
Demo (II) – Connect to MQTT through Kafka Connect
truck/nn/
position
mqtt to
kafka
Driving
Info
Position
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
truck_positio
n
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
KSQL in Action
KSQL: a Streaming SQL Engine for Apache Kafka
• Enables stream processing with zero coding required
• The simples way to process streams of data in real-time
• Powered by Kafka and Kafka Streams: scalable, distributed, mature
• All you need is Kafka – no complex deployments
• available as Developer preview!
• STREAM and TABLE as first-class citizens
• STREAM = data in motion
• TABLE = collected state of a stream
• join STREAM and TABLE
KSQL Architecture & Components
KSQL Server
• runs the engine that executes KSQL queries
• includes processing, reading, and writing data to and from the target Kafka cluster
• KSQL servers form KSQL clusters and can run in containers, virtual machines, and
bare-metal machines
• You can add and remove servers to/from the same KSQL cluster during live
operations to elastically scale KSQL’s processing capacity as desired
• You can deploy different KSQL clusters to achieve workload isolation
KSQL CLI
• You can interactively write KSQL queries by using the KSQL command line interface
(CLI).
• KSQL CLI acts as a client to the KSQL server
• For production scenarios you may also configure KSQL servers to run in non-
interactive “headless” configuration, thereby preventing KSQL CLI access
Demo (IV) - Start Kafka KSQL
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Terminology
Stream
• an unbounded sequence of structured data
(“facts”)
• Facts in a stream are immutable: new facts
can be inserted to a stream, but existing
facts can never be updated or deleted
• Streams can be created from a Kafka topic
or derived from an existing stream
• A stream’s underlying data is durably stored
(persisted) within a Kafka topic on the Kafka
brokers
Table
• materialized View of events with only the
latest value for a key
• a view of a stream, or another table, and
represents a collection of evolving facts
• the equivalent of a traditional database table
but enriched by streaming semantics such
as windowing
• Facts in a table are mutable: new facts can
be inserted to the table, and existing facts
can be updated or deleted
• Tables can be created from a Kafka topic or
derived from existing streams and tables
CREATE STREAM
Create a new stream, backed by a Kafka topic, with the specified columns and
properties
Supported column data types:
• BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING
• ARRAY<ArrayType>
• MAP<VARCHAR, ValueType>
• STRUCT<FieldName FieldType, ...>
Supports the following serialization formats: CSV, JSON, AVRO
KSQL adds the implicit columns ROWTIME and ROWKEY to every stream
CREATE STREAM stream_name ( { column_name data_type } [, ...] )
WITH ( property_name = expression [, ...] );
CREATE TABLE
Create a new table with the specified columns and properties
Supports same data types as CREATE STREAM
KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well
KSQL has currently the following requirements for creating a table from a Kafka topic
• message key must also be present as a field/column in the Kafka message value
• message key must be in VARCHAR aka STRING format
CREATE TABLE table_name ( { column_name data_type } [, ...] ) WITH (
property_name = expression [, ...] );
Demo (III) – Create a STREAM on truck_driving_info
truck/nn/
position
mqtt to
kafka
Position truck_positio
n
Driving
Info
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
Stream
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
Demo (III) - Create a STREAM on truck_driving_info
ksql> CREATE STREAM truck_driving_info_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_driving_info', 
value_format=‘JSON');
Message
----------------
Stream created
Demo (III) - Create a STREAM on truck_driving_info
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
Demo (III) – Use SELECT to browse from Stream
truck/nn/
position
mqtt to
kafka
KSQL CLI
Driving
Info
Position
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
truck_positio
n
Stream
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
Demo (III) – Use SELECT to browse from Stream
ksql> SELECT * FROM truck_position_s;
1522847870317 | "truck/13/position0 | �1522847870310 | 44 | 13 | 1390372503 |
Normal | 41.71 | -91.32 | -2458274393837068406
1522847870376 | "truck/14/position0 | �1522847870370 | 35 | 14 | 1961634315 |
Normal | 37.66 | -94.3 | -2458274393837068406
1522847870418 | "truck/21/position0 | �1522847870410 | 58 | 21 | 137128276 |
Normal | 36.17 | -95.99 | -2458274393837068406
1522847870397 | "truck/29/position0 | �1522847870390 | 18 | 29 | 1090292248 |
Normal | 41.67 | -91.24 | -2458274393837068406
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1522847914246 | "truck/11/position0 | �1522847914240 | 54 | 11 | 1198242881 |
Lane Departure | 40.86 | -89.91 | -2458274393837068406
1522847915125 | "truck/10/position0 | �1522847915120 | 93 | 10 | 1384345811 |
Overspeed | 40.38 | -89.17 | -2458274393837068406
1522847919216 | "truck/12/position0 | �1522847919210 | 75 | 12 | 24929475 |
Overspeed | 42.23 | -91.78 | -2458274393837068406
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
INSERT INTO … AS SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
stream_name and from_item must both
refer to a Stream. Tables are not supported!
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
Demo (IV) – CREATE AS … SELECT …
truck/nn/
position
mqtt to
kafka
Position
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
truck_positio
n
detect_dangerou
s_driving
Driving
Info
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
Stream
Stream
dangerous_
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
Demo (IV) – CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1522848286143 | "truck/15/position0 | �1522848286125 | 98 | 15 | 987179512 |
Overspeed | 34.78 | -92.31 | -2458274393837068406
1522848295729 | "truck/11/position0 | �1522848295720 | 54 | 11 | 1198242881 |
Unsafe following distance | 38.43 | -90.35 | -2458274393837068406
1522848313018 | "truck/11/position0 | �1522848313000 | 54 | 11 | 1198242881 |
Overspeed | 41.87 | -87.67 | -2458274393837068406
Functions
Scalar Functions
• ABS, ROUND, CEIL, FLOOR
• ARRAYCONTAINS
• CONCAT, SUBSTRING, TRIM
• EXTRACJSONFIELD
• GEO_DISTANCE
• LCASE, UCASE
• MASK, MASK_KEEP_LEFT,
MASK_KEEP_RIGHT, MASK_LEFT,
MASK_RIGHT
• RANDOM
• STRINGTOTIMESTAMP,
TIMESTAMPTOSTRING
Aggregate Functions
• COUNT
• MAX
• MIN
• SUM
• TOPK
• TOPKDISTINCT
User-Defined Functions (UDF) and User-
Defined Aggregate Functions (UDAF)
• Currently only supported using Java
Windowing
Introduction to Stream Processing
Since streams are unbounded, you need
some meaningful time frames to do
computations (i.e. aggregations)
Computations over events done using
windows of data
Windows give the power to keep a
working memory and look back at recent
data efficiently
Windows are tracked per unique key
Time
Stream of Data Window of Data
Sliding Window (aka
Hopping Window) - uses
eviction and trigger policies
that are based on time: window
length and sliding interval
length
Fixed Window (aka Tumbling
Window) - eviction policy always
based on the window being full
and trigger policy based on
either the count of items in the
window or time
Session Window – composed
of sequences of temporarily
related events terminated by a
gap of inactivity greater than
some timeout
Windowing
Introduction to Stream Processing
Time TimeTime
Demo (IV) – Aggregate and Window
truck/nn/
position
mqtt to
kafka
detect_dangerou
s_driving
Driving
Info
Position
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
truck_positio
n
Stream
Stream
dangerous_
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
count_by_
eventType
Table
Count_by_
evnet_type
Demo (IV) – SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS 
SELECT eventType, count(*) nof 
FROM dangerous_driving_s 
WINDOW TUMBLING (SIZE 30 SECONDS) 
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-09-19 20:10:59.587 | Overspeed | 1
2018-09-19 20:11:15.713 | Unsafe following distance | 1
2018-09-19 20:11:39.662 | Unsafe tail distance | 1
2018-09-19 20:12:03.870 | Unsafe following distance | 1
2018-09-19 20:12:04.502 | Overspeed | 1
2018-09-19 20:12:05.856 | Lane Departure | 1
Joining
Introduction to Stream Processing
Challenges of joining streams
1. Data streams need to be aligned as they
come because they have different timestamps
2. since streams are never-ending, the joins
must be limited; otherwise join will never end
3. join needs to produce results continuously as
there is no end to the data
Stream to Static (Table) Join
Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-
to-Static
Join
Stream-
to-Stream
Join
Stream-
to-Stream
Join
Time
Time
Time
Demo (V) – Join Table to enrich with Driver data
truck/nn/
position
mqtt to
kafka
detect_dangerou
s_driving
Driving
Info
Position
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
truck_positio
n
Truck
Driver
jdbc-
source
trucking_
driver
27, Walter, Ward, Y, 24-JUL-85, 2017-10-02
15:19:00
{"id":27,"firstName":"Walter"
,"lastName":"Ward","availab
le":"Y","birthdate":"24-JUL-
85","last_update":15069230
52012}
StreamTable
Stream
join_dangerous_
driving_driver
Stream
dangerous_dr
iving_driver
dangerous_
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
Demo (V) – Join Table to enrich with Driver data
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
Demo (V) - Create Table with Driver State
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='truck_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
Demo (V) - Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON') 
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance
1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure
1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail
distance
Demo (VI) – Stream-to-Stream Join
truck/nn/
position
mqtt to
kafka
detect_dangerou
s_driving
Driving
Info
Position
truck/nn/
drving-info
mqtt to
kafka
truck_driving
info
truck_positio
n
join_dangerous_
and_position
Truck
Driver
jdbc-
source
trucking_
driver
27, Walter, Ward, Y, 24-JUL-85, 2017-10-02
15:19:00
{"id":27,"firstName":"Walter"
,"lastName":"Ward","availab
le":"Y","birthdate":"24-JUL-
85","last_update":15069230
52012}
StreamStreamTable
Stream
join_dangerous_
driving_driver
Stream
dangerous_dr
iving_driver
dangerous_
driving
Stream
dangerous_
driving_position
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Norma
l",
"correlationId":"-3208700263746910537"}
{"timestamp":1537342514539,"truckId":
87,"latitude":38.65,"longitude":-90.21}
Demo (V) - Stream-to-Stream Join
ksql> CREATE STREAM truck_position_s 
(timestamp VARCHAR, 
truckId VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE) 
WITH (kafka_topic='truck_position', 
value_format='JSON');
ksql> SELECT ddad.driverid, ddad.first_name, ddad.last_name, ddad.truckid,
ddad.routeid, ddad.eventtype, tp.latitude, tp.longitude 
FROM dangerous_driving_and_driver_s ddad 
INNER JOIN truck_position_s tp 
WITHIN 1 minute 
ON tp.truckid = ddad.truckid;
11 | Micky | Isaacson | 47 | 1961634315 | Unsafe tail distance | 38.99 | -94.38
11 | Micky | Isaacson | 47 | 1961634315 | Unsafe tail distance | 38.67 | -94.38
12 | Laurence | Lindsey | 52 | 1198242881 | Lane Departure | 38.0 | -94.37
Summary
Summary
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the ”S” in KSQL)
KSQL is dependent on “Kafka core”
• KSQL consumes from Kafka broker
• KSQL produces to Kafka broker
KSQL runs as a Java application and can be deployed to various resource managers
Use Kafka Connect or any other Stream Data Integration tool to bring your data into
Kafka first
Technology on its own won't help you.
You need to know how to use it properly.

Weitere ähnliche Inhalte

Was ist angesagt?

Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...confluent
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationPatrick Di Loreto
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Apache Kafka Scalable Message Processing and more!
Apache Kafka Scalable Message Processing and more! Apache Kafka Scalable Message Processing and more!
Apache Kafka Scalable Message Processing and more! Guido Schmutz
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformGuido Schmutz
 
How to build 1000 microservices with Kafka and thrive
How to build 1000 microservices with Kafka and thriveHow to build 1000 microservices with Kafka and thrive
How to build 1000 microservices with Kafka and thriveNatan Silnitsky
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...confluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystemconfluent
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSLightbend
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARNKafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARNDataWorks Summit
 

Was ist angesagt? (20)

Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...
Overcoming the Perils of Kafka Secret Sprawl (Tejal Adsul, Confluent) Kafka S...
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Apache Kafka Scalable Message Processing and more!
Apache Kafka Scalable Message Processing and more! Apache Kafka Scalable Message Processing and more!
Apache Kafka Scalable Message Processing and more!
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
How to build 1000 microservices with Kafka and thrive
How to build 1000 microservices with Kafka and thriveHow to build 1000 microservices with Kafka and thrive
How to build 1000 microservices with Kafka and thrive
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARNKafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
 

Ähnlich wie KSQL - Stream Processing simplified!

Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreGuido Schmutz
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applicationsRobert Sanders
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Michael Noll
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 

Ähnlich wie KSQL - Stream Processing simplified! (20)

Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 

Mehr von Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Mehr von Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Kürzlich hochgeladen

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Kürzlich hochgeladen (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

KSQL - Stream Processing simplified!

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH KSQL Stream Processing leicht gemacht! Guido Schmutz DOAG Big Data 2018 – 20.9.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3. Agenda 1. What is Apache Kafka? 2. KSQL in Action 3. Summary
  • 4. What is Apache Kafka?
  • 5. Apache Kafka – A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 6. Hold Data for Long-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact
  • 7. Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction V1 V2 V3 V4 V5 V6 V7 V8 V9 V1 0 V1 1 K1 K3 K4 K5 K2 K6
  • 8. Demo (I) Driving Info truck_drivin g info Position console consumer Testdata-Generator by Hortonworks truck_positio n {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21}
  • 9. Demo (I) – Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 –list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position
  • 10. Demo (I) – Run Producer and Kafka-Console-Consumer
  • 11. Demo (I) – Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {}
  • 12. Apache Kafka – wait there is more! Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 13. Kafka Connect - Overview Source Connecto r Sink Connecto r
  • 14. Choosing the Right API • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent
  • 15. Demo (II) – Connect to MQTT through Kafka Connect truck/nn/ position mqtt to kafka Driving Info Position truck/nn/ drving-info mqtt to kafka truck_driving info truck_positio n {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21}
  • 17. KSQL: a Streaming SQL Engine for Apache Kafka • Enables stream processing with zero coding required • The simples way to process streams of data in real-time • Powered by Kafka and Kafka Streams: scalable, distributed, mature • All you need is Kafka – no complex deployments • available as Developer preview! • STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream • join STREAM and TABLE
  • 18.
  • 19. KSQL Architecture & Components KSQL Server • runs the engine that executes KSQL queries • includes processing, reading, and writing data to and from the target Kafka cluster • KSQL servers form KSQL clusters and can run in containers, virtual machines, and bare-metal machines • You can add and remove servers to/from the same KSQL cluster during live operations to elastically scale KSQL’s processing capacity as desired • You can deploy different KSQL clusters to achieve workload isolation KSQL CLI • You can interactively write KSQL queries by using the KSQL command line interface (CLI). • KSQL CLI acts as a client to the KSQL server • For production scenarios you may also configure KSQL servers to run in non- interactive “headless” configuration, thereby preventing KSQL CLI access
  • 20. Demo (IV) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql>
  • 21. Terminology Stream • an unbounded sequence of structured data (“facts”) • Facts in a stream are immutable: new facts can be inserted to a stream, but existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream • A stream’s underlying data is durably stored (persisted) within a Kafka topic on the Kafka brokers Table • materialized View of events with only the latest value for a key • a view of a stream, or another table, and represents a collection of evolving facts • the equivalent of a traditional database table but enriched by streaming semantics such as windowing • Facts in a table are mutable: new facts can be inserted to the table, and existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables
  • 22. CREATE STREAM Create a new stream, backed by a Kafka topic, with the specified columns and properties Supported column data types: • BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR or STRING • ARRAY<ArrayType> • MAP<VARCHAR, ValueType> • STRUCT<FieldName FieldType, ...> Supports the following serialization formats: CSV, JSON, AVRO KSQL adds the implicit columns ROWTIME and ROWKEY to every stream CREATE STREAM stream_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] );
  • 23. CREATE TABLE Create a new table with the specified columns and properties Supports same data types as CREATE STREAM KSQL adds the implicit columns ROWTIME and ROWKEY to every table as well KSQL has currently the following requirements for creating a table from a Kafka topic • message key must also be present as a field/column in the Kafka message value • message key must be in VARCHAR aka STRING format CREATE TABLE table_name ( { column_name data_type } [, ...] ) WITH ( property_name = expression [, ...] );
  • 24. Demo (III) – Create a STREAM on truck_driving_info truck/nn/ position mqtt to kafka Position truck_positio n Driving Info truck/nn/ drving-info mqtt to kafka truck_driving info Stream {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21}
  • 25. Demo (III) - Create a STREAM on truck_driving_info ksql> CREATE STREAM truck_driving_info_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, correlationId VARCHAR) WITH (kafka_topic='truck_driving_info', value_format=‘JSON'); Message ---------------- Stream created
  • 26. Demo (III) - Create a STREAM on truck_driving_info ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING)
  • 27. SELECT Selects rows from a KSQL stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] [ LIMIT count ];
  • 28. Demo (III) – Use SELECT to browse from Stream truck/nn/ position mqtt to kafka KSQL CLI Driving Info Position truck/nn/ drving-info mqtt to kafka truck_driving info truck_positio n Stream {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21} {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"}
  • 29. Demo (III) – Use SELECT to browse from Stream ksql> SELECT * FROM truck_position_s; 1522847870317 | "truck/13/position0 | �1522847870310 | 44 | 13 | 1390372503 | Normal | 41.71 | -91.32 | -2458274393837068406 1522847870376 | "truck/14/position0 | �1522847870370 | 35 | 14 | 1961634315 | Normal | 37.66 | -94.3 | -2458274393837068406 1522847870418 | "truck/21/position0 | �1522847870410 | 58 | 21 | 137128276 | Normal | 36.17 | -95.99 | -2458274393837068406 1522847870397 | "truck/29/position0 | �1522847870390 | 18 | 29 | 1090292248 | Normal | 41.67 | -91.24 | -2458274393837068406 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1522847914246 | "truck/11/position0 | �1522847914240 | 54 | 11 | 1198242881 | Lane Departure | 40.86 | -89.91 | -2458274393837068406 1522847915125 | "truck/10/position0 | �1522847915120 | 93 | 10 | 1384345811 | Overspeed | 40.38 | -89.17 | -2458274393837068406 1522847919216 | "truck/12/position0 | �1522847919210 | 75 | 12 | 24929475 | Overspeed | 42.23 | -91.78 | -2458274393837068406
  • 30. CREATE STREAM … AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name];
  • 31. INSERT INTO … AS SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error stream_name and from_item must both refer to a Stream. Tables are not supported! CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ];
  • 32. Demo (IV) – CREATE AS … SELECT … truck/nn/ position mqtt to kafka Position {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21} truck_positio n detect_dangerou s_driving Driving Info truck/nn/ drving-info mqtt to kafka truck_driving info Stream Stream dangerous_ driving {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"}
  • 33. Demo (IV) – CREATE AS … SELECT … ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_s; 1522848286143 | "truck/15/position0 | �1522848286125 | 98 | 15 | 987179512 | Overspeed | 34.78 | -92.31 | -2458274393837068406 1522848295729 | "truck/11/position0 | �1522848295720 | 54 | 11 | 1198242881 | Unsafe following distance | 38.43 | -90.35 | -2458274393837068406 1522848313018 | "truck/11/position0 | �1522848313000 | 54 | 11 | 1198242881 | Overspeed | 41.87 | -87.67 | -2458274393837068406
  • 34. Functions Scalar Functions • ABS, ROUND, CEIL, FLOOR • ARRAYCONTAINS • CONCAT, SUBSTRING, TRIM • EXTRACJSONFIELD • GEO_DISTANCE • LCASE, UCASE • MASK, MASK_KEEP_LEFT, MASK_KEEP_RIGHT, MASK_LEFT, MASK_RIGHT • RANDOM • STRINGTOTIMESTAMP, TIMESTAMPTOSTRING Aggregate Functions • COUNT • MAX • MIN • SUM • TOPK • TOPKDISTINCT User-Defined Functions (UDF) and User- Defined Aggregate Functions (UDAF) • Currently only supported using Java
  • 35. Windowing Introduction to Stream Processing Since streams are unbounded, you need some meaningful time frames to do computations (i.e. aggregations) Computations over events done using windows of data Windows give the power to keep a working memory and look back at recent data efficiently Windows are tracked per unique key Time Stream of Data Window of Data
  • 36. Sliding Window (aka Hopping Window) - uses eviction and trigger policies that are based on time: window length and sliding interval length Fixed Window (aka Tumbling Window) - eviction policy always based on the window being full and trigger policy based on either the count of items in the window or time Session Window – composed of sequences of temporarily related events terminated by a gap of inactivity greater than some timeout Windowing Introduction to Stream Processing Time TimeTime
  • 37. Demo (IV) – Aggregate and Window truck/nn/ position mqtt to kafka detect_dangerou s_driving Driving Info Position truck/nn/ drving-info mqtt to kafka truck_driving info truck_positio n Stream Stream dangerous_ driving {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21} count_by_ eventType Table Count_by_ evnet_type
  • 38. Demo (IV) – SELECT COUNT … GROUP BY ksql> CREATE TABLE dangerous_driving_count AS SELECT eventType, count(*) nof FROM dangerous_driving_s WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY eventType; Message ---------------------------- Table created and running ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’), eventType, nof FROM dangerous_driving_count;; 2018-09-19 20:10:59.587 | Overspeed | 1 2018-09-19 20:11:15.713 | Unsafe following distance | 1 2018-09-19 20:11:39.662 | Unsafe tail distance | 1 2018-09-19 20:12:03.870 | Unsafe following distance | 1 2018-09-19 20:12:04.502 | Overspeed | 1 2018-09-19 20:12:05.856 | Lane Departure | 1
  • 39. Joining Introduction to Stream Processing Challenges of joining streams 1. Data streams need to be aligned as they come because they have different timestamps 2. since streams are never-ending, the joins must be limited; otherwise join will never end 3. join needs to produce results continuously as there is no end to the data Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream- to-Static Join Stream- to-Stream Join Stream- to-Stream Join Time Time Time
  • 40. Demo (V) – Join Table to enrich with Driver data truck/nn/ position mqtt to kafka detect_dangerou s_driving Driving Info Position truck/nn/ drving-info mqtt to kafka truck_driving info truck_positio n Truck Driver jdbc- source trucking_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00 {"id":27,"firstName":"Walter" ,"lastName":"Ward","availab le":"Y","birthdate":"24-JUL- 85","last_update":15069230 52012} StreamTable Stream join_dangerous_ driving_driver Stream dangerous_dr iving_driver dangerous_ driving {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21}
  • 41. Demo (V) – Join Table to enrich with Driver data #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"truck_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }'
  • 42. Demo (V) - Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='truck_driver', value_format='JSON', key='id'); Message ---------------- Table created
  • 43. Demo (V) - Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON') AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance 1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure 1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail distance
  • 44. Demo (VI) – Stream-to-Stream Join truck/nn/ position mqtt to kafka detect_dangerou s_driving Driving Info Position truck/nn/ drving-info mqtt to kafka truck_driving info truck_positio n join_dangerous_ and_position Truck Driver jdbc- source trucking_ driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00 {"id":27,"firstName":"Walter" ,"lastName":"Ward","availab le":"Y","birthdate":"24-JUL- 85","last_update":15069230 52012} StreamStreamTable Stream join_dangerous_ driving_driver Stream dangerous_dr iving_driver dangerous_ driving Stream dangerous_ driving_position {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Norma l", "correlationId":"-3208700263746910537"} {"timestamp":1537342514539,"truckId": 87,"latitude":38.65,"longitude":-90.21}
  • 45. Demo (V) - Stream-to-Stream Join ksql> CREATE STREAM truck_position_s (timestamp VARCHAR, truckId VARCHAR, latitude DOUBLE, longitude DOUBLE) WITH (kafka_topic='truck_position', value_format='JSON'); ksql> SELECT ddad.driverid, ddad.first_name, ddad.last_name, ddad.truckid, ddad.routeid, ddad.eventtype, tp.latitude, tp.longitude FROM dangerous_driving_and_driver_s ddad INNER JOIN truck_position_s tp WITHIN 1 minute ON tp.truckid = ddad.truckid; 11 | Micky | Isaacson | 47 | 1961634315 | Unsafe tail distance | 38.99 | -94.38 11 | Micky | Isaacson | 47 | 1961634315 | Unsafe tail distance | 38.67 | -94.38 12 | Laurence | Lindsey | 52 | 1198242881 | Lane Departure | 38.0 | -94.37
  • 47. Summary KSQL is another way to work with data in Kafka => you can (re)use some of your SQL knowledge Similar semantics to SQL, but is for queries on continuous, streaming data Well-suited for structured data (there is the ”S” in KSQL) KSQL is dependent on “Kafka core” • KSQL consumes from Kafka broker • KSQL produces to Kafka broker KSQL runs as a Java application and can be deployed to various resource managers Use Kafka Connect or any other Stream Data Integration tool to bring your data into Kafka first
  • 48. Technology on its own won't help you. You need to know how to use it properly.