3. 3Confidential
Confluent is ...
…a company founded by the original creators of Apache Kafka
...a distributed streaming platform
• Built on Apache Kafka
• Confluent Open Source
• Confluent Enterprise
All components but Kafka
are optional to run Confluent.
Mix-and-match them as required.
…a company founded by the original creators of Apache Kafka
4. 4Confidential
Kafka Streams is ...
… the easiest way to process data in Kafka (as of v0.10)
• Easy to use library
• Real stream processing / record by record / ms latency
• DSL
• Focus on applications
• No cluster / “cluster to-go”
• ”DB to-go”
• Expressive
• Single record transformations
• Aggregations / Joins
• Time, windowing, out-of-order data
• Stream-table duality
• Tightly integrated within Kafka
• Fault-tolerant
• Scalable (s/m/l/xl), elastic
• Encryption, authentication, authorization
• Stateful
• Backed by Kafka
• Queryable / “DB to-go”
• Date reprocessing
• Application “reset button”
7. 7Confidential
Before Kafka Streams
Do-it-yourself stream Processing
• Hard to get right / lots of “clue code”
• Fault-tolerance / scalability … ???
Using a framework
• Requires a cluster
• Bare metal – hard to manage
• YARN / Mesos
• Test locally – deploy remotely
• “Can you please deploy my code?”
• Jar und dependency hell
How does you application interact with
you stream processing job?
plain
consumer/produc
er clients
and
others...
13. 13Confidential
How to install Kafka Streams?
Not at all. It’s a library.
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.1.0</version>
</dependency>
14. 14Confidential
How do I deploy my app?
Whatever works for you. It’s just an app as any other!
15. 15Confidential
If it’s just a regular application…
• How does it scale?
• How can it be fault-tolerant?
• How does it handle distributed state?
Off-load hard problems to brokers.
• Kafka is a streaming platform: no need to reinvent the wheel
• Exploit consumer groups and group management protocol
25. 25Confidential
KStream/KTable
• KStream
• Record stream
• Each record describes an event in the real world
• Example: click stream
• KTable
• Changelog stream
• Each record describes a change to a previous record
• Example: position report stream
• In Kafka Streams:
• KTable holds a materialized view of the latest update per key as internal state
27. 27Confidential
KTable (count moves)
alice paris bob zurich alice berlin
Record stream
alice 0
KTable state
count()
Changelog stream (output)
alice 0
alice 0
KTable state
bob 0
count()
bob 0
alice 1
KTable state
bob 0
count()
alice 1
28. 28Confidential
KTable (cont.)
• Internal state:
• Continuously updating materialized view of the latest status
• Downstream result (“output”)
• Changelog stream, describing every update to the materialized view
KStream stream = …
KTable table = stream.aggregate(...)
It’s the changelog!
30. 30Confidential
Time and Windows
• Event time (default)
• Create time
• (Broker) Ingestion time
• Customized
• (Hopping) Time windows
• Overlapping or non-overlapping (tumbling)
• For aggregations
• Processing Time
• Sliding windows
• For KStream-KStream joins
KStream stream = …
KTable table = stream.aggregate(TimeWindow.of(10 * 1000), ...);
31. 31Confidential
KTable Semantics
• Non-windowed:
• State is kept forever:
• Out-of-order/late-arriving records can be handled straightforward
• KTable aggregation can be viewed as a landmark window (ie, window size ==
infinite)
• Output is a changelog stream
• Windowed:
• Windows (ie, state) is kept ”forever” (well, there is a configurable retention time)
• Out-of-order/late-arriving records can be handled straightforward
• Output is a changelog stream
• No watermarks required
• Early updates/results
33. 33Confidential
Page Views per Region
Stream/Table
joinClick Stream
Profile Changelog
key val
Current User Info
Cn
t
PageViews per
Region
<userId:region>
<userId:page>
<region:page>
34. 34Confidential
Page Views per Region
final KStreamBuilder builder = new KStreamBuilder();
// read record stream from topic “PageView” and changelog stream from topic “UserProfiles”
final KStream<String, String> views = builder.stream("PageViews"); // <userId : page>
final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
35. 35Confidential
Page Views per Region
final KStreamBuilder builder = new KStreamBuilder();
// read record stream from topic “PageView” and changelog stream from topic “UserProfiles”
final KStream<String, String> views = builder.stream("PageViews"); // <userId : page>
final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
// enrich page views with user’s region -- stream-table-join
final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles,
(page, userRegion) -> page + “,” + userRegion )
// and set “region” as new key
.map( (userId, pageAndRegion) -> new KeyValue<>(pageAndRegion.split(“,”)[1], pageAndRegion.split(“,”)[0]) );
36. 36Confidential
Page Views per Region
final KStreamBuilder builder = new KStreamBuilder();
// read record stream from topic “PageView” and changelog stream from topic “UserProfiles”
final KStream<String, String> views = builder.stream("PageViews"); // <userId : page>
final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
// enrich page views with user’s region -- stream-table-join AND set “region” as new key
final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page>
// count views by region, using hopping windows of size 5 minutes that advance every 1 minute
final KTable<Windowed<String>, Long> viewsPerRegion = viewsWithRegionKey
.groupByKey() // redistribute data
.count(TimeWindow.of(5 * 60 * 1000L).advanceBy(60 * 1000L), "GeoPageViewsStore");
37. 37Confidential
Page Views per Region
final KStreamBuilder builder = new KStreamBuilder();
// read record stream from topic “PageView” and changelog stream from topic “UserProfiles”
final KStream<String, String> views = builder.stream("PageViews"); // <userId : page>
final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
// enrich page views with user’s region -- stream-table-join AND set “region” as new key
final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page>
// count views by region, using hopping windows of size 5 minutes that advance every 1 minute
final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...);
// write result
viewsByRegion.toStream( (windowedRegion, count) -> windowedRegion.toString() ) // prepare result
.to(stringSerde, longSerde, "PageViewsByRegion"); // write to topic “PageViewsByResion”
38. 38Confidential
Page Views per Region
final KStreamBuilder builder = new KStreamBuilder();
// read record stream from topic “PageView” and changelog stream from topic “UserProfiles”
final KStream<String, String> views = builder.stream("PageViews"); // <userId : page>
final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
// enrich page views with user’s region -- stream-table-join AND set “region” as new key
final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page>
// count views by region, using hopping windows of size 5 minutes that advance every 1 minute
final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...);
// write result to topic “PageViewsByResion”
viewsByRegion.toStream(...).to(..., "PageViewsByRegion");
// start application
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration); // streamsConfiguration omitted for brevity
streams.start();
// stop application
streams.close();
/* https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/PageViewRegionExample.java */
39. 39Confidential
Interactive Queries
• KTable is a changelog stream with materialized internal view (state)
• KStream-KTable join can do lookups into the materialized view
• What if the application could do lookups, too?
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
Yes, it can!
“DB to-go“