Kafka Streams provides exactly-once stream processing by coordinating transactions across Kafka topics. Records are processed and written to output topics, offset commits and state updates occur atomically using transactions. This ensures each record is processed once even if failures occur. Connectors will extend exactly-once processing to data from outside Kafka.
2. Outline
• Stream processing with Kafka
• Exactly-once for stream processing
• How Kafka Streams enabled exactly-once
2
3. 3
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
Your App
4. 4
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
ack
ack
commit
Your App
5. 5
Stream Processing: Do itYourself
while (isRunning) {
// read some messages from Kafka
inputMessages = consumer.poll();
// do some processing…
// send output messages back to Kafka, wait for ack
producer.send(outputMessages).get();
// commit offsets for processed messages
consumer.commit(..);
}
6. 6
• Ordering
• Partitioning &
Scalability
• Fault tolerance
• State Management
• Time, Window &
Out-of-order Data
• Re-processing
DIY Stream Processing is Hard
8. 8
Exactly-Once
An application property for stream processing,
.. that for each received record,
.. its process results will be reflected exactly once,
.. even under failures
9. 9
Error Scenario #1: Duplicate Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
Streams App
10. 10
Error Scenario #1: Duplicate Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
producer config: retries = N (default = 0)
Streams App
11. 11
Error Scenario #2: Re-process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
State
Process
Streams App
12. 12
Error Scenario #2: Re-process
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
19. 19
• Building blocks to achieve exactly-once
• Idempotence: de-duped sends in order per partition
• Transactions: atomic multiple-sends across topic partitions
• Kafka Streams: enable exactly-once in a single knob
Exactly-once, the Kafka Way!(0.11+)
20. Kafka Streams (0.10+)
• New client library beyond producer and consumer
• Powerful yet easy-to-use
• Event time, stateful processing
• Out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
20
24. Kafka Streams DSL
24
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
25. Kafka Streams DSL
25
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
26. Kafka Streams DSL
26
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
27. Kafka Streams DSL
27
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
31. State Store
31
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);
32. Kafka Streams DSL
32
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
40. 40
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
Exactly-once with Kafka
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
41. 41
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
Exactly-Once with Kafka
42. 42
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
All or Nothing
Exactly-Once with Kafka
43. 43
Exactly-Once with Kafka Streams (0.11+)
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
44. 44
Exactly-Once with Kafka Streams (0.11+)
• Acked produce to sink topics
• A batch of records sent to the offset topic
• State update on processor
45. 45
• Acked produce to sink topics
• A batch of records sent to the offset topic
Exactly-Once with Kafka Streams (0.11+)
• A batch of records sent to changelog topics
46. 46
Exactly-Once with Kafka Streams (0.11+)
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog topics
47. 47
Exactly-Once with Kafka Streams (0.11+)
All or Nothing
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog topics