Markus Günther provides an overview of Apache Kafka. Kafka is a distributed publish-subscribe messaging system that supports topic access semantics. Producers publish data to topics and consumers subscribe to topics of interest to consume data at their own pace. Kafka uses a persistent commit log to implement messaging, with publishers appending messages and consumers reading sequentially. It supports at-least-once and exactly-once delivery guarantees.
3. 3
Adding more systems increases the complexity of
communication channels in this kind of architecture.
System
System
System System
System
System
4. 4
A messaging solution can be used to decouple producing systems
from consuming systems and thus remove that complexity.
Producer
Consumer
Producer Producer
Consumer
Consumer
Messaging Solution
5. 5
Apache Kafka supports this communication model.
Producer
Consumer
Producer Producer
Consumer
Consumer
Apache Kafka Cluster
6. 6
Producers publish data to specific topics, consumers subscribe to
topics of interest and consume data at their own pace.
Producer
Consumer
Producer Producer
Consumer
Consumer
Topic A Topic B Topic C
Consumer
Consumer
7. 7
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics.
History
Intentions ▪ Designed for near-real-time processing of events
▪ Supports multiple delivery semantics
▪ At-least-once
▪ Exactly-once (well, not quite)
▪ Optimized binary protocol for client-to-broker communication
▪ No integration with JMS, …
▪ Apache Kafka originated at LinkedIn
▪ Maintained by the Apache Foundation
▪ Confluent drives further development
▪ Confluent provides various system components that enrich the Kafka ecosystem
8. 8
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics. (cont.)
Innovations ▪ Messages are acknowledged in order
▪ Messages are persisted for days / weeks / indefinite
▪ Consumers manage their offsets
9. 9
Kafka uses a persistent log to implement publish-subscribe
messaging. Publishers append, consumers read sequentially.
9 8 7 6 5 4 3 2 1 0
Producer
publishes
Consumer
consumer group: A
Consumer
consumer group: B
current position: 8 current position: 3
11. 1
A Kafka topic is comprised of at least one partition.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
12. 1
Consumers that participate in the same consumer group share the
read workload of an equally partition-sized topic.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
13. 1
Kafka redistributes work if a consumer process fails and is no
longer able to process messages.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
14. 1
A message (or record, or event, or what-have-you) contains
metadata alongside the actual message payload.
Headers
(optional)
Key
(optional)
Value
(set by application)
Timestamp
(set by Kafka or by application)
15. 1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Partition 2
Apache Kafka Cluster
Broker 2
Partition 0
Broker 3
Partition 1
Topic with 3 partitions, replication factor = 1
16. 1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Leader-partition 2
Apache Kafka Cluster
Broker 2
Leader-partition 0
Broker 3
Leader-partition 1
Topic with 3 partitions, replication factor = 2
Follower-partition 0 Follower-partition 1 Follower-partition 2
17. 1
In-Sync-Replica set
for partition 0
The In-Sync-Replica set (ISR) contains all brokers that are either a
leader or a follower for a dedicated topic-partition.
Partition 0
Broker 1
Follower-partition 0
Broker 2
Leader-partition 0
replicate
acknowledge
19. 1
A reference architecture helps us to sort things into categories that
are driven by certain (non-)functional requirements.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Collection Service
(MQTT)
Collection Service
(HTTP)
Cache
Topic 1
Topic 2
Topic 3
Subscriber 2
(Stream Processor)
Subscriber 3
(Stream Processor)
Search
Engine
RDBMS
Client Application
Subscriber 1
(Stream Processor)
20. 2
Apache Kafka features a rich ecosystem of supporting services that
fit nicely into the tiers of a streaming architecture.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Kafka Connect
(Source Connector)
Kafka Client DSL
(Producing System) Topic 1
Topic 2
Topic 3
Search
Engine
RDBMS
Client Application
Kafka Client DSL
(Consuming System)
Confluent
Schema Registry
Confluent
REST Proxy
Kafka Streams DSL
or ksqlDB
(Stream Processor)
Kafka Connect
(Sink Connector)
Kafka Cluster
21. 2
Want to know more?
Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and
stream processing at scale, O‘Reilly, 2nd Edition, 2021
▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable
Event-Driven Applications, Independently published, 2020
▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly,
2014
▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data
Systems by Example, O‘Reilly, 2021
▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka
and MapR Streams, O‘Reilly, 2016
▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018
▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017
22. 2
Want to know more?
Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen
mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98
▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit
Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87
▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit
Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77
▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38
▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017,
p. 54-58
▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka,
JavaSPEKTRUM, 3/2017, p. 48-51
23. 2
Want to know more?
GitHub
Other ▪ Confluent Developer Portal,
https://developer.confluent.io/
▪ Various blogs on testing, data exploration, etc.,
https://www.mguenther.net/tag/kafka.html/
▪ Kafka for JUnit on GitHub,
https://mguenther.github.io/kafka-junit/
▪ User Guide to Kafka for JUnit,
https://mguenther.github.io/kafka-junit/
▪ Event-sourcing using Spring Kafka,
https://github.com/mguenther/spring-kafka-event-sourcing-sampler
▪ Spring Kafka for Large-Scale Event Processing
https://github.com/mguenther/spring-kafka-event-processing-sampler
▪ Introduction to Spring Kafka
https://github.com/mguenther/spring-kafka-introduction