Streaming Data with Apache Kafka

Markus Günther
Freelance Software Engineer / Architect
mail@mguenther.net | mguenther.net | @markus_guenther
Streaming Data
with Apache Kafka

2
Point-to-point communication is simple to maintain – especially
if there is only a small number of systems involved.
System
System

3
Adding more systems increases the complexity of
communication channels in this kind of architecture.
System
System
System System
System
System

4
A messaging solution can be used to decouple producing systems
from consuming systems and thus remove that complexity.
Producer
Consumer
Producer Producer
Consumer
Consumer
Messaging Solution

5
Apache Kafka supports this communication model.
Producer
Consumer
Producer Producer
Consumer
Consumer
Apache Kafka Cluster

6
Producers publish data to specific topics, consumers subscribe to
topics of interest and consume data at their own pace.
Producer
Consumer
Producer Producer
Consumer
Consumer
Topic A Topic B Topic C
Consumer
Consumer

7
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics.
History
Intentions ▪ Designed for near-real-time processing of events
▪ Supports multiple delivery semantics
▪ At-least-once
▪ Exactly-once (well, not quite)
▪ Optimized binary protocol for client-to-broker communication
▪ No integration with JMS, …
▪ Apache Kafka originated at LinkedIn
▪ Maintained by the Apache Foundation
▪ Confluent drives further development
▪ Confluent provides various system components that enrich the Kafka ecosystem

8
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics. (cont.)
Innovations ▪ Messages are acknowledged in order
▪ Messages are persisted for days / weeks / indefinite
▪ Consumers manage their offsets

9
Kafka uses a persistent log to implement publish-subscribe
messaging. Publishers append, consumers read sequentially.
9 8 7 6 5 4 3 2 1 0
Producer
publishes
Consumer
consumer group: A
Consumer
consumer group: B
current position: 8 current position: 3

1
A Kafka topic is comprised of at least one partition.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2

1
Consumers that participate in the same consumer group share the
read workload of an equally partition-sized topic.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group

1
Kafka redistributes work if a consumer process fails and is no
longer able to process messages.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group

1
A message (or record, or event, or what-have-you) contains
metadata alongside the actual message payload.
Headers
(optional)
Key
(optional)
Value
(set by application)
Timestamp
(set by Kafka or by application)

1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Partition 2
Broker 2
Partition 0
Broker 3
Partition 1
Topic with 3 partitions, replication factor = 1

1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Leader-partition 2
Broker 2
Leader-partition 0
Broker 3
Leader-partition 1
Topic with 3 partitions, replication factor = 2
Follower-partition 0 Follower-partition 1 Follower-partition 2

1
In-Sync-Replica set
for partition 0
The In-Sync-Replica set (ISR) contains all brokers that are either a
leader or a follower for a dedicated topic-partition.
Partition 0
Broker 1
Follower-partition 0
Broker 2
Leader-partition 0
replicate
acknowledge

1
A reference architecture helps us to sort things into categories that
are driven by certain (non-)functional requirements.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Collection Service
(MQTT)
Collection Service
(HTTP)
Cache
Topic 1
Topic 2
Topic 3
Subscriber 2
(Stream Processor)
Subscriber 3
(Stream Processor)
Search
Engine
RDBMS
Client Application
Subscriber 1
(Stream Processor)

2
Apache Kafka features a rich ecosystem of supporting services that
fit nicely into the tiers of a streaming architecture.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Kafka Connect
(Source Connector)
Kafka Client DSL
(Producing System) Topic 1
Topic 2
Topic 3
Search
Engine
RDBMS
Client Application
Kafka Client DSL
(Consuming System)
Confluent
Schema Registry
Confluent
REST Proxy
Kafka Streams DSL
or ksqlDB
(Stream Processor)
Kafka Connect
(Sink Connector)
Kafka Cluster

2
Want to know more?
Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and
stream processing at scale, O‘Reilly, 2nd Edition, 2021
▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable
Event-Driven Applications, Independently published, 2020
▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly,
2014
▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data
Systems by Example, O‘Reilly, 2021
▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka
and MapR Streams, O‘Reilly, 2016
▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018
▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017

2
Want to know more?
Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen
mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98
▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit
Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87
▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit
Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77
▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38
▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017,
p. 54-58
▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka,
JavaSPEKTRUM, 3/2017, p. 48-51

2
Want to know more?
GitHub
Other ▪ Confluent Developer Portal,
https://developer.confluent.io/
▪ Various blogs on testing, data exploration, etc.,
https://www.mguenther.net/tag/kafka.html/
▪ Kafka for JUnit on GitHub,
https://mguenther.github.io/kafka-junit/
▪ User Guide to Kafka for JUnit,
https://mguenther.github.io/kafka-junit/
▪ Event-sourcing using Spring Kafka,
https://github.com/mguenther/spring-kafka-event-sourcing-sampler
▪ Spring Kafka for Large-Scale Event Processing
https://github.com/mguenther/spring-kafka-event-processing-sampler
▪ Introduction to Spring Kafka
https://github.com/mguenther/spring-kafka-introduction

2
Questions?
mguenther.net markus_guenther
mail@mguenther.net

Streaming Data with Apache Kafka

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Streaming Data with Apache Kafka

Ähnlich wie Streaming Data with Apache Kafka (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Streaming Data with Apache Kafka