SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Markus Günther
Freelance Software Engineer / Architect
mail@mguenther.net | mguenther.net | @markus_guenther
Streaming Data
with Apache Kafka
2
Point-to-point communication is simple to maintain – especially
if there is only a small number of systems involved.
System
System
3
Adding more systems increases the complexity of
communication channels in this kind of architecture.
System
System
System System
System
System
4
A messaging solution can be used to decouple producing systems
from consuming systems and thus remove that complexity.
Producer
Consumer
Producer Producer
Consumer
Consumer
Messaging Solution
5
Apache Kafka supports this communication model.
Producer
Consumer
Producer Producer
Consumer
Consumer
Apache Kafka Cluster
6
Producers publish data to specific topics, consumers subscribe to
topics of interest and consume data at their own pace.
Producer
Consumer
Producer Producer
Consumer
Consumer
Topic A Topic B Topic C
Consumer
Consumer
7
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics.
History
Intentions ▪ Designed for near-real-time processing of events
▪ Supports multiple delivery semantics
▪ At-least-once
▪ Exactly-once (well, not quite)
▪ Optimized binary protocol for client-to-broker communication
▪ No integration with JMS, …
▪ Apache Kafka originated at LinkedIn
▪ Maintained by the Apache Foundation
▪ Confluent drives further development
▪ Confluent provides various system components that enrich the Kafka ecosystem
8
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics. (cont.)
Innovations ▪ Messages are acknowledged in order
▪ Messages are persisted for days / weeks / indefinite
▪ Consumers manage their offsets
9
Kafka uses a persistent log to implement publish-subscribe
messaging. Publishers append, consumers read sequentially.
9 8 7 6 5 4 3 2 1 0
Producer
publishes
Consumer
consumer group: A
Consumer
consumer group: B
current position: 8 current position: 3
1
1
A Kafka topic is comprised of at least one partition.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
1
Consumers that participate in the same consumer group share the
read workload of an equally partition-sized topic.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
1
Kafka redistributes work if a consumer process fails and is no
longer able to process messages.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
1
A message (or record, or event, or what-have-you) contains
metadata alongside the actual message payload.
Headers
(optional)
Key
(optional)
Value
(set by application)
Timestamp
(set by Kafka or by application)
1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Partition 2
Apache Kafka Cluster
Broker 2
Partition 0
Broker 3
Partition 1
Topic with 3 partitions, replication factor = 1
1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Leader-partition 2
Apache Kafka Cluster
Broker 2
Leader-partition 0
Broker 3
Leader-partition 1
Topic with 3 partitions, replication factor = 2
Follower-partition 0 Follower-partition 1 Follower-partition 2
1
In-Sync-Replica set
for partition 0
The In-Sync-Replica set (ISR) contains all brokers that are either a
leader or a follower for a dedicated topic-partition.
Partition 0
Broker 1
Follower-partition 0
Broker 2
Leader-partition 0
replicate
acknowledge
1
Code, anyone?
1
A reference architecture helps us to sort things into categories that
are driven by certain (non-)functional requirements.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Collection Service
(MQTT)
Collection Service
(HTTP)
Cache
Topic 1
Topic 2
Topic 3
Subscriber 2
(Stream Processor)
Subscriber 3
(Stream Processor)
Search
Engine
RDBMS
Client Application
Subscriber 1
(Stream Processor)
2
Apache Kafka features a rich ecosystem of supporting services that
fit nicely into the tiers of a streaming architecture.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Kafka Connect
(Source Connector)
Kafka Client DSL
(Producing System) Topic 1
Topic 2
Topic 3
Search
Engine
RDBMS
Client Application
Kafka Client DSL
(Consuming System)
Confluent
Schema Registry
Confluent
REST Proxy
Kafka Streams DSL
or ksqlDB
(Stream Processor)
Kafka Connect
(Sink Connector)
Kafka Cluster
2
Want to know more?
Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and
stream processing at scale, O‘Reilly, 2nd Edition, 2021
▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable
Event-Driven Applications, Independently published, 2020
▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly,
2014
▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data
Systems by Example, O‘Reilly, 2021
▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka
and MapR Streams, O‘Reilly, 2016
▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018
▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017
2
Want to know more?
Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen
mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98
▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit
Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87
▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit
Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77
▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38
▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017,
p. 54-58
▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka,
JavaSPEKTRUM, 3/2017, p. 48-51
2
Want to know more?
GitHub
Other ▪ Confluent Developer Portal,
https://developer.confluent.io/
▪ Various blogs on testing, data exploration, etc.,
https://www.mguenther.net/tag/kafka.html/
▪ Kafka for JUnit on GitHub,
https://mguenther.github.io/kafka-junit/
▪ User Guide to Kafka for JUnit,
https://mguenther.github.io/kafka-junit/
▪ Event-sourcing using Spring Kafka,
https://github.com/mguenther/spring-kafka-event-sourcing-sampler
▪ Spring Kafka for Large-Scale Event Processing
https://github.com/mguenther/spring-kafka-event-processing-sampler
▪ Introduction to Spring Kafka
https://github.com/mguenther/spring-kafka-introduction
2
Questions?
mguenther.net markus_guenther
mail@mguenther.net

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 

Ähnlich wie Streaming Data with Apache Kafka

Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann
 

Ähnlich wie Streaming Data with Apache Kafka (20)

Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with KafkaLarge scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
 

Kürzlich hochgeladen

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 

Streaming Data with Apache Kafka

  • 1. Markus Günther Freelance Software Engineer / Architect mail@mguenther.net | mguenther.net | @markus_guenther Streaming Data with Apache Kafka
  • 2. 2 Point-to-point communication is simple to maintain – especially if there is only a small number of systems involved. System System
  • 3. 3 Adding more systems increases the complexity of communication channels in this kind of architecture. System System System System System System
  • 4. 4 A messaging solution can be used to decouple producing systems from consuming systems and thus remove that complexity. Producer Consumer Producer Producer Consumer Consumer Messaging Solution
  • 5. 5 Apache Kafka supports this communication model. Producer Consumer Producer Producer Consumer Consumer Apache Kafka Cluster
  • 6. 6 Producers publish data to specific topics, consumers subscribe to topics of interest and consume data at their own pace. Producer Consumer Producer Producer Consumer Consumer Topic A Topic B Topic C Consumer Consumer
  • 7. 7 Apache Kafka is a distributed publish-subscribe messaging system that supports topic access semantics. History Intentions ▪ Designed for near-real-time processing of events ▪ Supports multiple delivery semantics ▪ At-least-once ▪ Exactly-once (well, not quite) ▪ Optimized binary protocol for client-to-broker communication ▪ No integration with JMS, … ▪ Apache Kafka originated at LinkedIn ▪ Maintained by the Apache Foundation ▪ Confluent drives further development ▪ Confluent provides various system components that enrich the Kafka ecosystem
  • 8. 8 Apache Kafka is a distributed publish-subscribe messaging system that supports topic access semantics. (cont.) Innovations ▪ Messages are acknowledged in order ▪ Messages are persisted for days / weeks / indefinite ▪ Consumers manage their offsets
  • 9. 9 Kafka uses a persistent log to implement publish-subscribe messaging. Publishers append, consumers read sequentially. 9 8 7 6 5 4 3 2 1 0 Producer publishes Consumer consumer group: A Consumer consumer group: B current position: 8 current position: 3
  • 10. 1
  • 11. 1 A Kafka topic is comprised of at least one partition. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2
  • 12. 1 Consumers that participate in the same consumer group share the read workload of an equally partition-sized topic. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2 Consumer Consumer Consumer Consumer group
  • 13. 1 Kafka redistributes work if a consumer process fails and is no longer able to process messages. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2 Consumer Consumer Consumer Consumer group
  • 14. 1 A message (or record, or event, or what-have-you) contains metadata alongside the actual message payload. Headers (optional) Key (optional) Value (set by application) Timestamp (set by Kafka or by application)
  • 15. 1 Topic-partitions are spread across available brokers and can thus span multiple machines in a Apache Kafka cluster. Partition 0 Partition 1 Partition 2 Broker 1 Partition 2 Apache Kafka Cluster Broker 2 Partition 0 Broker 3 Partition 1 Topic with 3 partitions, replication factor = 1
  • 16. 1 Topic-partitions are spread across available brokers and can thus span multiple machines in a Apache Kafka cluster. Partition 0 Partition 1 Partition 2 Broker 1 Leader-partition 2 Apache Kafka Cluster Broker 2 Leader-partition 0 Broker 3 Leader-partition 1 Topic with 3 partitions, replication factor = 2 Follower-partition 0 Follower-partition 1 Follower-partition 2
  • 17. 1 In-Sync-Replica set for partition 0 The In-Sync-Replica set (ISR) contains all brokers that are either a leader or a follower for a dedicated topic-partition. Partition 0 Broker 1 Follower-partition 0 Broker 2 Leader-partition 0 replicate acknowledge
  • 19. 1 A reference architecture helps us to sort things into categories that are driven by certain (non-)functional requirements. Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier Collection Service (MQTT) Collection Service (HTTP) Cache Topic 1 Topic 2 Topic 3 Subscriber 2 (Stream Processor) Subscriber 3 (Stream Processor) Search Engine RDBMS Client Application Subscriber 1 (Stream Processor)
  • 20. 2 Apache Kafka features a rich ecosystem of supporting services that fit nicely into the tiers of a streaming architecture. Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier Kafka Connect (Source Connector) Kafka Client DSL (Producing System) Topic 1 Topic 2 Topic 3 Search Engine RDBMS Client Application Kafka Client DSL (Consuming System) Confluent Schema Registry Confluent REST Proxy Kafka Streams DSL or ksqlDB (Stream Processor) Kafka Connect (Sink Connector) Kafka Cluster
  • 21. 2 Want to know more? Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and stream processing at scale, O‘Reilly, 2nd Edition, 2021 ▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable Event-Driven Applications, Independently published, 2020 ▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly, 2014 ▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example, O‘Reilly, 2021 ▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka and MapR Streams, O‘Reilly, 2016 ▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018 ▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017
  • 22. 2 Want to know more? Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98 ▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87 ▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77 ▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38 ▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017, p. 54-58 ▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka, JavaSPEKTRUM, 3/2017, p. 48-51
  • 23. 2 Want to know more? GitHub Other ▪ Confluent Developer Portal, https://developer.confluent.io/ ▪ Various blogs on testing, data exploration, etc., https://www.mguenther.net/tag/kafka.html/ ▪ Kafka for JUnit on GitHub, https://mguenther.github.io/kafka-junit/ ▪ User Guide to Kafka for JUnit, https://mguenther.github.io/kafka-junit/ ▪ Event-sourcing using Spring Kafka, https://github.com/mguenther/spring-kafka-event-sourcing-sampler ▪ Spring Kafka for Large-Scale Event Processing https://github.com/mguenther/spring-kafka-event-processing-sampler ▪ Introduction to Spring Kafka https://github.com/mguenther/spring-kafka-introduction