SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Deep Dive into Apache
Kafka
Jun Rao
co-founder of Confluent
Kafka Usage
Agenda
• High throughput
• Reliability and durability
• Compacted topic
Scaled-out Architecture
5
Kafka cluster
broker 1
…
producer producer producer
consumer consumer
broker 2 broker n topic partition
Persistent Log
6
Detailed Log Representation
7
offset 0 - 10000
timestamp
index
offset
index
offset 10001 - 20000 offset 20001 - 30000
offset
index
offset
index
timestamp
index
timestamp
index
Message Format
offset
message
length CRC timestamp
key
length
key
content
value
length
value
content
8 bytes 4 bytes 4 bytes
magic byte
1 byte
attribute
1 byte
8 bytes 4 bytes varies 4 bytes varies
Batching and Compression
compressed
batch 1send()
send()
send()
send()
producer
async
flush
poll()compressed
batch 2
compressed
batch 3
compressed
batch 1
compressed
batch 2
compressed
batch 3
consumerbroker
Agenda
• High throughput
• Reliability and durability
• Compacted topic
Kafka Replication
• Configurable replication factor
• Tolerating f – 1 failures with f replicas
• Unlike quorum based replication
• Automated failover
Replicas and Layout
• Topic partition has replicas
• Replicas spread evenly among brokers
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
High Level Data Flow in Replication
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
When producer receives ack Latency Durability on failures
acks=0 (no ack) no network delay some data loss
acks=1 (wait for leader) 1 network roundtrip a few data loss
acks=all (wait for committed) 2 network roundtrips no data loss
topic1-part1 topic1-part1 topic1-part1
consumer
1
Extend to Multiple Partitions
Leaders are evenly spread among brokers
broker 1 broker 2
topic3-part1
follower
broker 3
topic3-part1
follower
topic1-part1
producer
leader
topic1-part1
follower
topic1-part1
follower
broker 4
topic3-part1
leader
producer
topic2-part1
producer
leader
topic2-part1
follower
topic2-part1
follower
In-sync Replicas (ISR)
broker 1
producer
leader
broker 2
follower
broker 3
follower
2
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
last
committed
m2, m1
In-sync : replica reads from leader’s log
end within replica.lag.time.max.ms
Follower Failure
broker 1
producer
leader
broker 2
follower
broker 3
follower
2
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
last
committed
Shrinking ISR
broker 1
producer
leader
broker 2
follower
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m4, m3
follower
Failed Replica Coming Back
broker 1
producer
leader
broker 2
follower
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m3
2
follower
Leader Failure
broker 1
producer
leader
broker 2
follower
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m3
2
follower
Selecting New Leader from ISR
broker 1
producer
leader
broker 2
leader
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m3
follower
Expanding ISR
broker 1
producer
leader
broker 2
leader
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4
last
committed
m3
follower
m4
m5 m5
m5
m5
Unclean Leader Election
broker 1
producer
leader ???
broker 2
leader
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m3
follower
m4
m5
m5
Guaranteed Replicas
broker 1
producer
broker 2
leader
broker 3
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
m3 m3
m4 m4
last
committed
m3
m4
m5
ISR > min.insync.replicas ?
m6
Mission Critical Data
• Disable Unclean Leader Election
• unclean.leader.election.enable = false
• Set replication factor
• default.replication.factor = 3
• Set minimum ISRs
• min.insync.replicas = 2
• Set producer acks
• acks = all
24
Failure Detection and Controller Flow
broker 1 broker 2
topic3-part1
follower
broker 3
topic3-part1
follower
topic1-part1
controller
leader
topic1-part1
follower
topic1-part1
follower
broker 4
topic3-part1
leader
topic2-part1
leader
topic2-part1
follower
topic2-part1
follower
new leaders and ISRs
Agenda
• High throughput
• Reliability and durability
• Compacted topic
Log Compaction: What is it?
Use Case
product
catalog search
index
item1  “new description”
Adding a New Index Instance
product
catalog search
index
item1  “new description”
new
search
index
Using a Compacted Topic
product
catalog search
index
item1  “new description”
new
search
index
set consumer offset to 0
Log Cleaner Implementation
offset 3001 - 4000
firstDirty
1. build map
2. scan and probe map
firstDirty
offset 2001 - 3000offset 1001 - 2000 offset 4001 - 5000 offset 5001 - 6000
offset 3001 - 5000offset 1001 - 3000 offset 5001 - 6000
key1 3500
key2 3700
key3 4200key1 1500
key last offset
reject
key4 2100 keep
key3 4200 keep
Cleaning Configs
• log.cleaner.min.cleanable.ratio (default 0.5)
• dirty/total ratio when log cleaner is triggered
• log.cleaner.io.max.bytes.per.second (default infinite)
• Max rate cleaning can be done
• Can be used for throttling
Be Careful with Deletes
• Delete tombstone modeled as null message
• Danger of removing a deleted key too soon
• Consumer still assumes the old value with the key
• log.cleaner.delete.retention.ms (default 1 day)
• “Delete tombstone” removed after that time
• Consumer needs to finish consuming the tombstone before that
time
Summary
• Apache Kafka is a streaming platform
• The storage part supports
• High throughput
• High availability and durability
• Retaining database-like data
Coming Up Next
Date Title Speaker
10/27 Data Integration with Kafka Gwen Shapira
11/17 Demystifying Stream Processing Neha Narkhede
12/1 A Practical Guide To Selecting A Stream
Processing Technology
Michael Noll
12/15 Streaming in Practice: Putting Apache
Kafka in Production
Roger Hoover

Weitere ähnliche Inhalte

Was ist angesagt?

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 

Was ist angesagt? (20)

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 

Ähnlich wie Deep Dive into Apache Kafka

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical OverviewSylvester John
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Microservices interaction at scale using Apache Kafka
Microservices interaction at scale using Apache KafkaMicroservices interaction at scale using Apache Kafka
Microservices interaction at scale using Apache KafkaIvan Ursul
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Databricks
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup Ran Levy
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)Erhwen Kuo
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experienceRico Chen
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 

Ähnlich wie Deep Dive into Apache Kafka (20)

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Microservices interaction at scale using Apache Kafka
Microservices interaction at scale using Apache KafkaMicroservices interaction at scale using Apache Kafka
Microservices interaction at scale using Apache Kafka
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experience
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 

Kürzlich hochgeladen

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Kürzlich hochgeladen (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Deep Dive into Apache Kafka

  • 1. Deep Dive into Apache Kafka Jun Rao co-founder of Confluent
  • 3.
  • 4. Agenda • High throughput • Reliability and durability • Compacted topic
  • 5. Scaled-out Architecture 5 Kafka cluster broker 1 … producer producer producer consumer consumer broker 2 broker n topic partition
  • 7. Detailed Log Representation 7 offset 0 - 10000 timestamp index offset index offset 10001 - 20000 offset 20001 - 30000 offset index offset index timestamp index timestamp index
  • 8. Message Format offset message length CRC timestamp key length key content value length value content 8 bytes 4 bytes 4 bytes magic byte 1 byte attribute 1 byte 8 bytes 4 bytes varies 4 bytes varies
  • 9. Batching and Compression compressed batch 1send() send() send() send() producer async flush poll()compressed batch 2 compressed batch 3 compressed batch 1 compressed batch 2 compressed batch 3 consumerbroker
  • 10. Agenda • High throughput • Reliability and durability • Compacted topic
  • 11. Kafka Replication • Configurable replication factor • Tolerating f – 1 failures with f replicas • Unlike quorum based replication • Automated failover
  • 12. Replicas and Layout • Topic partition has replicas • Replicas spread evenly among brokers topic1-part1 logs broker 1 topic1-part2 logs broker 2 topic2-part2 topic2-part1 logs broker 3 topic1-part1 logs broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1
  • 13. High Level Data Flow in Replication broker 1 producer leader broker 2 follower broker 3 follower 4 2 2 3 commit ack When producer receives ack Latency Durability on failures acks=0 (no ack) no network delay some data loss acks=1 (wait for leader) 1 network roundtrip a few data loss acks=all (wait for committed) 2 network roundtrips no data loss topic1-part1 topic1-part1 topic1-part1 consumer 1
  • 14. Extend to Multiple Partitions Leaders are evenly spread among brokers broker 1 broker 2 topic3-part1 follower broker 3 topic3-part1 follower topic1-part1 producer leader topic1-part1 follower topic1-part1 follower broker 4 topic3-part1 leader producer topic2-part1 producer leader topic2-part1 follower topic2-part1 follower
  • 15. In-sync Replicas (ISR) broker 1 producer leader broker 2 follower broker 3 follower 2 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR last committed m2, m1 In-sync : replica reads from leader’s log end within replica.lag.time.max.ms
  • 16. Follower Failure broker 1 producer leader broker 2 follower broker 3 follower 2 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR last committed
  • 17. Shrinking ISR broker 1 producer leader broker 2 follower broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m4, m3 follower
  • 18. Failed Replica Coming Back broker 1 producer leader broker 2 follower broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m3 2 follower
  • 19. Leader Failure broker 1 producer leader broker 2 follower broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m3 2 follower
  • 20. Selecting New Leader from ISR broker 1 producer leader broker 2 leader broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m3 follower
  • 21. Expanding ISR broker 1 producer leader broker 2 leader broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4 last committed m3 follower m4 m5 m5 m5
  • 22. m5 Unclean Leader Election broker 1 producer leader ??? broker 2 leader broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m3 follower m4 m5
  • 23. m5 Guaranteed Replicas broker 1 producer broker 2 leader broker 3 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 m3 m3 m4 m4 last committed m3 m4 m5 ISR > min.insync.replicas ? m6
  • 24. Mission Critical Data • Disable Unclean Leader Election • unclean.leader.election.enable = false • Set replication factor • default.replication.factor = 3 • Set minimum ISRs • min.insync.replicas = 2 • Set producer acks • acks = all 24
  • 25. Failure Detection and Controller Flow broker 1 broker 2 topic3-part1 follower broker 3 topic3-part1 follower topic1-part1 controller leader topic1-part1 follower topic1-part1 follower broker 4 topic3-part1 leader topic2-part1 leader topic2-part1 follower topic2-part1 follower new leaders and ISRs
  • 26. Agenda • High throughput • Reliability and durability • Compacted topic
  • 28. Use Case product catalog search index item1  “new description”
  • 29. Adding a New Index Instance product catalog search index item1  “new description” new search index
  • 30. Using a Compacted Topic product catalog search index item1  “new description” new search index set consumer offset to 0
  • 31. Log Cleaner Implementation offset 3001 - 4000 firstDirty 1. build map 2. scan and probe map firstDirty offset 2001 - 3000offset 1001 - 2000 offset 4001 - 5000 offset 5001 - 6000 offset 3001 - 5000offset 1001 - 3000 offset 5001 - 6000 key1 3500 key2 3700 key3 4200key1 1500 key last offset reject key4 2100 keep key3 4200 keep
  • 32. Cleaning Configs • log.cleaner.min.cleanable.ratio (default 0.5) • dirty/total ratio when log cleaner is triggered • log.cleaner.io.max.bytes.per.second (default infinite) • Max rate cleaning can be done • Can be used for throttling
  • 33. Be Careful with Deletes • Delete tombstone modeled as null message • Danger of removing a deleted key too soon • Consumer still assumes the old value with the key • log.cleaner.delete.retention.ms (default 1 day) • “Delete tombstone” removed after that time • Consumer needs to finish consuming the tombstone before that time
  • 34. Summary • Apache Kafka is a streaming platform • The storage part supports • High throughput • High availability and durability • Retaining database-like data
  • 35. Coming Up Next Date Title Speaker 10/27 Data Integration with Kafka Gwen Shapira 11/17 Demystifying Stream Processing Neha Narkhede 12/1 A Practical Guide To Selecting A Stream Processing Technology Michael Noll 12/15 Streaming in Practice: Putting Apache Kafka in Production Roger Hoover

Hinweis der Redaktion

  1. It’s widely used and in production at thousands of companies. Let’s walk through the the basics of Kafka and understand how it acts as a streaming platform.
  2. Now we’ve talked about these two motivations for streams---solving pipline spawl and asynchronous stream processing. It won’t surprise anyone that when I talk about this streaming platform that enables these pipelines and processing I am talking about Apache Kafka.
  3. New producer api. Be aware of non-java api.
  4. Put partitions inside topic? Maybe useful to use another picture. 3-tier
  5. New producer api. Be aware of non-java api.
  6. Leaders spread over brokers. Another picture. Animated transitioin.
  7. New producer api. Be aware of non-java api.
  8. Actually in 0.8.1
  9. Actually in 0.8.1