SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Common issues with Apache
Kafka® Producer
Badai Aqrandista, Senior Technical Support Engineer
Introduction
2
• My name is BADAI AQRANDISTA
• I started as a web developer, building website with Perl
and PHP in 2005.
• Experience supporting applications on Linux/UNIX
environment, from hotel booking engine,
telecommunication billing system, and mining equipment
monitoring system.
• Currently working for Confluent as Senior Technical
Support Engineer.
Kafka in a nutshell
3
• Kafka is a Pub/Sub system
• Kafka Producer sends record into Kafka
broker
• Kafka Consumer fetches record from
Kafka broker
• Kafka broker persists any data it receives
until retention period expires
PRODUCER CONSUME
R
Kafka Producer Internals
Kafka Producer Internals
5
• KafkaProducer API:
• public Future<RecordMetadata> send(ProducerRecord<K,V> record)
• public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback)
• KafkaProducer#send method is asynchronous.
• It does not immediately send the record to Kafka broker.
• It puts the record in an internal queue and an internal queue will send multiple records as a
batch.
Batch
Record
Key
Value
Record
Key
Value
Record
Key
Value
Kafka Producer Internals
6
• Each Kafka Producer batch corresponds to a partitions.
• Kafka Producer determines the batch to append a record to based on the record key.
• If record key is “null”, Kafka Producer will choose the batch randomly.
• If record key is not “null”, Kafka Producer will use the hash of the record key to determine
the partition number.
• One or more batches are sent to the Kafka broker in a PRODUCE request.
Kafka Producer Internals
7
• Kafka Producer internal thread sends a batch to Kafka broker based on these
configuration:
• “batch.size” – defaults to 16 kB
• “linger.ms” – defaults to 0
• So, Kafka Producer internal thread sends a batch to Kafka broker when:
• The total size of records in the batch exceeds “batch.size”, or
• The time since batch creation exceeds “linger.ms”, or
• Kafka Producer ”flush()” method is called (directly or indirectly via “close()”).
• Kafka Producer only creates one connection to each broker.
• In the end, every batch for a Kafka broker must be sent sequentially through this one
connection.
• The maximum number of batches sent to each broker at any one time is controlled by
“max.in.flight.requests.per.connection”, which defaults to 5.
Kafka Producer Issues
Kafka Producer Issues
9
1. Failure to connect to Kafka broker
2. Record is too large
3. Batch expires before sending
4. Not enough replicas error
Failure to connect to Kafka broker
10
• This error is not obvious, but it means failure to connect to Kafka broker.
• The error message looks like this:
• [2021-08-02 12:57:44,097] WARN [Producer clientId=producer-1] Connection to node -1
(kafka1/172.20.0.6:9093) could not be established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
• How to fix this:
• Check the broker configuration to confirm the listener port and security protocol
• Check the hostname or the IP address of the broker
• Confirm that Kafka Producer’s bootstrap.server configuration is correct
• Confirm that connectivity exists between Kafka Producer’s host and Kafka broker hosts with commands
such as:
• ping {BROKER_HOST}
• nc {BROKER_HOST} {BROKER PORT}
• openssl s_client -connect {BROKER_HOST}:{BROKER_PORT}
Record is too large
11
• This error is because the record size is greater than “max.request.size” configuration, which
defaults to 1048576 (1 MB).
• The error message is like this:
• org.apache.kafka.common.errors.RecordTooLargeException: The message is 1600088 bytes when
serialized which is larger than 1048576, which is the value of the max.request.size configuration.
• How to fix it:
• Reduce the record size. This requires a change in the application that generates the record.
• If you cannot reduce the record size, you can increase producer configuration “max.request.size”. If you
do this, you also need to increase topic configuration “max.message.bytes”.
• Note: “max.request.size” is the maximum request size AFTER serialization but BEFORE
compression. So, setting compression will not fix this.
Batch expires before sending
12
• This error is a symptom of slow transfer time (on network) or slow processing (on Kafka
broker).
• The error looks like this:
• org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test1-0:1500 ms has
passed since batch creation
• Sanity checks:
• Is the topic partition online? Topic partition is online if one or more Kafka brokers hosting the replicas
are online.
• Use “kafka-topics --bootstrap-server {BROKER HOST:PORT} --describe --topic {TOPIC NAME}”
• “delivery.timeout.ms” – An upper bound on the time to report success or failure after a call to send()
returns.
• The default value is 120000 ms (2 minutes).
• If ”delivery.timeout.ms” is set to a very low value, it can cause batches to be expired too early.
• “batch.size” – The maximum size of a record batch.
• The default value is 16384 bytes (16 kB).
• If the message size is large, this configuration may need to be increased to allow more records per
batch. More records per batch means higher throughput and lower latency per record.
Batch expires before sending
13
• How to investigate this issue (cont’d):
• First, we need to identify whether this is caused by slow transfer time or slow processing.
• To check if it is slow transfer time, execute “ping {BROKER HOST}” from the producer host. The round trip time
(RTT) should be reasonable. For example: If both producer and Kafka brokers are in the same data center, the
RTT should be less than 10 ms, mostly should be under 1 ms.
• If ”ping” result is good (i.e. consistently under 10 ms with 0% packet loss), then network latency is unlikely
to be the cause.
• To check if it is slow processing, check the following on Kafka brokers:
• Number of connections on the Kafka broker with “netstat -n | grep 9092 | wc -l”. More than 1000
connections is usually too high and can cause slow processing or connectivity issue.
• Number of topic partitions per broker. More than 1500 partitions per broker is usually too high and can
cause slow processing. Check it with “kafka-topics --describe | awk ‘{print $5, $6}’ | sort | uniq –c”.
• If Kafka broker host has enough CPU and memory, then you can increment “num.replica.fetchers” to 2 or 3 to allow
more partitions per broker.
• Inter-broker ”ping” latency. If the brokers are running on multiple data center (e.g. multiple Availability
Zone), then this may be significant contributor to produce latency.
• CPU usage of Kafka brokers. Following JMX metrics also show the internal thread idle-ness if you need:
• kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent – if this is low (< 0.5), that
means it needs higher “num.io.threads”, if CPU allows.
• kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent – if this is low (< 0.5), that means it
needs higher “num.network.threads”, if CPU allows.
Not enough replicas error
14
• This means the number of replicas in ISR is less than “min.insync.replicas” configuration.
• The error looks like this:
• [2021-08-03 01:34:05,077] WARN [Producer clientId=producer-1] Got error produce response with
correlation id 3 on topic-partition test2-0, retrying (2147483646 attempts left). Error:
NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender)
• This error occurs when:
• Topic replication factor is 3.
• Topic configuration includes “min.insync.replicas=2”.
• Producer uses “acks=all” configuration.
Not enough replica error
15
• What is ISR? Short for “In Sync Replicas”. This means the follower replicas that are in sync
with the leader. In other word, the follower replicas that have all records that the leader
replica has.
• How can a replica become out of sync? Either because the broker is offline or replication
failure or slow replication.
• How to fix this error:
• If it is out of sync because Kafka broker being offline, start the broker hosting the offline replicas.
• If it is out of sync because of replication failure, fix the failure. This is separate discussion. But the most
common one is disk failure. If the disk storing the replica data is full, Kafka broker will stop replicating all
replicas on that disk.
• If it is out of sync because of slow replication, fix the slow replication. This is also separate discussion.
But the most common cause is inter-broker latency or too many topic partitions per broker.
Thank you. Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Building Microservices with Apache Kafka
Building Microservices with Apache KafkaBuilding Microservices with Apache Kafka
Building Microservices with Apache Kafkaconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQAraf Karsh Hamid
 

Was ist angesagt? (20)

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Building Microservices with Apache Kafka
Building Microservices with Apache KafkaBuilding Microservices with Apache Kafka
Building Microservices with Apache Kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 

Ähnlich wie Common issues with Apache Kafka® Producer

Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability Jeff Holoman
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Otávio Carvalho
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleScyllaDB
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101HungWei Chiu
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
 

Ähnlich wie Common issues with Apache Kafka® Producer (20)

Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Common issues with Apache Kafka® Producer

  • 1. Common issues with Apache Kafka® Producer Badai Aqrandista, Senior Technical Support Engineer
  • 2. Introduction 2 • My name is BADAI AQRANDISTA • I started as a web developer, building website with Perl and PHP in 2005. • Experience supporting applications on Linux/UNIX environment, from hotel booking engine, telecommunication billing system, and mining equipment monitoring system. • Currently working for Confluent as Senior Technical Support Engineer.
  • 3. Kafka in a nutshell 3 • Kafka is a Pub/Sub system • Kafka Producer sends record into Kafka broker • Kafka Consumer fetches record from Kafka broker • Kafka broker persists any data it receives until retention period expires PRODUCER CONSUME R
  • 5. Kafka Producer Internals 5 • KafkaProducer API: • public Future<RecordMetadata> send(ProducerRecord<K,V> record) • public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback) • KafkaProducer#send method is asynchronous. • It does not immediately send the record to Kafka broker. • It puts the record in an internal queue and an internal queue will send multiple records as a batch. Batch Record Key Value Record Key Value Record Key Value
  • 6. Kafka Producer Internals 6 • Each Kafka Producer batch corresponds to a partitions. • Kafka Producer determines the batch to append a record to based on the record key. • If record key is “null”, Kafka Producer will choose the batch randomly. • If record key is not “null”, Kafka Producer will use the hash of the record key to determine the partition number. • One or more batches are sent to the Kafka broker in a PRODUCE request.
  • 7. Kafka Producer Internals 7 • Kafka Producer internal thread sends a batch to Kafka broker based on these configuration: • “batch.size” – defaults to 16 kB • “linger.ms” – defaults to 0 • So, Kafka Producer internal thread sends a batch to Kafka broker when: • The total size of records in the batch exceeds “batch.size”, or • The time since batch creation exceeds “linger.ms”, or • Kafka Producer ”flush()” method is called (directly or indirectly via “close()”). • Kafka Producer only creates one connection to each broker. • In the end, every batch for a Kafka broker must be sent sequentially through this one connection. • The maximum number of batches sent to each broker at any one time is controlled by “max.in.flight.requests.per.connection”, which defaults to 5.
  • 9. Kafka Producer Issues 9 1. Failure to connect to Kafka broker 2. Record is too large 3. Batch expires before sending 4. Not enough replicas error
  • 10. Failure to connect to Kafka broker 10 • This error is not obvious, but it means failure to connect to Kafka broker. • The error message looks like this: • [2021-08-02 12:57:44,097] WARN [Producer clientId=producer-1] Connection to node -1 (kafka1/172.20.0.6:9093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) • How to fix this: • Check the broker configuration to confirm the listener port and security protocol • Check the hostname or the IP address of the broker • Confirm that Kafka Producer’s bootstrap.server configuration is correct • Confirm that connectivity exists between Kafka Producer’s host and Kafka broker hosts with commands such as: • ping {BROKER_HOST} • nc {BROKER_HOST} {BROKER PORT} • openssl s_client -connect {BROKER_HOST}:{BROKER_PORT}
  • 11. Record is too large 11 • This error is because the record size is greater than “max.request.size” configuration, which defaults to 1048576 (1 MB). • The error message is like this: • org.apache.kafka.common.errors.RecordTooLargeException: The message is 1600088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration. • How to fix it: • Reduce the record size. This requires a change in the application that generates the record. • If you cannot reduce the record size, you can increase producer configuration “max.request.size”. If you do this, you also need to increase topic configuration “max.message.bytes”. • Note: “max.request.size” is the maximum request size AFTER serialization but BEFORE compression. So, setting compression will not fix this.
  • 12. Batch expires before sending 12 • This error is a symptom of slow transfer time (on network) or slow processing (on Kafka broker). • The error looks like this: • org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test1-0:1500 ms has passed since batch creation • Sanity checks: • Is the topic partition online? Topic partition is online if one or more Kafka brokers hosting the replicas are online. • Use “kafka-topics --bootstrap-server {BROKER HOST:PORT} --describe --topic {TOPIC NAME}” • “delivery.timeout.ms” – An upper bound on the time to report success or failure after a call to send() returns. • The default value is 120000 ms (2 minutes). • If ”delivery.timeout.ms” is set to a very low value, it can cause batches to be expired too early. • “batch.size” – The maximum size of a record batch. • The default value is 16384 bytes (16 kB). • If the message size is large, this configuration may need to be increased to allow more records per batch. More records per batch means higher throughput and lower latency per record.
  • 13. Batch expires before sending 13 • How to investigate this issue (cont’d): • First, we need to identify whether this is caused by slow transfer time or slow processing. • To check if it is slow transfer time, execute “ping {BROKER HOST}” from the producer host. The round trip time (RTT) should be reasonable. For example: If both producer and Kafka brokers are in the same data center, the RTT should be less than 10 ms, mostly should be under 1 ms. • If ”ping” result is good (i.e. consistently under 10 ms with 0% packet loss), then network latency is unlikely to be the cause. • To check if it is slow processing, check the following on Kafka brokers: • Number of connections on the Kafka broker with “netstat -n | grep 9092 | wc -l”. More than 1000 connections is usually too high and can cause slow processing or connectivity issue. • Number of topic partitions per broker. More than 1500 partitions per broker is usually too high and can cause slow processing. Check it with “kafka-topics --describe | awk ‘{print $5, $6}’ | sort | uniq –c”. • If Kafka broker host has enough CPU and memory, then you can increment “num.replica.fetchers” to 2 or 3 to allow more partitions per broker. • Inter-broker ”ping” latency. If the brokers are running on multiple data center (e.g. multiple Availability Zone), then this may be significant contributor to produce latency. • CPU usage of Kafka brokers. Following JMX metrics also show the internal thread idle-ness if you need: • kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent – if this is low (< 0.5), that means it needs higher “num.io.threads”, if CPU allows. • kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent – if this is low (< 0.5), that means it needs higher “num.network.threads”, if CPU allows.
  • 14. Not enough replicas error 14 • This means the number of replicas in ISR is less than “min.insync.replicas” configuration. • The error looks like this: • [2021-08-03 01:34:05,077] WARN [Producer clientId=producer-1] Got error produce response with correlation id 3 on topic-partition test2-0, retrying (2147483646 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender) • This error occurs when: • Topic replication factor is 3. • Topic configuration includes “min.insync.replicas=2”. • Producer uses “acks=all” configuration.
  • 15. Not enough replica error 15 • What is ISR? Short for “In Sync Replicas”. This means the follower replicas that are in sync with the leader. In other word, the follower replicas that have all records that the leader replica has. • How can a replica become out of sync? Either because the broker is offline or replication failure or slow replication. • How to fix this error: • If it is out of sync because Kafka broker being offline, start the broker hosting the offline replicas. • If it is out of sync because of replication failure, fix the failure. This is separate discussion. But the most common one is disk failure. If the disk storing the replica data is full, Kafka broker will stop replicating all replicas on that disk. • If it is out of sync because of slow replication, fix the slow replication. This is also separate discussion. But the most common cause is inter-broker latency or too many topic partitions per broker.
  • 16. Thank you. Any questions?