Weitere ähnliche Inhalte Ähnlich wie Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit 2020 (20) Mehr von HostedbyConfluent (20) Kürzlich hochgeladen (20) Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit 20201. IBM Event StreamsApache Kafka
© 2020 IBM Corporation
Help, My Kafka’s Broken
Emma Humber
Gantigmaa (Tina) Selenge
Kafka Summit 2020
2. Help my Kafka’s broken
Prepare
Review
© 2020 IBM Corporation 2
Include resource names and routing between
components in a topology diagram.
Collect logs and store JMX metrics published by Kafka
brokers, clients, the JVM and the OS.
Make logs useful.
Change one thing at a time.
3. Help my Kafka’s broken
Prepare
Review
© 2020 IBM Corporation 3
Use logs to create a timeline of events. Consult your
metrics.
Compare with a working system.
Collect data for root cause before restarting.
Understand what you need to find root cause next
time.
5. © 2020 IBM Corporation 5
[2020-09-12 13:44:02,633] INFO Replica loaded for
partition asdf-0 with initial high watermark 0
(kafka.cluster.Replica)
6. Logs
© 2020 IBM Corporation 6
Find a log4j.properties
kafka-install/config/log4j.properties
Edit output location
log4j.appender.kafkaAppender.File= mylog123.log
Change log level
log4j.rootLogger=INFO, stdout, kafkaAppender
log4j.logger.kafka=DEBUG log4j.logger.org.apache.kafka=TRACE
7. Hangs
© 2020 IBM Corporation 7
Collect javacores at intervals.
kill -3 $JAVA_PID
jstack -l $JAVA_PID > javacore.txt
Look for threads that don’t change and deadlock alerts.
9. Memory
Excessive load or a memory leak?
Health Center to analyse heap dumps.
-XX:+HeapDumpOnOutOfMemoryError
© 2020 IBM Corporation 9
10. Memory
© 2020 IBM Corporation 10
Kubernetes can terminate containers that exceed configured resources.
Leave room for native memory allocation and pagecache, as well as heap.
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
11. Garbage collection
Kafka can be sensitive to garbage
collection.
Unexplained delays in processing
increase message latency. Gaps seen
between log time stamps.
© 2020 IBM Corporation 11
13. ZooKeeper
Send 4 letter words to the ZooKeeper cluster to query state
echo “srvr” | nc <zookeeper_ip> 2181
Navigate the ZooKeeper tree
bin/zkCli.sh
If replication is stuck, consider deleting the zkNode representing the controller to
trigger re-election.
© 2020 IBM Corporation 13
15. Broker Broker
Monitoring
Metrics
System
© 2020 IBM Corporation 15
Broker jmx_exporter
server
Prometheus
Alert Manager
Grafana
PagerDuty
Application
Producer
Consumer
jmx_exporter
server
External
Monitoring
Kafka Cluster
JVM
JVM
JVM
16. Monitoring
Metrics
System
© 2020 IBM Corporation 16
Start with a few key metrics.
Alert on carefully selected, key data.
Refine metrics and alerts when you encounter a
problem.
Watch for no metrics!
18. Broker metrics
Partitions
Throughput
Balance
© 2020 IBM Corporation 18
Under replicated partition count.
kafka.server:type=ReplicaManager,name=UnderReplicate
dPartitions
Fewer than minimum in sync replica.
kafka.server:type=ReplicaManager,name=UnderMinIsrPar
titionCount
Offline partitions have no leader.
kafka.controller:
type=KafkaController,name=OfflinePartitionsCount
20. Client metrics
Producer
Consumer
Custom
© 2020 IBM Corporation 20
Monitor metrics showing the trend of flow rates and
latency.
kafka.producer:type=producer-metrics,name=
record-send-rate
record-error-rate
request-rate
request-latency-avg
response-rate
io-wait-time-ns-avg
21. Client metrics
Producer
Consumer
Custom
© 2020 IBM Corporation 21
Understand what is an acceptable lag.
kafka.consumer:type=consumer-fetch-manager-
metrics,name=
records-lag
records-lead-min
records-consumed-rate
kafka-consumer-groups.sh
24. Summary
© 2020 IBM Corporation 24
Monitoring can get you ahead of problem before they happen.
Start with small set of key metrics.
Alert on carefully selected metrics and avoid the bad practices.
Important to monitor the OS level metrics.
25. Thank you
Emma Humber
Support Lead - IBM Event Streams
—
emma.humber@uk.ibm.com
Gantigmaa(Tina) Selenge
DevOps Engineer – IBM Event Streams
—
gselenge@uk.ibm.com
© Copyright IBM Corporation 2020. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of
any kind, express or implied. Any statement of direction represents IBM’s current intent, is subject to change or withdrawal, and represent only goals and objectives. IBM, the IBM logo, and
ibm.com are trademarks of IBM Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available at Copyright and trademark information.
© 2020 IBM Corporation 25