Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Java zone 2015 How to make life with kafka easier.

1.060 Aufrufe

Veröffentlicht am

You’ve just set up your Kafka cluster and now you are ready to process tens of thousands events per second. You decoupled your architecture and now all the communication goes via pubSub bus and you can focus only on providing business value. It would be great if that can be true. In real life you need to do a lot of tweaks to have your backbone ready to handle all the traffic you want.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Java zone 2015 How to make life with kafka easier.

  1. 1. PubSub++ How to make your life with Kafka easier Krzysztof Dębski @DebskiChris JavaZone 2015
  2. 2. Who am I @DebskiChris http://hermes.allegro.tech
  3. 3. Allegro Group 500+ people in IT 50+ independent teams 16 years on market 2 years after technical revolution
  4. 4. Kafka as a backbone
  5. 5. Kafka
  6. 6. Hermes
  7. 7. Kafka data
  8. 8. Partitioning Round robin partitioning (default) Key based partitioning
  9. 9. Performance issues
  10. 10. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  11. 11. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 Brokers that should have partition copies
  12. 12. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 In Sync Replicas
  13. 13. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 Leader broker ID
  14. 14. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  15. 15. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2
  16. 16. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1, 3 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  17. 17. Lost events
  18. 18. ACK levels 0 - don’t wait for response from the leader 1 - only the leader has to respond -1 - all replicas must be in sync Speed Safety
  19. 19. Event identification
  20. 20. Lost events
  21. 21. Lost events ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [test,1] offset 10000 from consumer with correlation id 0. Possible cause: Request for offset 10000 but we only have log segments in the range 8000 to 9000. (kafka.server.ReplicaManager)
  22. 22. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper
  23. 23. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper
  24. 24. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper commited offset = 9000
  25. 25. Slow responses
  26. 26. Slow responses 75% 99% 99,9% responsetime
  27. 27. Slow responses messagesize 75% 99% 99,9% Is response time correlated to message size?
  28. 28. Slow responses responsetime 75% 99% 99,9% Same distribution for fixed message size.
  29. 29. Slow responses responsetime 75% 99% 99,9% Hermes overhead is just about 1 ms.
  30. 30. Kafka kernel 3.2.x
  31. 31. Kafka kernel 3.2.x
  32. 32. Kafka kernel 3.2.x kernel >= 3.8.x
  33. 33. Normal operation
  34. 34. Slow responses
  35. 35. Message size
  36. 36. Optimize message sizemessagesize 99,9% all topics 99,9% biggest topic
  37. 37. Optimize message size JSON human readable big memory and network footprint poor support for Hadoop
  38. 38. Optimize message size JSON Snappy ERROR Error when sending message to topic t3 with key: 4 bytes, value: 100 bytes with error: The server experienced an unexpected error when processing the request (org.apache.kafka.clients.producer.internals. ErrorLoggingCallback) java: target/snappy-1.1.1/snappy.cc:423: char* snappy::internal:: CompressFragment(const char*, size_t, char*, snappy::uint16*, int): Assertion `0 == memcmp(base, candidate, matched)' failed. errors on publishing large amount of messages
  39. 39. Optimize message size JSON Snappy Lz4 failed on distributed data compressionratio single topic multiple topics
  40. 40. Optimize message size JSON Snappy Lz4 Avro small network footprint Hadoop friendly easy schema verification
  41. 41. Improvements
  42. 42. Multi data center
  43. 43. Consumer backoff You can’t have exactly one delivery http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
  44. 44. Kafka offsets <=0.8.1 - Zookeeper >=0.8.2 - Zookeeper or Kafka >=0.9(?) - Kafka
  45. 45. Kafka Offset Monitor
  46. 46. Manage your topics
  47. 47. Improved security Authentication and authorization interfaces provided By Default: You can create any topic in your group You can publish everywhere (in progress) Group owner defines subscriptions
  48. 48. Improved offset management
  49. 49. Improved offset management
  50. 50. Improved offset management
  51. 51. Improved offset management
  52. 52. Turn back the time PUT /groups/{group}/topics/{topic}/subscriptions/{subscription}/retransmission -8h
  53. 53. Blog: http://allegro.tech Twitter: @allegrotechblog Twitter: @debskichris

×