"I will be presenting how we do the smart/automated capacity management on Multi-tenant Kafka cluster in Booking.com. It was a long journey. In this end to end story, I will be presenting what the issues were at the beginning, how we came up with a plan, designed, implemented, and applied to our existing clusters smoothly, now how the clients can monitor and even get alerted before their reserved capacity has been reached. What were the challenges and our learnings? What is next?
Why? In Booking.com, the infra team manages 60 different Kafka clusters with hundreds of topics in each. There are clusters running with hundred brokers. As there are hundreds of Kafka clients from tens of different departments, it is high likely some of the clients start abusing the cluster. Especially during peak times, when the retention was set as retention.ms, or when the underlying message size changes, it is hard to predict what would be the occupied storage in total. Finding the relevant clients, deciding which data to discard, dealing with so many unknowns in a short period of time can be hassle. Also these are not fun activities but just a toil for the team.
What? To avoid such boring issues, the team has chosen the path to build a smart mechanism and have quotas in place. It helped saving time developing new features instead of chasing people to resolve collisions. You can think that as an extension to the built-in throttling producer/consumer rate limits provided by the Apache Kafka, but it is much more than that. There are several components will be explained during the presentation one of them is our control plane (custom built) which manages the communication between clients and servers and does many things automated.
Another one is the Custom Policies that we plugged in on the Kafka side to validate the configuration even tried (malicious configuration) on the server side. The talk guarantees learning and shows examples of Kafka at scale problems in Booking.com."
3. Streaming Infra Team
Nurettin OMEROGLU
Senior Software Engineer
I am a member of Streaming Infra Team (10
people) and have more than 4 years of expertise
on Apache Kafka client and server side
components. We manage on-prem Kafka solution
serving to clients running on variety of platforms
such as bare-metal, kubernetes and also Cloud
6. 100M
monthly active
app users
155,000
destinations around the world
Car hire available in 140+countries
and pre-booked taxis in
over 500cities across 120+
countries
243M+
verified guest reviews
and 24/7
customer service
in 45
languages and dialects
Since 2010,
Booking.com has
welcomed
4.5B+
guest arrivals
28M
total reported
listings
worldwide
6.6M
options in homes,
apartments and
other unique
places to stay
30
different types of
places to stay,
including homes,
apartments, B&Bs,
hostels, farm stays,
bungalows, even
boats, igloos and
treehouses
140offices in 70countries over
5,000employees in Amsterdam
7. Payments
A/B Tests
MySQL
Cassandra
Hadoop
Cloud
...
Events
Logs
Online ML
Fraud detection
Personalization
Bookings FPA reporting
Data Streaming
Platform
MySQL
Cassandra
Hadoop
Cloud
...
● Transports and transposes data via pub/sub;
● Connects application through data pipeline
● Resilient, scalable, fault tolerant, secure, with SLO guarantees;
Real-time
analytics
8. Scale of Streaming @Booking.com
How much data? ~2.2PB
produced and consumed per day
How many clusters? 62
How many topics? ~34K
How many partitions? ~138K
How many servers? 900 kafka brokers
+75 zk
10. Setup
● On-premise multi-tenant kafka clusters running on bare-metal
● Local SSD storage (~3.5TB per broker)
● 32 thread CPU / 256MB memory / 10 Gb network
11. Existing Components
● Custom Configuration validations
● Custom Quota validations
○ Topics per principal
○ Partitions per principal
…
● Topics
● Custom quotas
(booking-specific)
…
● Specific Configurations
● Custom Quotas
…
● Custom PrincipalBuilder
● Custom Policies
(AbstractPolicy)
○ AlterConfigPolicy
○ CreateTopicPolicy
…
Mysql
(Metadata
Store)
Bkstreaming CLI
(Self-service, home-built)
Kontrole
(Control Center, home-built)
Kafka Cluster
12. Example Scenario for Custom Quota Validations
(2) Auth: OK
(3) Topics per principal quota: OK
(4) Partitions per principal quota: OK
(1) Add topic for a service
(5) Create topic
Mysql
(Metadata
Store)
Kontrole
(Control Center)
Kafka Cluster
13. Reactive Approach
● Clients use retention.ms configuration
retention.ms - which deletes messages after a
certain amount of time.
● Dangerous situations if traffic spikes
● We were the middleman handling the toil /
issues between multiple tenants
○ Increase number of brokers, or
○ Determine noisy neighbors and
■ Throttle, or
■ Communicate with clients (night?)
● Lack of visibility and forecasting to plan ahead
reserved space for safety
Topic 1
Shared broker disk among topics
Topic 4
Topic 2
Topic 5
Topic 3
Topic 6
15. IDEA?
retention.bytes - which deletes the oldest messages
when the total size of a partition exceeds a threshold.
● Reserve storage per principal (quota)
● Let the clients manage their reserved storage
● Make retention.bytes mandatory on topic
● Feedback to clients around their usage/growth
Discarded Options:
● Kubernetes elasticity
● Network attached or remote storage options
reserved space for safety
Reserved quotas per principal
Principal
quota
Principal
quota
Principal
quota
16. Determine cluster capacity
1) Periodically fetch
information from Cruise
Control about the cluster
Number of available
brokers, disk information …
2) Use min disk capacity
among brokers to calculate
cluster capacity
3) Target 90% disk usage
(headroom)
Total capacity = (min broker disk * number of brokers) *
0.9
Kontrole
Cruise
Control
Graphite
(1) Periodic cron job
(2) Available brokers,
disk information
(3) Calculate capacity,
Publish metrics
17. New Quota + Topic level configuration
● Reserve storage per principal (quota) (default 500MB)
● Add property `topic_capacity_bytes` per Kafka topic (not visible to
Kafka brokers) to manage retention.bytes
● We do all the calculations under this value (including retention.bytes)
topic_capacity_bytes = retention.bytes * partition_count * replica_count
● Whenever there is a partition count increase (i.e. done via Kontrole),
retention.bytes (per partition) is re-calculated accordingly.
19. New Topic Creation
Kontrole
mysql
(1) Create topic
with topic_capacity_bytes
(2) Get principal’s quota
(3) Enough space for the new topic?
(4) No, reject. Ask for quota increase
(4) Yes, topic fits, go on!
Create topic with relevant
retention.bytes
Kafka Cluster
22. Add Alerting
● Warn/notify before topic_capacity_bytes configuration kicks in and start
deleting data.
● Actions:
○ reduce the retention.ms configuration, or
○ increase the topic capacity.
23. Onboard Existing Clusters
● Simulating scenarios on test cluster
● Operational documentation
● Stakeholder management
● Documentation for clients
● Enable capacity project on a cluster
○ Calculate / Add topic_capacity_bytes to each topic (with extra)
○ Calculate / Add quotas per principal
24. Migration Challenges
● Revert strategy
○ Dynamic flag to disable the project on cluster
● Sanity check if cluster is suitable
○ Brokers may have non-uniform storage capacity
○ With extras, all quotas may not fit into the available capacity
26. What is next?
● Allow teams to extend their quota if there is enough capacity
(self service)
● Send usage report to the teams, with the capacity allocated to the
principal vs. their usage
(cost attribution)