The Dark and Dirty Side of Fixing Uneven Partitions with Olena Babenko & Olena Kutsenko

The dark and dirty side of
fixing uneven partitions
Olena Kutsenko
Sr. Developer Advocate
Aiven
Olena Babenko
Staff Software Engineer
Aiven

olena@aiven.io
@OlenaKutsenko aiven.io Olena Babenko:
It all started well…

olena@aiven.io
Recommended strategies for partitioning
➔ Select number of partitions based on how data is consumed

olena@aiven.io
➔ Select number of partitioning neither too low nor to high

olena@aiven.io
➔ Use keys with the highest cardinality

olena@aiven.io
➔ Be mindful of data distribution over time

olena@aiven.io
➔ Be mindful of data distribution over time
➔ Consider potential edge cases

olena@aiven.io
You were pretty happy
about the results

olena@aiven.io
Nothing predicted the storm

olena@aiven.io
Or so you thought

olena@aiven.io
Partition 1 47%
Partition 2 34%
Partition 3 7%
Partition 4 4%
Partition 5 4%
Partition 6 4%
Data balancing gone wild

olena@aiven.io
How uneven partitions affect the system

olena@aiven.io
➔ Brokers:

olena@aiven.io
➔ Brokers:
◆ Heavy load on the file system -> slower brokers

olena@aiven.io
➔ Brokers:
➔ Consumers:

olena@aiven.io
➔ Brokers:
➔ Consumers:
◆ Increased consumer lag

olena@aiven.io
➔ Brokers:
➔ Consumers:
◆ Consumers that are assigned to a hot partition require bigger resources

olena@aiven.io
➔ Brokers:
➔ Consumers:
◆ Underutilisation of resources when vertical scaling with k8s

olena@aiven.io
➔ Brokers:
➔ Consumers:
◆ Underutilisation of resources when vertical scaling with k8s
◆ Out-of-memory exception cycle

olena@aiven.io
What to do now?

olena@aiven.io
“Premature optimization
is the root of all evil”
Donald Knuth

olena@aiven.io
You can’t avoid the change.
Embrace the inevitable.

olena@aiven.io
Today you’ll learn
●

olena@aiven.io
● Different recipes to deal with uneven partitioning

olena@aiven.io
● From easiest 🌶

olena@aiven.io
● From easiest 🌶 to more difficult 🌶🌶🌶

olena@aiven.io
🌶🌶🌶 The advanced techniques will help you
● Rebalance records across partitions
● Scale your topic up or down
● Be effective at disaster recovery

olena@aiven.io
Partition 1 47%
Partition 2 34%
Partition 3 7%
Partition 4 4%
Partition 5 4%
Partition 6 4%

15%
12%
13%
11%
13%
11%
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
14%
12%
Partition 7
Partition 8

olena@aiven.io
Level 1. Easy🌶

olena@aiven.io
Level 1. Easy🌶
If you don’t use keys..

olena@aiven.io
No keys - increase the number of partitions

olena@aiven.io
- This way you can’t scale down, but you can scale up!

olena@aiven.io
- Pay attention to

olena@aiven.io
- Pay attention to
- Data retention period

olena@aiven.io
- Pay attention to
- Number of consumers

olena@aiven.io
- Pay attention to
- Data distribution over time

olena@aiven.io
- Pay attention to
- Data distribution over time
- Linger_ms and batch_size for sticky partitioning

olena@aiven.io
Level 2. Moderate🌶🌶

olena@aiven.io
Level 2. Moderate🌶🌶
One or two keys are hot

olena@aiven.io
You still can add new partitions

olena@aiven.io
You still can add new partitions…. kinda

olena@aiven.io
The key challenge:
github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/
producer/internals/BuiltInPartitioner.java

olena@aiven.io
You will need
● Calculate which key is hot 🔥

olena@aiven.io
You will need
● Keep the state

olena@aiven.io
You will need
● Keep the state
● Not mess up old keys

olena@aiven.io
You will need
● Keep the state
● Not mess up old keys
● Use custom partitioner

olena@aiven.io
Example
public static int partitionForKey(final byte[] serializedKey, final int numPartitions) {
if (serializedKey == "bananas🍌🍌") {
... do the dirty magic here ...
} else {
return Utils.toPositive(Utils.murmur2(serializedKey)) % (numPartitions - 1);
}
}

olena@aiven.io
Level 3. Getting hot🌶🌶🌶

olena@aiven.io
Level 3. Getting hot🌶🌶🌶
Time to migrate to a new topic

olena@aiven.io
The time will come…
when you need to re-create the topic

olena@aiven.io
➔ Rebalance records across partitions

olena@aiven.io
➔ Scale your topic up or down

olena@aiven.io
➔ Scale your topic up or down
➔ To do disaster recovery

olena@aiven.io
Let’s re-create the topic
and MIGRATE!

olena@aiven.io
Three main steps

…
Producers Consumers
P1
P0
Old topic
0 6 12
🗝
🗝 1 7 13

…
Producers Consumers
P1
P0
Old topic
0 6 12
🗝
🗝 1 7 13
P1
P2
P0
P3
0 12
🗝
1 13
🗝
2 14
3 15
🗝
🗝
New topic

…
…
Producers Consumers
P1
P0
P1
P2
P0
P3
Old topic
New topic
0 6 12
🗝
🗝
0 12
🗝
1 13
🗝
1 7 13
2 14
3 15
🗝
🗝

olena@aiven.io
Our goals when migrating
1. Keep downtime to bare minimum
2. No duplicates

olena@aiven.io
Two options

olena@aiven.io
Two options
(in reality way more, but similar in
essence)

olena@aiven.io
Option 1: Stop the world producers

olena@aiven.io
Option 1: Stop the world producers
Sharp cut

…
Producers Consumers
P1
P0
Old topic
0 6 12
🗝
🗝 1 7 13
P1
P2
P0
P3
New topic
0 12
🗝
1 13
🗝
2 14
3 15
🗝
🗝

…
Producers
Consumers
P1
P0
Old topic
0 6 12
🗝
🗝 1 7 13
P1
P2
P0
P3
New topic
0 12
🗝
1 13
🗝
2 14
3 15
🗝
🗝

…
Producers
Consumers
P1
P0 0 6 12
🗝
🗝 1 7 13
P1
P2
P0
P3
New topic
0 12
🗝
1 13
🗝
2 14
3 15
🗝
🗝
Old topic

…
Consumers
P1
P0 0 6 12
🗝
🗝 1 7 13
P1
P2
P0
P3
New topic
0 12
🗝
1 13
🗝
2 14
3 15
🗝
🗝
Producers
Old topic

olena@aiven.io
Advantages
➔ No skipped messages
➔ Prevention of duplicates
➔ No need for extra compute to replicate data from old to new topic

olena@aiven.io
Limitations
➔ Downtime
➔ Difficult to test new setup and challenging to roll back
➔ Limited time window for migration
➔ Need for seamless collaboration among teams
➔ All-or-nothing migration

olena@aiven.io
Option 2: Gradual switch relying on
replicated data

olena@aiven.io
ABOVE - Olena K
BELOW - Olena B

olena@aiven.io
Time for plan B

olena@aiven.io
Time for plan B
with Olena B

olena@aiven.io
Strategy

olena@aiven.io
Step 1

olena@aiven.io
Step 2

olena@aiven.io
Step 3

olena@aiven.io
Step 4

olena@aiven.io
Step 5

olena@aiven.io
Step 1
New topic creation

olena@aiven.io
Partition 1 44%
Partition 2 30%
Partition 3 6%
Partition 4 3%
Partition 5 3%
Partition 6 3%
Partition 7 3%
Partition 8 3%

olena@aiven.io
Risks
Had to redo whole process
because of too few/many partitions

olena@aiven.io
Step 2
Fast and reliable data pump

olena@aiven.io
Data pump application requirements
- Simple
- Fast.
- Reliable
Kafka
Streams Java

olena@aiven.io
Risks
- Require too much resources if not simple enough
- Cannot keep up if it is too complicated
- Data losses if application is not reliable
- Data loss or duplicates because records from from different
partitions get shuffled
WARNING. Records/keys almost certainly will be mixed.

olena@aiven.io
New partitions have mix of data from old partitions

olena@aiven.io
Out of order events

olena@aiven.io
Out of order events
If consumers had stopped when order is not correct.
- Read some records one more time
OR
- Skip some records

olena@aiven.io
Out of order events
Be careful
during data pump catch up and
if you use big batches to read data

olena@aiven.io
Out of order events
Old topic timestamps from metadata
could be used to preserve chronological order

olena@aiven.io
Step 3
Gradual consumer switch.

olena@aiven.io
Risks
- Spikes
- Too long downtime for consumers
- Data loss or duplicates

olena@aiven.io
Consumer groups translations

olena@aiven.io
Simple Consumer Group Translation

olena@aiven.io
Old Consumer Group:
Partition 0: offset 13
Last consumed event

olena@aiven.io
Timestamps:
Partition 0: 07:01:04
Partition 2: 07:01:03
Earliest timestamp:
07:01:03

olena@aiven.io
Streaming Consumer Group
Translations

olena@aiven.io
Streaming Consumer Group Translation

olena@aiven.io
Offset Translations

olena@aiven.io
Offset Translation

olena@aiven.io
Offset Translation
Consumer Group 1:

olena@aiven.io
Offset Translation
For r in records:
P = r.metadata.old_partition
If offsets[P] <= r.metadata.offset:
return
Consumer Group 1:

olena@aiven.io
Mirror Maker offset translation
Data pump -> MirrorSourceTask
Old + New records metadata -> Records in Offset Sync topics
Offset translation -> MirrorCheckpointTask
Problems:
- Main usecase data transfer between 2 clusters, not a same
- Till version 3.3 offset translation by measuring the 'distance' between
the MM2 offset sync and the upstream consumer group, and then
assuming that the same distance applies in the downstream topic.

olena@aiven.io
Risks
- Spikes
- Too long downtime for consumers
- Data loss or duplicates
- Poor offsets estimations
- Bad timing for offsets translation

olena@aiven.io
Bad timing for offset translations

olena@aiven.io
Bad timing for Offset Translation

olena@aiven.io
Bad timing for Offset Translation
Either
Start from 32:
duplicate B1, B2
OR
Start from 35:
A2 is lost

olena@aiven.io
“Bounded” stream

olena@aiven.io
Gradual consumer switch
Earliest offset Duplicates guaranteed
Consumer’s earliest timestamp High probability of
duplicates
Offset translation A few duplicates
Offset translation + “late events”
tracking
Almost no duplicates
Offset translation + “late events”
tracking + “Bounded” stream
approach
Rare/no duplicates

olena@aiven.io
Gradual consumer switch

olena@aiven.io
More topics to talk about
- Apache Mirror Maker implementation details
- Stateless vs Stateful consumers
- Idempotence
- Changing schemas
- New key selection strategy

olena@aiven.io
Step 4
Gradual producers switch

olena@aiven.io
Risks
- Data loss or duplicates if data pump is not fast enough

olena@aiven.io
To summarize it all

olena@aiven.io
Key learnings
● No keys - add partitions 🌶
● A few hot keys - you still can add partitions 🌶🌶
● Workarounds are not sufficient? - Migrate the topic 🌶🌶🌶

olena@aiven.io
Migrate the topic 🌶🌶🌶
● Sharp cut - stop the producers first
○ Exactly once delivery
○ Expect the downtime
○ All-or-nothing migration
● Generic gradual switch
○ Minimal downtime
○ Possibility to test before switching
○ Switch consumer groups gradually
○ Minimize chance of duplicates

olena@aiven.io

olena@aiven.io
Olena Kutsenko
Olena Babenko

olena@aiven.io
Olena Kutsenko
Olena Babenko
The trusted open source
data platform for everyone

olena@aiven.io
#G8
The trusted open source
data platform for everyone

The Dark and Dirty Side of Fixing Uneven Partitions with Olena Babenko & Olena Kutsenko

Recommended

Recommended

More Related Content

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

The Dark and Dirty Side of Fixing Uneven Partitions with Olena Babenko & Olena Kutsenko