More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn

More Data, More Problems:
Scaling Kafka Mirroring Pipelines at LinkedIn
Celia Kung
Engineering Manager

Agenda
Use Cases & Motivation
Kafka MirrorMaker at LinkedIn
Brooklin Mirroring
Future

• Aggregating data from all data centers
• Moving data from offline data stores into online
environments
• Moving data between LinkedIn and external
cloud services
Use Cases

Motivation
● Kafka data at LinkedIn continues to grow rapidly

Motivation
● Kafka MirrorMaker (KMM) has not scaled well
● KMM is difficult to operate and maintain

100+ 9pipelines data centers

100+clusters
6K+hosts
2T+messages/day

Topology
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
• Each pipeline:
○ mirrors data from 1 source cluster to 1
destination cluster
○ constitutes its own KMM cluster

Topology
Datacenter B
aggregate
tracking
tracking
Datacenter A
aggregate
tracking
tracking
KMM
aggregate
metrics
metrics
aggregate
metrics
metrics
Datacenter C
aggregate
tracking
tracking
aggregate
metrics
metrics
...
KMM KMM
KMM KMM KMM
KMM KMM KMM KMM KMM KMM
KMM KMM KMM KMM KMM KMM
... ... ...

KMM does not scale well
● # of KMM clusters = (# of data centers)2 x # of Kafka clusters
● More consumer-producer pairs → need to provision more hardware

KMM is difficult to operate
● Static configuration file per KMM cluster
● Changes require deploying to 100+ clusters

KMM is fragile
● Poor failure isolation
● Increased latency
● Unable to catch up with traffic

Brooklin Mirroring
● Optimized for stability and operability
● Built on top of our streaming data pipelines service, Brooklin
● Brooklin Kafka mirroring has been in production for 1+ years
● Open-sourced Brooklin last month

Mirroring pipelines at LinkedIn
100+ 9pipelines data centers

Kafka MirrorMaker
100+clusters
6K+hosts
2T+messages/day

Brooklin Mirroring
9clusters
<2Khosts
2T+messages/day

Topology
Datacenter B
aggregate
tracking
tracking
Datacenter A
aggregate
tracking
tracking
Brooklin
• A single Brooklin cluster
encompasses multiple pipelines
○ 1 cluster per data center
Brooklin

Topology
Datacenter A
aggregate
tracking
tracking metrics
aggregate
metrics
Datacenter B
aggregate
tracking
tracking metrics
aggregate
metrics
Datacenter C
aggregate
tracking
tracking metrics
aggregate
metrics
...Brooklin BrooklinBrooklin

Brooklin Mirroring Architecture
Brooklin

Kafka mirroring built on Brooklin
DestinationsSources
Messaging systems
Microsoft
EventHubs
Messaging systems
Microsoft
EventHubs
Databases Databases

Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards

Dynamic Management
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards

Creating a pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
create POST /datastream
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
num-streams: 5

Creating a pipeline
ZooKeeper
Brooklin
host
Brooklin
host
Brooklin
host
Brooklin
host
Brooklin
host

Updating a pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
update PUT /datastream/mm_DC1-tracking_DC2-aggregate-
tracking
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB|topicC|topicD
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
num-streams: 10

Updating a pipeline
ZooKeeper
Brooklin
host
Brooklin
host
Brooklin
host
Brooklin
host
Brooklin
host

On-demand Diagnostics
Brooklin
Engine
Diagnostics
Rest API
ZooKeeper
getAllStatus GET /diag?datastream=mm_DC1-tracking_DC2-aggregate-tracking
host1.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-0, topicA-3, topicB-0, topicB-2]
autoPausedPartitions: [{topicA-3: {reason: SEND_ERROR, description: failed to produce messages from this
partition}}]
manuallyPausedPartitions: []
host2.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-1, topicA-2, topicB-1, topicB-3]
autoPausedPartitions: []
manuallyPausedPartitions: []

Error Isolation
● Manually pause and resume mirroring at every level
○ Entire pipeline, topic, topic-partition
● Brooklin can automatically pause mirroring of partitions
○ Auto-resumes the partitions after configurable duration
● Flow of messages from other partitions continue

Processing Loop
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}

Producer flush can be expensive
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}

Long Flush
producer.flush() can take several minutes
200
100
0

Rebalance Storms
consumers rebalance after max.poll.interval.ms
3
2
1
0

Increase max.poll.interval.ms?
● Reduces chances of consumer rebalance
● Risk detecting real failures late

Flushless Produce
consumer.poll() → producer.send(records) → producer.flush() → consumer.commit()

Flushless Produce
Only commit “safe” acknowledged checkpoints:
consumer.poll() → producer.send(records) → consumer.commit(offsets)

Flushless Produce
sp0 consumer producer
checkpoint
manager
o1, o2 o1, o2 o1, o2
o1
o2
Source
Destination
ack(sp0, o2)
dp0
dp1
● Checkpoint manager maintains producer-acknowledged offsets for each
source partition
Source partition sp0
in-flight: []
acked: []
safe checkpoint: --

Flushless Produce
checkpoint
manager
o1, o2 o1, o2 o1, o2
o1
o2
Source
Destination
ack(sp0, o2)
dp0
dp1
source partition
in-flight: [o1, o2]
acked: []
safe checkpoint: --

Flushless Produce
checkpoint
manager
o1, o2 o1, o2 o1, o2
o1
o2
Source
Destination
ack(sp0, o2)
dp0
dp1
source partition
in-flight: [o1]
acked: [o2]
safe checkpoint: --

Flushless Produce
checkpoint
manager
o3, o4 o3, o4 o3, o4
o3
o4
Source
Destination
ack(sp0, o1)
dp0
dp1
● Update safe checkpoint to largest acknowledged offset that is less than
oldest in-flight (if any)
in-flight: [o3, o4]
acked: [o1, o2]
safe checkpoint:

Flushless Produce
checkpoint
manager
o3, o4 o3, o4 o3, o4
o3
o4
Source
Destination
ack(sp0, o1)
dp0
dp1
● Update safe checkpoint to largest acknowledged offset that is less than
oldest in-flight (if any)
in-flight: [o3, o4]
acked: []
safe checkpoint: o2

Brooklin Mirroring Performance
● Use same consumer/producer configs as KMM
● Single host: 64 GB memory, 24 CPU (12 cores each)
● Metrics:
○ Throughput (output compressed bytes/sec)
○ Memory utilization %
○ CPU utilization %

Brooklin Mirroring Performance
● Brooklin mirroring is CPU-bound
● Metrics (20 consumer-producer pairs):
○ Throughput: up to 28 MB/s
○ Memory utilization: 70%
○ CPU utilization: 97%

• 70%+ CPU time spent in decompression & re-
compression
○ GZIPInputStream.read(): ~10%
○ GZIPOutputStream.write(): ~61%
• Introduced “Passthrough” mirroring
Performance

• Rebalances cause drop in availability
• Brooklin-controlled partition assignment &
Kafka low-level consumer
Stability

• Auto-scaling: adjust number of consumers
based on throughput needs
• Smarter (throughput-based) partition
assignment
Scalability

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn

Similar to More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn (20)

Recently uploaded

Recently uploaded (20)

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn