Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019

Processing multi-million
events per second Walmart

Kafka Streams at Scale
Deepak Goyal
Customer Backbone
Walmart Labs

Kafka Streams’
Challenges
1. Fault Recovery
2. Horizontal Scalability
3. Cloud Readiness
4. RocksDB
5. Large Clusters
@walmartlabs

App Instance
Kafka Streams
Task
task
consumer
processor
rocks
db
producer
akka
server
task
consumer
processor
rocks
db
producer
akka
server
task
consumer
processor
rocks
db
producer
akka
server
@walmartlabs
Kafka Streams
Instance

Event Flow
kafka cluster
app cluster
task-0 task-0’
stand-by
task-0’’
stand-by
change-log topic
partition 0’’
partition 0’
partition 0
input topic
partition 0’’
partition 0’
partition 0
rocks
db
rocks
db
active
rocks
db
Event Flow
@walmartlabs

Challenges
1. Fault Recovery
2. Horizontal Scalability
3. Cloud Readiness
4. RocksDB
5. Large Clusters
@walmartlabs

app cluster
task-0’
stand-by
task-0
kafka cluster
task-0’
new
stand-by
change-log topic
partition 0’’
partition 0’
partition 0
input topic
partition 0’’
partition 0’
partition 0
active
rocks
db
rocks
db
the source of truth
stand-by recovery
Default Bootstrap
empty
rocks
db
@walmartlabs

Default
Bootstrap
ChangeLog topic as a source of truth
• Slow stand-by recovery
• Log-Compacted change-log topics
• Inefficient disk usage
@walmartlabs

app cluster
task-0’
stand-by
task-0 task-0’
new
stand-by
kafka cluster
change-log topic
partition 0’’
partition 0’
partition 0
input topic
partition 0’’
partition 0’
partition 0
active
rocks
db
rocks
db
the source of truth
Cold Bootstrap
empty
rocks
db
rocks
db
enhanced stand-by recovery
the new source of truth
@walmartlabs

Cold
Bootstrap
@walmartlabs
Active topic as a source of truth
• Lightning stand-by recovery
• Efficient disk usage
• Bootstrap across data centers

partition-0 0,4,8
partition-1 1,5,9
partition-2 2,6,10
partition-3 3,7,11
Repartitioning
Logic
partition-0 0,2,4,6,8,10
partition-1 1,3,5,7,9,11
partition-0 0,2,4,6,8,10
partition-1 1,3,5,7,9,11
partition-2 0,2,4,6,8,10
partition-3 1,3,5,7,9,11
partition-2 0,2,4,6,8,10
partition-3 1,3,5,7,9,11
@walmartlabs

Dynamic
Repartitioning
app cluster
task-0
active
rocks
db
task-0’
stand-by
rocks
db
task-0’’ or 2
future stand-by
rocks
db
task-2
becomes active
rocks
db
task-2’
new stand-by
empty
rocks
db
rocks
db
@walmartlabs
Scaling up from 2 to 4 partitions

Scaling
Lookups
Queryable Stand-by
AKKA Server (Non Blocking IO)
Partition Specific Lookups
@walmartlabs

AZ/Rack Aware
Task Assignment
AZ2 AZ3AZ1
task-0
active
rocks
db
task-0’’
stand-by
rocks
db
task-1
active
rocks
db
task-1’
stand-by
rocks
db
task-0’
stand-by
rocks
db
task-1’’
stand-by
rocks
db
StickyTaskAssignor using RACK_ID_CONFIG = “rack.id”;
@walmartlabs

Cloud
Maintenance
AZ3AZ1
task-0
active
rocks
db
task-0’’
stand-by
rocks
db
AZ2
task-0’
stand-by
rocks
db
task-1
active
rocks
db
task-1’
stand-by
rocks
db
task-1task-1’’
stand-by
rocks
db
becomes active
@walmartlabs
StickyTaskAssignor (ClusterSizeThreshold>=4)
AZ2
task-0’
stand-by
rocks
db
task-1’
stand-by
rocks
db

Enhancements
to Streams’
RocksDB
•Eliminated Synchronized GETs
•Column Family support
•Queryable in suspended and
restoration state
@walmartlabs

Large Clusters
Rebalance time
• Bottleneck: Group Leader Broker
• Partition Assignment Info
• Compression
• Better Encoding
• 24x smaller in terms of size
@walmartlabs

Broker Configs
Overriding Broker Defaults
• message.max.bytes
• offsets.load.buffer.size
• replica.fetch.max.bytes
• socket.request.max.bytes
@walmartlabs

Streams
Configs
Overriding Streams Defaults
• acks
• linger.ms
• auto.offset.reset
• retries
• state.cleanup.delay.ms
@walmartlabs

Results
@walmartlabs
Staging Benchmark
Kafka Cluster 17 XL VMs
Streams Cluster 100 M VMs
Processing Rate 2.3 million events per second

Up Next
• Feature Extraction and Model Inferencing 😎
• Cold Bootstrap from other stand-by 😍
• Cold-Bootstrap and Repartitioning for DSL🧐
• TTL support for RocksDB 🧐
• Merge Operator for RocksJava 😥
• Chaining Kafka Streams Apps 🧐
• Multi Tenancy 🧐 😪 🧐 😵
@walmartlabs

keep-streaming
. . . . . . . . . . . . . . . .
. . .
We are hiring!
@deepak-iiit
Walmart

Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019

Similar to Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019

Editor's Notes