(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
7. 7
7
Partitions, Tasks, and Consumer Groups
input topic
result topic
4 input topic
partitions
=> 4 tasks
Task
executes
processor
topology
One consumer
group:
can be
executed with
1 to 4 thread on
1 to 4 machines
32. 34
34
But I’ll want to scale-
out and back
anyway.
Besides, I don’t really
trust my storage
admin.
33. 35
Recommendations:
● Keep change-log shards small
● If you trust your storage:
Use StatefulSets
● Use anti-affinity when possible
● Use “parallel” pod management
37. 40
40
Automate Deployment and Management of Apache Kafka®
Confluent Operator enables you to:
Automate provisioning of
Kafka pods in minutes
Monitor SLAs through
Confluent Control Center or
Prometheus
Scale your Kafkas clusters
elastically
Operate at scale with
enterprise support from
Confluent
Want to learn more about running Kafka on Kubernetes?
confluent.io/kubernetes
39. Summary
• Kafka Streams has recoverable state, that gives
streams apps easy elasticity and high availability
• Kubernetes makes it easy to scale applications
• It also has StatefulSets for applications with state.
• Now you know how to deploy Kafka Streams on
Kubernetes and take advantage on all the scalability and
high-availability capabilities