This document discusses how Pulsar operators can be used to automate lifecycle management of Pulsar clusters on Kubernetes. It describes how operators use custom resource definitions and controllers to reconcile the actual cluster state with the desired state. Specific examples are provided for how operators can perform controlled cluster upgrades, scale bookies, and clean up after cluster deletion. The integration of operators with Helm charts is also covered.
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA 2021
1. Pulsar Virtual Summit North America 2021
Manage Pulsar Cluster
Lifecycles with
Kubernetes Operators
Yang Yang @ StreamNative
2. Pulsar Virtual Summit North America 2021
About Me
● Software Engineer in StreamNative Cloud
● Open Source Contributor
○ Apache Pulsar
○ Apache BookKeeper
● Alibaba -> Elastic -> StreamNative
3. Pulsar Virtual Summit North America 2021
StreamNative
Founded by the creators of Apache Pulsar, StreamNative provides a
cloud-native, unified messaging and streaming platform powered by
Apache Pulsar to support multi-cloud and hybrid-cloud strategies
4. Pulsar Virtual Summit North America 2021
Apache Pulsar
● Apache Pulsar is a cloud-native, distributed messaging and streaming
platform
○ Fast
○ Highly scalable
○ Lots of powerful features
● Deployment
○ Kubernetes
5. Pulsar Virtual Summit North America 2021
Deploy Pulsar Clusters on K8s
● Kubernetes is an open-source system for
automating deployment, scaling, and
management of containerized
applications.
○ Consistent, repeatable, reliable
deployments on a wide variety of
systems
○ Extensible control plane
● Lots of details, hard to be managed
directly
○ Hundreds of lines of manifests
6. Pulsar Virtual Summit North America 2021
Deploy Pulsar Clusters on K8s
● Helm is a tool for managing Kubernetes
packages
○ Chart
■ Pulsar
○ Config
○ Release
● Works for day-1 operations
○ What about days after?
7. Pulsar Virtual Summit North America 2021
Operations After “Day-1”
● It’s difficult to manage the entire lifecycle of Pulsar Clusters with Helm charts
automatically
○ Upgrades
○ Scale up/down
○ Failover
○ Fine-grained, complex workflows
● Requires some program to perform these operations based on the state of
Pulsar clusters continuously
8. Pulsar Virtual Summit North America 2021
Pulsar Operators
● StreamNative developed a set of Pulsar operators to automate common
operational activities
○ Put operational knowledges of Pulsar clusters in softwares
○ Let software manage softwares
● Fundamental elements
○ CRDs
○ Controllers
9. Pulsar Virtual Summit North America 2021
Pulsar Operators: CRDs
● CRD: Custom Resource Definition
● Declarative APIs
○ Different CRDs for different component clusters
■ ZooKeeper
■ BookKeeper
■ PulsarBroker
■ PulsarProxy
○ Treat a cluster as a single object rather than a set of primitives directly(like Pod, StatefulSet,
Service, etc.)
■ Metadata
■ Spec
● Defines the desired state
■ Status
● Exposes the actual state
12. Pulsar Virtual Summit North America 2021
Pulsar Operators
● Controllers: encode operational knowledges into programs
○ Reconcile the actual state towards the desired state continuously
■ Resilient: level-based reconciliations
■ Eventually consistent
○ Triggered by events
■ Changes of resources
■ Timer
■ Custom event source
● Allow customized management workflow on resources flexibly
14. Pulsar Virtual Summit North America 2021
Controlled Cluster Upgrades
● Upgrade clusters smoothly in a controlled manner
○ Maintain cluster availability
○ Avoid data-loss
○ Minimum impacts
● Rolling upgrade in K8s?
○ Not good enough for stateful workloads
■ BookKeeper
● Data re-replication during upgrades
■ ZooKeeper
● Leader restart could cause system to be unavailable for a few seconds
15. Pulsar Virtual Summit North America 2021
ZooKeeper Upgrades
● Use `OnDelete` update
strategy to control the
sequence of upgrades
● Collect membership
information via Rest APIs
● Upgrade ZooKeeper Leader
node after follower nodes to
reduce impacts
16. Pulsar Virtual Summit North America 2021
BookKeeper Upgrades
● Turn off auto-recovery during
upgrades to avoid unnecessary
data re-replication
● Turn on auto-recovery after
upgrades automatically
17. Pulsar Virtual Summit North America 2021
Decommission Bookies
● Decommissioning bookies from a BookKeeper cluster is a complex and
error-prone operation
○ Admin has to perform the following steps for each bookie to decommission
■ Run $ bin/bookkeeper shell listunderreplicated to see if there is under replicated
ledgers
■ Stop the bookie if there is no under-replicated ledgers
■ Run $ bin/bookkeeper shell decommissionbookie -bookieid <target bookieid> to
decommission the target bookie
● Replicate under-replicated ledgers
● Remove bookie metadata
18. Pulsar Virtual Summit North America 2021
Decommission Bookies by Operator
● Operator is able to decommission bookies automatically
○ The administrator just need to update the value of replicas to the desired number, operator
then take care of the decommission workflow
● How it works
○ Add a finalizer to each bookie pod for pre-deletion hook
○ Create a job to decommission the bookie on pod deletion
○ Remove the finalizer on the pod to after decommission
20. Pulsar Virtual Summit North America 2021
Scale Down Bookies by Operator
● Scale down from 6 -> 3
○ Operator controlled decommissioning
by one bookie pod at a time
21. Pulsar Virtual Summit North America 2021
Scale Down Bookies by Operator
● Create a job to decommission
the bookie
22. Pulsar Virtual Summit North America 2021
Scale Down Bookies by Operator
● Remove the finalizer on the pod
when the decommissioning job
completes successfully
○ The job will be garbage collected
23. Pulsar Virtual Summit North America 2021
Scale Down Bookies by Operator
● Keep decommissioning until the
number of replicas reaches the
expected value
24. Pulsar Virtual Summit North America 2021
Cleanup After Cluster Deletion
● Sometimes the metadata and
the data of a broker cluster need
to be deleted explicitly after the
deletion of the cluster
○ A shared storage architecture
26. Pulsar Virtual Summit North America 2021
Helm Chart Integration
● StreamNative platform provides a Helm chart for managing pulsar clusters
with operators
○ Use cases
■ Deploy a new pulsar cluster mostly just like using the open-source Helm chart
■ Help users to migrate existing pulsar clusters from Helm charts to operators
○ Map the configuration of helm releases to definitions of operator CRDs
○ Operators control the clusters under the hood
31. Pulsar Virtual Summit North America 2021
Conclusion
● Operators are powerful tools for managing lifecycles of pulsar clusters in
production on Kubernetes
○ Especially for operations after “Day-1”
● It evolves with Pulsar and will be made it better and better
○ Built-in structured configurations
○ Fine-grained automatic cluster management
● Want to try it out?
○ StreamNative Platform
○ StreamNative Cloud
32. Pulsar Virtual Summit North America 2021
StreamNative Platform
Self-managed enterprise offering of Pulsar
✓ Kafka-on-Pulsar
✓ Function Mesh for serverless streaming
✓ Enterprise-ready security
✓ Pulsar Operators
✓ Seamless StreamNative Cloud
experience
https://streamnative.io/platform
33. Pulsar Virtual Summit North America 2021
StreamNative Cloud
Fully-managed Apache Pulsar-as-a-Service
✓ Massive scale without the ops overhead
✓ Built for hybrid and multi-cloud
✓ Cloud-Hosted & Cloud-Managed
✓ Stream across public clouds for
multi-cloud applications
✓ Elastic, consumption-based pricing with
‘pay as you go’ model
✓ Reliably scale mission-critical apps
https://streamnative.io/cloud
34. Pulsar Virtual Summit North America 2021
We’re hiring
Build Pulsar with the team that builds Pulsar
✓ Work with the creators of Pulsar
✓ Exciting, growth-stage company
✓ Open and collaborative environment
✓ Competitive compensation and
benefits
✓ Best teammates on earth
https://streamnative.io/careers