Apache Druid supports auto-scaling of Middle Manager nodes to handle changes in data ingestion load. On Kubernetes, this can be implemented using Horizontal Pod Autoscaling based on custom metrics exposed from the Druid Overlord process, such as the number of pending/running tasks and expected number of workers. The autoscaler scales the number of Middle Manager pods between minimum and maximum thresholds to maintain a target average load percentage.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
1. Apache Druid Auto Scale-out/in for Streaming
Data Ingestion on Kubernetes
Jinchul Kim
2. About Jinchul
• DevOps engineer and Senior Software Developer at SK Telecom (2017 ~ )
• Scrum master for cloud platform development using Kubernetes, Docker, and a variety of applications
• Committer of Apache Impala Project (2018 ~)
• SAP HANA in-memory engine at SAP Labs Korea (2008 ~ 2017)
• Designed, wrote server-side code by using C++: SQL/SQLScript parser, semantic analyzer, SQL optimizer, rule/
cost based optimization, plan explanation and executor, SQL plan cache, and SQLScript debugger
• Received "SAP Applaud Award" for strategic contribution with impact across teams/functions and overcame
significant challenges on HANA scale-out quality
3. • Motivation
• Background & terminology
• Apache Kafka
• Apache Druid
• Docker & Kubernetes
• Helm
• Auto Scaling in Druid
• Horizontal Pods Auto Scaling with Custom Metrics on Kubernetes
• Horizontal Pods Auto Scaling: Scale-in issue and workarounds
• Conclusion
Agenda
4. Motivation
• Why do we need auto-scaling?
• Cost saving by resource management
• What kinds of information do we need for auto-scaling?
• Hardware resource metrics
• Custom metrics from service
5. Motivation (Cont.)
• Drawbacks in auto-scaling feature of Apache Druid
• Druid's auto scaling is only available in AWS
• A few minutes for start-up and shutdown of VMs
• Druid's auto scaling is tightly coupled with AWS API
12. • Overlord
• Assigns ingestion tasks to Middle
Managers
• Is a controller of data ingestion into Druid
• Watches over Middle Managers
• Coordinates segment publishing
• Middle Manager
• Processes handle ingestion of new data
into the cluster
• Reads external data sources and
publishes new Druid segments
• Is called Worker node
• Executes submitted tasks
• Forwards tasks to peons that run in
separate JVMs
• Peon
• Runs a single task in a single JVM
• Is managed by Middle Manager
13. WHAT HAS DOCKER DONE FOR US?
• Continuous delivery
- Deliver software more often and with less errors
- No time spent on dev-to-ops handoffs
• Improved Security
- Containers help isolate each part of your system and
provides better control of each component of your
system
• Run anything, anywhere
- All languages, all databases, all operating systems
- Any distribution, any cloud, any machine
• Reproducibility
- Reduces the times we say “it only worked on my
machine”
14. VMs vs. Containers
Source: https://www.docker.com/whatisdocker/
Containers are isolated, but
share OS and, where
appropriate, bins/libraries
15. WHAT DOES KUBERNETES DO?
• Kubernetes is an open-source system for automating
deployment, scaling, and management of
containerized applications.
• Improves reliability
-Continuously monitors and manages your containers
-Will scale your application to handle changes in load
• Better use of infrastructure resources
-Helps reduce infrastructure requirements by
gracefully scaling up and down your entire platform
• Coordinates what containers run where and when
across your system
16. Helm Architecture
Helm Client
gRPC RESTful
Chart
Repository
Kubernetes Cluster
App. App. App.
Helm
• Package manager for managing Kubernetes applications
• Helm Charts helps you define, install, and upgrade Kubernetes application
• Renders k8s manifest files and send them to k8s API => launch apps into the k8s cluster
…
K8S API ServerTiller Server
Docker
Image
Registry
17. * The basic chart format consists of the templates directory, values.yaml, and other files as below.
19. The Autoscaling mechanisms currently in place are tightly coupled with our deployment
infrastructure but the framework should be in place for other implementations. We are highly
open to new implementations or extensions of the existing mechanisms. In our own
deployments, middle manager nodes are Amazon AWS EC2 nodes and they are
provisioned to register themselves in a galaxy environment.
If autoscaling is enabled, new middle managers may be added when a task has been in
pending state for too long. Middle managers may be terminated if they have not run any
tasks for a period of time.
“
”
[Autoscaling, http://druid.io/docs/latest/design/overlord.html ]
Description of Auto Scaling in Druid
22. Horizontal Pod Autoscaler
Deployment
ReplicaSet
Custom Metrics API
Prometheus
MiddleManager Pod
MiddleManager
Overlord
Watcher
MiddleManager Pod
MiddleManager
Overlord
Watcher
MiddleManager Pod
MiddleManager
Overlord
Watcher
…
/druid_ingestion_num_peons
/druid_ingestion_num_workers
/druid_ingestion_num_pending_tasks
/druid_ingestion_num_running_tasks
/druid_ingestion_expected_num_workers
/druid_ingestion_current_load
custom.metrics.k8s.io/v1beta1
23.
24. Exposing Custom Metrics to Prometheus (Cont.)
Property Description
druid_ingestion_num_peons The number of peons for each worker
druid_ingestion_num_workers The number of workers in indexing service
druid_ingestion_num_pending_tasks The number of pending tasks in indexing service
druid_ingestion_num_running_tasks The number of running tasks in indexing service
druid_ingestion_expected_num_workers The number of expected workers in indexing service
druid_ingestion_current_load Percentage of current load
/druid/indexer/v1/workers
/druid/indexer/v1/pendingTasks
/druid/indexer/v1/runningTasks
HTTP endpoint of
Overlord process
1. RESTful HTTP request
2. Get JSON string
3. Parse the string and replace the property
36. $ make certs
Generating TLS certs
Generating a 2048 bit RSA private key
......................................+++
.......................+++
writing new private key to 'metrics-ca.key'
-----
2018/09/19 20:05:54 [INFO] generate received request
2018/09/19 20:05:54 [INFO] received CSR
2018/09/19 20:05:54 [INFO] generating key: rsa-2048
2018/09/19 20:05:55 [INFO] encoded CSR
2018/09/19 20:05:55 [INFO] signed certificate with serial number
369504685819654624616304590957348031615297503101
Generating custom-metrics-api/cm-adapter-serving-certs.yaml
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
37. $ kubectl create -f ./custom-metrics-api
Exploring Middle Manager Auto Scaling based on Custom Metrics
38. $ kubectl create -f ./custom-metrics-api
secret "cm-adapter-serving-certs" created
clusterrolebinding.rbac.authorization.k8s.io "custom-metrics:system:auth-delegator"
created
rolebinding.rbac.authorization.k8s.io "custom-metrics-auth-reader" created
deployment.extensions "custom-metrics-apiserver" created
clusterrolebinding.rbac.authorization.k8s.io "custom-metrics-resource-reader" created
serviceaccount "custom-metrics-apiserver" created
service "custom-metrics-apiserver" created
apiservice.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" created
clusterrole.rbac.authorization.k8s.io "custom-metrics-server-resources" created
clusterrole.rbac.authorization.k8s.io "custom-metrics-resource-reader" created
clusterrolebinding.rbac.authorization.k8s.io "hpa-controller-custom-metrics" created
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
39. $ kubectl get pods -n monitoring
Exploring Middle Manager Auto Scaling based on Custom Metrics
40. $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
custom-metrics-apiserver-7dd968d85-zhrhw 1/1 Running 0 1m
prometheus-7dff795b9f-5ltcn 1/1 Running 0 4m
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
41. $ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
Exploring Middle Manager Auto Scaling based on Custom Metrics
44. $ kubectl create -f ./druid/middlemanager-hpa.yaml
horizontalpodautoscaler.autoscaling "druid-mm" created
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
45. $ kubectl get hpa -n demo
Exploring Middle Manager Auto Scaling based on Custom Metrics
46. $ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 100/100 1 16 1 32s
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
47. $ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 100/100 1 16 1 32s
$ kubectl get hpa -n demo
Exploring Middle Manager Auto Scaling based on Custom Metrics
48. $ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 100/100 1 16 1 32s
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 300/100 1 16 8 1m
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
49. $ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 100/100 1 16 1 32s
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 300/100 1 16 8 1m
$ kubectl get hpa -n demo
Exploring Middle Manager Auto Scaling based on Custom Metrics
50. $ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 100/100 1 16 1 32s
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 300/100 1 16 8 1m
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
druid-mm Deployment/druid-middlemanager 300/100 1 16 16 2m
$
Exploring Middle Manager Auto Scaling based on Custom Metrics
51. $ kubectl get --raw
/apis/custom.metrics.k8s.io/v1beta1/namespaces/demo/pods/*/druid_ingestion_current_load
Exploring Middle Manager Auto Scaling based on Custom Metrics
55. Scale-in Issue
• Eviction of pods by random fashion while
• Web-server
• Druid Middle-manager
Horizontal Pod Auto-scaler
Ç Ç
Horizontal Pod Auto-scaler
Ç Ç
58. Conclusion
Kubernetes Druid
Coverage
Any (private/public) Cloud
platform if Kubernetes is
available
AWS EC2
Start/Stop instance A few seconds only A few minutes
Ownership of auto-scaling
Decoupling from Druid core
source
Tightly coupled with Druid
core source
Extensibility
Easily extensible: Druid
Historical node and any
other applications
Not supports historical node
Who is the better controller for Druid Auto Scaling?