Anzeige
Anzeige

Más contenido relacionado

Similar a Scylla on Kubernetes: Introducing the Scylla Operator(20)

Anzeige
Anzeige

Scylla on Kubernetes: Introducing the Scylla Operator

  1. Scylla on Kubernetes: Introducing the Scylla Operator Yannis Zarkadas, Software Engineer @ Arrikto
  2. Presenter Yannis Zarkadas, Software Engineer ■ Storage, DevOps, ML-Engineering ■ Open Source Enthusiast: ● Scylla Operator ● Cassandra Operator in rook.io ● Kubeflow
  3. Problem Statement ● Great database ● Requires operational expertise ● Great workload management platform Can we leverage Kubernetes to write a great management layer for Scylla ?
  4. Pod kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd nginx Pod MySQL Pod tomcat Pod kubectl apply -f save Controllers Scheduler write Various Controllers new Pod Node 4 new Pod schedule
  5. StatefulSet Deploys and scales stateful software. Provides guarantees for: ■ Pod uniqueness ● At most 1 of each Pod exists at any given time ■ Pod ordering ● Rolling Update and Deployment ■ Persistent network and storage identity ● DNS record and own Persistent Volume storage identity network identity
  6. spec.replicas: status.replicas: status.readyReplicas: StatefulSet Controller kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd kubectl apply -f Controllers StatefulSet Controller Various Controllers Node 4 write Headless Service StatefulSet save scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local 3 0 0 1 1 2 2 3 3
  7. Controller Spec (desired) Status (real) Kubernetes Objects Controller Pattern Used everywhere in Kubernetes Observe Calculate Reconcile Physical ResourcesPhysical ResourcesPhysical Resources write
  8. Custom Resource Definition ■ Store Custom Objects ■ Compatible with kubectl ● kubectl get clusters
  9. The Operator Pattern Controller Observe Calculate Reconcile write Operator = Controller(s) + CRD(s)
  10. Why the StatefulSet is not enough
  11. StatefulSet: Confined to 1 Rack Member Pod Cluster Rack Datacenter StatefulSet StatefulSet StatefulSet Multiple Racks ? Multiple Datacenters? Pod Member
  12. Safe Scale Down 0 44 88 132 176 220 ● Want to leave ○ nodetool decommission ● Stream data ● Leave Scylla Ring member-0 Up member-1 Up member-2 Up member-3 Up member-4 Up member-5 UpLeaving Member Member Member Member Member Member
  13. StatefulSet: Unsafe Scale Down kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-1 Pod scylla-1.scylla. default.svc.cluster.local spec.replicas: 2 scylla-2 Pod scylla-2.scylla. default.svc.cluster.local StatefulSet Scale Down? spec.replicas: status.replicas: status.readyReplicas: 3 0 0 1 1 2 2 3 3 kubectl apply -f save 2 Data not streamed! Scylla Ring scylla-0 Up scylla-1 Up scylla-2 UpDown Potential Data Loss! scylla-0 Pod scylla-0.scylla. default.svc.cluster.local
  14. StatefulSet: Cannot track Member identity kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local Member Joining Replace Member? Add new Member? Node Fail Must know Member identity beforehand!
  15. Vanilla Solution: StatefulSet Problems with: ■ Seeds ■ Multi-zone deployment ■ Scale Down ■ Loss of Persistence ■ Backups/Restores ■ Extensibility What if we could create management software in the image of Kubernetes Controllers?
  16. Design
  17. Our goal Operator = Controller(s) + CRD(s) Controller Observe Calculate Reconcile write
  18. StatefulSet Pod Rack N, Datacenter M ... Cluster Custom Resource Member Services (Static IP) Controller communication through Labels / Annotations Member Services (Static IP) Member Services (Static IP) write watch Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 1 Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 2 Sidecar JMX/HTTP
  19. Mapping of Abstractions Member Pod Cluster Rack Datacenter StatefulSet StatefulSets Cluster Custom Resource
  20. Sidecar CRD + Controller + Sidecar Sidecar JMX/HTTP Pod Sidecar needed to: ■ Setup config files ■ Install plugins at startup ■ Backup and Restore functionality ■ Future extensibility Member
  21. An Alternative to DNS Records Services already have a static IP, called ClusterIP. Solution: ClusterIP Service per Pod Drawbacks? : ■ Performance: iptables can handle a few hundred Members, IPVS can handle thousands with no problem. ■ ClusterIP CIDR Depletion: Usually a /12 IP Block, so plenty of addresses. Much Requested Feature -> ■ What if we could have static IPs?
  22. Implementation
  23. Cluster Creation & Scale Up kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service scylla-eu-west1-c-1 Pod 10.96.0.4 Member Service Scylla Cluster write kubectl apply save new Cluster 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12
  24. kubelet Scale Down Sidecar scylla-eu-west1-c-1 Member Pod kubelet Master Node 1 kubelet Node 3 Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service Scylla Cluster kubectl apply save scale down eu-west1-c Cluster changed 10.96.0.4 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12 1 Member Service decommissioned: false nodetool decommission Node 4 Scylla Ring scylla-eu-west1-b-0 Up scylla-eu-west1-c-0 Up scylla-eu-west1-c-1 UpLeaving decommissioned: true stream data kubelet Node 2
  25. Local Storage vs Network Attached Local NVME SSD Network Attached Storage (AWS EBS, Google Persistent Disk) ■ Fast ■ Ephemeral ■ Slow ■ Fault-tolerant Scylla handles replication => Use Local Storage! v1.10: Local Persistent Volumes in Beta
  26. Local Storage Failure Scenarios ■ Disk Misbehaves ● Block errors ● Deteriorating performance ■ Disk Fails ● Mount Point Disappears ■ Node Fails ● With Disk on it ■ Pod still runs ■ Unhandled by K8s ■ Pod fails to start ■ Unhandled by K8s ■ Pod fails to be scheduled ■ Unhandled by K8s Common in the Cloud!
  27. Node Fail kubelet Master Node 1 kubelet Node 2 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers /mnt/ssd1 /mnt/ssd1 /mnt/ssd1 member-0 Pod 10.96.0.1 Member Service kubelet Node 3 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service member-2 Pod 10.96.0.4 Member Service Node Fail Admin / Fencing Software Delete Node 3 StatefulSet changed Recreate PVC member-1 Pod 10.96.0.3 Member Service Empty Disk
  28. kubelet Node 2 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service Algorithm: Cluster Member? (search with IP) Yes Empty Disk ? Stream Missing Data (replace_address_first_boot option) Yes Node Fail Empty Disk
  29. Demo
  30. Take away Kubernetes helps to manage Scylla, but has some limitations: ■ CPU Pinning ● Huge performance gains. ● Must be enabled in the kubelet. ● Many managed solutions don’t enable it. ■ Local Storage ● Supported but still needs improvement. ● Some vendors don’t offer high storage machines for K8s. ■ Multi-Region Clusters ● Still an unsolved problem. “Cost of Containerization” by Moreno Garcia: https://www.scylladb.com/2018/08/09/cost-containerization-scylla/
  31. Future Work Scylla Operator ■ Repairs with Scylla Manager ■ Multi-Region Clusters ● Very early support in Kubernetes ● LoadBalancer per Pod is a possible workaround ■ Backups and Restores ■ File your own issue: ● https://github.com/scylladb/scylla-operator Kubernetes ■ Better Support for Local Storage ● Monitoring, scheduling
  32. Thank you Stay in touch Any questions? Yannis Zarkadas yanniszark@arrikto.com @yanniszark

Hinweis der Redaktion

  1. Overview of distributed nature of Scylla
  2. Overview: each member stores a different portion of the data
  3. Intro to kubernetes: Smallest unit of processing: Pod Declarative nature: user declares desired state, Kubernetes works to satisfy
  4. Kubernetes’ solution for running DBs: StatefulSet
  5. Example of how the StatefulSet works
  6. Controller pattern that appears everywhere in K8s: 1. Observe desired state 2. Calculate actual state 3. Diff and take action
  7. What is missing to enable us to build our own controller? Custom Objects. CRDs enable us to store custom objects in etcd.
  8. Operator pattern. Controller acts as a human operator would.
  9. Examples of how our design addresses each of the StatefulSet’s shortcomings.
Anzeige