Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.
There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.
Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.
Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub: https://github.com/zalando/patroni
How to do a LIVE-demo with minikube:
1. git clone https://github.com/zalando/patroni
2. cd patroni
3. git checkout feature/demo
4. cd kubernetes
5. open demo.sh and edit line #4 (specify the minikube context )
6. docker build -t patroni .
7. may be docker push patroni
8. may be edit patroni_k8s.yaml line #22 and put the name of patroni image you build there
9. install tmux
10. run tmux in one terminal
11. run bash demo.sh in another terminal and press Enter from time to time
3. 3
ZALANDO
15 markets
6 fulfillment centers
20 million active customers
3.6 billion € net sales 2016
165 million visits per month
12,000 employees in Europe
5. 5
Bot pattern and Patroni
Postgres-operator
Patroni on Kubernetes, first attempt
Kubernetes-native Patroni
Live-demo
AGENDA
6. 6
● small python daemon
● implements “bot” pattern
● runs next to PostgreSQL
● decides on promotion/demotion
● uses DCS to run leader election and keep cluster state
Bot pattern and Patroni
7. 7
● Distributed Consensus/Configuration Store (Key-Value)
● Uses RAFT (Etcd, Consul) or ZAB (ZooKeeper)
● Write succeed only if majority of nodes acknowledge it
(quorum)
● Supports Atomic operations (CompareAndSet)
● Can expire objects after TTL
http://thesecretlivesofdata.com/raft/
DCS
8. 8
Bot pattern: leader alive
Primary
NODE A
Standby
NODE B
Standby
NODE C
UPDATE(“/leader”, “A”, ttl=30,
prevValue=”A”)Success
WATCH (/leader)
WATCH (/leader)
/leader: “A”, ttl: 30
9. 9
Bot pattern: master dies, leader key holds
Primary
Standby
Standby
WATCH (/leader)
WATCH (/leader)
/leader: “A”, ttl: 17
NODE A
NODE B
NODE C
11. 11
Bot pattern: who will be the next master?
Standby
Standby
Node B:
GET A:8008/patroni -> failed/timeout
GET C:8008/patroni -> wal_position: 100
Node C:
GET A:8008/patroni -> failed/timeout
GET B:8008/patroni -> wal_position: 100
NODE B
NODE C
12. 12
Bot pattern: leader race among equals
Standby
Standby
/leader: “C”, ttl: 30
CREATE (“/leader”, “C”,
ttl=30, prevExists=False)
CREATE (“/leader”, “B”,
ttl=30, prevExists=False)
FAIL
SUCCESS
NODE B
NODE C
13. 13
Bot pattern: promote and continue
replication
Standby
Primary
/leader: “C”, ttl: 30WATCH(/leader
)
promote
NODE B
NODE C
16. 16
“Kubernetes is an open-source system for automating deployment, scaling,
and management of containerized applications.
It groups containers that make up an application into logical units (Pods) for
easy management and discovery. Kubernetes builds upon 15 years of
experience of running production workloads at Google, combined with
best-of-breed ideas and practices from the community.”
kubernetes.io
KUBERNETES
18. 18
Spilo & Patroni on K8S v1
● We will deploy Etcd on Kubernetes
● Depoy Spilo with PetSet (old name for StatefulSet)
● And quickly hack a callback script for Patroni, which will
label the Pod we are running in with the current role
(master, replica)
● And use Services with labelSelectors for traffic routing
19. 19
Can we get rid from Etcd?
● Use labelSelector to find all Kubernetes objects
associated with the given cluster
○ Pods - cluster members
○ ConfigMaps or Endpoints to keep configuration
● Every iteration of HA loop we will update labels and
metadata on the objects (the same way as we updating
keys in Etcd)
● It is even possible to do CAS operation using K8S API
20. 20
No K8S API for expiring objects
How to do leader election?
21. 21
Do it on the client side!
● Leader should periodically update ConfigMap or Endpoint
○ Update must happen as CAS operation
○ Demote to read-only in case of failure
● All other members should check that leader ConfigMap (or
Endpoint) is being updated
○ If there are no updates during TTL => do leader election
24. 24
● No dependency on Etcd
● When using Endpoint for leader
election we can also maintain
subsets with the IP of the
leader Pod
● 100% Kubernetes-native
solution
Kubernetes API as DCS
CONSPROS
● Can’t tolerante arbitrary clock
skew rate
● OpenShift doesn’t allow to put
IP from the Pods rage into the
Endpoint
● SLA for K8S API on GCE
prommiss only 99.5% availability
26. 26
How to deploy it
● kubectl create -f your-cluster.yaml
● Use Patroni Helm Chart + Spilo
● Use postgres-operator
27. 27
POSTGRES-OPERATOR
● Creates CustomResourceDefinition Postgresql and watches it
● When new Postgresql object is created - deploys a new cluster
○ Creates Secrets, Endpoints, Services and StatefulSet
● When Postgresql object is updated - updates StatefulSet
○ and does a rolling upgrade
● Periodically syncs running clusters with the manifests
● When Postgresql object is deleted - cleans everything up
30. 30
PostgreSQL
manifest
Stateful set
Spilo pod
Kubernetes cluster
PATRONI
Postgres
operator
pod
Endpoint
Service
Client
application
Postgres
operator
config mapCluster
secrets
Database
deployer
create
create
create
watch
deploy
Update with
actual master
role
31. 31
Monitoring & Backups
● Things to monitor:
○ Pods status (via K8S API)
○ Patroni & PostgreSQL state
○ Replication state and lag
● Always do Backups!
○ And always test them!
GET http://$POD_IP:8008/patroni
for every Pod in the cluster, check
that state=running and compare
xlog_position with the master
32. 32
Our learnings
● We run Kubernetes on top of AWS infrastructure
○ Availability of K8S API in our case is very close to 100%
○ PersistentVolume (EBS) attach/detach sometimes buggy and slow
● Kubernetes cluster upgrade
○ Require rotating all nodes and can cause multiple switchovers
■ Thanks to postgres-operator it is solved, now we need only one
● Kubernetes node autoscaler
○ Sometimes terminates the nodes were Spilo/Patroni/PostgreSQL runs
■ Patroni handles it gracefully, by doing a switchover