Flagger is a Kubernetes operator that automates canary deployments using Istio routing and Prometheus metrics. It gradually shifts traffic to the canary while measuring key performance indicators like success rate and latency. If the canary meets the thresholds, it is promoted to primary, if not traffic is shifted back and the canary is aborted. Flagger implements a control loop that advances the canary in steps from 0% to 100% traffic while monitoring metrics to ensure stability before fully promoting the deployment.
2. What is Progressive Delivery?
Progressive Delivery is Continuous Delivery with fine-grained control over the
blast radius.
Building blocks
● User segmentation
● Traffic management
● Observability
● Automation
https://redmonk.com/jgovernor/2018/08/06/towards-progressive-delivery/
3. Introducing Flagger
Flagger is a Kubernetes operator that automates the promotion of
canary deployments using Istio routing for traffic shifting and
Prometheus metrics for canary analysis.
Flagger implements a control loop that gradually shifts traffic to the
canary while measuring key performance indicators. Based on the
KPIs analysis a canary is promoted or aborted.
6. Key Performance Indicators
The decision to pause the traffic shift, abort or promote a canary is
based on:
● Deployment health status
● Request success rate percentage
● Request latency average value
13. Flagger control loop stage 1-5
● scan for canary deployments
● create the primary deployment and HPA
● create the ClusterIP services for primary and canary
● create an Istio virtual service with weighted destinations mapped to primary and
canary ClusterIP services
● check primary and canary deployments status
○ halt advancement if a rolling update is underway
○ halt advancement if pods are unhealthy
● increase canary traffic weight percentage from 0% to 5% (step weight)
● check canary HTTP request success rate and latency
○ halt advancement if any metric is under the specified threshold
○ increment the failed checks counter
14. Flagger control loop stage 6-7
● check if the number of failed checks reached the threshold
○ route all traffic to primary
○ scale to zero the canary deployment
○ mark canary as failed
○ wait for the canary to be updated (revision bump) and start over
● increase canary traffic by 5% (step weight)
○ halt advancement while canary request success rate is under the threshold
○ halt advancement while canary request duration P99 is over the threshold
○ halt advancement if the primary or canary deployment becomes unhealthy
○ halt advancement while canary deployment is being scaled up/down by HPA
15. Flagger control loop stage 8-12
● promote canary to primary when canary weight reaches max
○ copy canary deployment spec template over primary
● wait for primary rolling update to finish
○ halt advancement if pods are unhealthy
● route all traffic to primary
● scale to zero the canary deployment
● mark canary as finished
● wait for the canary to be updated (revision bump) and start over