SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
KUBERNETES IN
PRODUCTION:
LESSONS LEARNT
1
Introduction
● Kubernetes in production for 6+ months, handling 2K requests/second
● 100+ micro-services and 200+ components like Databases, cache stores and
queues
● 1800+ pods
● New environment setup in weeks through automation
● Cost savings through optimum utilization of resources
2
Cluster Creation
● Cluster pod address range (cluster-ipv4-cidr)
○ Size
○ IP conflict between clusters in different Google Cloud Platform (GCP) projects
● Cluster type
○ Zonal
○ Regional
● Add storage class for SSD
3
Namespaces != Environments
4
Pod
Staging Cluster
ns
Pod Pod Pod
Cluster
staging ns
Pod PodPod
production ns
Pod PodPod
Production Cluster
ns
Pod Pod Pod
Team as Namespace
5
Pod Pod
Cluster
platform ns
Pod PodPod
promotions ns
Pod PodPod
Helm & Tiller
6
Global vs Namespace-scoped Tiller
7
Caveat:
ClusterRoleBinding cannot be
created using these tillers
Global vs Namespace-scoped Tiller
8
CI/CD for Helm
9
Rolling Update & Readiness probe
10
Service
V2
V1 V1
Service
V2
V1 V1 Service
V2
V1
V2
Service
V2
V2
Deploy one instance of new version
Attach to load balancer Delete one instance of old version
Deploy one instance of new version
Delete another instance of old version
Service
V2
V1 V1
Crash loop in new version
V2
Unhealthy
V2
Healthy
maxSurge: 1
maxUnavailable: 1
minReadySeconds: 3
Database on containers
● High Availability is important in container world
○ Pods are not durable
● Use persistent volumes
● Statefulset - What & Why?
○ Ordered creation, deletion and scaling
○ Stable Identifier for pods
○ Each pod will have dedicated persistent volume
11
Database on containers
12
K8s Cluster
Pod Pod Pod
Statefulset
MasterSlave 1 Slave 2
● Statefulset alone is not enough
for achieving High Availability
● Postgres cluster => Stolon
● Use pod anti-affinity to reduce
impact of a node failure
Isolate Stateful & Stateless Apps
● Why?
○ Separation of concerns
○ Different resource consumption pattern for stateful and stateless apps
○ Apps undergo frequent updates while components does not
● Separate Node pool
● Separate Cluster
○ Consul and kube-consul-register for service discovery
13
Inter Cluster - Service Discovery
14
Resource Requests & Limits
Requests:
When Containers have resource requests specified, the K8s scheduler can make better decisions about
which nodes to place Pods on.
Limits:
When Containers have their limits specified, contention for resources on a node can be handled in a
better way by the K8s scheduler.
15
● How we approached?
○ Start with default requests and limits which is unlimited
○ Learn the patterns over time and introduce appropriate requests and limits
● Advantages:
○ Measure the full utilization requirement of each application separately
● Disadvantages:
○ Unbalanced pod scheduling and this led to resource crunch
○ Auto scaling of nodes in GKE doesn’t work
Resource Requests & Limits
16
Monitoring in K8s
● Why it is important in container world?
● Tools:
○ Prometheus in K8s - Prometheus operator
○ Grafana
● Metrics exporters as separate pods:
○ Independent from the actual component
● Metrics exporters as sidecar of the component pod
○ Needs restart of actual component in case of an update
17
Monitoring in K8s
18
● Dashboards
○ Node metrics
○ Node Pod metrics
○ Ingress controller
○ K8s API latency
○ K8s persistent volumes
Alerting in K8s
● Pods - crash loops, readiness
● Nodes - Restart, Kubelet process restart, Docker daemon restart
● Sudden CPU and Memory, Disk Utilization spikes of Pods and Nodes
○ Indicates anomaly
○ If resource consumption of a node goes beyond configured eviction policy then pods are
evicted based on priority.
19
Monitoring & Alerting Setup
20
K8s Cluster
Pod Pod
Pod PodAlertManager Prometheus
Monitoring & Alerting
Node Pool
Default node pool
GrafanaSlack
Kubernetes API Gotchas
● Downtime during K8s master upgrades in GKE
○ Applications dependent on Kubernetes API are affected
○ Maintenance Window (Beta) - GKE allows to configure a 4 hour time frame window
● Reduce application runtime dependency on K8s API
21
GKE Limitations
● Only 16 disks can be attached per node
● Only 8 SSD disks can be attached per node
● Max of 50 internal load balancer is allowed per project in GKE
● Pod IP range decides the number of nodes
● No control over K8s master nodes
22
Development practices that help containerization
● Config - Store config in the environment
● Logs - Treat logs as event streams
○ Centralized logging - Stackdriver / ELK
● Processes - Execute app as one or more stateless processes
● Concurrency - Scale out via process model
● The Twelve-Factor App - https://12factor.net
23
QUESTIONS ?
24
25
THANK YOU
Arunvel Sriram
&
Prabhu Jayakumar

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
Towards a self automated CERN Cloud
Towards a self automated CERN CloudTowards a self automated CERN Cloud
Towards a self automated CERN Cloud
 
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
 
GKE Tip Series - Usage Metering
GKE Tip Series -  Usage MeteringGKE Tip Series -  Usage Metering
GKE Tip Series - Usage Metering
 
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep diveKubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
 
PuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, Puppet
PuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, PuppetPuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, Puppet
PuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, Puppet
 
OpenShift.io on Gluster
OpenShift.io on GlusterOpenShift.io on Gluster
OpenShift.io on Gluster
 
Handling Kubernetes Resources
Handling Kubernetes ResourcesHandling Kubernetes Resources
Handling Kubernetes Resources
 
Persistent Storage in Openshift using GlusterFS
Persistent Storage in Openshift using GlusterFSPersistent Storage in Openshift using GlusterFS
Persistent Storage in Openshift using GlusterFS
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
Containarized Gluster Storage in Kubernetes
Containarized Gluster Storage in KubernetesContainarized Gluster Storage in Kubernetes
Containarized Gluster Storage in Kubernetes
 
GlusterFS Containers
GlusterFS ContainersGlusterFS Containers
GlusterFS Containers
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 
GDG London Workshop: Build GCP infrastructure with Terraform
GDG London Workshop: Build GCP infrastructure with Terraform GDG London Workshop: Build GCP infrastructure with Terraform
GDG London Workshop: Build GCP infrastructure with Terraform
 
Kubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz RiceKubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz Rice
 
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
 
How to manage Kubernetes at scale with just git
How to manage Kubernetes at scale with just git How to manage Kubernetes at scale with just git
How to manage Kubernetes at scale with just git
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep dive
 

Ähnlich wie Kubernetes in Production: Lessons Learnt

Ähnlich wie Kubernetes in Production: Lessons Learnt (20)

Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018Kubernetes and Cloud Native Update Q4 2018
Kubernetes and Cloud Native Update Q4 2018
 
Deploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade KubernetesDeploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade Kubernetes
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
 
Cloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAP
Cloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAPCloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAP
Cloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAP
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker ee
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
 
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive MeetupGoogle Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
 
Benchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public cloudsBenchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public clouds
 
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
The ultimate Kubernetes Deployment Checklist - Infra to MicroservicesThe ultimate Kubernetes Deployment Checklist - Infra to Microservices
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
 
Proteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift PitfallsProteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift Pitfalls
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Kubernetes in Production: Lessons Learnt

  • 2. Introduction ● Kubernetes in production for 6+ months, handling 2K requests/second ● 100+ micro-services and 200+ components like Databases, cache stores and queues ● 1800+ pods ● New environment setup in weeks through automation ● Cost savings through optimum utilization of resources 2
  • 3. Cluster Creation ● Cluster pod address range (cluster-ipv4-cidr) ○ Size ○ IP conflict between clusters in different Google Cloud Platform (GCP) projects ● Cluster type ○ Zonal ○ Regional ● Add storage class for SSD 3
  • 4. Namespaces != Environments 4 Pod Staging Cluster ns Pod Pod Pod Cluster staging ns Pod PodPod production ns Pod PodPod Production Cluster ns Pod Pod Pod
  • 5. Team as Namespace 5 Pod Pod Cluster platform ns Pod PodPod promotions ns Pod PodPod
  • 8. Caveat: ClusterRoleBinding cannot be created using these tillers Global vs Namespace-scoped Tiller 8
  • 10. Rolling Update & Readiness probe 10 Service V2 V1 V1 Service V2 V1 V1 Service V2 V1 V2 Service V2 V2 Deploy one instance of new version Attach to load balancer Delete one instance of old version Deploy one instance of new version Delete another instance of old version Service V2 V1 V1 Crash loop in new version V2 Unhealthy V2 Healthy maxSurge: 1 maxUnavailable: 1 minReadySeconds: 3
  • 11. Database on containers ● High Availability is important in container world ○ Pods are not durable ● Use persistent volumes ● Statefulset - What & Why? ○ Ordered creation, deletion and scaling ○ Stable Identifier for pods ○ Each pod will have dedicated persistent volume 11
  • 12. Database on containers 12 K8s Cluster Pod Pod Pod Statefulset MasterSlave 1 Slave 2 ● Statefulset alone is not enough for achieving High Availability ● Postgres cluster => Stolon ● Use pod anti-affinity to reduce impact of a node failure
  • 13. Isolate Stateful & Stateless Apps ● Why? ○ Separation of concerns ○ Different resource consumption pattern for stateful and stateless apps ○ Apps undergo frequent updates while components does not ● Separate Node pool ● Separate Cluster ○ Consul and kube-consul-register for service discovery 13
  • 14. Inter Cluster - Service Discovery 14
  • 15. Resource Requests & Limits Requests: When Containers have resource requests specified, the K8s scheduler can make better decisions about which nodes to place Pods on. Limits: When Containers have their limits specified, contention for resources on a node can be handled in a better way by the K8s scheduler. 15
  • 16. ● How we approached? ○ Start with default requests and limits which is unlimited ○ Learn the patterns over time and introduce appropriate requests and limits ● Advantages: ○ Measure the full utilization requirement of each application separately ● Disadvantages: ○ Unbalanced pod scheduling and this led to resource crunch ○ Auto scaling of nodes in GKE doesn’t work Resource Requests & Limits 16
  • 17. Monitoring in K8s ● Why it is important in container world? ● Tools: ○ Prometheus in K8s - Prometheus operator ○ Grafana ● Metrics exporters as separate pods: ○ Independent from the actual component ● Metrics exporters as sidecar of the component pod ○ Needs restart of actual component in case of an update 17
  • 18. Monitoring in K8s 18 ● Dashboards ○ Node metrics ○ Node Pod metrics ○ Ingress controller ○ K8s API latency ○ K8s persistent volumes
  • 19. Alerting in K8s ● Pods - crash loops, readiness ● Nodes - Restart, Kubelet process restart, Docker daemon restart ● Sudden CPU and Memory, Disk Utilization spikes of Pods and Nodes ○ Indicates anomaly ○ If resource consumption of a node goes beyond configured eviction policy then pods are evicted based on priority. 19
  • 20. Monitoring & Alerting Setup 20 K8s Cluster Pod Pod Pod PodAlertManager Prometheus Monitoring & Alerting Node Pool Default node pool GrafanaSlack
  • 21. Kubernetes API Gotchas ● Downtime during K8s master upgrades in GKE ○ Applications dependent on Kubernetes API are affected ○ Maintenance Window (Beta) - GKE allows to configure a 4 hour time frame window ● Reduce application runtime dependency on K8s API 21
  • 22. GKE Limitations ● Only 16 disks can be attached per node ● Only 8 SSD disks can be attached per node ● Max of 50 internal load balancer is allowed per project in GKE ● Pod IP range decides the number of nodes ● No control over K8s master nodes 22
  • 23. Development practices that help containerization ● Config - Store config in the environment ● Logs - Treat logs as event streams ○ Centralized logging - Stackdriver / ELK ● Processes - Execute app as one or more stateless processes ● Concurrency - Scale out via process model ● The Twelve-Factor App - https://12factor.net 23