SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
1
Have some
Bad News &
Good News!
Just because you containerized, Kubernetized and Cloudified your
application, doesn’t mean its Reliable, Scalable and Secured automatically
Bad News First!
• Your Hardware will fail
• Your Enterprise grade application will fail
• Your Cloud will fail
• Your Kubernetes cluster will fail
Embrace it!
QoSLayer Application Layer Cluster Layer Infrastructure Layer
Reliable
Scalable
Available
Secured
Performance
INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS
INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A POD BY MISTAKE ☹
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
A HARDWARE FAILURE OF THE PHYSICAL MACHINE or VM
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A NODE BY MISTAKE ☹
INVOLUNTARY DISRUPTIONS
Master
Worker 1
New Pod
Worker 2
login Emp Emp
POD GETS EVICTED FROM NODE DUE TO RESOURCE CONSTRAINTS
New Pod New Pod
New Pod
New Pod New Pod New Pod
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
Master
Worker 1 Worker 2
login Emp Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp
Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp
Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
PENDING QUEUE!
Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade
Good news -
Solution?
https://www.plectica.com/maps/I7WZTGITU/edit/RAKHSLAXT
Choose Right Controller/Storage Req
Pod Replicas
Application Upgrade Strategy
https://www.youtube.com/watch?v=c7ytxiddImw
spec:
replicas: 1
deployment.spec.replicas
deployment.spec.strategy statefulset.spec.updateStrategy
Recreate – deletes all
RollingUpdate – one pod upgrade at a time
OnDelete – only on Delete | Partition(canary)
RollingUpdate
daemonset.spec. updateStrategy
onDelete
RollingUpdate
https://www.youtube.com/watch?v=GQJP9QdHHs8
deployment daemonset statefulset job ephemeral persistent
https://www.youtube.com/watch?v=c7ytxiddImw
Pod eviction during
resource constraints
Node disk or mem
pressures
liveness
Liveness and Readiness Probes
readiness
pods/probe/exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
pods/probe/tcp-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
https://www.youtube.com/watch?v=u7sbDPmezAo
node-selectors
Affinity and Anti Affinity
nodeAffinity podAffinity and podAntiAffinity
pods/pod-nginx.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
pods/pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
pods/pod-with-pod-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
Pod eviction during
resource constraints
Node disk or mem
pressures
Not all similar Pods
folks together
naive-dep-login
Affinity and Anti Affinity
self-relialized-dep-login
01-affinity-antiaffinity1-naive-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"
01-affinity-antiaffinity2-self-relialized-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: color
operator: In
values:
- blue
weight: 1
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"
taint
Taints and Tolerations
Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a
hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.
toleration
kubectl taint nodes node1 key=value:NoSchedule pods/pod-with-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "example-key"
operator: "Exists"
effect: "NoSchedule"
One or more taints are applied to
a node; this marks that the node
should not accept any pods that
do not tolerate the taints.
Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
kubectl delete pod login-abcdf-
123adfc
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
Deleting a Pod! - #ClaGIFied
Controllers will
start panicking
Kubectl Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
Kubectl drain
node1
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
For Every Node –
Cordon it
For Every
POD
Cordon it –
Mark
Unschedulable
Is
PDB
met?
Retry
Draining
a Node!
-
#ClaGIFied
POD
DISRUPTION
BUDGET
POD DISRUPTION BUDGET
A PDB limits the number of pods of a replicated
application that are down simultaneously from
voluntary disruptions.
How
Does that
Work?
Your Deployment
Pod Disruption Budget
PDB
e001/pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: loginBudget
spec:
minAvailable: 1
selector:
matchLabels:
app: login
e001/dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
.
.
.
.
.
.
Admin calls kubectl drain
Programmatically using
Eviction API
https://www.youtube.com/watch?v=pNbkZMEDevs
VOLUNTARY DISRUPTIONS
Disclaimer: Not all disruptions will be protected by
PDB
Some examples includes:
1.Deleting a deployment directly
2.Deleting a pod directly
How to determine
right value for my
PDB?
How to determine right value for my PDB?
• There is no single rule for this
Few Examples will be:
1. You are running a Consul cluster with K8S and you want to maintain a quorum of minimum 3 server components for fault
tolerance. In this case we can specify PDB’s minAvailable as 3.
2. You are running a statefulset for your database with K8S. And here you can specify PDB to avoid disruption in that DB, may
be you need respective team to take DB backups and then confirm that you can perform the disruption.
3. For stateless microservice, you might say I need minimum 1 replica running all the time and set PDB accordingly. Like we
saw in our demo sometime back.
4. And the list goes on.
So it means for every workload you are running in your cluster the setup of PDB can differ.
https://www.youtube.com/watch?v=pNbkZMEDevs
https://github.com/mikkeloscar/pdb-controller/
The controller simply gets all Pod Disruption Budgets for each namespace and
compares them to Deployments and StatefulSets. For any resource with
more than 1 replica and no matching Pod Disruption Budget, a default PDB
will be created
Cool tip on PDB controller
resources.requests(limits).cpu
Resource Constraints and PriorityClass
PriorityClass – Non-Namespaced object
containers:
- name: login
image: "busybox:1"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
command:
- sleep
- "7200"
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for
High Priority service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 5000
globalDefault: false
description: "This priority class should be used for
Low Priority service pods only."
resources.requests(limits).memory
spec:
priorityClassName: high-priority
Pod Spec with reference to PriorityClassName
Pod eviction during
resource constraints
failure-domain.beta.kubernetes.io/zone(< 1.17)
topology.kubernetes.io/zone (>= 1.17)
Topology Spread – Hosts/Zones/Regions
pods/pod-with-pod-affinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: topo-emp-deployment
labels:
app: emp
spec:
replicas: 2
selector:
matchLabels:
app: emp
template:
metadata:
labels:
app: emp
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: failure-
domain.beta.kubernetes.io/zone
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
topologyKey: kubernetes.io/hostname
kubernetes.io/hostname
topologyKey: failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/region(< 1.17)
topology.kubernetes.io/region (>= 1.17)
Not all similar Pods
folks together
Pods are HA during zonal
or Region Failure
https://cloud.google.com/compute/docs/regions-zones
https://docs.microsoft.com/en-us/azure/availability-zones/az-overview
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-
regions-availability-zones.html#concepts-availability-zones
AZ’s are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.
Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade
Choose Right
Controller
Pod Replicas
Application
Upgrade Strategy
PDB
Affinity and Anti
Affinity/Taints and
Tolerations
Taints and
Tolerations
Topology Spread –
Hosts/Zones/
Regions
Resource
Constraints and
PriorityClass
QoSLayer Application Layer Cluster Layer Infrastructure Layer
Reliable
Scalable
Available
Secured
Performance
Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
39

Weitere ähnliche Inhalte

Was ist angesagt?

Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformLivePerson
 
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech TalkSpring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech TalkRed Hat Developers
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance testBryan Liu
 
Adopting Java for the Serverless world at JUG Hamburg
Adopting Java for the Serverless world at  JUG HamburgAdopting Java for the Serverless world at  JUG Hamburg
Adopting Java for the Serverless world at JUG HamburgVadym Kazulkin
 
What's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkRed Hat Developers
 
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-RanchersTommy Lee
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverBlazeMeter
 
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Severalnines
 
Testing the Enterprise layers, with Arquillian
Testing the Enterprise layers, with ArquillianTesting the Enterprise layers, with Arquillian
Testing the Enterprise layers, with ArquillianVirtual JBoss User Group
 
Perfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka PerformancesPerfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka PerformancesFlorent Ramiere
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
 
Go語言開發APM微服務在Kubernetes之經驗分享
Go語言開發APM微服務在Kubernetes之經驗分享Go語言開發APM微服務在Kubernetes之經驗分享
Go語言開發APM微服務在Kubernetes之經驗分享Te-Yen Liu
 
Cloud Foundry | How it works
Cloud Foundry | How it worksCloud Foundry | How it works
Cloud Foundry | How it worksKazuto Kusama
 
Staying out of_trouble_with_k8s_on_aws
Staying out of_trouble_with_k8s_on_awsStaying out of_trouble_with_k8s_on_aws
Staying out of_trouble_with_k8s_on_awsAdam Hamsik
 
DCEU 18: How To Build Your Containerization Strategy
DCEU 18: How To Build Your Containerization StrategyDCEU 18: How To Build Your Containerization Strategy
DCEU 18: How To Build Your Containerization StrategyDocker, Inc.
 
Unleashing Docker with Pipelines in Bitbucket Cloud
Unleashing Docker with Pipelines in Bitbucket CloudUnleashing Docker with Pipelines in Bitbucket Cloud
Unleashing Docker with Pipelines in Bitbucket CloudAtlassian
 
Adding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with KeptnAdding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with KeptnAndreas Grabner
 
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWSAutomated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWSBamdad Dashtban
 

Was ist angesagt? (20)

Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech TalkSpring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance test
 
Adopting Java for the Serverless world at JUG Hamburg
Adopting Java for the Serverless world at  JUG HamburgAdopting Java for the Serverless world at  JUG Hamburg
Adopting Java for the Serverless world at JUG Hamburg
 
What's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech Talk
 
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & Webdriver
 
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
 
Testing the Enterprise layers, with Arquillian
Testing the Enterprise layers, with ArquillianTesting the Enterprise layers, with Arquillian
Testing the Enterprise layers, with Arquillian
 
Perfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka PerformancesPerfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka Performances
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About Orchestration
 
New AWS Services
New AWS ServicesNew AWS Services
New AWS Services
 
Go語言開發APM微服務在Kubernetes之經驗分享
Go語言開發APM微服務在Kubernetes之經驗分享Go語言開發APM微服務在Kubernetes之經驗分享
Go語言開發APM微服務在Kubernetes之經驗分享
 
Perlbrew
PerlbrewPerlbrew
Perlbrew
 
Cloud Foundry | How it works
Cloud Foundry | How it worksCloud Foundry | How it works
Cloud Foundry | How it works
 
Staying out of_trouble_with_k8s_on_aws
Staying out of_trouble_with_k8s_on_awsStaying out of_trouble_with_k8s_on_aws
Staying out of_trouble_with_k8s_on_aws
 
DCEU 18: How To Build Your Containerization Strategy
DCEU 18: How To Build Your Containerization StrategyDCEU 18: How To Build Your Containerization Strategy
DCEU 18: How To Build Your Containerization Strategy
 
Unleashing Docker with Pipelines in Bitbucket Cloud
Unleashing Docker with Pipelines in Bitbucket CloudUnleashing Docker with Pipelines in Bitbucket Cloud
Unleashing Docker with Pipelines in Bitbucket Cloud
 
Adding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with KeptnAdding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with Keptn
 
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWSAutomated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
 

Ähnlich wie Production Grade Kubernetes Applications

AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事smalltown
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with KubernetesSatnam Singh
 
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...Codemotion
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsFederico Michele Facca
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
DevOps with Kubernetes and Helm - Jenkins World Edition
DevOps with Kubernetes and Helm - Jenkins World EditionDevOps with Kubernetes and Helm - Jenkins World Edition
DevOps with Kubernetes and Helm - Jenkins World EditionJessica Deen
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
Upgrade Kubernetes the boring way
Upgrade Kubernetes the boring wayUpgrade Kubernetes the boring way
Upgrade Kubernetes the boring wayOleksandr Slynko
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
 
Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Ryan Jarvinen
 
Kubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of ContainersKubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of ContainersKel Cecil
 
Kubernetes Operators: Rob Szumski
Kubernetes Operators: Rob SzumskiKubernetes Operators: Rob Szumski
Kubernetes Operators: Rob SzumskiRedis Labs
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentationProdops.io
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
[k8s] Kubernetes terminology (1).pdf
[k8s] Kubernetes terminology (1).pdf[k8s] Kubernetes terminology (1).pdf
[k8s] Kubernetes terminology (1).pdfFrederik Wouters
 
Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesObjectRocket
 
Framework Agnostic Discovery
Framework Agnostic DiscoveryFramework Agnostic Discovery
Framework Agnostic DiscoveryKubeAcademy
 

Ähnlich wie Production Grade Kubernetes Applications (20)

AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with Kubernetes
 
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...
Salvatore Incandela, Fabio Marinelli - Using Spinnaker to Create a Developmen...
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
DevOps with Kubernetes and Helm - Jenkins World Edition
DevOps with Kubernetes and Helm - Jenkins World EditionDevOps with Kubernetes and Helm - Jenkins World Edition
DevOps with Kubernetes and Helm - Jenkins World Edition
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
Upgrade Kubernetes the boring way
Upgrade Kubernetes the boring wayUpgrade Kubernetes the boring way
Upgrade Kubernetes the boring way
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
 
Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17
 
Kubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of ContainersKubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of Containers
 
Kubernetes Operators: Rob Szumski
Kubernetes Operators: Rob SzumskiKubernetes Operators: Rob Szumski
Kubernetes Operators: Rob Szumski
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentation
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
[k8s] Kubernetes terminology (1).pdf
[k8s] Kubernetes terminology (1).pdf[k8s] Kubernetes terminology (1).pdf
[k8s] Kubernetes terminology (1).pdf
 
Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on Kubernetes
 
Framework Agnostic Discovery
Framework Agnostic DiscoveryFramework Agnostic Discovery
Framework Agnostic Discovery
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Production Grade Kubernetes Applications

  • 1. Chennai, India 755 02 01 268 Narayanan.kmu@gmail.com bribebybytes.github.io/landing-page NARAYANAN KRISHNAMURTHY Technical Architect, ADP India Cloud Architect with 15 Years in IT CLOUD DEVOPS TELECOM Skills Languages English Tamil Hindi Being Social Bribe By Bytes Hobby 1
  • 2.
  • 3. Have some Bad News & Good News!
  • 4. Just because you containerized, Kubernetized and Cloudified your application, doesn’t mean its Reliable, Scalable and Secured automatically
  • 5. Bad News First! • Your Hardware will fail • Your Enterprise grade application will fail • Your Cloud will fail • Your Kubernetes cluster will fail Embrace it!
  • 6. QoSLayer Application Layer Cluster Layer Infrastructure Layer Reliable Scalable Available Secured Performance
  • 9. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp CLUSTER ADMIN DELETES A POD BY MISTAKE ☹
  • 10. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp A HARDWARE FAILURE OF THE PHYSICAL MACHINE or VM
  • 11. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp CLUSTER ADMIN DELETES A NODE BY MISTAKE ☹
  • 12. INVOLUNTARY DISRUPTIONS Master Worker 1 New Pod Worker 2 login Emp Emp POD GETS EVICTED FROM NODE DUE TO RESOURCE CONSTRAINTS New Pod New Pod New Pod New Pod New Pod New Pod
  • 13. VOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp
  • 14. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS
  • 15. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS
  • 16. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS PENDING QUEUE!
  • 17. Cluster admin deletes a pod by mistake A hardware failure of the physical machine or Virtual Machine Cluster admin deletes a node by mistake Pod gets evicted from node due to resource constraints Draining a node for repair or upgrade or to scale down Application Upgrade
  • 20. Choose Right Controller/Storage Req Pod Replicas Application Upgrade Strategy https://www.youtube.com/watch?v=c7ytxiddImw spec: replicas: 1 deployment.spec.replicas deployment.spec.strategy statefulset.spec.updateStrategy Recreate – deletes all RollingUpdate – one pod upgrade at a time OnDelete – only on Delete | Partition(canary) RollingUpdate daemonset.spec. updateStrategy onDelete RollingUpdate https://www.youtube.com/watch?v=GQJP9QdHHs8 deployment daemonset statefulset job ephemeral persistent https://www.youtube.com/watch?v=c7ytxiddImw Pod eviction during resource constraints Node disk or mem pressures
  • 21. liveness Liveness and Readiness Probes readiness pods/probe/exec-liveness.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: k8s.gcr.io/busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 pods/probe/tcp-liveness-readiness.yaml apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers: - name: goproxy image: k8s.gcr.io/goproxy:0.1 ports: - containerPort: 8080 readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20 https://www.youtube.com/watch?v=u7sbDPmezAo
  • 22. node-selectors Affinity and Anti Affinity nodeAffinity podAffinity and podAntiAffinity pods/pod-nginx.yaml apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd pods/pod-with-node-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0 pods/pod-with-pod-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0 Pod eviction during resource constraints Node disk or mem pressures Not all similar Pods folks together
  • 23. naive-dep-login Affinity and Anti Affinity self-relialized-dep-login 01-affinity-antiaffinity1-naive-dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login template: metadata: labels: app: login spec: containers: - name: login image: "busybox:1" command: - sleep - "7200" 01-affinity-antiaffinity2-self-relialized-dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login template: metadata: labels: app: login spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: color operator: In values: - blue weight: 1 containers: - name: login image: "busybox:1" command: - sleep - "7200"
  • 24. taint Taints and Tolerations Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods. toleration kubectl taint nodes node1 key=value:NoSchedule pods/pod-with-toleration.yaml apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent tolerations: - key: "example-key" operator: "Exists" effect: "NoSchedule" One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
  • 25. Kube-api-server Service controllers or kube-proxy kubelet in Node Container Runtime (e.g., Docker) Containers kubectl delete pod login-abcdf- 123adfc Pod Set to ‘Terminating’ state Pod removed from Endpoints pre-stop hook trigged SIGTERM signal is sent to each container kill <process> pre-stop hook executed Pod no more considered as valid replica SIGKILL signal is sent to each container kill -9 <process> Remove Pod from API Server Initiates SIGTERM Initiate SIGKILL Pods garbage collected 30 secs Grace Period Remove Pods and Cleans-up Deleting a Pod! - #ClaGIFied Controllers will start panicking
  • 26. Kubectl Kube-api-server Service controllers or kube-proxy kubelet in Node Container Runtime (e.g., Docker) Containers Kubectl drain node1 Pod Set to ‘Terminating’ state Pod removed from Endpoints pre-stop hook trigged SIGTERM signal is sent to each container kill <process> pre-stop hook executed Pod no more considered as valid replica SIGKILL signal is sent to each container kill -9 <process> Remove Pod from API Server Initiates SIGTERM Initiate SIGKILL Pods garbage collected 30 secs Grace Period Remove Pods and Cleans-up For Every Node – Cordon it For Every POD Cordon it – Mark Unschedulable Is PDB met? Retry Draining a Node! - #ClaGIFied
  • 28. POD DISRUPTION BUDGET A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions.
  • 30. Your Deployment Pod Disruption Budget PDB e001/pdb.yaml apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: loginBudget spec: minAvailable: 1 selector: matchLabels: app: login e001/dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login . . . . . . Admin calls kubectl drain Programmatically using Eviction API https://www.youtube.com/watch?v=pNbkZMEDevs
  • 31. VOLUNTARY DISRUPTIONS Disclaimer: Not all disruptions will be protected by PDB Some examples includes: 1.Deleting a deployment directly 2.Deleting a pod directly
  • 32. How to determine right value for my PDB?
  • 33. How to determine right value for my PDB? • There is no single rule for this Few Examples will be: 1. You are running a Consul cluster with K8S and you want to maintain a quorum of minimum 3 server components for fault tolerance. In this case we can specify PDB’s minAvailable as 3. 2. You are running a statefulset for your database with K8S. And here you can specify PDB to avoid disruption in that DB, may be you need respective team to take DB backups and then confirm that you can perform the disruption. 3. For stateless microservice, you might say I need minimum 1 replica running all the time and set PDB accordingly. Like we saw in our demo sometime back. 4. And the list goes on. So it means for every workload you are running in your cluster the setup of PDB can differ. https://www.youtube.com/watch?v=pNbkZMEDevs
  • 34. https://github.com/mikkeloscar/pdb-controller/ The controller simply gets all Pod Disruption Budgets for each namespace and compares them to Deployments and StatefulSets. For any resource with more than 1 replica and no matching Pod Disruption Budget, a default PDB will be created Cool tip on PDB controller
  • 35. resources.requests(limits).cpu Resource Constraints and PriorityClass PriorityClass – Non-Namespaced object containers: - name: login image: "busybox:1" resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" command: - sleep - "7200" apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "This priority class should be used for High Priority service pods only." --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: low-priority value: 5000 globalDefault: false description: "This priority class should be used for Low Priority service pods only." resources.requests(limits).memory spec: priorityClassName: high-priority Pod Spec with reference to PriorityClassName Pod eviction during resource constraints
  • 36. failure-domain.beta.kubernetes.io/zone(< 1.17) topology.kubernetes.io/zone (>= 1.17) Topology Spread – Hosts/Zones/Regions pods/pod-with-pod-affinity.yaml apiVersion: apps/v1 kind: Deployment metadata: name: topo-emp-deployment labels: app: emp spec: replicas: 2 selector: matchLabels: app: emp template: metadata: labels: app: emp spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: topologyKey: failure- domain.beta.kubernetes.io/zone containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0 topologyKey: kubernetes.io/hostname kubernetes.io/hostname topologyKey: failure-domain.beta.kubernetes.io/region failure-domain.beta.kubernetes.io/region(< 1.17) topology.kubernetes.io/region (>= 1.17) Not all similar Pods folks together Pods are HA during zonal or Region Failure https://cloud.google.com/compute/docs/regions-zones https://docs.microsoft.com/en-us/azure/availability-zones/az-overview https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using- regions-availability-zones.html#concepts-availability-zones AZ’s are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.
  • 37. Cluster admin deletes a pod by mistake A hardware failure of the physical machine or Virtual Machine Cluster admin deletes a node by mistake Pod gets evicted from node due to resource constraints Draining a node for repair or upgrade or to scale down Application Upgrade Choose Right Controller Pod Replicas Application Upgrade Strategy PDB Affinity and Anti Affinity/Taints and Tolerations Taints and Tolerations Topology Spread – Hosts/Zones/ Regions Resource Constraints and PriorityClass
  • 38. QoSLayer Application Layer Cluster Layer Infrastructure Layer Reliable Scalable Available Secured Performance
  • 39. Chennai, India 755 02 01 268 Narayanan.kmu@gmail.com bribebybytes.github.io/landing-page NARAYANAN KRISHNAMURTHY Technical Architect, ADP India Cloud Architect with 15 Years in IT CLOUD DEVOPS TELECOM Skills Languages English Tamil Hindi Being Social Bribe By Bytes Hobby 39