SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Crashing Pods
–
How to compensate for such an outage?
Michael Hofmann
Hofmann IT-Consulting
info@hofmann-itconsulting.de
https://hofmann-itconsulting.de
Crashing pods?
●
Controlled (error state): rolling update
●
Application deadlock
– Thread pool full
– Thread deadlock situation (detection in JVM)
●
Memory Leak (out of memory)
●
Bug in application or application server
Mitigation/Compensation Strategies
●
Quick recognition of error state for recovery
●
Short time for eventual consistency
●
Controlled error state (e.g. rolling update)
●
Intelligent routing (outlier detection)
●
Classic resilience
Kubernetes Architecture
Source: https://kubernetes.io/docs/concepts/overview/components/
Liveness and readiness probes
spec:
containers:
- name: crashing-pod
image: hofmann/crashing-pod:latest
imagePullPolicy: Never
ports:
- containerPort: 9080
livenessProbe:
httpGet:
path: /health/live
port: 9080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /health/ready
port: 9080
initialDelaySeconds: 15
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
●
Difference liveness
and readiness probe
●
Only pods with a
successful readiness
probe will be
assigned to a service
(endpoint, get IP)
From service to pod
●
Service
– Called by client: <svc-name>.<ns>.svc.cluster.local
– Basis for dns naming
– References pod by labels
●
Pods
– Assigned IPs
●
Endpoints
– Connects service to pod-instances (IPs)
– stored in etcd: IP and port
– Endpoint refresh: pod created, pod deleted, pod label modified
– Basis for: kube-proxy, ingress controller, coreDNS, cloud provider, service mesh
Workflow
Endpoint outdated
●
Kubelet:
– Readiness probes
– Housekeeping interval to update endpoint
●
Kube-proxy (iptables settings)
●
Kubernetes DNS (coreDNS)
●
Caching of DNS values in client
Rolling update
●
Update running pods
– Defined by rolling update strategy
●
Influenced by
– liveness and readiness probes
– preStop lifecycle hook
●
Distributed infrastructure can react on error state (update components)
– SIGTERM (not SIGKILL)
●
Shutdown hook in application server (finish open requests)
●
Target: zero-downtime-deployment (should...)
Rolling update & preStop Hook
readinessProbe:
...
lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "sleep 30"]
strategy: # default of k8s
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # max. 1 over-provisioned pod
maxUnavailable: 0 # no unavailable pod during update
Container
Deployment
Intelligent Routing
●
Server side load balancing
– Endpoint handling done by infrastructure (K8S)
– Requests will be routed to faulty instance until platform evicts faulty instance
●
Client side load balancing
– Client must now all endpoints: dependency on infrastructure (service registry)
– Can react on faulty request
●
Outlier detection (additional to client side LB)
– Faulty instance (HTTP >= 500) will be evicted (period of time)
– Reacts faster than distributed infrastructure
Resilience
●
Frameworks
– Server side load balancing
●
Retry storm on faulty pods
– Spring Cloud LoadBalancer (client side LB)
●
Since 2020
●
Generic abstraction for Netflix Ribbon
●
Kubernetes and Cloud Foundry service registry
●
Service Mesh
Idempotency
●
Retry causes multiple calls!
●
GET, HEAD, OPTIONS, DELETE (if exists)
●
PUT
– Idempotent by definition
●
must be implemented idempotent (DuplicateKeyException)
– Primary key must be in payload
●
POST
– Idempotency key in header
– Idempotency key stored in separate table
– PUT semantics with primary key (header vs. payload)
Istio
Istio
●
Resilience
●
Client side load balancing (knows pods)
●
Outlier detection
●
Does it`s own health checks (in addition to
kubelet)
●
kubelet checks sidecar and workload together
Istio
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: mesh-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
Istio
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: crashing-pod
spec:
gateways:
- mesh-gateway
hosts:
- "*"
http:
- match:
- uri:
prefix: /
route:
- destination:
port:
number: 9080
host: crashing-pod
subset: v1
retries:
attempts: 3
perTryTimeout: 1s
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: crashing-pod
spec:
host: crashing-pod
subsets:
- name: v1
labels:
app: crashing-pod
trafficPolicy:
tls:
mode: DISABLE
loadBalancer:
simple: ROUND_ROBIN
outlierDetection:
consecutiveGatewayErrors: 1
interval: 1.0s
baseEjectionTime: 30s
Recap: Quick Error-Recognition
●
Interval of health probes (liveness, readiness) by kubelet
●
Other error detection by kubelet (OOM)
●
Problem: distributed architecture of K8S (propagation of
error event to components)
●
Error type:
– Controlled error state (e.g. rolling update)
– Fast detectable errors
– Slow detectable errors
Demo
Summary
●
Distributed architecture of K8S
●
Controlled error state: 99,9% (see rolling update) -->
100%?!
●
Mix of strategy necessary: 100%
– Client side load balancing
– Outlier detection
– Resilience
– Idempotency

Weitere ähnliche Inhalte

Mehr von Michael Hofmann

Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?Michael Hofmann
 
Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonService Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonMichael Hofmann
 
Service Mesh - Kilometer 30 im Microservices-Marathon
Service Mesh - Kilometer 30 im Microservices-MarathonService Mesh - Kilometer 30 im Microservices-Marathon
Service Mesh - Kilometer 30 im Microservices-MarathonMichael Hofmann
 
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderenAPI-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderenMichael Hofmann
 
Microprofile.io - Cloud Native mit Java EE
Microprofile.io - Cloud Native mit Java EEMicroprofile.io - Cloud Native mit Java EE
Microprofile.io - Cloud Native mit Java EEMichael Hofmann
 
Microservices mit Java EE - am Beispiel von IBM Liberty
Microservices mit Java EE - am Beispiel von IBM LibertyMicroservices mit Java EE - am Beispiel von IBM Liberty
Microservices mit Java EE - am Beispiel von IBM LibertyMichael Hofmann
 

Mehr von Michael Hofmann (6)

Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
 
Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonService Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathon
 
Service Mesh - Kilometer 30 im Microservices-Marathon
Service Mesh - Kilometer 30 im Microservices-MarathonService Mesh - Kilometer 30 im Microservices-Marathon
Service Mesh - Kilometer 30 im Microservices-Marathon
 
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderenAPI-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
 
Microprofile.io - Cloud Native mit Java EE
Microprofile.io - Cloud Native mit Java EEMicroprofile.io - Cloud Native mit Java EE
Microprofile.io - Cloud Native mit Java EE
 
Microservices mit Java EE - am Beispiel von IBM Liberty
Microservices mit Java EE - am Beispiel von IBM LibertyMicroservices mit Java EE - am Beispiel von IBM Liberty
Microservices mit Java EE - am Beispiel von IBM Liberty
 

Kürzlich hochgeladen

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Crashing Pods: How to Compensate for such an Outage?

  • 1. Crashing Pods – How to compensate for such an outage? Michael Hofmann Hofmann IT-Consulting info@hofmann-itconsulting.de https://hofmann-itconsulting.de
  • 2. Crashing pods? ● Controlled (error state): rolling update ● Application deadlock – Thread pool full – Thread deadlock situation (detection in JVM) ● Memory Leak (out of memory) ● Bug in application or application server
  • 3. Mitigation/Compensation Strategies ● Quick recognition of error state for recovery ● Short time for eventual consistency ● Controlled error state (e.g. rolling update) ● Intelligent routing (outlier detection) ● Classic resilience
  • 5. Liveness and readiness probes spec: containers: - name: crashing-pod image: hofmann/crashing-pod:latest imagePullPolicy: Never ports: - containerPort: 9080 livenessProbe: httpGet: path: /health/live port: 9080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 1 failureThreshold: 5 readinessProbe: httpGet: path: /health/ready port: 9080 initialDelaySeconds: 15 periodSeconds: 5 timeoutSeconds: 1 failureThreshold: 5 ● Difference liveness and readiness probe ● Only pods with a successful readiness probe will be assigned to a service (endpoint, get IP)
  • 6. From service to pod ● Service – Called by client: <svc-name>.<ns>.svc.cluster.local – Basis for dns naming – References pod by labels ● Pods – Assigned IPs ● Endpoints – Connects service to pod-instances (IPs) – stored in etcd: IP and port – Endpoint refresh: pod created, pod deleted, pod label modified – Basis for: kube-proxy, ingress controller, coreDNS, cloud provider, service mesh
  • 8. Endpoint outdated ● Kubelet: – Readiness probes – Housekeeping interval to update endpoint ● Kube-proxy (iptables settings) ● Kubernetes DNS (coreDNS) ● Caching of DNS values in client
  • 9. Rolling update ● Update running pods – Defined by rolling update strategy ● Influenced by – liveness and readiness probes – preStop lifecycle hook ● Distributed infrastructure can react on error state (update components) – SIGTERM (not SIGKILL) ● Shutdown hook in application server (finish open requests) ● Target: zero-downtime-deployment (should...)
  • 10. Rolling update & preStop Hook readinessProbe: ... lifecycle: preStop: exec: command: ["/bin/bash", "-c", "sleep 30"] strategy: # default of k8s type: RollingUpdate rollingUpdate: maxSurge: 1 # max. 1 over-provisioned pod maxUnavailable: 0 # no unavailable pod during update Container Deployment
  • 11. Intelligent Routing ● Server side load balancing – Endpoint handling done by infrastructure (K8S) – Requests will be routed to faulty instance until platform evicts faulty instance ● Client side load balancing – Client must now all endpoints: dependency on infrastructure (service registry) – Can react on faulty request ● Outlier detection (additional to client side LB) – Faulty instance (HTTP >= 500) will be evicted (period of time) – Reacts faster than distributed infrastructure
  • 12. Resilience ● Frameworks – Server side load balancing ● Retry storm on faulty pods – Spring Cloud LoadBalancer (client side LB) ● Since 2020 ● Generic abstraction for Netflix Ribbon ● Kubernetes and Cloud Foundry service registry ● Service Mesh
  • 13. Idempotency ● Retry causes multiple calls! ● GET, HEAD, OPTIONS, DELETE (if exists) ● PUT – Idempotent by definition ● must be implemented idempotent (DuplicateKeyException) – Primary key must be in payload ● POST – Idempotency key in header – Idempotency key stored in separate table – PUT semantics with primary key (header vs. payload)
  • 14. Istio
  • 15. Istio ● Resilience ● Client side load balancing (knows pods) ● Outlier detection ● Does it`s own health checks (in addition to kubelet) ● kubelet checks sidecar and workload together
  • 16. Istio apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: mesh-gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*"
  • 17. Istio apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: crashing-pod spec: gateways: - mesh-gateway hosts: - "*" http: - match: - uri: prefix: / route: - destination: port: number: 9080 host: crashing-pod subset: v1 retries: attempts: 3 perTryTimeout: 1s apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: crashing-pod spec: host: crashing-pod subsets: - name: v1 labels: app: crashing-pod trafficPolicy: tls: mode: DISABLE loadBalancer: simple: ROUND_ROBIN outlierDetection: consecutiveGatewayErrors: 1 interval: 1.0s baseEjectionTime: 30s
  • 18. Recap: Quick Error-Recognition ● Interval of health probes (liveness, readiness) by kubelet ● Other error detection by kubelet (OOM) ● Problem: distributed architecture of K8S (propagation of error event to components) ● Error type: – Controlled error state (e.g. rolling update) – Fast detectable errors – Slow detectable errors
  • 19. Demo
  • 20. Summary ● Distributed architecture of K8S ● Controlled error state: 99,9% (see rolling update) --> 100%?! ● Mix of strategy necessary: 100% – Client side load balancing – Outlier detection – Resilience – Idempotency