SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Practical
Chaos Engineering
Andrea Tosatto Jacopo Nardiello
IDI 2018
Andrea Tosatto Jacopo Nardiello
$ who
@_hilbert_ @jnardiello
Founder & DevOps Engineer
SIGHUP - sighup.io
(PowerDNS) Solution Engineer
Open-Xchange - open-xchange.com
Agenda
- Chaos Engineering, what the heck!?
- A Practical Approach
- Starting with Simple Tools
- Reverse Engineering
- Predictive Failures
- The Cloud Native Approach
- Reproducibility, Build/Ship/Run
- Kubernetes
- The Chaos Engineer Toolkit
- Wrap Up
What the heck?!
Motivations
Modern architectures and businesses demand:
- Performance
- Availability
- Fault tolerance
- Velocity of features release
Balance of these four aspects should equally influence technological and architectural choices.
This has been the engine that eventually led to the rise of Microservices oriented architectures: each team
should be able to work and ship independently.
Motivations
Drawback of distributed architectures is complexity
Building Confidence
despite the Unknown
Core principle: Experiment
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
Core principle: Experiment
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
Trigger actual real world events and measure how
your system reacts.
There’s also the human factor to consider. Test and
measure the response from your team.
Core principle: Experiment
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
While with testing you want to catch bugs as far as possible
from production, with Chaos Engineering it’s the opposite.
You want to run your experiments as close as possible to
prod. This is because we are working within the unknown and
evaluating the system as a whole.
Environment and interactions in prod are unique and almost
impossible to replicate reliably.
Core principle: Experiment
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
Once you successfully ran an experiment, automate it. Make it
run as a routine to build up confidence over time.
Core principle: Experiment
1. Identify the Steady State
2. Real World Events
3. Run Experiments in Production
4. Automate your Experiments
5. Identify and Minimize Blast Radius
Chaos Engineering is not about breaking production.
Chaos experiments should take careful, measured risks that
build upon each others to increase confidence.
Start small.
Concentrated
experiments.
Automated
tests
Small scale Large scale
Core principle: Experiment
Chaos Engineering
is not about new tools
A practical Approach - Start Small!
Introduce Chaos in your infrastructure
- Network latency / Packet loss - Network Emulation
- IO latency / Kernel Failure Injection - SystemTap
Reverse Engineering
- What’s the app doing - tracing
- What’s the bottleneck - profiling
Focus on a single host / application using standard (Linux) tools
1. Defines a clear Open API to write and run your chaos engineering experiments
2. Integrates natively with cloud and cloud-native infrastructures
3. Ships with a number of predefined testing scenarios
4. Provides a simple CLI interface
5. It’s easy to automate
Scaling our Chaos Experiments: The Chaos Toolkit
$ chaos discover && chaos init
{
"version": "0.1.0",
"title": "Moving a file from under our feet is forgivable",
"description": "Our application should re-create a file that was removed",
"steady-state-hypothesis": {
"title": "The file must be around first",
"probes": [
{
"type": "python",
"name": "file-must-exist",
"tolerance": true,
"provider": {
"module": "os.path",
"func": "exists",
"arguments": {
"path": "some/file"
}
}
}
]
},
"method": [
{
"type": "action",
"name": "file-be-gone",
"provider": {
"module": "os.path",
"func": "remove",
"arguments": {
"path": "some/file"
}
},
"pauses": {
"after": 5
}
},
{
"ref": "file-must-exist"
}
]
}
{
"version": "0.1.0",
"title": "Moving a file from under our feet is forgivable",
"description": "Our application should re-create a file that was
removed",
{
"version": "0.1.0",
"title": "Moving a file from under our feet is forgivable",
"description": "Our application should re-create a file that was removed",
"steady-state-hypothesis": {
"title": "The file must be around first",
"probes": [
{
"type": "python",
"name": "file-must-exist",
"tolerance": true,
"provider": {
"module": "os.path",
"func": "exists",
"arguments": {
"path": "some/file"
}
}
}
]
},
"method": [
{
"type": "action",
"name": "file-be-gone",
"provider": {
"module": "os.path",
"func": "remove",
"arguments": {
"path": "some/file"
}
},
"pauses": {
"after": 5
}
},
{
"ref": "file-must-exist"
}
]
}
"steady-state-hypothesis": {
"title": "The file must be around first",
"probes": [
{
"type": "python",
"name": "file-must-exist",
"tolerance": true,
"provider": {
"module": "os.path",
"func": "exists",
"arguments": {
"path": "some/file"
}
}
}
]
},
$ chaos discover && chaos init
{
"version": "0.1.0",
"title": "Moving a file from under our feet is forgivable",
"description": "Our application should re-create a file that was removed",
"steady-state-hypothesis": {
"title": "The file must be around first",
"probes": [
{
"type": "python",
"name": "file-must-exist",
"tolerance": true,
"provider": {
"module": "os.path",
"func": "exists",
"arguments": {
"path": "some/file"
}
}
}
]
},
"method": [
{
"type": "action",
"name": "file-be-gone",
"provider": {
"module": "os.path",
"func": "remove",
"arguments": {
"path": "some/file"
}
},
"pauses": {
"after": 5
}
},
{
"ref": "file-must-exist"
}
]
}
"method": [
{
"type": "action",
"name": "file-be-gone",
"provider": {
"module": "os.path",
"func": "remove",
"arguments": {
"path": "some/file"
}
},
$ chaos discover && chaos init
Predictive Failures
The Cloud Native Approach
“Cloud Native is structuring teams, culture and technology to utilize
automation and architectures to manage complexity and
unlock velocity” - Joe Beda
1. Application probing
2. Rolling updates (done right)
3. Workload scheduling and anti-affinity rules
4. Ingresses and LoadBalancers
5. Write monitoring exporters for your KPIs
Learning from Mistakes the Cloud-Native way
k8s probing
Liveness probe
Readiness probe
exec
http
tcp
µservice
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
1. Use multiple replicas when possible (stating the obvious…)
2. Leverage labels and annotations to test canary deployments
3. Set your rollout strategy with maxUnavailable and maxSurge
4. Leverage history and plan your emergency rollbacks
k8s rolling updates
k8s scheduling
Node selectors, pod (containers) affinity and anti-affinity rules let you schedule your
workloads
across your cluster to leverage multi-AZ architectures and take control on where workloads
are executed.
Node 1 Node 2 Node 3
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringEx
ecution:
nodeSelectorTerms:
- matchExpressions:
- key:
kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
k8s ingresses
Pod Pod Pod
Ingress Ingress
Leverage Ingresses and LoadBalancers to implement logical separation between
service/applications and infrastructure edges (access layer) - stating the obvious
Users
k8s monitoring
1. Instrument the application and the chaos probes to have a clear view on your
business metrics (metrics should represent your business KPI and users behaviour,
not just ops availability).
2. When working with distributed architectures, use tracing to correlate actions and
events!
Chaos Engineering naturally fits
Cloud Native infrastructures
Wrap Up
Chaos Engineering is not a new technology / methodology
Chaos Engineering is powerful tool to
- improve the confidence in managing complex distributed infrastructure
- meet business requirements on scalability and agility
Cloud Native Infrastructures are a natural fit for Chaos Engineering
- they natively provide APIs to implement fault tolerant and elastic computing
- reduce the blast radius of our Chaos Experiments minimizing the risks
Experiment - Learn - Improve - Automate
Embrace the Chaos
principlesofchaos.org
Useful links
- The Chaos Toolkit, http://chaostoolkit.org/
- Bloomberg/powerfulseal, https://github.com/bloomberg/powerfulseal
- Kube-monkey, https://github.com/asobti/kube-monkey
Thanks Andrea Tosatto Jacopo Nardiello
@_hilbert_ @jnardiello

Weitere ähnliche Inhalte

Was ist angesagt?

Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker, Inc.
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-KnolxKnoldus Inc.
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Edureka!
 
Docker introduction
Docker introductionDocker introduction
Docker introductionPhuc Nguyen
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to DockerAditya Konarde
 
MicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshMicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshAkash Agrawal
 
Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesGary Silverman
 
CI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day IsraelCI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day IsraelAmazon Web Services
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with KubernetesArun Gupta
 
DevSecOps in the Cloud from the Lens of a Well-Architected Framework.pptx
DevSecOps in the Cloud from the Lens of a  Well-Architected Framework.pptxDevSecOps in the Cloud from the Lens of a  Well-Architected Framework.pptx
DevSecOps in the Cloud from the Lens of a Well-Architected Framework.pptxTurja Narayan Chaudhuri
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon Web Services
 
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...Amazon Web Services
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
 
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...David J Rosenthal
 
Getting Started on Amazon EKS
Getting Started on Amazon EKSGetting Started on Amazon EKS
Getting Started on Amazon EKSMatthew Barlocker
 
Docker introduction
Docker introductionDocker introduction
Docker introductiondotCloud
 

Was ist angesagt? (20)

Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to Docker
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
 
Intro to Amazon ECS
Intro to Amazon ECSIntro to Amazon ECS
Intro to Amazon ECS
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Abc of docker
Abc of dockerAbc of docker
Abc of docker
 
MicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshMicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMesh
 
Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best Practices
 
CI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day IsraelCI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day Israel
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
DevSecOps in the Cloud from the Lens of a Well-Architected Framework.pptx
DevSecOps in the Cloud from the Lens of a  Well-Architected Framework.pptxDevSecOps in the Cloud from the Lens of a  Well-Architected Framework.pptx
DevSecOps in the Cloud from the Lens of a Well-Architected Framework.pptx
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
 
Getting Started on Amazon EKS
Getting Started on Amazon EKSGetting Started on Amazon EKS
Getting Started on Amazon EKS
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 

Ähnlich wie Practical Chaos Engineering

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to KubernetesPaul Czarkowski
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security LLC
 
How to Reverse Engineer Web Applications
How to Reverse Engineer Web ApplicationsHow to Reverse Engineer Web Applications
How to Reverse Engineer Web ApplicationsJarrod Overson
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneGary Wisniewski
 
Containers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingContainers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingDmitry Spodarets
 
JavaScript development methodology
JavaScript development methodologyJavaScript development methodology
JavaScript development methodologyAleksander Fabijan
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Large-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 MinutesLarge-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 MinutesHiroshi SHIBATA
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...Yevgeniy Brikman
 
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxSANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxJasonOstrom1
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Zabbix
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersPaul Balogh
 
StackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationStackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationDmitri Zimine
 
Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapPatrick Chanezon
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaSAppsembler
 

Ähnlich wie Practical Chaos Engineering (20)

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Gallio Crafting A Toolchain
Gallio Crafting A ToolchainGallio Crafting A Toolchain
Gallio Crafting A Toolchain
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠
 
How to Reverse Engineer Web Applications
How to Reverse Engineer Web ApplicationsHow to Reverse Engineer Web Applications
How to Reverse Engineer Web Applications
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with Chaperone
 
Containers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingContainers for Science and High-Performance Computing
Containers for Science and High-Performance Computing
 
JavaScript development methodology
JavaScript development methodologyJavaScript development methodology
JavaScript development methodology
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Large-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 MinutesLarge-scaled Deploy Over 100 Servers in 3 Minutes
Large-scaled Deploy Over 100 Servers in 3 Minutes
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxSANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with Testcontainers
 
StackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationStackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops Automation
 
Beyond static configuration
Beyond static configurationBeyond static configuration
Beyond static configuration
 
Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 Recap
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 

Kürzlich hochgeladen

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Practical Chaos Engineering

  • 1. Practical Chaos Engineering Andrea Tosatto Jacopo Nardiello IDI 2018
  • 2. Andrea Tosatto Jacopo Nardiello $ who @_hilbert_ @jnardiello Founder & DevOps Engineer SIGHUP - sighup.io (PowerDNS) Solution Engineer Open-Xchange - open-xchange.com
  • 3. Agenda - Chaos Engineering, what the heck!? - A Practical Approach - Starting with Simple Tools - Reverse Engineering - Predictive Failures - The Cloud Native Approach - Reproducibility, Build/Ship/Run - Kubernetes - The Chaos Engineer Toolkit - Wrap Up
  • 5. Motivations Modern architectures and businesses demand: - Performance - Availability - Fault tolerance - Velocity of features release Balance of these four aspects should equally influence technological and architectural choices. This has been the engine that eventually led to the rise of Microservices oriented architectures: each team should be able to work and ship independently.
  • 6. Motivations Drawback of distributed architectures is complexity
  • 8. Core principle: Experiment 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius
  • 9. 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius Core principle: Experiment
  • 10. 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius Trigger actual real world events and measure how your system reacts. There’s also the human factor to consider. Test and measure the response from your team. Core principle: Experiment
  • 11. 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius While with testing you want to catch bugs as far as possible from production, with Chaos Engineering it’s the opposite. You want to run your experiments as close as possible to prod. This is because we are working within the unknown and evaluating the system as a whole. Environment and interactions in prod are unique and almost impossible to replicate reliably. Core principle: Experiment
  • 12. 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius Once you successfully ran an experiment, automate it. Make it run as a routine to build up confidence over time. Core principle: Experiment
  • 13. 1. Identify the Steady State 2. Real World Events 3. Run Experiments in Production 4. Automate your Experiments 5. Identify and Minimize Blast Radius Chaos Engineering is not about breaking production. Chaos experiments should take careful, measured risks that build upon each others to increase confidence. Start small. Concentrated experiments. Automated tests Small scale Large scale Core principle: Experiment
  • 14. Chaos Engineering is not about new tools
  • 15. A practical Approach - Start Small! Introduce Chaos in your infrastructure - Network latency / Packet loss - Network Emulation - IO latency / Kernel Failure Injection - SystemTap Reverse Engineering - What’s the app doing - tracing - What’s the bottleneck - profiling Focus on a single host / application using standard (Linux) tools
  • 16. 1. Defines a clear Open API to write and run your chaos engineering experiments 2. Integrates natively with cloud and cloud-native infrastructures 3. Ships with a number of predefined testing scenarios 4. Provides a simple CLI interface 5. It’s easy to automate Scaling our Chaos Experiments: The Chaos Toolkit
  • 17. $ chaos discover && chaos init { "version": "0.1.0", "title": "Moving a file from under our feet is forgivable", "description": "Our application should re-create a file that was removed", "steady-state-hypothesis": { "title": "The file must be around first", "probes": [ { "type": "python", "name": "file-must-exist", "tolerance": true, "provider": { "module": "os.path", "func": "exists", "arguments": { "path": "some/file" } } } ] }, "method": [ { "type": "action", "name": "file-be-gone", "provider": { "module": "os.path", "func": "remove", "arguments": { "path": "some/file" } }, "pauses": { "after": 5 } }, { "ref": "file-must-exist" } ] } { "version": "0.1.0", "title": "Moving a file from under our feet is forgivable", "description": "Our application should re-create a file that was removed",
  • 18. { "version": "0.1.0", "title": "Moving a file from under our feet is forgivable", "description": "Our application should re-create a file that was removed", "steady-state-hypothesis": { "title": "The file must be around first", "probes": [ { "type": "python", "name": "file-must-exist", "tolerance": true, "provider": { "module": "os.path", "func": "exists", "arguments": { "path": "some/file" } } } ] }, "method": [ { "type": "action", "name": "file-be-gone", "provider": { "module": "os.path", "func": "remove", "arguments": { "path": "some/file" } }, "pauses": { "after": 5 } }, { "ref": "file-must-exist" } ] } "steady-state-hypothesis": { "title": "The file must be around first", "probes": [ { "type": "python", "name": "file-must-exist", "tolerance": true, "provider": { "module": "os.path", "func": "exists", "arguments": { "path": "some/file" } } } ] }, $ chaos discover && chaos init
  • 19. { "version": "0.1.0", "title": "Moving a file from under our feet is forgivable", "description": "Our application should re-create a file that was removed", "steady-state-hypothesis": { "title": "The file must be around first", "probes": [ { "type": "python", "name": "file-must-exist", "tolerance": true, "provider": { "module": "os.path", "func": "exists", "arguments": { "path": "some/file" } } } ] }, "method": [ { "type": "action", "name": "file-be-gone", "provider": { "module": "os.path", "func": "remove", "arguments": { "path": "some/file" } }, "pauses": { "after": 5 } }, { "ref": "file-must-exist" } ] } "method": [ { "type": "action", "name": "file-be-gone", "provider": { "module": "os.path", "func": "remove", "arguments": { "path": "some/file" } }, $ chaos discover && chaos init
  • 21. The Cloud Native Approach “Cloud Native is structuring teams, culture and technology to utilize automation and architectures to manage complexity and unlock velocity” - Joe Beda
  • 22. 1. Application probing 2. Rolling updates (done right) 3. Workload scheduling and anti-affinity rules 4. Ingresses and LoadBalancers 5. Write monitoring exporters for your KPIs Learning from Mistakes the Cloud-Native way
  • 23. k8s probing Liveness probe Readiness probe exec http tcp µservice livenessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: X-Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3
  • 24. 1. Use multiple replicas when possible (stating the obvious…) 2. Leverage labels and annotations to test canary deployments 3. Set your rollout strategy with maxUnavailable and maxSurge 4. Leverage history and plan your emergency rollbacks k8s rolling updates
  • 25. k8s scheduling Node selectors, pod (containers) affinity and anti-affinity rules let you schedule your workloads across your cluster to leverage multi-AZ architectures and take control on where workloads are executed. Node 1 Node 2 Node 3 apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringEx ecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2
  • 26. k8s ingresses Pod Pod Pod Ingress Ingress Leverage Ingresses and LoadBalancers to implement logical separation between service/applications and infrastructure edges (access layer) - stating the obvious Users
  • 27. k8s monitoring 1. Instrument the application and the chaos probes to have a clear view on your business metrics (metrics should represent your business KPI and users behaviour, not just ops availability). 2. When working with distributed architectures, use tracing to correlate actions and events!
  • 28. Chaos Engineering naturally fits Cloud Native infrastructures
  • 29. Wrap Up Chaos Engineering is not a new technology / methodology Chaos Engineering is powerful tool to - improve the confidence in managing complex distributed infrastructure - meet business requirements on scalability and agility Cloud Native Infrastructures are a natural fit for Chaos Engineering - they natively provide APIs to implement fault tolerant and elastic computing - reduce the blast radius of our Chaos Experiments minimizing the risks Experiment - Learn - Improve - Automate
  • 31. Useful links - The Chaos Toolkit, http://chaostoolkit.org/ - Bloomberg/powerfulseal, https://github.com/bloomberg/powerfulseal - Kube-monkey, https://github.com/asobti/kube-monkey
  • 32. Thanks Andrea Tosatto Jacopo Nardiello @_hilbert_ @jnardiello