SlideShare a Scribd company logo
1 of 40
Chaos Engineering Organizations that are ignoring Chaos
Engineering are leaving money on the
table.
«Failures are given, and
everything will eventually fail
over time»
(Werner Vogels – CTO Amazon)
Why?
1. Growth of microservices and distributed cloud
architectures
2. The web has grown increasingly complex
3. We all depend on these system more than ever
4. Failures have become much harder to predict
5. These failures cause strongly outages for companies
From On-Premises ...
1. Before the Cloud, users were connected to our application through the Company’s local network
2. A server’s downtime was planned and involved stopping production
3. Monolithic
... To Cloud
1. Now our users are connected through the Internet
2. The workload to which our services are subjected will increase significantly,
thanks to the greater spread of the applications themselves
3. Many Microservices replace one Monolithic
Microservices: is it really a matter of sizes?
Common Characteristics
Componentisation via services
Organised around business capabilities
Decentralised data management
Products not projects
Decentralised governance
Smart endpoints and dumb pipes
Evolutionary design
Infrastructure automation
Designed for failure
We cannot say there is a formal definition of the microservices
architectural style, but we can attempt to describe what we see
as common characteristics for architectures that fit the label.
(Martin Fowler, James Lewis)
Or is it a matter of paradigms?
Or is it a matter of paradigms?
Change Mindset
Building a reliable application in the cloud is different
than building a reliable application in an enterprise setting
Reactive Manifesto
1. Jones Boner, Dave Farley, Roland Kuhn, Martin Thompson – 16.01.2014
2. The absolute, most important thing is it to be responsive.
This means that a reactive system needs to remain responsive even when a failure occurs.
• https://www.reactivemanifesto.org/it
Resilient
System
• Networks
• Servers
• Applications
• Processes
• People
Resilience is the ability of a system to adapt to changes, failures & disturbances
Resilience is a function of People & Culture
Failures are given
Availability Downtime per year
95% (1-nine) 18 days 6 hours
99% (2-nines) 3 days 15 hours
99.9% (3-nines) 8 hours 45 minutes
99.99% (4-nines) 52 minutes
99.999% (5-nines) 5 minutes
99.9999% (6-nines) 31 seconds
The beauty of Math at work
Component Availability Downtime
X 99% (2-nines) 3 days 15 hours
Y 99.99% (4-nines) 52 minutes
X and Y Combined 98.99% 3 days 16 hours 33 minutes
Component Availability Downtime
X 99% (2-nines) 3 days 15 hours
Two X in parallel 99.99% (4-nines) 52 minutes
Three X in parallel 99.9999% (6-nines) 31 seconds
Chaos Engineering
Chaos Engineering is the discipline of experimenting on a system in
order to build confidence in the system’s capability to whitstand turbolent
conditions in production.
https://principlesofchaos.org
• Instead of trying to avoid failure, chaos engineering
embraces it
• Provide evidence of system weaknesses through scientific
chaos engineering experiments
• Which kind of weaknesses? Dark Debt
History
1. 1564-1642: Galileo Galilei introduces the experimental scientific method
2. 1879-1955: No amount of experiments will prove me right; a single experiment will prove me wrong (A. Einstein)
3. 2000: Game Day by Jesse Robbins, the Master of Disaster
4. 2010: Chaos Monkey by Netflix. Why? To support move from physical infrastructure to cloud infrastructure
5. 2011: Simian Army. We have to design a cloud architecture where individual components can fail without affecting the
availability of the entire system
6. 2012: Neftlix shared Chaos Monkey on Github
7. 2014: A new role. Chaos Engineer
Once upon a time in Seattle
«You don’t choose the moment, the moment chooses you»
«You only choose how prepared you are when it does»
Jesse Robbins, the Master of Disaster at Amazon
Chaos Experiment vs Testing
Testing
• Several set of inputs and predicted outputs
• Limited scopes
• Is a programming practice that instructs developers
• Testing, strictly speaking, does not create new
knowledge
Chaos Experiment
• Discover weakness through experiments
• Limited scopes
• Experimentation creates new knowledge
Game Day
• An exercise designed to increase Resilience through large-scale fault
injection across critical systems.
• The goal of a Game Day is to practice how you, your team, and your
supporting system deal with real-world turbolent conditions.
• Creating Resiliency through destruction
Sociotechnical System
Before starting your journey into chaos engineering, make sure you’ve done your homework and have built resiliency into every
level of your organization. Building resilient systems isn’t all about software. It starts at the infrastructure layer, progress to
the network and data, influences application design and extends to people and culture.
Adrian Hornsby
Notifications and Approvals
Name Role Approved?
Bob Jennifer Owner (CEO) Yes
• Remember the Conway’s Law
Table of notifications and approvals
Dark Debt
• Dark Debt is not recognizable at the time of creation.
• Dark Debt arises from the unforeseen interactions of hardware or software
with other parts of the framework.
• Dark Debt is invisible until an anomaly reveals its presence.
• Platform
• Applications
• People, practices, and processes
The Phases of Chaos Engineering
Chaos engineering is NOT about letting monkeys loose or allowing them to break things randomly without a purpose.
Chaos engineering is about breaking things in a controlled environment.
Start with Experiments
• Get your team together and come up with a picture of your system (including people, practices, processes)
• Make the right questions:
 Where would it be most valuable to create an experiment that helps us build trust and confidence in our system
under turbolent conditions?
 What could possibly go wrong?
• Chaos Engineering doesn’t guarantee you have the perfect system
• Chaos Engineering never ends
• Likelihood and Impact
Checkmate in three moves
Preparation
• Identification and mitigation of risks and impact from failure
• Reduces frequency of failures (MTBF)
• Reduces duration of recovery (MTTR)
Participation
• Builds confidence & competence responding to failure and under stress
• Strengthens individual and cultural ability to anticipate, mitigate, respond to, and recovery from
failures of all types
Exercises
• Trigger and expose «latent defects»
• Choose discover them, instead of letting that be determined by the next real disaster.
Likelihood-Impact Map
• The likelihood that a failure may occur
• The potential impact your system will
experience if it does
API products becomes
unavailable
Contribution
Availability
Describe Your Experiment
• A steady-state hypothesis: A set of measurements that indicates that the system is working in an expected way
from a business perspective, and within a given set of tolerances
• A method: The set of activities you’re going to use to inject the turbolent conditions into the target system
• Rollbacks: A set of remediating actions through which you will attempt to repair what you have done
knowingly in your experiment’s method
Explore Discover Analyze
Validate Improve
Demo
Explore Discover Analyze
Validate Improve
1. Using a chaos experiment to explore and discover weaknesses in the target system
2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system
3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed)
4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
Demo
Explore Discover Analyze
Validate Improve
1. Using a chaos experiment to explore and discover weaknesses in the target system
2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system
3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed)
4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
Demo
Explore Discover Analyze
Validate Improve
1. Using a chaos experiment to explore and discover weaknesses in the target system
2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system
3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed)
4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
Demo
Explore Discover Analyze
Validate Improve
1. Using a chaos experiment to explore and discover weaknesses in the target system
2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system
3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed)
4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been
overcome.
Under the skin of chaos run
Start
Experiment valid?
Steady-state hypothesis
Execute method
Steady-state hypothesis
No deviations found Deviations found Experiment aborted
No
Not within tolerances
Not within tolerances
Within tolerances
Within tolerances
Yes
Steady-state hypothesis
Model that characterizes the steady-state of the system based on expected values of
the business metrics.
Chaos Engineering
Canary Deployment
Start small and slowly build confidence within your team and your organization.
- How many customers are
affected?
- What functionality is
impaired?
- Which locations are
imapcted?
Benefits of Chaos Engineering
- First, chaos engineering help you uncover the unknowns
in your system and fix them before they happen in
production at 3am during the weekend — so, first,
improved resiliency and sleep.
- Second, a successful chaos engineering practice always
generates a lot more changes than anticipated, and these
are mostly cultural. Probably the most important of these
changes is a natural evolution towards a “non-blaming”
culture: the “Why did you do that?” turns into a “How
can we avoid doing that in the future?” — resulting in
happier and more efficient, empowered, engaged and
successful teams. And that’s gold!
Books and Resources
Principles of Chaos Engineering
https://github.com/chaostoolkit
Thank you
@aacerbis
Linkedin
alberto.acerbis@4solid.it

More Related Content

What's hot

Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with KubernetesArun Gupta
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Ana Medina
 
Chaos engineering and chaos testing
Chaos engineering and chaos testingChaos engineering and chaos testing
Chaos engineering and chaos testingjeetendra mandal
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSBilal Aybar
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-KnolxKnoldus Inc.
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgNils Meder
 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformAnshul Patel
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureAna Medina
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos EngineeringSIGHUP
 
Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
 
Chaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ MilesChaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ MilesRussell Miles
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryKarun Chennuri
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoAmazon Web Services
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSAmazon Web Services
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SREAraf Karsh Hamid
 
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)Amazon Web Services
 
Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering KubernetesAlex Soto
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...Amazon Web Services
 

What's hot (20)

Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
 
Chaos engineering and chaos testing
Chaos engineering and chaos testingChaos engineering and chaos testing
Chaos engineering and chaos testing
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin Platform
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017
 
Chaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ MilesChaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ Miles
 
Chaos Engineering on Cloud Foundry
Chaos Engineering on Cloud FoundryChaos Engineering on Cloud Foundry
Chaos Engineering on Cloud Foundry
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San Francisco
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWS
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
 
IaC on AWS Cloud
IaC on AWS CloudIaC on AWS Cloud
IaC on AWS Cloud
 
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)
AWS re:Invent 2016: Identifying Your Migration Options: the 6 Rs (ENT311)
 
Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering Kubernetes
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...
 

Similar to Chaos engineering

#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...Agile Testing Alliance
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Dinis Cruz
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadKevin Crawley
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guidematthewbrahms
 
ChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos TestingChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos TestingAaron Rinehart
 
Using security to drive chaos engineering
Using security to drive chaos engineeringUsing security to drive chaos engineering
Using security to drive chaos engineeringDinis Cruz
 
Green Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleJason Chan
 
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...Applitools
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014Brad Power
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...agilemaine
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
OmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelOmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelSergio Freire
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroPaul Boos
 
Test Environment Management
Test Environment ManagementTest Environment Management
Test Environment ManagementKanoah
 
Stress testing of powered by fiware application: the Digital Enabler
Stress testing of powered by fiware application: the Digital EnablerStress testing of powered by fiware application: the Digital Enabler
Stress testing of powered by fiware application: the Digital EnablerAntonino Sirchia
 
5-Ways-to-Revolutionize-Your-Software-Testing
5-Ways-to-Revolutionize-Your-Software-Testing5-Ways-to-Revolutionize-Your-Software-Testing
5-Ways-to-Revolutionize-Your-Software-TestingMary Clemons
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous deliveryMasas Dani
 

Similar to Chaos engineering (20)

#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is Dead
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
ChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos TestingChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos Testing
 
Using security to drive chaos engineering
Using security to drive chaos engineeringUsing security to drive chaos engineering
Using security to drive chaos engineering
 
Green Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos Engineering
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and Scale
 
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...
Testing Hourglass at Jira Frontend - by Alexey Shpakov, Sr. Developer @ Atlas...
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
 
Design For Testability
Design For TestabilityDesign For Testability
Design For Testability
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
OmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelOmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next Level
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
Test Environment Management
Test Environment ManagementTest Environment Management
Test Environment Management
 
Stress testing of powered by fiware application: the Digital Enabler
Stress testing of powered by fiware application: the Digital EnablerStress testing of powered by fiware application: the Digital Enabler
Stress testing of powered by fiware application: the Digital Enabler
 
5-Ways-to-Revolutionize-Your-Software-Testing
5-Ways-to-Revolutionize-Your-Software-Testing5-Ways-to-Revolutionize-Your-Software-Testing
5-Ways-to-Revolutionize-Your-Software-Testing
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous delivery
 

Recently uploaded

Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolinonuriaiuzzolino1
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查ydyuyu
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxgalaxypingy
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 

Recently uploaded (20)

Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 

Chaos engineering

  • 1. Chaos Engineering Organizations that are ignoring Chaos Engineering are leaving money on the table.
  • 2. «Failures are given, and everything will eventually fail over time» (Werner Vogels – CTO Amazon)
  • 3. Why? 1. Growth of microservices and distributed cloud architectures 2. The web has grown increasingly complex 3. We all depend on these system more than ever 4. Failures have become much harder to predict 5. These failures cause strongly outages for companies
  • 4. From On-Premises ... 1. Before the Cloud, users were connected to our application through the Company’s local network 2. A server’s downtime was planned and involved stopping production 3. Monolithic
  • 5. ... To Cloud 1. Now our users are connected through the Internet 2. The workload to which our services are subjected will increase significantly, thanks to the greater spread of the applications themselves 3. Many Microservices replace one Monolithic
  • 6. Microservices: is it really a matter of sizes? Common Characteristics Componentisation via services Organised around business capabilities Decentralised data management Products not projects Decentralised governance Smart endpoints and dumb pipes Evolutionary design Infrastructure automation Designed for failure We cannot say there is a formal definition of the microservices architectural style, but we can attempt to describe what we see as common characteristics for architectures that fit the label. (Martin Fowler, James Lewis)
  • 7. Or is it a matter of paradigms?
  • 8. Or is it a matter of paradigms?
  • 9.
  • 10. Change Mindset Building a reliable application in the cloud is different than building a reliable application in an enterprise setting
  • 11. Reactive Manifesto 1. Jones Boner, Dave Farley, Roland Kuhn, Martin Thompson – 16.01.2014 2. The absolute, most important thing is it to be responsive. This means that a reactive system needs to remain responsive even when a failure occurs. • https://www.reactivemanifesto.org/it
  • 12. Resilient System • Networks • Servers • Applications • Processes • People Resilience is the ability of a system to adapt to changes, failures & disturbances Resilience is a function of People & Culture
  • 13. Failures are given Availability Downtime per year 95% (1-nine) 18 days 6 hours 99% (2-nines) 3 days 15 hours 99.9% (3-nines) 8 hours 45 minutes 99.99% (4-nines) 52 minutes 99.999% (5-nines) 5 minutes 99.9999% (6-nines) 31 seconds
  • 14. The beauty of Math at work Component Availability Downtime X 99% (2-nines) 3 days 15 hours Y 99.99% (4-nines) 52 minutes X and Y Combined 98.99% 3 days 16 hours 33 minutes Component Availability Downtime X 99% (2-nines) 3 days 15 hours Two X in parallel 99.99% (4-nines) 52 minutes Three X in parallel 99.9999% (6-nines) 31 seconds
  • 15. Chaos Engineering Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to whitstand turbolent conditions in production. https://principlesofchaos.org • Instead of trying to avoid failure, chaos engineering embraces it • Provide evidence of system weaknesses through scientific chaos engineering experiments • Which kind of weaknesses? Dark Debt
  • 16.
  • 17. History 1. 1564-1642: Galileo Galilei introduces the experimental scientific method 2. 1879-1955: No amount of experiments will prove me right; a single experiment will prove me wrong (A. Einstein) 3. 2000: Game Day by Jesse Robbins, the Master of Disaster 4. 2010: Chaos Monkey by Netflix. Why? To support move from physical infrastructure to cloud infrastructure 5. 2011: Simian Army. We have to design a cloud architecture where individual components can fail without affecting the availability of the entire system 6. 2012: Neftlix shared Chaos Monkey on Github 7. 2014: A new role. Chaos Engineer
  • 18. Once upon a time in Seattle «You don’t choose the moment, the moment chooses you» «You only choose how prepared you are when it does» Jesse Robbins, the Master of Disaster at Amazon
  • 19. Chaos Experiment vs Testing Testing • Several set of inputs and predicted outputs • Limited scopes • Is a programming practice that instructs developers • Testing, strictly speaking, does not create new knowledge Chaos Experiment • Discover weakness through experiments • Limited scopes • Experimentation creates new knowledge
  • 20. Game Day • An exercise designed to increase Resilience through large-scale fault injection across critical systems. • The goal of a Game Day is to practice how you, your team, and your supporting system deal with real-world turbolent conditions. • Creating Resiliency through destruction
  • 21. Sociotechnical System Before starting your journey into chaos engineering, make sure you’ve done your homework and have built resiliency into every level of your organization. Building resilient systems isn’t all about software. It starts at the infrastructure layer, progress to the network and data, influences application design and extends to people and culture. Adrian Hornsby
  • 22. Notifications and Approvals Name Role Approved? Bob Jennifer Owner (CEO) Yes • Remember the Conway’s Law Table of notifications and approvals
  • 23.
  • 24. Dark Debt • Dark Debt is not recognizable at the time of creation. • Dark Debt arises from the unforeseen interactions of hardware or software with other parts of the framework. • Dark Debt is invisible until an anomaly reveals its presence. • Platform • Applications • People, practices, and processes
  • 25. The Phases of Chaos Engineering Chaos engineering is NOT about letting monkeys loose or allowing them to break things randomly without a purpose. Chaos engineering is about breaking things in a controlled environment.
  • 26. Start with Experiments • Get your team together and come up with a picture of your system (including people, practices, processes) • Make the right questions:  Where would it be most valuable to create an experiment that helps us build trust and confidence in our system under turbolent conditions?  What could possibly go wrong? • Chaos Engineering doesn’t guarantee you have the perfect system • Chaos Engineering never ends • Likelihood and Impact
  • 27. Checkmate in three moves Preparation • Identification and mitigation of risks and impact from failure • Reduces frequency of failures (MTBF) • Reduces duration of recovery (MTTR) Participation • Builds confidence & competence responding to failure and under stress • Strengthens individual and cultural ability to anticipate, mitigate, respond to, and recovery from failures of all types Exercises • Trigger and expose «latent defects» • Choose discover them, instead of letting that be determined by the next real disaster.
  • 28. Likelihood-Impact Map • The likelihood that a failure may occur • The potential impact your system will experience if it does API products becomes unavailable Contribution Availability
  • 29. Describe Your Experiment • A steady-state hypothesis: A set of measurements that indicates that the system is working in an expected way from a business perspective, and within a given set of tolerances • A method: The set of activities you’re going to use to inject the turbolent conditions into the target system • Rollbacks: A set of remediating actions through which you will attempt to repair what you have done knowingly in your experiment’s method Explore Discover Analyze Validate Improve
  • 30.
  • 31. Demo Explore Discover Analyze Validate Improve 1. Using a chaos experiment to explore and discover weaknesses in the target system 2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system 3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed) 4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
  • 32. Demo Explore Discover Analyze Validate Improve 1. Using a chaos experiment to explore and discover weaknesses in the target system 2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system 3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed) 4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
  • 33. Demo Explore Discover Analyze Validate Improve 1. Using a chaos experiment to explore and discover weaknesses in the target system 2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system 3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed) 4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
  • 34. Demo Explore Discover Analyze Validate Improve 1. Using a chaos experiment to explore and discover weaknesses in the target system 2. Using a chaos experiment to discover and begin to analyze any weaknesses surfaced in the system 3. One the challenge of analysis is done, it’s time to apply an improvement to the system (if needed) 4. Your chaos experiment becomes a chaos test to detect whether the weakness has indeed been overcome.
  • 35. Under the skin of chaos run Start Experiment valid? Steady-state hypothesis Execute method Steady-state hypothesis No deviations found Deviations found Experiment aborted No Not within tolerances Not within tolerances Within tolerances Within tolerances Yes
  • 36. Steady-state hypothesis Model that characterizes the steady-state of the system based on expected values of the business metrics. Chaos Engineering
  • 37. Canary Deployment Start small and slowly build confidence within your team and your organization. - How many customers are affected? - What functionality is impaired? - Which locations are imapcted?
  • 38. Benefits of Chaos Engineering - First, chaos engineering help you uncover the unknowns in your system and fix them before they happen in production at 3am during the weekend — so, first, improved resiliency and sleep. - Second, a successful chaos engineering practice always generates a lot more changes than anticipated, and these are mostly cultural. Probably the most important of these changes is a natural evolution towards a “non-blaming” culture: the “Why did you do that?” turns into a “How can we avoid doing that in the future?” — resulting in happier and more efficient, empowered, engaged and successful teams. And that’s gold!
  • 39. Books and Resources Principles of Chaos Engineering https://github.com/chaostoolkit