SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Antifragility and testing
distributed systems
Approaches for testing and improving resiliency
Failure
It’s inevitable
Microservice Architectures
■ Bounded contexts
■ Deterministic in nature
■ Simple behaviour
■ Independently testable (e.g. Pact)
Distributed Architectures
Conversely…
■ Unbounded context
■ Non-determinism
■ Exhibit chaotic behaviour
■ Emergent behaviour
■ Complex testing
Problems with traditional approaches
■ Integration test hell
■ Need to get by without E2E environments
■ Learnings are non-representative anyway
■ Slower
■ Costly (effort + $$)
Alternative?
Create an isolated, simulated environment
■ Run locally or on a CI environment
■ Fast - no need to setup complex test data, scenarios etc.
■ Enables single-variable hypothesis testing
■ Automatable
Lab Testing w Docker Compose
Hypothesis testing simulated environments
Docker Compose
■ Docker container orchestration tool
■ Run locally or remotely
■ Works across platforms (Windows, Mac, *nix)
■ Easy to use
Nginx
Let’s take a practical, real-world example: Nginx as an API Proxy.
Simulating failure with Muxy
“A tool to help simulate distributed systems failures”
Hypothesis testing
Our job is to hypothesise, test, learn, change, and repeat
Nginx Testing
H0 = Introducing network latency does not cause errors
Test setup:
● Nginx running locally, with Production configuration
● DNSMasq used to resolve production urls to other Docker
containers
● Muxy container setup, proxying the API
● A test harness to hit the API via Nginx n times, expecting 0
failures
Demo
Fingers crossed...
Knobs and Levers
We can now have a number of levers to pull. What if we...
● Want to improve on our SLA?
● Want to see how it performs if the API is hard down?
● ...
Antifragility
Failure is inevitable, let’s make it normal
Titanic Architectures
Architectures
Titanic Architectures
“Titanic architectures are architectures that are good in theory, but
haven’t been put into practice”
Anti-titanic architectures?
“What doesn’t kill you makes you stronger”
Antifragility
“The resilient resists shocks and stays the same; the antifragile gets
better” - Nasim Taleb
Chaos Engineering
● We expect our teams to build resilient applications
○ Fault tolerance across and within service boundaries
● We expect servers and dependent services to fail
● Let’s make that normal
● Production is a playground
● Levelling up
Chaos Engineering - Principles
1. Build a hypothesis around Steady State Behavior
2. Vary real-world events
3. Run experiments in production
4. Automate experiments to run continuously
Requires the ability to measure - you need metrics!!
http://www.principlesofchaos.org/
Production Hypothesis Testing
H0 = Loss of an AWS region does not result in errors
Test setup:
● Multi-region application setup for the video playing API
● Apply Chaos Kong to us-west-2
● Measure aggregate production traffic for ‘normal’ levels
Kill an AWS region
http://techblog.netflix.com/2015/09/chaos-engineering-upgraded.html
Go/Hystrix API Demo
H0 = Introducing network latency does not cause API errors
Test setup:
● API1 running with Hystrix circuit breaker enabled if API2 does
not respond within SLAs
● Muxy container setup, proxying upstream API2
● A test harness to hit API1 n times, expecting 0 failures
Human Factors
Technology is only part of the problem, can we test
that too?
Chernobyl
● Worst nuclear disaster of all time (1986)
● Public information sketchy
● Estimated > 3M Ukrainians affected
● Radioactive clouds sent over Europe
● Combination of system + human errors
● Series of seemingly logical steps -> catastrophe
What we know about human factors
● Accidents happen
● 1am - 8am = higher incidence of human errors
● Humans will ignore directions
○ They sometimes need to (e.g. override)
○ Other times they think they need to (mistake)
● Computers are better at following processes
Let’s use a Production deployment as a key example:
● CI -> CD pipeline used to deploy
● Production incident occurs 6 hours later (2am)
● ...what do we do?
● We trust the build pipeline, avoid non-standard
actions
These events help us understand and improve our
systems
Translation
“ A game day exercise is where we intentionally try to
break our system, with the goal of being able to
understand it better and learn from it ”
Game Day Exercises
Prerequisites:
● A game plan
● All team members and affected staff aware of it
● Close collaboration between Dev, Ops, Test,
Product people etc.
● An open mind
● Hypotheses
● Metrics
● Bravery
Game Day Exercises
● Get entire team together
● Make a simple diagram of system on a whiteboard
● Come up with ~5 failure scenarios
● Write down hypotheses for each scenario
● Backup any data you can’t lose
● Induce each failure and observe the results
Game Day Exercises
https://stripe.com/blog/game-day-exercises-at-stripe
Examples of things that fail:
● Application dies
● Hard disk fail
● Machine dies < AZ < Region…
● Github/Source control goes down
● Build server dies
● Loss of  degraded network connectivity
● Loss of dependent API
● ...
Game Day Exercises
Wrapping up
I hope I didn’t fail
■ Apply the scientific method
■ Use metrics to make learn and make decisions
■ Docker-compose + Muxy to automate failure
■ Build resilience into software & architecture
■ Regularly Production resilience until it’s normal
■ Production outages are opportunities to learn
■ Start small!
Wrapping up
Thank you
PRESENTED BY:
@matthewfellows
■ Antifragility (https://en.wikipedia.org/wiki/Antifragile)
■ Chaos Engineering
(http://techblog.netflix.com/2014/09/introducing-chaos-
engineering.html)
■ Principles of Chaos (http://www.principlesofchaos.org/)
■ Human factors in large-scale technological systems'
accidents: Three Mile Island, Bhopal, Chernobyl
(http://oae.sagepub.com/content/5/2/133.abstract)
References
■ Docker Compose (https://www.docker.com/docker-compose)
■ Muxy (https://github.com/mefellows/muxy)
■ Nginx resilience testing with Docker Compose
(www.onegeek.com.au/articles/resilience-testing-nginx-with-
docker-dnsmasq-and-muxy)
■ Golang + Hystrix resilience testing with Docker Compose
(https://github.com/mefellows/muxy/tree/mst-meetup-
demo/examples/hystrix)
Code  Tool References

Weitere ähnliche Inhalte

Was ist angesagt?

About performance testing with NanoCloud
About performance testing with NanoCloudAbout performance testing with NanoCloud
About performance testing with NanoCloud
artem_panasyuk
 

Was ist angesagt? (20)

Smarter deployments with octopus deploy
Smarter deployments with octopus deploySmarter deployments with octopus deploy
Smarter deployments with octopus deploy
 
TestWorks Conf Performance testing made easy with gatling - Guillaume Corré
TestWorks Conf Performance testing made easy with gatling - Guillaume CorréTestWorks Conf Performance testing made easy with gatling - Guillaume Corré
TestWorks Conf Performance testing made easy with gatling - Guillaume Corré
 
Introduction to Automated Testing
Introduction to Automated TestingIntroduction to Automated Testing
Introduction to Automated Testing
 
Five Easy Ways to QA Your Drupal Site
Five Easy Ways to QA Your Drupal SiteFive Easy Ways to QA Your Drupal Site
Five Easy Ways to QA Your Drupal Site
 
Agile testing for large projects
Agile testing for large projectsAgile testing for large projects
Agile testing for large projects
 
Hadoop Summit 2013 : Continuous Integration on top of hadoop
Hadoop Summit 2013 : Continuous Integration on top of hadoopHadoop Summit 2013 : Continuous Integration on top of hadoop
Hadoop Summit 2013 : Continuous Integration on top of hadoop
 
Php Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The CookbookPhp Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The Cookbook
 
Introduction to K6
Introduction to K6Introduction to K6
Introduction to K6
 
Getting started with Octopus Deploy
Getting started with Octopus DeployGetting started with Octopus Deploy
Getting started with Octopus Deploy
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional World
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Continuous delivery - tools and techniques
Continuous delivery - tools and techniquesContinuous delivery - tools and techniques
Continuous delivery - tools and techniques
 
Oscp - Journey
Oscp - JourneyOscp - Journey
Oscp - Journey
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Octopus Deploy Tech Fest 2014
Octopus Deploy Tech Fest 2014Octopus Deploy Tech Fest 2014
Octopus Deploy Tech Fest 2014
 
Blazing Fast Feedback Loops in the Java Universe
Blazing Fast Feedback Loops in the Java UniverseBlazing Fast Feedback Loops in the Java Universe
Blazing Fast Feedback Loops in the Java Universe
 
Continuous Integration as a Way of Life
Continuous Integration as a Way of LifeContinuous Integration as a Way of Life
Continuous Integration as a Way of Life
 
Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!
 
Release Often Release Safely
Release Often Release SafelyRelease Often Release Safely
Release Often Release Safely
 
About performance testing with NanoCloud
About performance testing with NanoCloudAbout performance testing with NanoCloud
About performance testing with NanoCloud
 

Andere mochten auch

Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...
nimak
 

Andere mochten auch (8)

The case for consumer-driven contracts
The case for consumer-driven contractsThe case for consumer-driven contracts
The case for consumer-driven contracts
 
Deploy with Confidence using Pact Go!
Deploy with Confidence using Pact Go!Deploy with Confidence using Pact Go!
Deploy with Confidence using Pact Go!
 
Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
 
Continous Integration: A Case Study
Continous Integration: A Case StudyContinous Integration: A Case Study
Continous Integration: A Case Study
 
Distributed Testing Environment
Distributed Testing EnvironmentDistributed Testing Environment
Distributed Testing Environment
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 
Microservices = Death of the Enterprise Service Bus (ESB)?
Microservices = Death of the Enterprise Service Bus (ESB)?Microservices = Death of the Enterprise Service Bus (ESB)?
Microservices = Death of the Enterprise Service Bus (ESB)?
 

Ähnlich wie Antifragility and testing for distributed systems failure

RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
dcieslak
 
WSO2Con Asia 2014 - Agile DevOps in the Cloud
WSO2Con Asia 2014 - Agile DevOps in the CloudWSO2Con Asia 2014 - Agile DevOps in the Cloud
WSO2Con Asia 2014 - Agile DevOps in the Cloud
WSO2
 

Ähnlich wie Antifragility and testing for distributed systems failure (20)

Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Expedia 3x3 presentation
Expedia 3x3 presentationExpedia 3x3 presentation
Expedia 3x3 presentation
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
Continuous integration (eng)
Continuous integration (eng)Continuous integration (eng)
Continuous integration (eng)
 
MockServer-driven testing
MockServer-driven testingMockServer-driven testing
MockServer-driven testing
 
Writing Tests with the Unity Test Framework
Writing Tests with the Unity Test FrameworkWriting Tests with the Unity Test Framework
Writing Tests with the Unity Test Framework
 
Remote iOS Devices Server – Scaling iOS
Remote iOS Devices Server – Scaling iOSRemote iOS Devices Server – Scaling iOS
Remote iOS Devices Server – Scaling iOS
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
 
Hacking Vulnerable Websites to Bypass Firewalls
Hacking Vulnerable Websites to Bypass FirewallsHacking Vulnerable Websites to Bypass Firewalls
Hacking Vulnerable Websites to Bypass Firewalls
 
ContainerCon - Test Driven Infrastructure
ContainerCon - Test Driven InfrastructureContainerCon - Test Driven Infrastructure
ContainerCon - Test Driven Infrastructure
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesRise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
 
Deploying software at Scale
Deploying software at ScaleDeploying software at Scale
Deploying software at Scale
 
Jenkinsconf Presentation - Advance jenkins management with multiple projects.
Jenkinsconf Presentation - Advance jenkins management with multiple projects.Jenkinsconf Presentation - Advance jenkins management with multiple projects.
Jenkinsconf Presentation - Advance jenkins management with multiple projects.
 
CI
CICI
CI
 
Agile devops in the cloud
Agile devops in the cloudAgile devops in the cloud
Agile devops in the cloud
 
WSO2Con Asia 2014 - Agile DevOps in the Cloud
WSO2Con Asia 2014 - Agile DevOps in the CloudWSO2Con Asia 2014 - Agile DevOps in the Cloud
WSO2Con Asia 2014 - Agile DevOps in the Cloud
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 

Mehr von DiUS

Mehr von DiUS (19)

Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
 
How to build confidence in your release cycle
How to build confidence in your release cycleHow to build confidence in your release cycle
How to build confidence in your release cycle
 
Serverless microservices: Test smarter, not harder
Serverless microservices: Test smarter, not harderServerless microservices: Test smarter, not harder
Serverless microservices: Test smarter, not harder
 
Test Smart, not hard
Test Smart, not hardTest Smart, not hard
Test Smart, not hard
 
10 things-to-inspire-in-10-mins
10 things-to-inspire-in-10-mins10 things-to-inspire-in-10-mins
10 things-to-inspire-in-10-mins
 
Trends and development practices in Serverless architectures
Trends and development practices in Serverless architecturesTrends and development practices in Serverless architectures
Trends and development practices in Serverless architectures
 
Deploying large-scale, serverless and asynchronous systems - without integrat...
Deploying large-scale, serverless and asynchronous systems - without integrat...Deploying large-scale, serverless and asynchronous systems - without integrat...
Deploying large-scale, serverless and asynchronous systems - without integrat...
 
The Diversity Dilemma - Supporting our Sisters in STEM
The Diversity Dilemma - Supporting our Sisters in STEMThe Diversity Dilemma - Supporting our Sisters in STEM
The Diversity Dilemma - Supporting our Sisters in STEM
 
GameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos EngineeringGameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos Engineering
 
Crafting Quality Software
Crafting Quality SoftwareCrafting Quality Software
Crafting Quality Software
 
Metrics on the front, data in the back
Metrics on the front, data in the backMetrics on the front, data in the back
Metrics on the front, data in the back
 
DIY IoT Backend
DIY IoT BackendDIY IoT Backend
DIY IoT Backend
 
How to Build Hardware Lean
How to Build Hardware LeanHow to Build Hardware Lean
How to Build Hardware Lean
 
Behaviour Change and Coaching: What we can learn from BJ Fogg
Behaviour Change and Coaching: What we can learn from BJ FoggBehaviour Change and Coaching: What we can learn from BJ Fogg
Behaviour Change and Coaching: What we can learn from BJ Fogg
 
Power in Agile Teams
Power in Agile Teams Power in Agile Teams
Power in Agile Teams
 
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
The Diversity Dilemma: Attracting and Retaining Talented Women in Technology-...
 
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
AWS Summit Melbourne 2014 | The Path to Business Agility for Vodafone: How Am...
 
Agile Australia 2014 | A light saber for your disruptive tool belt: the Busin...
Agile Australia 2014 | A light saber for your disruptive tool belt: the Busin...Agile Australia 2014 | A light saber for your disruptive tool belt: the Busin...
Agile Australia 2014 | A light saber for your disruptive tool belt: the Busin...
 
Agile Australia 2014 | UX: How to measure more than a gut feel by Amir Ansari
Agile Australia 2014 | UX: How to measure more than a gut feel by Amir AnsariAgile Australia 2014 | UX: How to measure more than a gut feel by Amir Ansari
Agile Australia 2014 | UX: How to measure more than a gut feel by Amir Ansari
 

Kürzlich hochgeladen

₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
Diya Sharma
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 

Kürzlich hochgeladen (20)

Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
 

Antifragility and testing for distributed systems failure

  • 1. Antifragility and testing distributed systems Approaches for testing and improving resiliency
  • 3. Microservice Architectures ■ Bounded contexts ■ Deterministic in nature ■ Simple behaviour ■ Independently testable (e.g. Pact)
  • 4.
  • 5. Distributed Architectures Conversely… ■ Unbounded context ■ Non-determinism ■ Exhibit chaotic behaviour ■ Emergent behaviour ■ Complex testing
  • 6.
  • 7. Problems with traditional approaches ■ Integration test hell ■ Need to get by without E2E environments ■ Learnings are non-representative anyway ■ Slower ■ Costly (effort + $$)
  • 8. Alternative? Create an isolated, simulated environment ■ Run locally or on a CI environment ■ Fast - no need to setup complex test data, scenarios etc. ■ Enables single-variable hypothesis testing ■ Automatable
  • 9. Lab Testing w Docker Compose Hypothesis testing simulated environments
  • 10. Docker Compose ■ Docker container orchestration tool ■ Run locally or remotely ■ Works across platforms (Windows, Mac, *nix) ■ Easy to use
  • 11.
  • 12. Nginx Let’s take a practical, real-world example: Nginx as an API Proxy.
  • 13.
  • 14. Simulating failure with Muxy “A tool to help simulate distributed systems failures”
  • 15. Hypothesis testing Our job is to hypothesise, test, learn, change, and repeat
  • 16. Nginx Testing H0 = Introducing network latency does not cause errors Test setup: ● Nginx running locally, with Production configuration ● DNSMasq used to resolve production urls to other Docker containers ● Muxy container setup, proxying the API ● A test harness to hit the API via Nginx n times, expecting 0 failures
  • 17.
  • 19. Knobs and Levers We can now have a number of levers to pull. What if we... ● Want to improve on our SLA? ● Want to see how it performs if the API is hard down? ● ...
  • 20. Antifragility Failure is inevitable, let’s make it normal
  • 22. Titanic Architectures “Titanic architectures are architectures that are good in theory, but haven’t been put into practice”
  • 23. Anti-titanic architectures? “What doesn’t kill you makes you stronger”
  • 24. Antifragility “The resilient resists shocks and stays the same; the antifragile gets better” - Nasim Taleb
  • 25. Chaos Engineering ● We expect our teams to build resilient applications ○ Fault tolerance across and within service boundaries ● We expect servers and dependent services to fail ● Let’s make that normal ● Production is a playground ● Levelling up
  • 26. Chaos Engineering - Principles 1. Build a hypothesis around Steady State Behavior 2. Vary real-world events 3. Run experiments in production 4. Automate experiments to run continuously Requires the ability to measure - you need metrics!! http://www.principlesofchaos.org/
  • 27. Production Hypothesis Testing H0 = Loss of an AWS region does not result in errors Test setup: ● Multi-region application setup for the video playing API ● Apply Chaos Kong to us-west-2 ● Measure aggregate production traffic for ‘normal’ levels
  • 28. Kill an AWS region http://techblog.netflix.com/2015/09/chaos-engineering-upgraded.html
  • 29. Go/Hystrix API Demo H0 = Introducing network latency does not cause API errors Test setup: ● API1 running with Hystrix circuit breaker enabled if API2 does not respond within SLAs ● Muxy container setup, proxying upstream API2 ● A test harness to hit API1 n times, expecting 0 failures
  • 30. Human Factors Technology is only part of the problem, can we test that too?
  • 31.
  • 32. Chernobyl ● Worst nuclear disaster of all time (1986) ● Public information sketchy ● Estimated > 3M Ukrainians affected ● Radioactive clouds sent over Europe ● Combination of system + human errors ● Series of seemingly logical steps -> catastrophe
  • 33. What we know about human factors ● Accidents happen ● 1am - 8am = higher incidence of human errors ● Humans will ignore directions ○ They sometimes need to (e.g. override) ○ Other times they think they need to (mistake) ● Computers are better at following processes
  • 34. Let’s use a Production deployment as a key example: ● CI -> CD pipeline used to deploy ● Production incident occurs 6 hours later (2am) ● ...what do we do? ● We trust the build pipeline, avoid non-standard actions These events help us understand and improve our systems Translation
  • 35. “ A game day exercise is where we intentionally try to break our system, with the goal of being able to understand it better and learn from it ” Game Day Exercises
  • 36. Prerequisites: ● A game plan ● All team members and affected staff aware of it ● Close collaboration between Dev, Ops, Test, Product people etc. ● An open mind ● Hypotheses ● Metrics ● Bravery Game Day Exercises
  • 37. ● Get entire team together ● Make a simple diagram of system on a whiteboard ● Come up with ~5 failure scenarios ● Write down hypotheses for each scenario ● Backup any data you can’t lose ● Induce each failure and observe the results Game Day Exercises https://stripe.com/blog/game-day-exercises-at-stripe
  • 38. Examples of things that fail: ● Application dies ● Hard disk fail ● Machine dies < AZ < Region… ● Github/Source control goes down ● Build server dies ● Loss of degraded network connectivity ● Loss of dependent API ● ... Game Day Exercises
  • 39. Wrapping up I hope I didn’t fail
  • 40. ■ Apply the scientific method ■ Use metrics to make learn and make decisions ■ Docker-compose + Muxy to automate failure ■ Build resilience into software & architecture ■ Regularly Production resilience until it’s normal ■ Production outages are opportunities to learn ■ Start small! Wrapping up
  • 42. ■ Antifragility (https://en.wikipedia.org/wiki/Antifragile) ■ Chaos Engineering (http://techblog.netflix.com/2014/09/introducing-chaos- engineering.html) ■ Principles of Chaos (http://www.principlesofchaos.org/) ■ Human factors in large-scale technological systems' accidents: Three Mile Island, Bhopal, Chernobyl (http://oae.sagepub.com/content/5/2/133.abstract) References
  • 43. ■ Docker Compose (https://www.docker.com/docker-compose) ■ Muxy (https://github.com/mefellows/muxy) ■ Nginx resilience testing with Docker Compose (www.onegeek.com.au/articles/resilience-testing-nginx-with- docker-dnsmasq-and-muxy) ■ Golang + Hystrix resilience testing with Docker Compose (https://github.com/mefellows/muxy/tree/mst-meetup- demo/examples/hystrix) Code Tool References

Hinweis der Redaktion

  1. DiUS - who we are 100 or so developers, testers, UXers, BAs IMs etc. in Melbourne and Sydney We help businesses get their ideas to market - from software, to hardware and everything in between. We’re a lot more than that, so if you’re interested in hearing more about us and what we do come and chat after. We are always on the lookout for talent Yow? Hands? OK, we have HEAPS to cover. Inevitably with these things I get too excited and could go on for hours, it’s a really interesting topic, and one that could be the subject of 100 of these sessions but we only have 20 or so minutes. It’s also a Depressing talk at least to begin with. We’re going to talk about failures and catastrophes... A lot. I’m also going to tempt the Demo Gods which seems rather ironic, so if you could all please pray to your respective Gods that would be great. My hope is to pique your interests in a few areas and provide you with some materials to go further If I do my job well, we’ll have some tools/practices at the end that you can take back to your teams and talk about WHY Why this talk? Approaches too labour intensive No simple way to excercise failure in a lab environment - needs to be repeatable, automatable and so on
  2. Put your hands up if you’ve never been involved with any sort of failure?
  3. Chaos Initial starting conditions dramatically change outcomes
  4. Context: User service, well defined boundaries 1 external collaborator, 0 dependencies Few places where things can go wrong Well defined practices to test / remediate
  5. Chaos Initial starting conditions dramatically change outcomes Testing Integration test hell Need to be able to get by without E2E environments It’s not Prod anyway, learnings will be non-representative
  6. Genuine example of a Netflix architecture (mapped with Spigo) Chaos Initial starting conditions dramatically change outcomes Testing Integration test hell Need to be able to get by without E2E environments It’s not Prod anyway, learnings will be non-representative Lot’s of places where failure can occur
  7. Chaos Initial starting conditions dramatically change outcomes Testing Integration test hell Need to be able to get by without E2E environments It’s not Prod anyway, learnings will be non-representative If you’re using AWS/Cloud for this, you will be paying for all of the services you provision for this E2E test Not to mention management of them (tools, people, process etc.)
  8. One alternative... Still non-representative, but cheaper, faster etc.
  9. And one tool we have in the kit is Docker Compose Hands up - Docker? Docker Compose?
  10. We want to be able to test this failover scenario
  11. Now, it’s a terrible name. (Mux router, muliplexing and so on). Unlike all of my other GH projects this one somehow got popular and it was too late to change! It lets me: Act as a proxy between 2 endpoints and intercept requests Alter the network behaviour on a machine at Layer 4 - configuration for the network devices. Alter the http request/response cycle at Layer 7 Another really nice tool is Toxy. Lot’s of bells/whistles, however it can’t screw with the actual network and requires Node.
  12. Null hypothesis: “The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance. Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.” In English, the thing/variable we’re changing isn’t the cause of any change in behaviour we observe.
  13. Out of the lab, and into real life
  14. The Titanic was over engineered for its day, costing about $7.5M dollars which is between $200-400M today. But it failed and was a terrible catastrophe, mostly because we can now no longer get onto a boat without somebody impersonating Leonardo Di Caprio
  15. I struggled to find a really good ‘opposite’, and almost talked about the ancient Roman state enemy Mithradates. But the more I thought on it, the more I felt that we humans are a great example of the opposite. Whether you think we’ve been designed by a God, or crafted serendipitously by evolution, you can’t argue that once we are thrown into the real world - we get better. Hormesis (Mithradasis)
  16. Emerging field in Computing, initially from the world of Economics, where asymmetric payoffs + increased uncertainty = greater results. When applied to Engineering, the take home point is that we need to be subjecting our systems to failure more often, and in increasingly more brutal ways, to make our systems better Fragile < Resilient < Antifragile
  17. Netflix are the pioneers in this space, in fact they have a dedicated Chaos Team
  18. As per our lab experiments, we now take the same principles, but apply them to PRODUCTION
  19. Drop this in the likelihood we’ll be running low on time
  20. Here is Netflix, testing out their Chaos Kong (a bigger version of their Chaos Monkey) which takes out an entire AWS region.
  21. Drop this in the likelihood we’ll be running low on time
  22. Abnormal power surge in Reactor 4 Emergency shutdown (bad action) Huge power surge, resulting in steam explosions exposing the graphite core which then ignited
  23. Mistakes - Anyone at the Uber talk at Yow will note that there biggest incident happened between 12-1am (from memory)
  24. We jump into AWS console, see if we can SSH onto a box, holy crap the boxes are dead. Let’s check DNS settings, see if we can point it at the fail over inactive environment…. Incorrect. We redeploy from CI. At 3am in the morning, you run a serious risk of delegating your top level domain to your personal blog. Manual actions = catastrophe
  25. Why wait until 3am? Let’s break our system more often! Everything is up for grabs: Process Technology People* This is where we all get to learn, not just the ones deploying on managing the infra.