SlideShare ist ein Scribd-Unternehmen logo
1 von 12
NETFLIX’S
CHAOS
MONKEY
Michael Whitehead
“EVERYTHING FAILS ALL THE TIME”
- WERNER VOGELS
CHAOS MONKEY
A service that causes failure and
wreaks havoc on instances in Auto
Scaling Groups
A member of the Simian Army
developed by Netflix
WHY WOULD WE INTENTIONALLY
CAUSE FAILURE?!?
 It is inevitable
 Infrastructure is Complex
 Forcing failure puts you in control
 Identify faults in your architecture
‱ Does you load balancers reroute traffic correctly?
‱ Do your instances function correctly when they come back up?
‱ Are you monitoring tools alerting you on important events?
GETTING STARTED WITH CHAOS MONKEY
 Amazon Web Services
 Must be using Auto Scaling Groups
 Uses Amazon SimpleDB for event storage
 Simple Email Service setup (optional for notifications)
 Can be used with Netflix’s Asgard (optional)
 Java 7 JDK or newer
WOW!
EXAMPLE WITH CLOUDFORMATION
NEAT!
AWESOME!
COOL!
NO WAY!
BUILDING & CONFIGURATION
 Clone SimianArmy repo from Github
 Builds using Gradle
 Runs 6 times a day during business hours- 9am to 3pm
 Does not run on holidays or weekends
 Timeframes and frequency of runs can be configured
IMPORTANT PROPERTIES
 Enabling Chaos Monkey
 Set simianarmy.chaos.enabled = true
 Set simianarmy.chaos.leashed=false
 Probability of 1 instance being terminated per day per ASG
 simianarmy.chaos.ASG.probability = 1.0
 Opt-in or Opt-out model
OPT-IN / OPT-OUT MODEL
 Set to False = Opt-in Set to True = Opt-out
 simianarmy.chaos.ASG.enabled = false
 When Opt-In (false) you must enable each auto scaling group you
want to run Chaos Monkey in
 simianarmy.chaos.<<auto scaling group name>>.enabled = true
 When Opt-Out (true) you must disable each auto scaling group
you do not want it to run in
 simianarmy.chaos.<<auto scaling group name>>.enabled = false
EMAIL NOTIFICATIONS
ARE TERMINATIONS ALL IT CAN DO?
 Block all network traffic
 Burn CPU
 Burn IO
 Fill Disk
 Kill Processes
 Network Loss
 Null-Route
‱ All EC2 <-> EC2 traffic
SSH REQUIRED
 Detach all EBS volumes
 Fail DNS
 Fail EC2 API
 Fail S3 API
 Fail DynamoDB API
 Network Corruption
 Network Latency
LINKS
 CloudFormation Template:
https://github.com/joehack3r/aws/blob/master/cloudformation/te
mplates/chaosMonkey.json
 Chaos Monkey Announcement:
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-
wild.html
 Simian Army Quick Start Guide:
https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide
 Chaos Monkey Configuration:
https://github.com/Netflix/SimianArmy/wiki/Chaos-Settings
 Chaos Monkey Army:
https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army

Weitere Àhnliche Inhalte

Was ist angesagt?

Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
 
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...Derk-Jan de Grood
 
Test Automation
Test AutomationTest Automation
Test Automationnikos batsios
 
Designing your kanban board to map your process
Designing your kanban board to map your processDesigning your kanban board to map your process
Designing your kanban board to map your processYu Liang
 
Kanban Basics for Beginners
Kanban Basics for BeginnersKanban Basics for Beginners
Kanban Basics for BeginnersZsolt Fabok
 
5 Arguments Against Kanban
5 Arguments Against Kanban5 Arguments Against Kanban
5 Arguments Against KanbanNick Oostvogels
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
 
Kanban
KanbanKanban
Kanbanbramoe
 
Flusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryFlusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryJoost van der Griendt
 
QA Challenge Accepted 4.0 - Cypress vs. Selenium
QA Challenge Accepted 4.0 - Cypress vs. SeleniumQA Challenge Accepted 4.0 - Cypress vs. Selenium
QA Challenge Accepted 4.0 - Cypress vs. SeleniumLyudmil Latinov
 
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...Moataz Nabil
 
Selenium introduction
Selenium introductionSelenium introduction
Selenium introductionPankaj Dubey
 
Kubernetes and service mesh application
Kubernetes  and service mesh applicationKubernetes  and service mesh application
Kubernetes and service mesh applicationThao Huynh Quang
 
Kanban vs Scrum: What's the difference, and which should you use?
Kanban vs Scrum: What's the difference, and which should you use?Kanban vs Scrum: What's the difference, and which should you use?
Kanban vs Scrum: What's the difference, and which should you use?Arun Kumar
 
Spring cloud on kubernetes
Spring cloud on kubernetesSpring cloud on kubernetes
Spring cloud on kubernetesSangSun Park
 
Scrumban
ScrumbanScrumban
ScrumbanAjay Reddy
 
Large Scale Scrum: More with LeSS
Large Scale Scrum: More with LeSSLarge Scale Scrum: More with LeSS
Large Scale Scrum: More with LeSSRam Srinivasan, CST
 
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015Johnny Ordóñez
 

Was ist angesagt? (20)

Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017
 
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...
Keynote: Testing and Quality in the Scaled Agile Framework for Lean Enterpris...
 
Test Automation
Test AutomationTest Automation
Test Automation
 
Designing your kanban board to map your process
Designing your kanban board to map your processDesigning your kanban board to map your process
Designing your kanban board to map your process
 
Kanban Basics for Beginners
Kanban Basics for BeginnersKanban Basics for Beginners
Kanban Basics for Beginners
 
5 Arguments Against Kanban
5 Arguments Against Kanban5 Arguments Against Kanban
5 Arguments Against Kanban
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
Kanban
KanbanKanban
Kanban
 
Flusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryFlusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous Delivery
 
QA Challenge Accepted 4.0 - Cypress vs. Selenium
QA Challenge Accepted 4.0 - Cypress vs. SeleniumQA Challenge Accepted 4.0 - Cypress vs. Selenium
QA Challenge Accepted 4.0 - Cypress vs. Selenium
 
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...
Build your QA Pipeline using Serenity , Selenium WebDriver , Rest Assured and...
 
Selenium introduction
Selenium introductionSelenium introduction
Selenium introduction
 
Kubernetes and service mesh application
Kubernetes  and service mesh applicationKubernetes  and service mesh application
Kubernetes and service mesh application
 
Kanban
Kanban Kanban
Kanban
 
Kanban vs Scrum: What's the difference, and which should you use?
Kanban vs Scrum: What's the difference, and which should you use?Kanban vs Scrum: What's the difference, and which should you use?
Kanban vs Scrum: What's the difference, and which should you use?
 
Spring cloud on kubernetes
Spring cloud on kubernetesSpring cloud on kubernetes
Spring cloud on kubernetes
 
Kanban Basics
Kanban BasicsKanban Basics
Kanban Basics
 
Scrumban
ScrumbanScrumban
Scrumban
 
Large Scale Scrum: More with LeSS
Large Scale Scrum: More with LeSSLarge Scale Scrum: More with LeSS
Large Scale Scrum: More with LeSS
 
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015
Escalando Agile con SAFe - Regional Scrum Gathering Quito 2015
 

Andere mochten auch

ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012Amazon Web Services
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at NetflixGareth Bowles
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureLen Bass
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentAxel Fontaine
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security ServicesAlert Logic
 
Elements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsElements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsPek Pongpaet
 
From Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppFrom Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppPek Pongpaet
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016JoĂŁo Cravo
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkeyMatthew Campbell
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Jay Zarfoss
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the CloudAlert Logic
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgNils Meder
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...Amazon Web Services
 
Full Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanFull Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanWeston Bassler
 

Andere mochten auch (20)

ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at Netflix
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study future
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App Deployment
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services
 
Elements of User Experience for Mobile Apps
Elements of User Experience for Mobile AppsElements of User Experience for Mobile Apps
Elements of User Experience for Mobile Apps
 
From Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit AppFrom Sketch Mockup → WatchKit App
From Sketch Mockup → WatchKit App
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at Netflix
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
 
Full Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The ForemanFull Stack Automation with Katello & The Foreman
Full Stack Automation with Katello & The Foreman
 

Ähnlich wie Intro to Netflix's Chaos Monkey

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet
 
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsTomas Doran
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Yan Cui
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsBenjamin Cane
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Claus Ibsen
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudsonnachbaur
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patternsFlorent Ramiere
 
Advanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsAdvanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsk88hudson
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeSarah Z
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...Gareth Bowles
 
I don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesI don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesQA or the Highway
 
MongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB
 
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaErlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaHakka Labs
 

Ähnlich wie Intro to Netflix's Chaos Monkey (20)

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
 
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environments
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudson
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patterns
 
Advanced front-end automation with npm scripts
Advanced front-end automation with npm scriptsAdvanced front-end automation with npm scripts
Advanced front-end automation with npm scripts
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...
 
I don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth BowlesI don't always test...but when I do I test in production - Gareth Bowles
I don't always test...but when I do I test in production - Gareth Bowles
 
How to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWSHow to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWS
 
MongoDB, Cloudformation and Chef
MongoDB, Cloudformation and ChefMongoDB, Cloudformation and Chef
MongoDB, Cloudformation and Chef
 
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaErlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
 

KĂŒrzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

KĂŒrzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Intro to Netflix's Chaos Monkey

  • 2. “EVERYTHING FAILS ALL THE TIME” - WERNER VOGELS
  • 3. CHAOS MONKEY A service that causes failure and wreaks havoc on instances in Auto Scaling Groups A member of the Simian Army developed by Netflix
  • 4. WHY WOULD WE INTENTIONALLY CAUSE FAILURE?!?  It is inevitable  Infrastructure is Complex  Forcing failure puts you in control  Identify faults in your architecture ‱ Does you load balancers reroute traffic correctly? ‱ Do your instances function correctly when they come back up? ‱ Are you monitoring tools alerting you on important events?
  • 5. GETTING STARTED WITH CHAOS MONKEY  Amazon Web Services  Must be using Auto Scaling Groups  Uses Amazon SimpleDB for event storage  Simple Email Service setup (optional for notifications)  Can be used with Netflix’s Asgard (optional)  Java 7 JDK or newer
  • 7. BUILDING & CONFIGURATION  Clone SimianArmy repo from Github  Builds using Gradle  Runs 6 times a day during business hours- 9am to 3pm  Does not run on holidays or weekends  Timeframes and frequency of runs can be configured
  • 8. IMPORTANT PROPERTIES  Enabling Chaos Monkey  Set simianarmy.chaos.enabled = true  Set simianarmy.chaos.leashed=false  Probability of 1 instance being terminated per day per ASG  simianarmy.chaos.ASG.probability = 1.0  Opt-in or Opt-out model
  • 9. OPT-IN / OPT-OUT MODEL  Set to False = Opt-in Set to True = Opt-out  simianarmy.chaos.ASG.enabled = false  When Opt-In (false) you must enable each auto scaling group you want to run Chaos Monkey in  simianarmy.chaos.<<auto scaling group name>>.enabled = true  When Opt-Out (true) you must disable each auto scaling group you do not want it to run in  simianarmy.chaos.<<auto scaling group name>>.enabled = false
  • 11. ARE TERMINATIONS ALL IT CAN DO?  Block all network traffic  Burn CPU  Burn IO  Fill Disk  Kill Processes  Network Loss  Null-Route ‱ All EC2 <-> EC2 traffic SSH REQUIRED  Detach all EBS volumes  Fail DNS  Fail EC2 API  Fail S3 API  Fail DynamoDB API  Network Corruption  Network Latency
  • 12. LINKS  CloudFormation Template: https://github.com/joehack3r/aws/blob/master/cloudformation/te mplates/chaosMonkey.json  Chaos Monkey Announcement: http://techblog.netflix.com/2012/07/chaos-monkey-released-into- wild.html  Simian Army Quick Start Guide: https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide  Chaos Monkey Configuration: https://github.com/Netflix/SimianArmy/wiki/Chaos-Settings  Chaos Monkey Army: https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army