SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
CHAOS!
Breaking your systems to make them unbreakable
–Barack Obama
“You can’t let your failures define you.
You have to let your failures teach you.”
– Corey Bertram
“Hope is not a strategy.”
http://bit.ly/netflix-5-things
Chaos Monkey
J A S O N Y E E
Te c h n i c a l E v a n g e l i s t
Tr a v e l H a c k e r
W h i s k e y H u n t e r
P o k e m o n Tr a i n e r
Tw : @ g i t b i s e c t
E m : j y e e @ d a t a d o g h q . c o m
D ATA D O G
S a a S - b a s e d o b s e r v a b i l i t y
p l a t f o r m :
• M e t r i c s
• Tr a c e s ( A P M )
• L o g s
• S y n t h e t i c s
Tw : @ d a t a d o g h q
We ’ re h i r i n g :
j o b s . d a t a d o g h q . c o m
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
“By running Chaos Monkey in the middle of a
business day, in a carefully monitored environment
with engineers standing by to address any problems,
we can still learn the lessons about the weaknesses
of our system, and build automatic recovery
mechanisms to deal with them.
So next time an instance fails at 3 am on a Sunday,
we won’t even notice.”
–Netflix Technology Blog, 2011
http://bit.ly/netflix-chaos
Don’t be a jerk!
Game Days
90 Minutes
90 Minutes
30 minutes planning.
90 Minutes
30 minutes planning.
50 minutes playing.
90 Minutes
30 minutes planning.
50 minutes playing.
10 minutes reporting.
The Team
The Team
1 subject matter expert
The Team
1 subject matter expert
1 SRE
The Team
1 subject matter expert
1 SRE
Sometimes, 1 junior
engineer/developer
Before
• Schedule it.
Before
• Schedule it.
• Pick tests. Start easy.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
• Write down your plan if things go wrong.
Before
• Schedule it.
• Pick tests. Start easy.
• Write down what you expect to happen.
• Write down your plan if things go wrong.
• Share your document!
During
• Staging -> Production (off-peak) -> Production (primetime)
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
• Monitor for outages.
During
• Staging -> Production (off-peak) -> Production (primetime)
• Announce start in group chat.
• Maintain discussion in group chat.
• Monitor for outages.
• Run your test and take notes!
After
• Create cards/tickets to track issues that need work.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
• Email to all of engineering.
After
• Create cards/tickets to track issues that need work.
• Write a summary & key lessons.
• Email to all of engineering.
• Celebrate!
🍾🎉🐶
Game Levels
0. Terminate the service. Block access to 1 dependency.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
Monitor & alert on Latency, Errors, Traffic, & Saturation
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
4. Spike traffic.
Game Levels
0. Terminate the service. Block access to 1 dependency.
1. Block access to all dependencies.
2. Terminate the host.
3. Degrade the environment.
4. Spike traffic.
5. Terminate the region/cloud.
Experiment time!
Review
Share!
Plan!
Have fun!
Additional Resources
• systemctl, kill, iptables
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
• Vegeta - https://github.com/tsenart/vegeta
Additional Resources
• systemctl, kill, iptables
• Comcast - https://github.com/tylertreat/comcast
• Vegeta - https://github.com/tsenart/vegeta
• Kubernetes & Istio -
https://istio.io/docs/tasks/traffic-management/fault-injection/
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: wordpress-virtualservice
spec:
hosts:
- "*"
gateways:
- wordpress-gateway
http:
- route:
- destination:
host: wordpress
fault:
abort:
percentage:
value: 50.0
httpStatus: 500
delay:
percentage:
value: 50.0
fixedDelay: 30s
Additional Resources
• Gremlin - https://www.gremlin.com/
• ChaosToolkit - https://chaostoolkit.org/
Questions?
jason.yee@datadoghq.com
@gitbisect
Slides: bit.ly/cmr-chaos

Documents: bit.ly/cmr-chaos-docs
jason.yee@datadoghq.com
@gitbisect

Weitere ähnliche Inhalte

Ähnlich wie Jason Yee - Chaos! - Codemotion Rome 2019

The hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivityThe hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivityBrainhub
 
Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"RedHatAgileDay
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Developmentiandundore
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation systemTimothyBonci
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoDeja vu Security
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Noah Sussman
 
Software testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbaiSoftware testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbaivibrantuser
 
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityScrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityRon Quartel
 
Secrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slidesSecrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slidesAlan Richardson
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsMaritime DevCon
 
Let's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting TimeLet's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting TimeExcella
 
Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0TonikJDK
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solvingFlowa Oy
 
Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...vibrantuser
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austinmatthewbrahms
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesAlex Cruise
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)Dan Milstein
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 

Ähnlich wie Jason Yee - Chaos! - Codemotion Rome 2019 (20)

The hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivityThe hunt of the unicorn, to capture productivity
The hunt of the unicorn, to capture productivity
 
Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"Larry Maccherone: "Probabilistic Decision Making"
Larry Maccherone: "Probabilistic Decision Making"
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Development
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation system
 
Hushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for EchoHushcon 2016 Keynote: Test for Echo
Hushcon 2016 Keynote: Test for Echo
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
 
Software testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbaiSoftware testing (2) trainingin-mumbai
Software testing (2) trainingin-mumbai
 
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityScrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper Productivity
 
Secrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slidesSecrets and Mysteries of Automated Execution Keynote slides
Secrets and Mysteries of Automated Execution Keynote slides
 
Angus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent SystemsAngus Fletcher - Error Handling in Concurrent Systems
Angus Fletcher - Error Handling in Concurrent Systems
 
Let's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting TimeLet's Sharpen Your Agile Ax, It's Story Splitting Time
Let's Sharpen Your Agile Ax, It's Story Splitting Time
 
Sharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian SjorberSharpen your Agile Axe by Brian Sjorber
Sharpen your Agile Axe by Brian Sjorber
 
Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0Getting Schooled DerbyCon 3.0
Getting Schooled DerbyCon 3.0
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
 
Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...Software testing (2) trainingin-mumbai...
Software testing (2) trainingin-mumbai...
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austin
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
 
Spaghetti gate
Spaghetti gateSpaghetti gate
Spaghetti gate
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 

Mehr von Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Mehr von Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Kürzlich hochgeladen

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Kürzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Jason Yee - Chaos! - Codemotion Rome 2019

  • 1. CHAOS! Breaking your systems to make them unbreakable
  • 2. –Barack Obama “You can’t let your failures define you. You have to let your failures teach you.”
  • 3. – Corey Bertram “Hope is not a strategy.”
  • 6. J A S O N Y E E Te c h n i c a l E v a n g e l i s t Tr a v e l H a c k e r W h i s k e y H u n t e r P o k e m o n Tr a i n e r Tw : @ g i t b i s e c t E m : j y e e @ d a t a d o g h q . c o m
  • 7. D ATA D O G S a a S - b a s e d o b s e r v a b i l i t y p l a t f o r m : • M e t r i c s • Tr a c e s ( A P M ) • L o g s • S y n t h e t i c s Tw : @ d a t a d o g h q We ’ re h i r i n g : j o b s . d a t a d o g h q . c o m
  • 8. –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.”
  • 9. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 10. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 11. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 12. “By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.” –Netflix Technology Blog, 2011 http://bit.ly/netflix-chaos
  • 13. Don’t be a jerk!
  • 17. 90 Minutes 30 minutes planning. 50 minutes playing.
  • 18. 90 Minutes 30 minutes planning. 50 minutes playing. 10 minutes reporting.
  • 20. The Team 1 subject matter expert
  • 21. The Team 1 subject matter expert 1 SRE
  • 22. The Team 1 subject matter expert 1 SRE Sometimes, 1 junior engineer/developer
  • 24. Before • Schedule it. • Pick tests. Start easy.
  • 25. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen.
  • 26. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen. • Write down your plan if things go wrong.
  • 27. Before • Schedule it. • Pick tests. Start easy. • Write down what you expect to happen. • Write down your plan if things go wrong. • Share your document!
  • 28. During • Staging -> Production (off-peak) -> Production (primetime)
  • 29. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat.
  • 30. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat.
  • 31. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat. • Monitor for outages.
  • 32. During • Staging -> Production (off-peak) -> Production (primetime) • Announce start in group chat. • Maintain discussion in group chat. • Monitor for outages. • Run your test and take notes!
  • 33. After • Create cards/tickets to track issues that need work.
  • 34. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons.
  • 35. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons. • Email to all of engineering.
  • 36. After • Create cards/tickets to track issues that need work. • Write a summary & key lessons. • Email to all of engineering. • Celebrate!
  • 38. Game Levels 0. Terminate the service. Block access to 1 dependency.
  • 39. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies.
  • 40. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host.
  • 41. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. Monitor & alert on Latency, Errors, Traffic, & Saturation
  • 42. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. 4. Spike traffic.
  • 43. Game Levels 0. Terminate the service. Block access to 1 dependency. 1. Block access to all dependencies. 2. Terminate the host. 3. Degrade the environment. 4. Spike traffic. 5. Terminate the region/cloud.
  • 47. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast
  • 48. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast • Vegeta - https://github.com/tsenart/vegeta
  • 49. Additional Resources • systemctl, kill, iptables • Comcast - https://github.com/tylertreat/comcast • Vegeta - https://github.com/tsenart/vegeta • Kubernetes & Istio - https://istio.io/docs/tasks/traffic-management/fault-injection/
  • 50.
  • 51.
  • 52.
  • 53. apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: wordpress-virtualservice spec: hosts: - "*" gateways: - wordpress-gateway http: - route: - destination: host: wordpress fault: abort: percentage: value: 50.0 httpStatus: 500 delay: percentage: value: 50.0 fixedDelay: 30s
  • 54. Additional Resources • Gremlin - https://www.gremlin.com/ • ChaosToolkit - https://chaostoolkit.org/