Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2pqx1Jn.
Heather Nakama tells the story of implementing chaos testing on a small product, and how several small and targeted early investments in chaos engineering saved time and effort. Filmed at qconsf.com.
Heather Nakama is a Senior Software Engineer at Azure Search, a managed search-as-a-service offering from Microsoft's Azure cloud service platform. She works on the backend infrastructure that deploys, monitors, elastically scales, and automatically heals clusters hosting customer services. She has a passion for distributed systems and ensuring that they are scalable, fault-tolerant, and reliable.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
chaos-engineering-budget
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
19. “
-Martin Fowler
[Y]ou are doing enough testing if the following is true:
YOU RARELY GET BUGS THAT ESCAPE INTO
PRODUCTION
YOU ARE RARELY HESITANT TO CHANGE SOME
CODE FOR FEAR IT WILL CAUSE PRODUCTION BUGS
17
20. “
-Martin Fowler
[Y]ou are doing enough testing if the following is true:
YOU RARELY GET BUGS THAT ESCAPE INTO
PRODUCTION
YOU ARE RARELY HESITANT TO CHANGE SOME
CODE FOR FEAR IT WILL CAUSE PRODUCTION BUGS
18
35. “
-Martin Fowler
[Y]ou are doing enough testing if the following is true:
YOU RARELY GET BUGS THAT ESCAPE INTO
PRODUCTION
YOU ARE RARELY HESITANT TO CHANGE SOME
CODE FOR FEAR IT WILL CAUSE PRODUCTION BUGS
33
36. “
-Martin Fowler
[Y]ou are doing enough testing if the following is true:
YOU RARELY GET BUGS THAT ESCAPE INTO
PRODUCTION
YOU ARE RARELY HESITANT TO CHANGE SOME
CODE FOR FEAR IT WILL CAUSE PRODUCTION BUGS
34
42. CONFIDENCE
IN SYSTEM
HEALTH
40
As system complexity increases, the states
and operations that must potentially be
tested increase exponentially
• Curse of dimensionality
x
y
z
45. 43
Reduces test set in an evenly distributed way
that avoids engineer bias and blind spots
Generating test cases is very CHEAP,
enabling more tests to be run
x
y
z
RANDOMNESS
46. MONTE CARLO
APPROACH TO
SYSTEM
CONFIDENCE
44
Question What’s the probability that my system
will encounter a bug?
Prohibitively expensive
approach
Pre-defining system states and operations
through engineer effort and validating them
Cheaper approximation Validating system health at regular intervals
Sample set Randomly generated system states +
transient failures from running in production
47. 45
Question What’s the probability that my system
will encounter a bug?
Prohibitively expensive
approach
Pre-defining system states and operations
through engineer effort and validating them
Cheaper approximation Validating system health at regular intervals
Customer-generated system states
from running in production
Sample set
“TRADITIONAL”
CHAOS ENGINEERING
MONTE CARLO
APPROACH TO
SYSTEM
CONFIDENCE
77. Heather Nakama, QCon
11/14/17
DISTRIBUTED SYSTEMS ARE CHAOTIC AND
REQUIRE CHAOTIC TESTING METHODS
MONTE CARLO-STYLE RANDOM SAMPLING IS
A CHEAP WAY TO GAIN CONFIDENCE
ABOUT SYSTEM HEALTH
…AS LONG AS YOU INCLUDE FAILURE IN
YOUR SAMPLE SET
78. Heather Nakama, QCon
11/14/17
AZURE SEARCH
• Inside Azure Search: Chaos Engineering
• A Startup At Microsoft
HEATHER NAKAMA
• Distributed systems
• Chaos engineering
• Self-healing systems through machine learning
• Biologically inspired systems
Art by Designed by Vexels.com