Software projects were historically managed on a bet the farm model. They succeeded or they failed. And when they failed (as big software projects often did), the consequences were typically dire for, not only organizations as a whole, but for many of the individuals involved. Today, by contrast, many software and the development projects have evolved toward a much more incremental, iterative, and experimental process that takes cues from the open source model which excuses (and even rewards) certain types of failure.
In this session, we’ll discuss how failure can be turned into a positive. This includes the organizational dynamics associated with tolerating uncertain outcomes, the need to define acceptable failure parameters, and the technical means by which experimentation can be automated in ways that amplify the positive while minimizing the effect of negative outcomes.
12. 12
WHAT HE LEARNED
• Kindergarteners do not spend 15
minutes in a bunch of status
transactions trying to figure out who
is going to be CEO of Spaghetti
Corporation.
• They don’t sit around talking about
the problem. They just start
building to determine what works
and what doesn’t.
15. 15
THE RIGHT SCOPE
Constrain the impact of failure
• Enable experimentation
• Stop cascading of failures
• Make deployments incremental,
frequent, and routine events
• Generally decouple activities and
decisions from each other
• Small, autonomous, bounded context
services
16. 16
SMALL
• “Two pizza teams”
• Well-defined functional units
• Organized around business
capabilities (Conway's Law)
17. 17
AUTONOMOUS
• Implementation changes can happen
independently of other services
• Data and functionality exposed only
through service calls over the
network
• Designed to be externalizable
• No back-doors
18. 18
THE RIGHT APPROACH
Continuously experiment, iterate,
and improve
• It’s about the process
• Identify mistakes early
• Establish safety nets
• Fail and move on
19. 19
THE PROCESS
Involves people and communication
• The most effective process have continuous
communication - think scrums and kanban
• Allows for collaboration that can identify
failures before they happen
• Allows for feedback to continuously improve
and cultivate growth
• Provides transparency
20. 20
DEV LESSONS: BREAKING CODE VIOLENTLY
Build in violent failures to highlight issues
• C/C++ lessons:
• Sanity check using assertions
• Invariant checks
• If ever I’m here in the code and these
conditions aren’t met, then I have no
business being here. Something is
wrong and I should fail violently.
• Involves tracing through the failure
21. 21
AUTOMATED REGRESSION TESTING
• As products and services evolve we
discovered that maintaining and incrementally
adding new tests became valuable
• These tests were/are most often based on
experienced failures and bugs
• Scripts were developed to run nightly builds
against various developer changes to test for
regression
• Testing tools evolved - proprietary and open
source
22. 22
OPS LESSONS: CHAOS MONKEY
Test robustness of recovery using failure
• Platform should provide uninterrupted services
to the customer
• Therefore:
• Should always recover in acceptable
amount of time
• We should have random failures to ensure
that changes have not regressed or caused
new recovery problems
http://understeer.hatenablog.com/entry/2012/02/29/224629
23. 23
THE RIGHT WORKFLOW
Repeatably automate for consistency
• Goal is repeatable automation
• Toyota’s yellow cord
• Initially pipelines may be very
different
• Different tools
• Traditional vs. “cloud native”
• It’s a journey
• Consolidation evolves naturally
24. 24
DESIRABLE ENTERPRISE CI/CD WORKFLOW
myRepo
Project
Repo
CI
Commit Push
Pass/Fail
Local Test
Build
Repo
CD
Release
Repo
Monitor
Build Test
Review/
Appr
Deliver Deploy
3rd
Party
26. 26
OPS LESSONS: RED/GREEN
Configuration as code has built in failure
Continuous Integration /
Continuous Deployment
Image & Package &
Metadata Repository
src repo
Dev./Build QA
Production
in OHC
Events
27. 27
THE RIGHT INCENTIVES
Align rewards and behavior with desirable outcomes
• Incentives (advancement, money,
recognition) need to reward trust,
cooperation, and innovation
• Peer reward systems also valuable
• Individual has control over their own
success
• But people still have responsibility for
their actions
28. 28
THE RIGHT CULTURE
Build systems and organizations that allow for failing well
• Transparency
• Even good decisions can have bad
outcomes
• Innovation inherently risky
• Cut losses (avoid sunk cost fallacy)
This is why open source is
so successful!
30. 30
BUT CULTURE ISN’T SOMETHING YOU JUST CHANGE
• Lack of agreed-to model of what
“right” culture looks like
• Different organizations require
different behaviors
• Culture change is difficult to measure
and quantify
• Culture is very hard to impose
• Culture is an output, not an input
33. CREDITS
33
Tacoma Narrows Bridge: Barney Elliott; The Camera Shop - Screenshot taken from 16MM Kodachrome motion picture
film by Barney Elliott.
Time cover: Time, Inc.
Wipeout, Flickr/CC: https://www.flickr.com/photos/andymorffew/15843725192
Marshmallow challenge: http://marshmallowchallenge.com/Welcome.html
Linux Collaboration Summit: Linux Foundation.
Two pizzas: Flickr/CC https://www.flickr.com/photos/dongkwan/283076601
Frog: Kathy CC/Flickr https://flic.kr/p/b9fFV
Square peg Flickr/CC: https://www.flickr.com/photos/epublicist/3546059144/