8. “You don’t choose the moment,
the moment chooses you.
You only get to choose how
prepared you are when it does.”
-Fire Chief Mike Burtch
4
9. Cloud Operations
is the ability to consistently create
and deploy reliable software to an
unreliable platform that scales
horizontally.
http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html
5
10. “It’s not my code, it’s your machines!
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
11. “It’s not my code, it’s your machines!
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
12. “It’s not my code, it’s your machines!
Spock Scotty
Little bit weird Pulls levers & turns knobs
Sits closer to the boss Easily excited
Thinks too hard Yells a lot in emergencies
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
14. Fingerpointyness
problem!!!
argggh!
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
15. Fingerpointyness
problem!!!
argggh!
freaking out,
not talking,
finding fault
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
16. Fingerpointyness
problem!!!
argggh!
freaking out, blaming,
not talking, covering
finding fault ass
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
17. Fingerpointyness
problem!!!
argggh!
freaking out, blaming,
not talking, covering whining,
finding fault ass hiding.
hurt egos
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
18. Fingerpointyness
problem!!!
argggh!
freaking out, blaming, figuring it
not talking, covering whining, out
finding fault ass hiding.
hurt egos
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
19. Fingerpointyness
problem!!!
argggh! fixed
freaking out, blaming, figuring it
fixing things
not talking, covering whining, out
finding fault ass hiding.
hurt egos
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
20. Being productive
problem!!!
argggh!
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
21. Being productive
problem!!!
argggh!
figuring it
out
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
22. Being productive
problem!!!
argggh! fixed
figuring it fixing things
out
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
23. Being productive
problem!!!
argggh! fixed
figuring it fixing things feeling
out guilty
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
24. Being productive
problem!!!
argggh! fixed
figuring it fixing things feeling move
out guilty on with
life
time
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
27. Catastrophic Potential
Simple Complexity Complex
Tight
Coupling
Loose
Created by Jesse Robbins
"Catastrophic Potential" adapted from Normal Accidents by Charles Perrow
12
28. Catastrophic Potential
Simple Complexity Complex
Tight KEEP
OUT!!!
Coupling
Loose
Created by Jesse Robbins
"Catastrophic Potential" adapted from Normal Accidents by Charles Perrow
12
51. Failure happens
A single datacenter is the
problem
• Since they all fail at some point
Recovery procedures after
failure
• Power was gone ~45 minutes
• Most services took hours to come back
• Some unnamed ones more than 12 hours