10+ Deploys Per Day: Dev and Ops Cooperation at Flickr


Published on

Communications and cooperation between development and operations isn't optional, it's mandatory. Flickr takes the idea of "release early, release often" to an extreme - on a normal day there are 10 full deployments of the site to our servers. This session discusses why this rate of change works so well, and the culture and technology needed to make it possible.

Published in: Technologie
13 Kommentare
661 Gefällt mir
  • Great presentation ... I was wondering if Flickr uses staging or code review ? And what types of testing such as integration testing, functional testing, unit testing, or A/B testing, are used ?
    Sind Sie sicher, dass Sie...  Ja  Nein
    Ihre Nachricht erscheint hier
  • This is hilarious, you say no finger-pointing, but there is a clear message in this presentation that 'Ops' are the problem, with only a small nod to how developers can help. The general theme is that Ops are the ones causing problems with several slides focussing on this with no equivalent developers slide.
    Sind Sie sicher, dass Sie...  Ja  Nein
    Ihre Nachricht erscheint hier
  • http://blip.tv/oreilly-velocity-conference/velocity-09-john-allspaw-10-deploys-per-day-dev-and-ops-cooperation-at-flickr-2297883
    Sind Sie sicher, dass Sie...  Ja  Nein
    Ihre Nachricht erscheint hier
  • infra
    Sind Sie sicher, dass Sie...  Ja  Nein
    Ihre Nachricht erscheint hier
  • Well made presentation, good work!
    Sind Sie sicher, dass Sie...  Ja  Nein
    Ihre Nachricht erscheint hier
Keine Downloads
Bei Slideshare
Aus Einbettungen
Anzahl an Einbettungen
Gefällt mir
Einbettungen 0
No embeds

No notes for slide

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr

  1. 1. 10 deploys per day Dev & ops cooperation at Flickr John Allspaw & Paul Hammond Velocity 2009
  2. 2. 3 billion photos 40,000 photos per second http://flickr.com/photos/jimmyroq/415506736/
  3. 3. Dev versus Ops
  4. 4. “It’s not my machines, it’s your code!”
  5. 5. “It’s not my code, it’s your machines!”
  6. 6. Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies
  7. 7. Says “No” all the time Afraid that new fangled things will break the site Fingerpointy
  8. 8. Ops stereotype Because the site breaks unexpectedly Because no one tells them anything Because They say “NO” all the time
  9. 9. http://www.flickr.com/photos/stewart/461099066/ Traditional thinking Dev’s job is to add new features Ops’ job is to keep the site stable and fast
  10. 10. Ops’ job is NOT to keep the site stable and fast
  11. 11. Ops’ job is to enable the business (this is dev’s job too)
  12. 12. The business requires change
  13. 13. But change is the root cause of most outages!
  14. 14. Discourage change in the interests of stability or Allow change to happen as often as it needs to
  15. 15. Lowering risk of change through tools and culture
  16. 16. Dev and Ops
  17. 17. Ops who think like devs Devs who think like ops
  18. 18. “But that’s me!”
  19. 19. You can always think more like them
  20. 20. Tools
  21. 21. 1. Automated infrastructure If there is only one thing you do…
  22. 22. CFengine Chef BCfg2 FAI 1. Automated infrastructure If there is only one thing you do… System Imager Puppet Cobbler
  23. 23. Role & configuration management OS imaging
  24. 24. 2. Shared version control
  25. 25. Everyone knows where to look http://www.flickr.com/photos/thunderchild5/1330744559/
  26. 26. 3. One step build
  27. 27. 3. One step build and deploy
  28. 28. [2009-06-22 16:03:57] [harmes] site deployed (changes...) Who? When? What?
  29. 29. Small frequent changes http://www.flickr.com/photos/mauren/2429240906/
  30. 30. 4. Feature flags (aka branching in code)
  31. 31. 1.0.1 1.0.2 1.0 1.1 1.2 1.1.1 Desktop software
  32. 32. r2301 r2302 r2306 Web software
  33. 33. http://www.flickr.com/photos/8720628@N04/2188922076/ Always ship trunk
  34. 34. Everyone knows exactly where to look http://www.flickr.com/photos/thunderchild5/1330744559/
  35. 35. Feature flags #php if ($cfg['enable_feature_video']){ … } {* smarty *} {if $cfg.enable_feature_beehive} … {/if}
  36. 36. http://www.flickr.com/photos/healthserviceglasses/3522809727/ Private betas
  37. 37. Bucket testing http://www.flickr.com/photos/davidw/2063575447/
  38. 38. http://www.flickr.com/photos/jking89/3031204314/ Dark launches
  39. 39. Free contingency switches http://www.flickr.com/photos/flattop341/260207875/
  40. 40. 5. Shared metrics
  41. 41. Application level metrics
  42. 42. Application level metrics
  43. 43. Adaptive feedback loops RU ok? App System Metrics maybe?
  44. 44. 6. IRC and IM robots
  45. 45. Dev, Ops, and Robots Having a conversation build deploy logs logs alerts monitors IRC search engine
  46. 46. Culture
  47. 47. 1. Respect If there is only one thing you do…
  48. 48. Don’t stereotype (not all developers are lazy) http://www.flickr.com/photos/aaronjacobs/64368770/
  49. 49. http://www.flickr.com/photos/chrisdag/2286198568/ Respect other people’s expertise, opinions and responsibilities
  50. 50. http://www.flickr.com/photos/jwheare/2580631103/ Don’t just say “No”
  51. 51. http://www.flickr.com/photos/alancleaver/2661424637/ Don’t hide things
  52. 52. Developers: Talk to ops about the impact of your code: • what metrics will change, and how? • what are the risks? • what are the signs that something is going wrong? • what are the contingencies? This means you need to work this out before talking to ops
  53. 53. 2. Trust
  54. 54. Ops needs to trust dev to involve them on feature discussions Dev needs to trust ops to discuss infrastructure changes Everyone needs to trust that everyone else is doing their best for the business http://www.flickr.com/photos/85128884@N00/2650981813/
  55. 55. http://www.flickr.com/photos/flattop341/224176602/ Shared runbooks & escalation plans
  56. 56. http://www.flickr.com/photos/telstar/2861103147/ Provide knobs and levers
  57. 57. http://www.flickr.com/photos/williamhook/3468484351/ Ops: Be transparent, give devs access to systems
  58. 58. 3. Healthy attitude about failure
  59. 59. http://www.flickr.com/photos/pinksherbet/447190603/ Failure will happen
  60. 60. If you think you can prevent failure then you aren’t developing your ability to respond http://www.flickr.com/photos/toms/2323779363/
  61. 61. http://www.flickr.com/photos/changereality/2349538868/
  62. 62. Fire drills http://www.flickr.com/photos/dnorman/2678090600
  63. 63. 4. Avoiding Blame
  64. 64. No fingerpointing http://www.flickr.com/photos/rocketjim54/2955889085/
  65. 65. Fingerpointyness problem!!! argggh! fixed. freaking out, blaming, whining, figuring it fixing things not talking, covering hiding. out finding fault ass hurt egos time
  66. 66. Being productive problem!!! argggh! fixed. figuring it fixing things feeling move out guilty on with life time
  67. 67. Developers: Remember that someone else will probably get woken up when your code breaks http://www.flickr.com/photos/alex-s/353218851/
  68. 68. http://www.flickr.com/photos/allspaw/2819774755/ Ops: provide constructive feedback on current aches and pains
  69. 69. 1. Automated infrastructure 2. Shared version control 3. One step build and deploy 4. Feature flags 5. Shared metrics 6. IRC and IM robots 1. Respect 2. Trust 3. Healthy attitude about failure 4. Avoiding Blame
  70. 70. This is not easy You could just carry on shouting at each other…
  71. 71. (Thank you)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.