Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Go or No-Go: Operability and Contingency Planning at Etsy.com

38.899 Aufrufe

Veröffentlicht am

These are the slides from my talk at the Surge Conference in 2010, in Baltimore: http://omniti.com/surge/2010/speakers/john-allspaw

Veröffentlicht in: Technologie, Business
  • I love these slides! Did you know we’re running a competition on SlideShare to win a 3M PocketProjector MP180? To enter, simply tag your presentation with ‘3MInform’. Head over to our page for more details... and don’t forget to follow us to find out if you get shortlisted!
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Go or No-Go: Operability and Contingency Planning at Etsy.com

  1. Go or No-Go Operability and Contingency Planning John Allspaw, Etsy.com
  2. Etsy as of now Total Members: over 5.7 million Total Sellers: over 400,000 Items Currently Listed: 6.5 million Page Views per month: 775 million Total $ sold (gross merchandise sales) 2010 = $179.4 million (through August)
  3. New Features
  4. Delivering OperableGoSoftware Arch Review Development/Ops or No-Go Launch* Feedback Loop
  5. CONTINUOUS DEPLOYMENT != deploying new features without coordination and planning
  6. Operability Review Contingency Checklist
  7. Not An Innovative Idea http://en.wikipedia.org/wiki/Launch_status_check
  8. 10 minute get-together • Product • Development • Operations • Design • Community • Support
  9. Consensus
  10. Informally Codifies “OK” Dev “We all understand/agree/ Ops accept that we are OK here!” Product Community Support Buggy Stable Perfect! Sloppy Finished Enough For Launch Unfinished
  11. Yes or No
  12. Has the feature been tested enough to deploy to production? Is there any final functional QA still needed?
  13. Is communication (blog post/forums/etc) about the feature ready to go out with the feature?
  14. Does everyone know: when it will go live, and who will push the feature?
  15. Has the feature been in production for staff (or some other specific subset of the users) already? If not, could it have been?
  16. Is it possible to dark launch this feature? Will this feature be dark launched? (or, has it already?)
  17. Is it possible to turn up this feature on a percentage basis? If so: will we?
  18. Does it involve any new infrastructure? If so: are those pieces in monitoring and metrics collection? (this answer can’t be “no” before launch)
  19. Do we have on/off switches for this feature? If so: are those switches documented? (this answer can’t be “no” before launch)
  20. Are all the leads (Dev, Ops, Product, Community, Support, etc.) available for the launch and in communication? (this answer can’t be “no” before launch)
  21. Is there a single and easy place for users to report bugs or concerns about the feature? (forum topic, etc.)
  22. Have all leads agreed upon a post-launch “it’s all DONE” time to declare the launch was successful?
  23. Have we done a Contingency Checklist™ and everyone reviewed it? (this answer can’t be “no” before launch)
  24. Contingency Checklist
  25. “What could possibly go wrong?” “When it does go wrong, WTF will we do?!”
  26. NOTE: This is worked outBEFORE launch, normally by product and development, involving others where needed. (when we have saner heads)
  27. Issue Onsite Messaging Likelihood Forums Comment(s) Blog Impact on Users PR Engineering Response
  28. Comment Impact on Engineering Onsite Issue Likelihood Forums Blog PR (s) Users Response Messaging
  29. Example: Coffee! AWESOME NEW FEATURE • add coffee (like a tag) to your profile • others can favorite coffees • page showing all coffee favorites • bulk-add coffees to your profile • search people by their coffee
  30. Issue What could possibly go wrong with the feature launched in production? Example: “The Coffees-You’ve-Favorited page is too expensive.”
  31. Likelihood How likely is this issue going to come up? Example: “Low to mid.”
  32. Comment(s) Any extra info about this issue here. Example: “Because of how we paginate coffee favorites page, they are somewhat harder than normal favorites. If we do have to turn this off, we’re saying that we need to re-design it, or it needs to stay off until the initial burst of traffic from the launch.”
  33. Impact How much is this going to impact the experience of the feature, if it does become a concern? Example: “High”
  34. Engineering Response What will we do to mitigate the issue (i.e. can we gracefully degrade?) Example: “Set disable_coffee_favorites_page = 1”
  35. Onsite Messaging What is the messaging to the community in the forums/blog/etc., if this needs graceful degradation? Example: “‘The Coffee Favorites page is currently unavailable.’ Or, in the forums: “We’re working through some issues with displaying Coffee Favorites, we’ll let you know the status as time goes on.’”
  36. PR Is the issue so severe that we need PR involved? Example: “The CEO sends a press release, apologizing to Folger’s, Peet’s, and Starbucks with a witty yet calming voice of explanation and a humble request for patience.”
  37. * afterwards....
  38. *successful launch... Metrics? Are we there yet? OMG! Who to call if it breaks later?
  39. * non-successful launch... Metrics? What’d we miss? Post Mortem? Ramp down?
  40. Photos http://www.flickr.com/photos/jliba/3783269078/ http://www.flickr.com/photos/mybloodyself/2072928376/ http://www.flickr.com/photos/jacy/360020853/ http://www.flickr.com/photos/f-l-e-x/2319852529/ http://www.flickr.com/photos/16230215@N08/3023061528/ http://www.flickr.com/photos/proimos/4199675334/ http://www.flickr.com/photos/askal_bosch/2579320395/