Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S6mtg5.
Damon Edwards takes a look at the techniques that high-performing operations organizations are using to finally transform how they identify, mobilize, and respond to incidents. Filmed at qconsf.com.
Damon Edwards is a Co-Founder of Rundeck Inc., the makers of Rundeck, the popular open source runbook automation. He has spent the past 19 years working with both the technology and business ends of IT Operations and is noted for being a leader in porting cutting-edge DevOps techniques to large-scale enterprise organizations.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
incident-management-devops-sre/
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
9. What Is an Incident?
An unplanned disruption impacting
customers or business operations
10. What Is an Incident?
An unplanned disruption impacting
customers or business operations
Outages
Service Degradation
11. What Is an Incident?
An unplanned disruption impacting
customers or business operations
Outages
Service Degradation
Work interruption
Delay/Waiting
“Short-Notice” Requests
27. Adrian Cockcroft
Developer
Developer
Developer
Developer
Developer
Old Release Still
Running
Release Plan
Release Plan
Release Plan
Release Plan
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Bugs
Deploy
Feature to
Production
Immutable microservice deployment
scales, is faster with large teams and
diverse platform components
DockerCon EU 2014 Architecture enables speed.
Speed is the advantage.
40. 1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Principles of SRE
41. 1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Principles of SRE
42. 1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Principles of SRE
77. Why?
Why?
Why?
Why?
Why?
There is no root cause.
(That’s just a political distinction)
Right,
Wrong,
Safety II,
and You.
Incidents = unplanned investments
REDeploy.io