2012 Velocity London,
Presentation by Patrick Debois (@patrickdebois), Damon Edwards (@damonedwards), Gene Kim (@realgenekim), John Willis (@botchagalupe)
SecureWorld Kim - Infosec at Ludicrous Speeds - Rugged DevOps 6a
2012 Velocity London: DevOps Patterns Distilled
1. DevOps Patterns
Distilled
Patrick Debois (@patrickdebois)
Damon Edwards (@damonedwards)
Gene Kim (@realgenekim)
John Willis (@botchagalupe)
Velocity Europe 2012
1
7. Every Company Is An IT Company…
95% of all capital projects have an IT
component…
50% of all capital spending is technology-related
Where we need to
be…
IT is always in the
way
(again…)
We are here…
8. The DevOps Cookbook (Coming H1 2013)
John Allspaw (@allspaw)
Patrick Debois (@patrickdebois)
Damon Edwards (@damonedwards)
Gene Kim (@realgenekim)
Mike Orzen (@mikeorzen_leanit)
John Willis (@botchagalupe)
8
10. The First Way:
Systems Thinking (Left To Right)
Understand the flow of work
Always seek to increase flow
Never unconsciously pass defects downstream
Never allow local optimization to cause global
degradation
Achieve profound understanding of the system
12. The Second Way:
Amplify Feedback Loops (Right to Left)
Understand and respond to the needs of all
customers, internal and external
Shorten and amplify all feedback loops: stop the
line when necessary
Create quality at the source
Create and embed knowledge where we need it
15. Area 1: Extend delivery to production
“think Jez Humble”
Area 1
DEV OPS
16. Area 2: Extend operations feedback to project
think “John Allspaw”
DEV OPS
Area2
17. Area 3: Embed Dev into Ops
think “Adrian Cockcroft”
Area 3
DEV OPS
18. Area 4: Embed Ops into Dev
think “Chris Read”
DEV OPS
Area 4
19. Area 3: Embed Project
knowledge into Operations
Area 1: Extend delivery
to production
DEV OPS
Area 2: Extend operations
feedback to project
Area 4: Embed Operations
knowledge into Project
20. The Third Way:
Culture Of Continual Experimentation & Learning
Foster a culture that rewards:
Experimentation (taking risks) and learning from failure
Repetition is the prerequisite to mastery
Why?
You need a culture that keeps pushing into the danger
zone
And have the habits that enable you to survive in the
danger zone
25. Step #3 - Think Continuous Integration + Infra as Code
•Version Control Everything
•Single Repository of Truth
•One step Dev, Test, Prod
Environment build process
“Technology” Focused
25
26. Step #4 - Think Continuous Delivery
•Extend Release into Prod
•Reduce Technical Debt
•Definition of Done
•Visualize Tasks/Bugs
Mind Shift
26
27. Step #5 - Integrate other roles in the process
QA
CAB
Security
Management 27
36. Anti-Pattern #6 - Organizational Inertia
Group experiment:
> 80% people invest: Investors 80$ , other 0$
< 80% people invest: Investors -10$ , other 0$
Convergance to invest or not invest
depends on
initial group decision
Nash Equilibrium - Game Theory
36
37. Outcomes Business Goal(s)
Shared Process
Trust People
Robust Technology
37
40. GOAL
Provide feedback and visibility
...but why?
40
41. GOAL
Provide feedback and visibility
to align your organization’s improvement efforts
41
42. HOW DO YOU ALIGN YOUR ORGANIZATION?
1. Clear goals and operating instructions
2. Shared situational awareness
42
43. HOW DO YOU CREATE SHARED SITUATIONAL
AWARENESS?
43
44. 44
People
& Process
Data
Application Situational Infrastructure
Data Awareness Data
Business
Data
FOUR TYPES OF DATA YOU NEED
45. Step 1: MAKE ALL INFRASTRUCTURE DATA VISIBLE
Da
Busi
• Network, Disk I/O, Memory, Utilization,
etc...
• Present data in context of the application
• Standardize and extend to all
ational
Data environments
reness
Application
• Create awareness of deviations from norm
Dat
& Pro
Peo
45
46. STEP 2: MAKE ALL APPLICATION DATA VISIBLE
ata
iness
• Performance, faults, availability, logs, etc...
• Dev takes ownership of instrumenting their
applications, but anyone can view or
Data eness
Infrastructure
extend
tional
• Enable self-service metric creation (“one
line of code”)
• Increase signal, decrease noise
ata
ocess
ople
46
47. STEP 3: BREAK BUSINESS DATA OUT OF IT’S SILO
Data Awareness Data
Infrastructure Situational Application
Data
Process
• Sales, signups, churn, clickstream, etc...
Key Business Metric
• Make goals explicit (KPIs, one metric that matters)
Secondary Business Metric • Link all other metrics to business metrics
Technical/Process Metric • Empower improvement by showing cause and
effect
My activity
47
48. STEP 4: COLLECT AND VISUALIZE ORGANIZATION &
PROCESS DATA
• Change activity, quality, cycle time,effectiveness, etc...
• Focus on effectiveness, not efficiency
• Visualize the flow across the entire lifecycle
• Capture change data and enable overlays on any graph
Data
Business
48
49. 49
Organization
& Process
Data
Application Situational Infrastructure
Data Awareness Data
Business
Data
USE TO DRIVE CONTINUOUS IMPROVEMENT
53. Goals
Shorten and amplify feedback loops
Create knowledge and capabilities where we need it
Ensure that we’re optimizing for the entire system
53
54. “We found that when we woke up developers
at 2am, defects got fixed faster than ever”
Patrick Lightbody
Founder/CEO, BrowserMob
54
55. IT Operations As The Developers’ Best Friend
Tom Limoncelli Patrick Debois Adrian Cockcroft
55
56. Require That Dev Initially Maintain Their Own
Service
Source: Tom Limoncelli, Google (Usenix 2012)
56
57. Test Whether Developers Qualify For IT Operations
Resources
Types/frequency of pager alerts
Maturity of monitoring
System architecture review
Release process
Defect counts and severity
Production hygiene
Source: Tom Limoncelli, Google (Usenix 2012)
57
59. Integrate Dev Into IT Operations
Integrate Dev into IT Operations escalation processes
Have Dev cross-train IT Operations staff
Have Dev improve the environment
59
64. Why
• Seeing End to End
• Sharing the Pain
• Operations Andon Cord
• Create a Common Language
• Educate Dev to Think Like Ops
• Flattening Knowledge Chain
• Create Patterns of Fault Tolerance
• Manage Technical Debt
64
65. Engagement Models for Embedding
• One Off
• Cross Functional Teams
• Mercenaries
• Specialized Teams
• NoOps
65
68. Institutionalize IT Operations Knowledge
• Building Reusable IT Operations
• Embedded Operations
• Design
• Architecture
• Controls
• Monitoring
• Deployment
68
69. Break Things Early And Often
“Do painful things more frequently, so you can
make it less painful… We don’t get pushback
from Dev, because they know it makes rollouts
smoother.”
-- Adrian Cockcroft, Architect, Netflix
69
80. When IT Fails: A Business Novel and
The DevOps Cookbook
Coming January 15, 2013 and Q1 2013
“The lessons in When IT Fails might just save your business if IT fails for
you. Every IT executive should share this book with their business
peers.” -James Turnbull, VP Operations, Puppet Labs and author of
“Pro Puppet”
“The greatest IT management book of our generation.” –Branden
Williams, CTO Marketing, RSA
“This book will have a profound effect on IT, just as The Goal did for
manufacturing.’ - Jez Humble, co-author of the Jolt award-winning book
Continuous Delivery, and Principal at ThoughtWorks Studios.
81. Our Mission: Positively Impact The Lives
Of One Million IT Workers By 2017
For these slides, the “Top 10 Things You
Need To Know About DevOps,” Rugged
DevOps resources, and updates on the
books:
Or text “[email_address] 75271” to
+1 (858) 598-3980
Or signup at:
http://www.instantcustomer.com/go/75271
Or email genek@realgenekim.me
Hinweis der Redaktion
Trust - Robustness -
In Area 1 we defined the scope of our system by creating an end to end process that reaches from requirements all the way to running services. Now in Area 2 we are going to focus on giving everyone feedback on and visibility into that system so we can improve upon it.
What’s the goal of this area? We’ll it’s pretty obvious. The goal is to provide feedback and visibility
But why? to what end? Is it just data for the sake of curiosity? Like all of DevOps we have to look at it through the lens of “Why?”
The whole point of feedback and visibility is to align your organization’s improvement efforts. “ Align” -- that’s the most interesting word in that phrase. Let’s think about where DevOps problems come from. I’m going to assume that your organization is full of smart people with good intentions (if not you have bigger problems). If everyone is smart and wants the company to succeed, then why do DevOps problems exist? Because individuals and groups become misaligned to the point of becoming silos. Think about the classic examples... Dev ends up seeing their world one and takes dozens of daily actions according to that world view. Ops see the world another way and takes a whole different series of action during the day. Both are right from their perspective. Both are wrong from organization perspective. Misalignment ensues.
How do you align your organization... You must do two things. #1 is pretty straightforward and outside of the scope of this presentation #2 is shared situational awareness... (definition)
Allspaw isn’t going to like this photo... he’s a much more handsome devil in real life... but pay no attention to him... look at the what’s on the wall behind him. Those are one example of how you create situational awareness. Those screens are there radiate situational awareness.
The first step towards shared situational awareness is to give everyone the same visibility into the 4 key types of data that fuels alignment... Application Data, Infrastructure Data, Business Data, and People and Process Data. Let’s quickly look at how to do that.
Step #1.... Make all infrastructure data visible... this is your classic operations metrics... Network, Disk I/O, Memory, Utilization, etc... But don’t assume everyone is a hardcore sys admin... so provide the metrics in an application context. Standardize collection and analysis across all shared environments... if the first time a developer sees feedback is in production, don’t expect them to be able to make heads or tails of it... collect the same data and present the same feedback in all preproduction environments Focus on deviations... avoid burying people with data... use things like Statistical Process Control charts to point out deviations from the norm
You probably already have a lot of this data as well...Performance, faults, availability, logs, etc... Focus on making this a shared effort between dev and ops... dev defines, ops enables, everybody can view Focus on easy self-service... if adding a metric feels like a schema change then people will avoid it. If it’s as simple as adding a single line of code then people will do it. Get your org addicted to meaningful data... not just “all data”... teach everyone to keep the noise down and keep application and system output clean.
Business data... you probably also have a lot of this data... Sales, signups, churn, clickstream, etc... The problem usually that this data is sitting in silos without operational context. Focus on linking technical and process metrics with KPIs set by the business. (Amazon order rate... everything else is keyed off of that) Your goal is have everyone in the organization understand the direct links between their day to day activity and the goals set by their executives. For example, I’m a developer and my decisions effect how long a handoff or promotion of a release to a new environment takes... which negatively impacts cycle time... which negatively impacts the business goal of shorter time to market for feature requests.
While you like have lots of application, infrastructure, and business data... very few organizations have much data about the human activity that goes on inside their organization. If your goal is to improve your service delivery capabilities and solve your DevOps problems... it makes sense to have visibility into and metrics about those processes, right? Change activity, quality, cycle time, effectiveness, etc... capture it, store it, graph it. I don’t mean time tracking or individual productivity... I’m talking about the performance and effectiveness of the organization and it’s critical processes. Start with two straightforward things.... 1. Visualize flow across the entire lifecycle -- tools like Kanban or a delivery pipeline visualization are easy wins and provide powerful effects. It makes the problems obvious to all and you can swarm the troops to fix 2. Record change events and overlay them on every graph you have -- Change is the root of outages (at a minimum a change introduced the thing that cause the outage)... raise everyone’s awareness of what changed and when it changed.
Combine situational awareness with clear operating goals and you get a platform for driving continuous improvement. Self-organizing behavior... people know how to tell if they are doing the right thing.Consistent/predictable behavior... when aligned I know what actions to expect from my colleagues since we are working with the same operating rules while looking at the same data.You enable “Swappable parts”... New people or people new to a role know what is expected and their actions are guided by an understanding of what the current state is and which direction they are supposed to move it.
Buy big monitors and paint the walls with them.... it’s cheap and has psychological effect. Let’s look at what John has on the walls behind him... ... OK, you are now ready to kick your DevOps improvement program into high gear. On to Area 3....
What are we looking at in devops? Alignment... between dev and ops... but overall organizational alignment How do we get alignment... One hack is embedding ops knowledge into dev... There many examples that we can draw from lean... e.g., poka-yoke (error proofing), move the pain forward
In the next 10 minutes I am going to focus on a specific hack that has been used by quite a few companies. Embedding an Ops guy into a development organization... #I have joked that this is called the homeless mode.. #Simply put an organization puts an individual from the operations org to the dev org for some period of time. #Few days, more likely 3 months to a year.. some even longer or permanent.
#Empathy, Feel the pain in the tribe, influence and change the outcome through relationships #tribal language - names of config files, conventions, standards, directory names and structures. #getting Dev used to patterns of fault tolerance #Andon cord... stop the line early.. Bring the pain forward... Notes: #Simply put an organization puts an individual from the operations org to the dev org for some period of time. #Some examples might be a few days, more likely 3 months to a year.. some even longer or permanent. There’s a great story where a guy went into dev from ops and three years later when he came back to ops they thought he was an embedded dev to ops guy because of turnover. flattening the knowledge delivery chain, value chain as one system #Systems thinking end to end... flattening the knowledge delivery chain, value chain as one system.. #Empathy, Feel the pain in the tribe, and do something about it, pair and be involved in things, influence and change the outcome through relationships .. Pragmatic solutions: “not logging enough? Logging too much. Each can cause work. get rid of the you guys syndrone.. stop the blame game... It’s hard to yell at those “idiots” when one lives amongst you. # tribal language - names of config files, conventions, standards, directory names and structures. Jargon: common terminology Dev in London and Ops in NYC... You guys have all sorts of crazy names for things like you call soccer football. #KATA Educating Dev so they can think like Ops: .. through repetition it becomes part of your routine.... example like like feature flags, metrics collection within application, measurement of resource usage, etc.. Consider that app will always have finite user resources: refactoring user registration: shouldn’t take more CPU than it currently does; #getting Dev used to patterns of fault tolerance, not evenly distributed activity amongst development teams).. #We want everyone to think in a new way.. no fear of failure... #Help in Prioritize backlog to manage technical debt (& non-functional requirements)Allocate 20% of Dev cycles to non-functional requirements (Product Management) #Andon cord... stop the line early.. Bring the pain forward... How many times does something have to wait till get into production to fid it wasn't operationally sound. Tell Sales no... , never make an offer like that, because it wasn’t nearly as easy as we thought, and here’s a better proposal, negotiate other terms
#Two large teams (dev and Ops). Take a senior ops guy and put him/her in dev for 6 months to a year. Dotted line authority. (Silverpop Dan Nemic) #When building a new project team (i.e., dev) put an ops guy on the team as a cross functional team member . A lot of startups do this because they have no choice; however National Instruments (Ernest Muller) did this on a new “devops” project. #Mercenaries - Build a team of resources that can be used as a pool of resources when needed .. a few days, 3 months a year... These teams typically have a strong group of cross functional experts. DRW trading (Chris Read) #Specialized Teams - growing up the maturity level an organization might start having specialty pools of mercenaries (e.g. DBA’s. Security experts< ...). #I had to throw in NoOps in here as a pattern in that they run one big Dev org that “they say does not have operations”. When in fact they do have operations they are just embedded in the dev org. Netflix runs like this and it works very well for them. However, there are some issues .. culture and cost. Culture this works at Netflix because they really hire well and the cost of this type of operation means a lot of commitment to automation
Example... Subject Matter ExpertsExample ...Automation SpecialistExample...Big Picture #In the mercenaries example you need plumbers. A team might be hurting in a specific area and either the team realizes this itself or someone external to the team suggest it. they need expert in Linux, network, database: typically driven or a backlog or lower skillset deficit in dev (e.g., performance issues due to database, but that’s all we know; or Linux discovery work) Sometimes exec may say, “you’re dying in networking; get an ops person who knows what they’re doing”ops does a review, and recommends that they get an SRE #One of the ways to tackle technical debt is to put an automation specialist into the dev organization.focus on automation for the business unit (e.g., we’ve lost control of configurations, deployment, and it’s all a mess, we have no idea what’s in production or automation in deployment: Not create work for Dev; work at the edges, minimal impact, create standardization models without impact After it’s all done, then explain to Dev what you’ve done responsibility for automation code... Slack capacity issues ... Jumpstart ... prioritize backlog slack time 20% o manage technical debt.. pairing... touching keyboard... make a dev work with the ops embedding knowledge where we need it... #Big picture person: talk to the management team, paint vision, big picture , liase between Ops, Dev and business management (Management boundary issue)
A local bloke... John is a poster child for area2 Chris is a poster child for area 4... Jez Humble calls him his mentor... He wears the coolest shirts
Explain chaos monkey ...compliance monkey...
#Don’t embed social misfits... The guys who just like to get things done on there own w/o anybody else help are usually not good candidates for embedding. Embedding is a social experiment to change organizational behavior. #Understand the motivation of an individual that wants to be embedded. Don’t look for hero motivation look for sense of accomplishment motivation. This job can be somewhat thankless from an external viewpoint. In american football lineman... He get his motivation for winning the game and knows he was a big part of it. Motivation is not from being a hero, but from accomplishment (doesn’t need personal credit) vs. hero castle and worship # It’s important to maintain the previous relationships with Ops. Attending standups. external night out get togethers. You don’t want to loose the tribal connections and thought process. Plus it can also mend bridges.. in that “he ain’t that bad once you get to know him”... if they have to be reminded a year later than the experiment did’t work... At the end of the day you are trying to break down silos... Socialize, go out to the bars with them, they remain part of the original (virtual team)