SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Blameless System Design
Douglas Land
Vast.com, Inc.
Hi, my name is Douglas Land. I'm the director of technical operations for a company in town called Vast.com. We do big
data and analytics and we're starting a foray into several consumer facing products and I'm here today to present a
concept called Blameless System Design.
Annotated: sample script in white boxes
I break systems… a LOT
Auth
Syslog
Chef
Ambassadors
Prod Frontends
I break things, A LOT. I've broken authentication across all our servers. I've broken syslog.. just by using it. I've created
havoc via chef runs across our whole infrastructure. I'm probably one of the worst offenders of breaking production on my
team.
Sometimes I ‘break’ systems on purpose...
Service discovery by chef
90% code in prod
No shared storage for cloudstack
Sometimes you just need do things.
And sometimes I 'break' things on purpose. Sometimes you need to make trade-offs to meet your goals and objectives;
and you don't have the time or resources to adhere to standards. Sometimes you simply need to get something done as
soon as possible regardless of consequences.
Higher standards
And yet, I still hold others to a higher standard..
Servers still on public internet???
Created a flat VLAN when we did move to private IPs???
No centralized management of virtualization infrastructure???
The only 'shared storage' is via DRBD and ha.d???
And yet I somehow still hold others to a higher standard than I tend to follow myself. Every time start a new job and
encounter a new environment I looked around at the choices that were made, the technical debt that's been generated I
think, "What the heck is going on here?" "What are these guys thinking?"
Technical debtor’s prison
We’re obsessed with technical debt
Qualifying it:
Application Debt
Infrastructure Debt
Architecture Debt
Quantifying it:
size of code base
code coverage
coupling and cohesion reports
cyclomatic complexity
Halstead complexity measures
I think we're a little obsessed with technical debt. We spend a lot of time trying to qualify it and quantify it. We try to break it
down, measure it, and figure out what the actual cost is and how to improve our software, systems and infrastructure to
compensate for it.
The myth of technical debt
Peter Norvig, “All code is liability”
Not actually technical debt:
● Maintenance
● Changes in understanding
● Operational inertia
● Poor code choices
● Dependency liabilities
In the process we end up including many things under that umbrellas which don't have anything to do with technical debt at
all. Every platform or service is going to cease to be useful if we don't take the time to maintain it and understand how it's
evolved and changed.
So what is technical debt?
Technical debt is the choices we intentionally make to speed up the development
or implementation of systems, and which we acknowledge will need to be
changed later.
Technical debt is the result of an Efficiency-Thoroughness Trade-Off at an
individual level.
Technical debt is the output of a project constraint model at an organizational
level.
So what is technical debt? I'd qualify it as something intentional.. As something we acknowledge we'll need to change
later. At an individual level it's the result of an Efficiency-Thoroughness Trade-Off. At a business level It's the result of
constraints like cost and speed.
The blame game
Shouldn't we stop blaming people for making the trade-offs they're forced to
make?
So if we acknowledge that we all need to make trade-offs, either in the name of personal efficiency, cost savings, or time, I
think we can also acknowledge that none of us want to make those trade-offs; they're artifacts of the environment we work
in. We shouldn't be blamed for them.
Being Blameless
● If we remove fear we will have a more
honest conversation about trade-offs
● if we're honest about those trade-offs
crisis might be averted altogether
● If we understand our history, we won't be
destined to repeat it
Being 'blameless' has, in fact proven to be beneficial to business. If you're not afraid of retribution, you're more likely to be
honest. The more honest you are, the more everyone can learn about all kinds of situations, and the more we learn about
things, the more opportunity we have to improve.
What is blameless system design?
Assuming goodwill
Blameless post-mortems
Empathy
Experimentation
Honesty
Communication
So what is blameless system design? It's basically trying to look at things through others' eyes, and to give everyone as
much context as possible about any decisions being made. Since we in the tech community like acronyms, I also tried to
make a handy one. So Blameless System Design is A-BEECH.
Assume Goodwill
Your co-worker probably doesn’t come into work every day with
the intent of harming you or the organization.
*Most* people aren’t trying to cause issues... It's important to think about the fact that everyone is generally trying to
do the best job they can and to start decisions and discussions from that perspective. It's important to remember
that, if someone makes a mistake, it's from a place of misunderstanding, not malice.
Blameless Post-mortems
“We must strive to understand that accidents don’t
happen because people gamble and lose.
Accidents happen because the person believes that:
…what is about to happen is not possible,
…or what is about to happen has no connection to
what they are doing,
…or that the possibility of getting the intended
outcome is well worth whatever risk there is.”
- Erik Hollnagel
While blameless system design isn't error focused it's
important to have a framework in place when there are
issues. Blameless retrospectives remove fear from the
process and encourage people to improve the system
instead of seeks retribution, which is important for a high-
functioning team.
Empathy
● Reject ‘contempt culture’
● Focus on the positive
● Consider others’ perspectives
You might be sitting next to the person who had to make the tough call you’re critiquing. Someday, that person might be
you. Rather than jumping to judgements, it's important try to understand how someone might have arrived at their
narrative and how that might have shaped the decisions they made.
Experimentation
The Engineering Design Process
Define the Problem
Do Background Research
Specify Requirements
Brainstorm Solutions
Choose the Best Solution
Do Development Work
Build a Prototype
Test and Redesign
No system lives in isolation and complex system interactions can cause some very unexpected behavior. without
experiments, we have no way to qualify our assumptions about those interactions.This is why it's so important to measure
and record everything. Design your experiments, don’t be a victim of them.
Honesty
● Publish ALL your results
● Document ALL your decisions
● Be honest about trade-offs
● Track mitigations
Publish all your experiments and results whether they met your expectations or not. Document your decisions somewhere
so future reviewers will understand them. Be explicit in the docs about issues you came across and how you addressed
them. Be honest about trade-offs.
Communication
● Broadcast expectations
● Honor achievements
● Make doc easy to find
● Open discussions
● Well define feedback
channels
Broadcasts cultural expectations throughout the organization, repeatedly if needed. Open up meetings and discussions to
anyone who wants to participate, they just might provide unexpected insight. Clearly define both positive and negative
feedback channels so everyone knows how to provide input.
Did someone say devops?
● Culture
● Measurement
● Sharing
● Feedback loops
If some of this sounds familiar,
it's because it is. Blameless
system design includes many of
the attributes of devops in
general. A huge part of devops is
culture and hopefully some of this
might be actionable for people
trying to address that inside their
organization.
The bad
It’s hard to change culture and get away from a retribution
culture and the RCA mentality
It’s hard to get over hindsight bias.
It’s a lot of work to encourage openness and honesty, and
define what that looks like.
It’s hard to get over their impostor syndrome and / or contempt
cultures.
It's hard to change an organization's culture It's effectively asking an organization to accept risk; risk of the unknown. And
depending on the organization, that can be a little like steering the titanic. You really need to co-opt your boss and have him
co-opt his boss, it's turtles all the way up.
The good
● Remove fear
● Encourage ‘risk’
● Create feedback
● Reduce redundant learning
● Improve working environment, trust
But if you can pull it off and removes fear as an obstacle to innovation, encourages people to take risks, which could lead to
differentiation as a business, create better feedback loops, improve data flow, and create more trust at every level of your
organization I think you'll find it well worth the effort.
Douglas Land - Director of operations, Vast.com, Inc.
doug@webuilddevops.com | @webuilddevops
Some References:
http://www.datical.com/blog/technical-debt-devops/
http://laughingmeme.org/2016/01/10/towards-an-understanding-of-technical-debt/
http://blog.aurynn.com/86/contempt-culture
http://erikhollnagel.com/ideas/etto-principle/index.html
http://indecorous.com/fallible_humans/
https://hbr.org/2003/05/it-doesnt-matter/ar/pr
https://codeascraft.com/2014/07/18/just-culture-resources/
http://sidneydekker.com/just-culture/
I'd love to say we're at the end of the
journey to blameless system design,
but like many things I suspect this is
not a destination, and we're still a
work in progress. But thanks to
everyone who has contributed to the
work I've sites we're making progress
day by day. Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...Intersection Conference
 
Why usability problems go unfixed - UX Bristol 2012
Why usability problems go unfixed - UX Bristol 2012Why usability problems go unfixed - UX Bristol 2012
Why usability problems go unfixed - UX Bristol 2012Francis Rowland
 
Rethinking enterprise software - Codemotion 2014
Rethinking enterprise software - Codemotion 2014Rethinking enterprise software - Codemotion 2014
Rethinking enterprise software - Codemotion 2014Alberto Brandolini
 
Empowering Agile Self-Organized Teams With Design Thinking
Empowering Agile Self-Organized Teams With Design ThinkingEmpowering Agile Self-Organized Teams With Design Thinking
Empowering Agile Self-Organized Teams With Design ThinkingWilliam Evans
 
L'illusione dell'ortogonalità
L'illusione dell'ortogonalitàL'illusione dell'ortogonalità
L'illusione dell'ortogonalitàAlberto Brandolini
 
Design Thinking for Aviation Safety by Dr. Benjamin Goodheart
Design Thinking for Aviation Safety by Dr. Benjamin GoodheartDesign Thinking for Aviation Safety by Dr. Benjamin Goodheart
Design Thinking for Aviation Safety by Dr. Benjamin GoodheartRodrigo Narcizo
 
#CSOAUS: Innovation - for a brighter future at News Corp Australia
#CSOAUS: Innovation - for a brighter future at News Corp Australia#CSOAUS: Innovation - for a brighter future at News Corp Australia
#CSOAUS: Innovation - for a brighter future at News Corp AustraliaMark Drasutis
 
Lessons in Rapid Experiments and Learning From Failure
Lessons in Rapid Experiments and Learning From FailureLessons in Rapid Experiments and Learning From Failure
Lessons in Rapid Experiments and Learning From FailurePaul Taylor
 
12 Trends Influencing the Future of How We Work
12 Trends Influencing the Future of How We Work12 Trends Influencing the Future of How We Work
12 Trends Influencing the Future of How We WorkPaul Taylor
 
Resilient Design Management
Resilient Design ManagementResilient Design Management
Resilient Design ManagementChris Avore
 
Kerry.mushkin
Kerry.mushkinKerry.mushkin
Kerry.mushkinNASAPMC
 
Why projects fail
Why projects failWhy projects fail
Why projects failPonto GP
 
Fast Track Innovation
Fast Track Innovation Fast Track Innovation
Fast Track Innovation Bromford Lab
 
Lastconf2017 Synchronous communication is overrated!
Lastconf2017   Synchronous communication is overrated!Lastconf2017   Synchronous communication is overrated!
Lastconf2017 Synchronous communication is overrated!Kelsey van Haaster
 
Growing Agility ebook - Nokia - #SmarterEveryday
Growing Agility ebook - Nokia - #SmarterEverydayGrowing Agility ebook - Nokia - #SmarterEveryday
Growing Agility ebook - Nokia - #SmarterEverydayNokia
 
Making sense of engagement
Making sense of engagementMaking sense of engagement
Making sense of engagementcontentli
 

Was ist angesagt? (20)

Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
 
Why usability problems go unfixed - UX Bristol 2012
Why usability problems go unfixed - UX Bristol 2012Why usability problems go unfixed - UX Bristol 2012
Why usability problems go unfixed - UX Bristol 2012
 
Rethinking enterprise software - Codemotion 2014
Rethinking enterprise software - Codemotion 2014Rethinking enterprise software - Codemotion 2014
Rethinking enterprise software - Codemotion 2014
 
Empowering Agile Self-Organized Teams With Design Thinking
Empowering Agile Self-Organized Teams With Design ThinkingEmpowering Agile Self-Organized Teams With Design Thinking
Empowering Agile Self-Organized Teams With Design Thinking
 
The sweet spot
The sweet spotThe sweet spot
The sweet spot
 
L'illusione dell'ortogonalità
L'illusione dell'ortogonalitàL'illusione dell'ortogonalità
L'illusione dell'ortogonalità
 
Design Thinking for Aviation Safety by Dr. Benjamin Goodheart
Design Thinking for Aviation Safety by Dr. Benjamin GoodheartDesign Thinking for Aviation Safety by Dr. Benjamin Goodheart
Design Thinking for Aviation Safety by Dr. Benjamin Goodheart
 
#CSOAUS: Innovation - for a brighter future at News Corp Australia
#CSOAUS: Innovation - for a brighter future at News Corp Australia#CSOAUS: Innovation - for a brighter future at News Corp Australia
#CSOAUS: Innovation - for a brighter future at News Corp Australia
 
G skills
G skillsG skills
G skills
 
Lessons in Rapid Experiments and Learning From Failure
Lessons in Rapid Experiments and Learning From FailureLessons in Rapid Experiments and Learning From Failure
Lessons in Rapid Experiments and Learning From Failure
 
12 Trends Influencing the Future of How We Work
12 Trends Influencing the Future of How We Work12 Trends Influencing the Future of How We Work
12 Trends Influencing the Future of How We Work
 
Resilient Design Management
Resilient Design ManagementResilient Design Management
Resilient Design Management
 
Kerry.mushkin
Kerry.mushkinKerry.mushkin
Kerry.mushkin
 
Why projects fail
Why projects failWhy projects fail
Why projects fail
 
Scrum x version 2
Scrum x version 2 Scrum x version 2
Scrum x version 2
 
Fast Track Innovation
Fast Track Innovation Fast Track Innovation
Fast Track Innovation
 
Lastconf2017 Synchronous communication is overrated!
Lastconf2017   Synchronous communication is overrated!Lastconf2017   Synchronous communication is overrated!
Lastconf2017 Synchronous communication is overrated!
 
Growing Agility ebook - Nokia - #SmarterEveryday
Growing Agility ebook - Nokia - #SmarterEverydayGrowing Agility ebook - Nokia - #SmarterEveryday
Growing Agility ebook - Nokia - #SmarterEveryday
 
The Lean Hardware Toolbox
The Lean Hardware ToolboxThe Lean Hardware Toolbox
The Lean Hardware Toolbox
 
Making sense of engagement
Making sense of engagementMaking sense of engagement
Making sense of engagement
 

Ähnlich wie Blameless system design - annotated

People are more complex than computers - Mairead O'Connor Equal Experts
People are more complex than computers - Mairead O'Connor Equal ExpertsPeople are more complex than computers - Mairead O'Connor Equal Experts
People are more complex than computers - Mairead O'Connor Equal ExpertsMairead O'Connor
 
Design Thinking talk
Design Thinking talkDesign Thinking talk
Design Thinking talkGlyn Britton
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)Matt Mower
 
From agile projects to agile organizations
From agile projects to agile organizations From agile projects to agile organizations
From agile projects to agile organizations maggie2morgan
 
Narrated Version Dallas MPUG
Narrated Version Dallas MPUGNarrated Version Dallas MPUG
Narrated Version Dallas MPUGGlen Alleman
 
How to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web DesignHow to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web DesignMorten Rand-Hendriksen
 
Agile Development Overview (with a bit about builds)
Agile Development Overview (with a bit about builds)Agile Development Overview (with a bit about builds)
Agile Development Overview (with a bit about builds)David Benjamin
 
Successful Data Center Transformation Must Include Proper Handling of Data Ce...
Successful Data Center Transformation Must Include Proper Handling of Data Ce...Successful Data Center Transformation Must Include Proper Handling of Data Ce...
Successful Data Center Transformation Must Include Proper Handling of Data Ce...Dana Gardner
 
Essay On Falcon Bird In Hindi. Online assignment writing service.
Essay On Falcon Bird In Hindi. Online assignment writing service.Essay On Falcon Bird In Hindi. Online assignment writing service.
Essay On Falcon Bird In Hindi. Online assignment writing service.Bridget Dodson
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialWill Gallego
 
For Good or for Worse Making happy client relationships
For Good or for Worse Making happy client relationshipsFor Good or for Worse Making happy client relationships
For Good or for Worse Making happy client relationshipsImre Gmelig Meijling
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech careerGreg Jensen
 
Including the User: How insights drive business #pswud2017
Including the User: How insights drive business #pswud2017Including the User: How insights drive business #pswud2017
Including the User: How insights drive business #pswud2017Jeremy Johnson
 
Cycles: The simplest, proven way to build your business
Cycles: The simplest, proven way to build your businessCycles: The simplest, proven way to build your business
Cycles: The simplest, proven way to build your businessBryan Cassady
 
Leeroy driven development
Leeroy driven developmentLeeroy driven development
Leeroy driven developmentJohn Nicholas
 
Managing Uncertainty - 2011
Managing Uncertainty - 2011Managing Uncertainty - 2011
Managing Uncertainty - 2011RiskShare
 
Equation for design by committee
Equation for design by committeeEquation for design by committee
Equation for design by committeeQ7 Associates
 
How More Industries Can Cultivate A Culture of Operational Resilience
How More Industries Can Cultivate A Culture of Operational ResilienceHow More Industries Can Cultivate A Culture of Operational Resilience
How More Industries Can Cultivate A Culture of Operational ResilienceDana Gardner
 

Ähnlich wie Blameless system design - annotated (20)

People are more complex than computers - Mairead O'Connor Equal Experts
People are more complex than computers - Mairead O'Connor Equal ExpertsPeople are more complex than computers - Mairead O'Connor Equal Experts
People are more complex than computers - Mairead O'Connor Equal Experts
 
Design Thinking talk
Design Thinking talkDesign Thinking talk
Design Thinking talk
 
50.000 orange stickies later
50.000 orange stickies later50.000 orange stickies later
50.000 orange stickies later
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)
 
From agile projects to agile organizations
From agile projects to agile organizations From agile projects to agile organizations
From agile projects to agile organizations
 
Narrated Version Dallas MPUG
Narrated Version Dallas MPUGNarrated Version Dallas MPUG
Narrated Version Dallas MPUG
 
How to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web DesignHow to Not Destroy the World - the Ethics of Web Design
How to Not Destroy the World - the Ethics of Web Design
 
Agile Development Overview (with a bit about builds)
Agile Development Overview (with a bit about builds)Agile Development Overview (with a bit about builds)
Agile Development Overview (with a bit about builds)
 
Successful Data Center Transformation Must Include Proper Handling of Data Ce...
Successful Data Center Transformation Must Include Proper Handling of Data Ce...Successful Data Center Transformation Must Include Proper Handling of Data Ce...
Successful Data Center Transformation Must Include Proper Handling of Data Ce...
 
Essay On Falcon Bird In Hindi. Online assignment writing service.
Essay On Falcon Bird In Hindi. Online assignment writing service.Essay On Falcon Bird In Hindi. Online assignment writing service.
Essay On Falcon Bird In Hindi. Online assignment writing service.
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
 
For Good or for Worse Making happy client relationships
For Good or for Worse Making happy client relationshipsFor Good or for Worse Making happy client relationships
For Good or for Worse Making happy client relationships
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech career
 
Including the User: How insights drive business #pswud2017
Including the User: How insights drive business #pswud2017Including the User: How insights drive business #pswud2017
Including the User: How insights drive business #pswud2017
 
Cycles: The simplest, proven way to build your business
Cycles: The simplest, proven way to build your businessCycles: The simplest, proven way to build your business
Cycles: The simplest, proven way to build your business
 
Leeroy driven development
Leeroy driven developmentLeeroy driven development
Leeroy driven development
 
Managing Uncertainty - 2011
Managing Uncertainty - 2011Managing Uncertainty - 2011
Managing Uncertainty - 2011
 
Protect-Biz for non-profits
Protect-Biz for non-profitsProtect-Biz for non-profits
Protect-Biz for non-profits
 
Equation for design by committee
Equation for design by committeeEquation for design by committee
Equation for design by committee
 
How More Industries Can Cultivate A Culture of Operational Resilience
How More Industries Can Cultivate A Culture of Operational ResilienceHow More Industries Can Cultivate A Culture of Operational Resilience
How More Industries Can Cultivate A Culture of Operational Resilience
 

Kürzlich hochgeladen

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Blameless system design - annotated

  • 1. Blameless System Design Douglas Land Vast.com, Inc. Hi, my name is Douglas Land. I'm the director of technical operations for a company in town called Vast.com. We do big data and analytics and we're starting a foray into several consumer facing products and I'm here today to present a concept called Blameless System Design. Annotated: sample script in white boxes
  • 2. I break systems… a LOT Auth Syslog Chef Ambassadors Prod Frontends I break things, A LOT. I've broken authentication across all our servers. I've broken syslog.. just by using it. I've created havoc via chef runs across our whole infrastructure. I'm probably one of the worst offenders of breaking production on my team.
  • 3. Sometimes I ‘break’ systems on purpose... Service discovery by chef 90% code in prod No shared storage for cloudstack Sometimes you just need do things. And sometimes I 'break' things on purpose. Sometimes you need to make trade-offs to meet your goals and objectives; and you don't have the time or resources to adhere to standards. Sometimes you simply need to get something done as soon as possible regardless of consequences.
  • 4. Higher standards And yet, I still hold others to a higher standard.. Servers still on public internet??? Created a flat VLAN when we did move to private IPs??? No centralized management of virtualization infrastructure??? The only 'shared storage' is via DRBD and ha.d??? And yet I somehow still hold others to a higher standard than I tend to follow myself. Every time start a new job and encounter a new environment I looked around at the choices that were made, the technical debt that's been generated I think, "What the heck is going on here?" "What are these guys thinking?"
  • 5. Technical debtor’s prison We’re obsessed with technical debt Qualifying it: Application Debt Infrastructure Debt Architecture Debt Quantifying it: size of code base code coverage coupling and cohesion reports cyclomatic complexity Halstead complexity measures I think we're a little obsessed with technical debt. We spend a lot of time trying to qualify it and quantify it. We try to break it down, measure it, and figure out what the actual cost is and how to improve our software, systems and infrastructure to compensate for it.
  • 6. The myth of technical debt Peter Norvig, “All code is liability” Not actually technical debt: ● Maintenance ● Changes in understanding ● Operational inertia ● Poor code choices ● Dependency liabilities In the process we end up including many things under that umbrellas which don't have anything to do with technical debt at all. Every platform or service is going to cease to be useful if we don't take the time to maintain it and understand how it's evolved and changed.
  • 7. So what is technical debt? Technical debt is the choices we intentionally make to speed up the development or implementation of systems, and which we acknowledge will need to be changed later. Technical debt is the result of an Efficiency-Thoroughness Trade-Off at an individual level. Technical debt is the output of a project constraint model at an organizational level. So what is technical debt? I'd qualify it as something intentional.. As something we acknowledge we'll need to change later. At an individual level it's the result of an Efficiency-Thoroughness Trade-Off. At a business level It's the result of constraints like cost and speed.
  • 8. The blame game Shouldn't we stop blaming people for making the trade-offs they're forced to make? So if we acknowledge that we all need to make trade-offs, either in the name of personal efficiency, cost savings, or time, I think we can also acknowledge that none of us want to make those trade-offs; they're artifacts of the environment we work in. We shouldn't be blamed for them.
  • 9. Being Blameless ● If we remove fear we will have a more honest conversation about trade-offs ● if we're honest about those trade-offs crisis might be averted altogether ● If we understand our history, we won't be destined to repeat it Being 'blameless' has, in fact proven to be beneficial to business. If you're not afraid of retribution, you're more likely to be honest. The more honest you are, the more everyone can learn about all kinds of situations, and the more we learn about things, the more opportunity we have to improve.
  • 10. What is blameless system design? Assuming goodwill Blameless post-mortems Empathy Experimentation Honesty Communication So what is blameless system design? It's basically trying to look at things through others' eyes, and to give everyone as much context as possible about any decisions being made. Since we in the tech community like acronyms, I also tried to make a handy one. So Blameless System Design is A-BEECH.
  • 11. Assume Goodwill Your co-worker probably doesn’t come into work every day with the intent of harming you or the organization. *Most* people aren’t trying to cause issues... It's important to think about the fact that everyone is generally trying to do the best job they can and to start decisions and discussions from that perspective. It's important to remember that, if someone makes a mistake, it's from a place of misunderstanding, not malice.
  • 12. Blameless Post-mortems “We must strive to understand that accidents don’t happen because people gamble and lose. Accidents happen because the person believes that: …what is about to happen is not possible, …or what is about to happen has no connection to what they are doing, …or that the possibility of getting the intended outcome is well worth whatever risk there is.” - Erik Hollnagel While blameless system design isn't error focused it's important to have a framework in place when there are issues. Blameless retrospectives remove fear from the process and encourage people to improve the system instead of seeks retribution, which is important for a high- functioning team.
  • 13. Empathy ● Reject ‘contempt culture’ ● Focus on the positive ● Consider others’ perspectives You might be sitting next to the person who had to make the tough call you’re critiquing. Someday, that person might be you. Rather than jumping to judgements, it's important try to understand how someone might have arrived at their narrative and how that might have shaped the decisions they made.
  • 14. Experimentation The Engineering Design Process Define the Problem Do Background Research Specify Requirements Brainstorm Solutions Choose the Best Solution Do Development Work Build a Prototype Test and Redesign No system lives in isolation and complex system interactions can cause some very unexpected behavior. without experiments, we have no way to qualify our assumptions about those interactions.This is why it's so important to measure and record everything. Design your experiments, don’t be a victim of them.
  • 15. Honesty ● Publish ALL your results ● Document ALL your decisions ● Be honest about trade-offs ● Track mitigations Publish all your experiments and results whether they met your expectations or not. Document your decisions somewhere so future reviewers will understand them. Be explicit in the docs about issues you came across and how you addressed them. Be honest about trade-offs.
  • 16. Communication ● Broadcast expectations ● Honor achievements ● Make doc easy to find ● Open discussions ● Well define feedback channels Broadcasts cultural expectations throughout the organization, repeatedly if needed. Open up meetings and discussions to anyone who wants to participate, they just might provide unexpected insight. Clearly define both positive and negative feedback channels so everyone knows how to provide input.
  • 17. Did someone say devops? ● Culture ● Measurement ● Sharing ● Feedback loops If some of this sounds familiar, it's because it is. Blameless system design includes many of the attributes of devops in general. A huge part of devops is culture and hopefully some of this might be actionable for people trying to address that inside their organization.
  • 18. The bad It’s hard to change culture and get away from a retribution culture and the RCA mentality It’s hard to get over hindsight bias. It’s a lot of work to encourage openness and honesty, and define what that looks like. It’s hard to get over their impostor syndrome and / or contempt cultures. It's hard to change an organization's culture It's effectively asking an organization to accept risk; risk of the unknown. And depending on the organization, that can be a little like steering the titanic. You really need to co-opt your boss and have him co-opt his boss, it's turtles all the way up.
  • 19. The good ● Remove fear ● Encourage ‘risk’ ● Create feedback ● Reduce redundant learning ● Improve working environment, trust But if you can pull it off and removes fear as an obstacle to innovation, encourages people to take risks, which could lead to differentiation as a business, create better feedback loops, improve data flow, and create more trust at every level of your organization I think you'll find it well worth the effort.
  • 20. Douglas Land - Director of operations, Vast.com, Inc. doug@webuilddevops.com | @webuilddevops Some References: http://www.datical.com/blog/technical-debt-devops/ http://laughingmeme.org/2016/01/10/towards-an-understanding-of-technical-debt/ http://blog.aurynn.com/86/contempt-culture http://erikhollnagel.com/ideas/etto-principle/index.html http://indecorous.com/fallible_humans/ https://hbr.org/2003/05/it-doesnt-matter/ar/pr https://codeascraft.com/2014/07/18/just-culture-resources/ http://sidneydekker.com/just-culture/ I'd love to say we're at the end of the journey to blameless system design, but like many things I suspect this is not a destination, and we're still a work in progress. But thanks to everyone who has contributed to the work I've sites we're making progress day by day. Thank you.

Hinweis der Redaktion

  1. • ❑ name • ❑ title • ❑ company • ❑ about talk
  2. Intro: name, occupation Broke ALL OF auth Broke syslog by.. using it Broken all chef runs innumerable times Broke FE by turning back up some old nodes not properly decommissioned Broke our ambassador setup with some bad template logic https://i.ytimg.com/vi/GTkcjjt2TBY/maxresdefault.jpg
  3. I ship 90% code which sometimes makes it into production I hide a LOT of things behind config management that shouldn't be handled at that level I decided to deploy our private cloud with no shared storage I decided to attack service discovery with chef vs making devs register applications Sometimes we make decisions we know are mistakes in the name of moving forward. http://paragondsi.com/wp-content/uploads/2015/06/office-space.jpg
  4. What were people thinking??? Why are they leaving all this technical debt behind??
  5. we all constantly talking about and trying to quantify technical debt Application Debt – Debt that resides in the software package Infrastructure Debt – Debt that resides in the operating environments Architecture Debt – Debt that resides in the design of the entire system measuring technical debt size of code base code coverage coupling and cohesion reports cyclomatic complexity Halstead complexity measures https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/William_Hogarth_018.jpg/1239px-William_Hogarth_018.jpg
  6. Rather, there is ONLY technical debt - Kellan Elliott-McCrea Former CTO of Etsy - towards-an-understanding-of-technical-debt: "Technical debt is the choices we made in our code, intentionally, to speed up development today, knowing we’d have to change them later. " things ascribed to technical debt are just facets of creating software: maintenance, change in understanding, instead of treating it like an exception, we should just embrace it http://cattype.deviantart.com/art/Tsunami-Relief-Fund-216541678
  7. No one *wants* not to do their job well. We’ve all had to make trade offs to balance priorities Fast, cheap, good - the only people who can beat the good, fast, cheap triangle can't even be running a business As Erik Hollnagel stated, "The ETTO [ Efficiency-Thoroughness Trade-Off ] fallacy is that people are required to be both efficient and thorough at the same time – or rather to be thorough when with hindsight it was wrong to be efficient!" The more complex a system the higher likelihood of failure Shouldn't we stop blaming people for making the tradeoffs they're forced to make? https://www.flickr.com/photos/cafuego/12575046354
  8. etsy has done a great job bringing 'just culture' to postmortems, but that can be expanded beyond the scope of issues There are trade-offs in EVERY system design Restorative vs punative model If we remove fear we will have a more honest conversation about those tradeoffs if we're honest about those tradeoffs crisis might be averted all together If we understand our history, we won't be destined to repeat it https://upload.wikimedia.org/wikipedia/commons/8/8c/Tumbeasts_servers.png https://upload.wikimedia.org/wikipedia/commons/4/49/Smurf_Zombies_-_Flickr_-_SoulStealer.co.uk.jpg
  9. blameless system design is a beech
  10. Most people aren’t trying to bring about computergeddon. Bring empathy to the table when you’re discussing someone’s design. Has tooling improved? Did that shiny OSS project that will fix all of this ‘mess’ even exist in a production ready state when this was implemented? What logic might have lead to this design choice? Put yourself in their shoes. https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Goodwill_Industries_Logo.svg/341px-Goodwill_Industries_Logo.svg.png
  11. while not error focused it's important to have a constructive framework in place when there are problems ensures balanced accountability for both individuals and the organization analyses errors, not judges people removes fear from the process encourages people to improve the system instead of seeks retribution https://upload.wikimedia.org/wikipedia/commons/a/af/Aachen_Allegory.jpg
  12. You might be sitting next to the person who had to make the tough call you’re critiquing. Someday, that person might be you Reject 'contempt culture' and the trading of condescension for prestige try to understand how someone might have arrived at their self-taught narrative and how that might have shaped decisions focus on the good qualities of a design and see if those can be extended or applied other places https://upload.wikimedia.org/wikipedia/commons/8/85/Mother's_love.jpg
  13. No system lives in isolation Without experiments, we have no way to qualify our assumptions about those interactions. Measure Measure Measure and record! We deal with complex system interactions that can cause some very unexpected behavior. Record metrics at every step with every change to qualify your work design your experiments, don’t be a victim of them. https://upload.wikimedia.org/wikipedia/commons/e/e7/Atomic_Laboratory_Experiment_on_Atomic_Materials_-_GPN-2000-000663.jpg
  14. Publish all your experimentation results whether they bore fruit or not Document your decisions somewhere so future reviewers will understand them. Save future reviewers / architects some time by being explicit about issues you came across and how you addressed them. Be honest about trade-offs, this is not the place to be shy about the skeletons in the closet track mitigation responses, at least in a backlog, so they don't get buried over time to later re-emerge from their graves https://www.flickr.com/photos/rosengrant/3929869118
  15. broadcasts cultural expectations throughout the organization reinforce our organization with respect or a sense of achievement provide easy to find and access information about all systems open up meetings and discussions to anyone who wants to participate, they just might provide unexpected insight establish both positive and negative feedback channels https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Communication_shannon-weaver2.svg/2000px-Communication_shannon-weaver2.svg.png
  16. if some of this sounds familiar, it's because it is blameless system design includes many of the skills of the devops movement We've got the CMS in CAMS Culture Measurement Sharing creates feedback loops http://www.bouwkennisblog.nl/wp-content/uploads/2014/04/luisteren.jpg
  17. hard to change retribution culture and the RCA mentality hard to get over hindsight bias It's a lot of work! championing efforts encouraging openness defining what is broadcast everyone will need to get over their impostor syndrome and / or contempt cultures the organization must be willing to accept risk risk from new system design and complexity risk from choosing to leave old systems in place risk from updating old systems once risk has caused failure, organizations must be willing to try restorative measures (and not break trust) organizations must be willing to be honest and frank about both the good and the bad aspects of their systems https://pixabay.com/static/uploads/photo/2013/07/13/10/32/bad-157437_960_720.png
  18. Why do this? removes fear as an obstacle to innovation encourages people to take risks, which could lead to differentiation as a business creates good feedback loops to increase iterations creates good data to prevent 'retracing each other's steps' improves the working environment and relationships https://pixabay.com/static/uploads/photo/2013/07/13/10/32/good-157436_960_720.png
  19. https://upload.wikimedia.org/wikipedia/commons/c/c8/Thank_you_001.jpg