SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Root Cause Analysis
Fact and Fiction
Dustin Collins @dustinmm80
CyberArk Conjur
Origins of Root Cause Analysis
- First implemented in 1958 in Toyota manufacturing plants
(5 Whys)
- Has since been adopted and tailored to many other
industries
Why do RCA?
To better understand the underlying causes of problems so
we can address them and prevent them happening again.
When to do RCA?
- When issues happen more than once
- When an outage affects many users
- When a system is not functioning as designed
5 Whys
- Start with a problem statement
- Ask why
- Repeat until root cause is found
Issues with 5 Whys
Incorrect or leading problem statement can point to the wrong
issue.
Not very useful in complex situations, where you can’t answer
why in the moment.
Issues with 5 Whys
It’s not repeatable
- Different people may get different results
- Same people at a different time may get different results
Issues with 5 Whys
Linear thinking leads teams to drive towards one root cause.
There is usually more than one root cause.
Human error is not a valid root cause.
Biases
Hindsight bias
- “Should have known better”.
Outcome bias
- “Since the outcome was bad, the plan was bad”.
What changed?
Cynefin
Conceptual framework created by Dave Snowden to organize
intellectual capital at IBM.
Uses quadrants to organize problems by complexity and
suggests a course of action.
Cynefin
quadrant
steps to take
causality
knowledge
Cynefin
EntropyProgress
Simple/Obvious
Sense Categorize Respond
Alert: Disk almost full Disk full Prune Docker images
Internal webservice is not responding
Complicated
Sense Analyze Respond
Failing pipelines ran at
same time as other builds
Concurrent pipeline
collisions cause node
resource contention
Update pipelines to splay
across nodes
Increasing CI pipeline failures, happens only during the day
Complex
Probe Sense Respond
- Investigate logs
- Inspect exponential
backoff code
- Run load tests
- View existing alerts
- Analyze performance from
load test
Add backpressure throttling
to discovery service
Discovery service became overloaded, triggering cascading failure in
other services
Chaotic
Act Sense Respond
Move instances to another
AZ
Still unable to connect Cannot connect in new AZ
either
Unable to connect to instances in AWS us-east-1a. No AWS service
warnings.
Disorder
Reduce Analyze Iterate
What do we know for sure? What do we agree on? Move to a quadrant,
continue
Stems from a lack of agreement on the problem
Takeaways
- Be aware of the limitations of the RCA techniques you use
- Emergent behavior arises from complexity and increased rate of change
- Consider trying Cynefin to help you approach complex problems
Thank you!
Dustin Collins @dustinmm80

Weitere ähnliche Inhalte

Was ist angesagt?

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Henerey
 
How effective feedback can improve your software
How effective feedback can improve your softwareHow effective feedback can improve your software
How effective feedback can improve your softwareSven Peters
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from AnsibleGreg DeKoenigsberg
 
Reliable tests with selenium web driver
Reliable tests with selenium web driverReliable tests with selenium web driver
Reliable tests with selenium web driverPawelPabich
 
DevOps vs The Enterprise
DevOps vs The EnterpriseDevOps vs The Enterprise
DevOps vs The EnterpriseCloudCheckr
 
Silver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSilver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSeniorStoryteller
 
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionAgile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionMichael Palotas
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
OSGi Best and Worst Practices
OSGi Best and Worst PracticesOSGi Best and Worst Practices
OSGi Best and Worst PracticesChris Aniszczyk
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
DevOps - Automating Legacy
DevOps - Automating LegacyDevOps - Automating Legacy
DevOps - Automating LegacyDavid Tank
 
Breathing the breath of the monster combining agile and context-driven
Breathing the breath of the monster   combining agile and context-drivenBreathing the breath of the monster   combining agile and context-driven
Breathing the breath of the monster combining agile and context-drivenIlari Henrik Aegerter
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run itSkyscanner
 
A Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileA Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileIlari Henrik Aegerter
 
Agile Software Development for Non-Developers
Agile Software Development for Non-DevelopersAgile Software Development for Non-Developers
Agile Software Development for Non-Developershamvocke
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos EngineeringYury Roa
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureAtlassian
 

Was ist angesagt? (20)

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
How effective feedback can improve your software
How effective feedback can improve your softwareHow effective feedback can improve your software
How effective feedback can improve your software
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from Ansible
 
Reliable tests with selenium web driver
Reliable tests with selenium web driverReliable tests with selenium web driver
Reliable tests with selenium web driver
 
DevOps vs The Enterprise
DevOps vs The EnterpriseDevOps vs The Enterprise
DevOps vs The Enterprise
 
Silver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSilver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security Solutions
 
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionAgile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Shift left-testing
Shift left-testingShift left-testing
Shift left-testing
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
OSGi Best and Worst Practices
OSGi Best and Worst PracticesOSGi Best and Worst Practices
OSGi Best and Worst Practices
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
DevOps - Automating Legacy
DevOps - Automating LegacyDevOps - Automating Legacy
DevOps - Automating Legacy
 
Breathing the breath of the monster combining agile and context-driven
Breathing the breath of the monster   combining agile and context-drivenBreathing the breath of the monster   combining agile and context-driven
Breathing the breath of the monster combining agile and context-driven
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run it
 
A Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileA Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and Agile
 
Agile Software Development for Non-Developers
Agile Software Development for Non-DevelopersAgile Software Development for Non-Developers
Agile Software Development for Non-Developers
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
Delhi first draft
Delhi first draftDelhi first draft
Delhi first draft
 

Ähnlich wie Root Cause Analysis: Fact and Fiction

Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopJAXLondon2014
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0Joakim Lindbom
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to ProductionKarthik Gaekwad
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users AnonymousDave Haeffner
 
Scaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearScaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearTechWell
 
Building an automated database deployment pipeline
Building an automated database deployment pipelineBuilding an automated database deployment pipeline
Building an automated database deployment pipelineRed Gate Software
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)ClubHack
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Citrix
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
 
Green Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsGreen Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsBrian Leach
 
Using XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyUsing XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyniksilver
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroPaul Boos
 
Moving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeMoving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeXebiaLabs
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...Joakim Lindbom
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
DevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDavid Hsu
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-outmralexjuarez
 
How to test a Mainframe Application
How to test a Mainframe ApplicationHow to test a Mainframe Application
How to test a Mainframe ApplicationMichael Erichsen
 
DevOps Transition Strategies
DevOps Transition StrategiesDevOps Transition Strategies
DevOps Transition StrategiesAlec Lazarescu
 

Ähnlich wie Root Cause Analysis: Fact and Fiction (20)

Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris Gollop
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
 
Kanban 101
Kanban 101Kanban 101
Kanban 101
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to Production
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users Anonymous
 
Scaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearScaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without Fear
 
Building an automated database deployment pipeline
Building an automated database deployment pipelineBuilding an automated database deployment pipeline
Building an automated database deployment pipeline
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
Green Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsGreen Screen ci at Travis Perkins
Green Screen ci at Travis Perkins
 
Using XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyUsing XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technology
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
Moving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeMoving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your Code
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
DevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on Kubernetes
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
 
How to test a Mainframe Application
How to test a Mainframe ApplicationHow to test a Mainframe Application
How to test a Mainframe Application
 
DevOps Transition Strategies
DevOps Transition StrategiesDevOps Transition Strategies
DevOps Transition Strategies
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Root Cause Analysis: Fact and Fiction

  • 1. Root Cause Analysis Fact and Fiction Dustin Collins @dustinmm80 CyberArk Conjur
  • 2. Origins of Root Cause Analysis - First implemented in 1958 in Toyota manufacturing plants (5 Whys) - Has since been adopted and tailored to many other industries
  • 3. Why do RCA? To better understand the underlying causes of problems so we can address them and prevent them happening again.
  • 4. When to do RCA? - When issues happen more than once - When an outage affects many users - When a system is not functioning as designed
  • 5. 5 Whys - Start with a problem statement - Ask why - Repeat until root cause is found
  • 6. Issues with 5 Whys Incorrect or leading problem statement can point to the wrong issue. Not very useful in complex situations, where you can’t answer why in the moment.
  • 7. Issues with 5 Whys It’s not repeatable - Different people may get different results - Same people at a different time may get different results
  • 8. Issues with 5 Whys Linear thinking leads teams to drive towards one root cause. There is usually more than one root cause. Human error is not a valid root cause.
  • 9. Biases Hindsight bias - “Should have known better”. Outcome bias - “Since the outcome was bad, the plan was bad”.
  • 11. Cynefin Conceptual framework created by Dave Snowden to organize intellectual capital at IBM. Uses quadrants to organize problems by complexity and suggests a course of action.
  • 14. Simple/Obvious Sense Categorize Respond Alert: Disk almost full Disk full Prune Docker images Internal webservice is not responding
  • 15. Complicated Sense Analyze Respond Failing pipelines ran at same time as other builds Concurrent pipeline collisions cause node resource contention Update pipelines to splay across nodes Increasing CI pipeline failures, happens only during the day
  • 16. Complex Probe Sense Respond - Investigate logs - Inspect exponential backoff code - Run load tests - View existing alerts - Analyze performance from load test Add backpressure throttling to discovery service Discovery service became overloaded, triggering cascading failure in other services
  • 17. Chaotic Act Sense Respond Move instances to another AZ Still unable to connect Cannot connect in new AZ either Unable to connect to instances in AWS us-east-1a. No AWS service warnings.
  • 18. Disorder Reduce Analyze Iterate What do we know for sure? What do we agree on? Move to a quadrant, continue Stems from a lack of agreement on the problem
  • 19. Takeaways - Be aware of the limitations of the RCA techniques you use - Emergent behavior arises from complexity and increased rate of change - Consider trying Cynefin to help you approach complex problems