SlideShare ist ein Scribd-Unternehmen logo
1 von 35
The Elusive Root Cause Of IT Problems
And How To Easily Identify It


Noam Biran
Director of Product Management
Introduction
               Mr. Biran
               •    Director of Product Management at Neebula
               •    20 years experience in systems management & BSM
               •    Innovation Product Management at BMC
               •    Co-founder of Appilog (now HP uCMDB & DDMA)



 About Neebula
  Neebula provides the first and only automatic service-centric IT management
  solution allowing IT organizations to improve the service provided to the business
  by shifting from managing disparate technology silos to managing the services
  running in the data center. Leveraging unique technology that automatically maps
  business services to the underlying infrastructure, Neebula enables the IT team to
  increase availability of the main services they manage and reduce the time to
  repair of problems.
Agenda
•   Introduction
•   Root cause analysis defined
•   The problem resolution process
•   Problem detection
•   Root cause analysis methods
•   Improving root cause analysis processes
Root Cause Analysis Definition
   ITIL V3
              An Activity that identifies the Root Cause of
              an Incident or Problem.
              Root Cause Analysis typically concentrates on
              IT Infrastructure failures.



  Wikipedia
              Root Cause Analysis is any structured
              approach to identify the factors that resulted
              in the harmful consequences of one or more
              past events
The importance of Root Cause Analysis
• Root Cause Analysis has a high impact on
  – IT processes
     • The efficiency of the overall incident/problem
       management process
     • Good RCA discipline requires well established
       configuration management
  – Organizational goals
     • Meeting internal and external SLAs
     • Financial (budget & revenue) implications
     • Brand / customer loyalty
Root Cause Analysis Nowadays
The Critical Role of Root Cause Analysis
• Improper (or lack of) identification of the real
  root cause may yield:
   – Repeating problems
   – Increased downtime
   – Waste of human
     resources on
     “fixing” the wrong
     issues
   – Risk to the business
The Life of The Operator
We expect the operator
    – To handle 1000’s of cryptic events
    – Understand impact on 100’s of services
    – Understand the correlation to
       customers service complaints
    – Understand what changed
    – Orchestrate the resolution
And make these decisions within minutes to
reduce MTTR

   Are we giving our operators the tools to
   succeed?
Problem Resolution Process
Problem Resolution Process
• Events coming in to the NOC
• NOC performs some investigation
• Root cause analysis is shared between NOC
  & 2nd/3rd level support (admins)
• Low level diagnostics & problem resolution
  is done by 2nd/3rd level support (admins)
Involved Parties & Tools

• Tools
  – Monitoring tools
  – Configuration management tools
• People
  – Users
  – NOC
  – Admins – specialized teams focused on specific
    area, e.g. system, database, network
  – Application support / developers
The Common Process – Blame Game
•   No structured process
•   Lack of overall cross-domain view
•   Each team has its own terminology and view
•   Each team is working on its own
Problem Detection
Potential Problem Symptoms
• Lack of certain functionality
  – A certain transaction does not work
• Performance degradation
  – Fund transfer response time is above 2 sec.
• Availability issue
  – Application doesn’t work
• None
  – Unnoticeable failure due to high availability
    configuration
Problem Detection
• Good problem detection methods are key for a
  structured root cause analysis process
• Problem detection tools should provide sufficient
  data to the root cause analysis process
• There are various distinct methods each with its
  pros and cons
• There is no single superior detection method
Detection – Users
• What it does
  – Compensates for unknown / unreported
    problems
• What it doesn’t
  – Supposedly accurate – actually might point in
    the wrong direction
  – Usually takes place
    too late for a quick fix
    & impact to business
Detection – Infrastructure Monitoring
• What it does
  – Monitor each technical element
    comprising the service
  – Great way to identify
    specific availability failures
• What it doesn’t
  – Hard to correlate with real user experience
  – Too many false positives
  – Lots of events on symptoms rather on actual problem
Detection – End User Experience
• What it does
  – Measure overall response time of user transactions
  – Synthetic or real user transactions
  – The ultimate problem detection method
• What it doesn’t
  – No real breakdown to assist
    in pinpointing the problem
    or even the domain
Detection – Transaction Breakdown
• What it does
  – Discovery of each transaction’s path
    within the data center
  – Highlight potential performance
    problems within the transaction
    execution
• What it doesn’t
  – No correlation to infrastructure
    monitoring
  – Cannot cover the entire data center
    – domain specific
Detection – Domain Specific Tools
• What it does
  – Drill down in a specific application
  – Great analysis & diagnostics within an application
• What it doesn’t
  – No data center wide view
  – Lack of insight into the
    connections between
    applications
Detection - Synergy
Root Cause Analysis Methods
Potential Root Cause Types

•   Configuration change
•   Version upgrade
•   Hardware fault
•   Software bug
•   Capacity problem
•   Resource collision
Common Ways for Root Cause Analysis

•   War room scenario
•   The log file approach
•   APM tools
•   Transaction management
•   Manual event correlation / analysis
War Room Scenario

•   Getting everyone in the same room
•   Each has its own data and terminology
•   Blame game
•   Takes a lot of time
The Log File Approach

• An admin sits and analyzes log files and
  other historical data from various sources
• A domain specific approach
• Certain degree of structured process
• Might identify problems that
  are not the root cause
  (distractions)
APM Tools

• An admin sits and analyzes log files and
  other historical data from various sources
• A domain specific approach
• Certain degree of structured process
• Might identify problems that
  are not the root cause
  (distractions)
Transaction Management

• A great tool to point to the probable area
  where the root cause resides
• Limited to specific domains
• Inability to correlate with infrastructure
  metrics / failures
Manual Event Correlation / Analysis

• Requires cross-domain expertise
• Requires understanding of dependencies
  between components
• Time consuming
• Lack of insight into other
  non-event data
Improving Root Cause Analysis
          Processes
Making The Best From Existing Tools

• Choose problem detection methods that
  assist in the root cause analysis process
• Turn the root cause analysis into a
  structured process
  – Internal team processes
  – Inter-team processes
• Common language & visibility between
  teams
New Methods: Mapping

• Mapping of Business service & applications
  and the supporting infrastructure
• Ties symptoms (user) to problems
  (technology)
• Introduces a common language between
  teams
• Enables a high level cross-domain view
New Methods: Structured Process

• Define a structured process for problem
  investigation and root cause analysis
• Define how collaboration should occur
  during root cause analysis between teams
New Methods: Tools

• Use tools that provide a historical
  dimension for problem investigation
• Use tools that enable the correlation of
  problems to configuration changes
• Use topology based correlation instead of
  rule based (or manual based) correlation
The elusive root cause

Weitere ähnliche Inhalte

Was ist angesagt?

Financial Crime Projects
Financial Crime ProjectsFinancial Crime Projects
Financial Crime Projects
David Allsop
 
Requirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and ElicitationRequirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and Elicitation
Mohamed Shaaban
 

Was ist angesagt? (17)

Alexander Rhea Resume
Alexander Rhea ResumeAlexander Rhea Resume
Alexander Rhea Resume
 
Sadchap04
Sadchap04Sadchap04
Sadchap04
 
Requirements elicitation techniques
Requirements elicitation techniquesRequirements elicitation techniques
Requirements elicitation techniques
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
Requirement Elicitation Techniques/Methods
Requirement Elicitation Techniques/MethodsRequirement Elicitation Techniques/Methods
Requirement Elicitation Techniques/Methods
 
Chapter 7 Development Strategies
Chapter 7 Development StrategiesChapter 7 Development Strategies
Chapter 7 Development Strategies
 
Financial Crime Projects
Financial Crime ProjectsFinancial Crime Projects
Financial Crime Projects
 
Chapter 2 analyzing the business case
Chapter 2 analyzing the business caseChapter 2 analyzing the business case
Chapter 2 analyzing the business case
 
Systems Analysis
Systems AnalysisSystems Analysis
Systems Analysis
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?
 
Design for non functional requirements
Design for non functional requirementsDesign for non functional requirements
Design for non functional requirements
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
Requirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineeringRequirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineering
 
Requirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and ElicitationRequirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and Elicitation
 
Intoduction to software engineering part 1
Intoduction to software engineering part 1Intoduction to software engineering part 1
Intoduction to software engineering part 1
 
2 feasibility-study
2 feasibility-study2 feasibility-study
2 feasibility-study
 
Network Operations Center
Network Operations Center  Network Operations Center
Network Operations Center
 

Ähnlich wie The elusive root cause

lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjdlecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
AqeelAbbas94
 
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptxUNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
abhiisharma0504
 

Ähnlich wie The elusive root cause (20)

requirements analysis and design
requirements analysis and designrequirements analysis and design
requirements analysis and design
 
Requirement Analysis
Requirement AnalysisRequirement Analysis
Requirement Analysis
 
lecture_Analysis Phase.ppt
lecture_Analysis Phase.pptlecture_Analysis Phase.ppt
lecture_Analysis Phase.ppt
 
lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjdlecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
lecture_5 (2).ppt hjhrrgjbgrmgrhbgrgghjd
 
Testing Throughout the Software Life Cycle (2013)
Testing Throughout the Software Life Cycle (2013)Testing Throughout the Software Life Cycle (2013)
Testing Throughout the Software Life Cycle (2013)
 
software requirement
software requirement software requirement
software requirement
 
Chapter 12 developiong business&it solutions
Chapter 12  developiong business&it solutionsChapter 12  developiong business&it solutions
Chapter 12 developiong business&it solutions
 
Development Guideline
Development GuidelineDevelopment Guideline
Development Guideline
 
Requirements engineering process in software engineering
Requirements engineering process in software engineeringRequirements engineering process in software engineering
Requirements engineering process in software engineering
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projects
 
req engg (1).ppt
req engg (1).pptreq engg (1).ppt
req engg (1).ppt
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Mistakes we make_and_howto_avoid_them_v0.12
Mistakes we make_and_howto_avoid_them_v0.12Mistakes we make_and_howto_avoid_them_v0.12
Mistakes we make_and_howto_avoid_them_v0.12
 
INTRODUCTION TO SOFTWARE ENGINEERING
INTRODUCTION TO SOFTWARE ENGINEERINGINTRODUCTION TO SOFTWARE ENGINEERING
INTRODUCTION TO SOFTWARE ENGINEERING
 
What is onTune for management
What is onTune for managementWhat is onTune for management
What is onTune for management
 
Proj Mgmt.ppt
Proj Mgmt.pptProj Mgmt.ppt
Proj Mgmt.ppt
 
Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assurance
 
1 Information Systems Analysis & Design,.pptx
1 Information Systems Analysis & Design,.pptx1 Information Systems Analysis & Design,.pptx
1 Information Systems Analysis & Design,.pptx
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptxUNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
UNIT-III SYSTEM DEVELOPMENT LIFE CYCLE.pptx
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

The elusive root cause

  • 1. The Elusive Root Cause Of IT Problems And How To Easily Identify It Noam Biran Director of Product Management
  • 2. Introduction Mr. Biran • Director of Product Management at Neebula • 20 years experience in systems management & BSM • Innovation Product Management at BMC • Co-founder of Appilog (now HP uCMDB & DDMA) About Neebula Neebula provides the first and only automatic service-centric IT management solution allowing IT organizations to improve the service provided to the business by shifting from managing disparate technology silos to managing the services running in the data center. Leveraging unique technology that automatically maps business services to the underlying infrastructure, Neebula enables the IT team to increase availability of the main services they manage and reduce the time to repair of problems.
  • 3. Agenda • Introduction • Root cause analysis defined • The problem resolution process • Problem detection • Root cause analysis methods • Improving root cause analysis processes
  • 4. Root Cause Analysis Definition ITIL V3 An Activity that identifies the Root Cause of an Incident or Problem. Root Cause Analysis typically concentrates on IT Infrastructure failures. Wikipedia Root Cause Analysis is any structured approach to identify the factors that resulted in the harmful consequences of one or more past events
  • 5. The importance of Root Cause Analysis • Root Cause Analysis has a high impact on – IT processes • The efficiency of the overall incident/problem management process • Good RCA discipline requires well established configuration management – Organizational goals • Meeting internal and external SLAs • Financial (budget & revenue) implications • Brand / customer loyalty
  • 7. The Critical Role of Root Cause Analysis • Improper (or lack of) identification of the real root cause may yield: – Repeating problems – Increased downtime – Waste of human resources on “fixing” the wrong issues – Risk to the business
  • 8. The Life of The Operator We expect the operator – To handle 1000’s of cryptic events – Understand impact on 100’s of services – Understand the correlation to customers service complaints – Understand what changed – Orchestrate the resolution And make these decisions within minutes to reduce MTTR Are we giving our operators the tools to succeed?
  • 10. Problem Resolution Process • Events coming in to the NOC • NOC performs some investigation • Root cause analysis is shared between NOC & 2nd/3rd level support (admins) • Low level diagnostics & problem resolution is done by 2nd/3rd level support (admins)
  • 11. Involved Parties & Tools • Tools – Monitoring tools – Configuration management tools • People – Users – NOC – Admins – specialized teams focused on specific area, e.g. system, database, network – Application support / developers
  • 12. The Common Process – Blame Game • No structured process • Lack of overall cross-domain view • Each team has its own terminology and view • Each team is working on its own
  • 14. Potential Problem Symptoms • Lack of certain functionality – A certain transaction does not work • Performance degradation – Fund transfer response time is above 2 sec. • Availability issue – Application doesn’t work • None – Unnoticeable failure due to high availability configuration
  • 15. Problem Detection • Good problem detection methods are key for a structured root cause analysis process • Problem detection tools should provide sufficient data to the root cause analysis process • There are various distinct methods each with its pros and cons • There is no single superior detection method
  • 16. Detection – Users • What it does – Compensates for unknown / unreported problems • What it doesn’t – Supposedly accurate – actually might point in the wrong direction – Usually takes place too late for a quick fix & impact to business
  • 17. Detection – Infrastructure Monitoring • What it does – Monitor each technical element comprising the service – Great way to identify specific availability failures • What it doesn’t – Hard to correlate with real user experience – Too many false positives – Lots of events on symptoms rather on actual problem
  • 18. Detection – End User Experience • What it does – Measure overall response time of user transactions – Synthetic or real user transactions – The ultimate problem detection method • What it doesn’t – No real breakdown to assist in pinpointing the problem or even the domain
  • 19. Detection – Transaction Breakdown • What it does – Discovery of each transaction’s path within the data center – Highlight potential performance problems within the transaction execution • What it doesn’t – No correlation to infrastructure monitoring – Cannot cover the entire data center – domain specific
  • 20. Detection – Domain Specific Tools • What it does – Drill down in a specific application – Great analysis & diagnostics within an application • What it doesn’t – No data center wide view – Lack of insight into the connections between applications
  • 23. Potential Root Cause Types • Configuration change • Version upgrade • Hardware fault • Software bug • Capacity problem • Resource collision
  • 24. Common Ways for Root Cause Analysis • War room scenario • The log file approach • APM tools • Transaction management • Manual event correlation / analysis
  • 25. War Room Scenario • Getting everyone in the same room • Each has its own data and terminology • Blame game • Takes a lot of time
  • 26. The Log File Approach • An admin sits and analyzes log files and other historical data from various sources • A domain specific approach • Certain degree of structured process • Might identify problems that are not the root cause (distractions)
  • 27. APM Tools • An admin sits and analyzes log files and other historical data from various sources • A domain specific approach • Certain degree of structured process • Might identify problems that are not the root cause (distractions)
  • 28. Transaction Management • A great tool to point to the probable area where the root cause resides • Limited to specific domains • Inability to correlate with infrastructure metrics / failures
  • 29. Manual Event Correlation / Analysis • Requires cross-domain expertise • Requires understanding of dependencies between components • Time consuming • Lack of insight into other non-event data
  • 30. Improving Root Cause Analysis Processes
  • 31. Making The Best From Existing Tools • Choose problem detection methods that assist in the root cause analysis process • Turn the root cause analysis into a structured process – Internal team processes – Inter-team processes • Common language & visibility between teams
  • 32. New Methods: Mapping • Mapping of Business service & applications and the supporting infrastructure • Ties symptoms (user) to problems (technology) • Introduces a common language between teams • Enables a high level cross-domain view
  • 33. New Methods: Structured Process • Define a structured process for problem investigation and root cause analysis • Define how collaboration should occur during root cause analysis between teams
  • 34. New Methods: Tools • Use tools that provide a historical dimension for problem investigation • Use tools that enable the correlation of problems to configuration changes • Use topology based correlation instead of rule based (or manual based) correlation

Hinweis der Redaktion

  1. Introduction to the subjectWebinar logistics: presentation first, send questions during, answer questions at the end
  2. RCA is problematic even to defineITIL definition -> useless. ITIL failedWikipedia:StructuredFactorsConsequencesPast events – I’ll call them symptoms
  3. Talk about each bullet
  4. Many data sources (event feeds)All are mixed and funneled into the NOCNOC needs to filter and make order in them based on:RelevanceSource / derivedBut the NOC doesn’t have the tools or processes to do thisNo structured way to do this filtering (though the NOC is used to structured processes like run book)
  5. Taking care of the symptoms and not the problemsAssociating wrong events -> figuring out the incorrect root cause
  6. NOC is used to structured processes (like run book)We don’t give them toolsWe don’t give them structured processes (or any processes)They don’t posses cross-domain knowledge usually
  7. Isolation – diagnosticsNOC’s investigation may yield forwarding to the wrong team and therefore wrong analysis done in the wrong context
  8. Explain eachHow do they all tie together? Usually they don’t
  9. Problem detection begins with the symptomsSame symptoms may be caused by different problems
  10. We need a combination of toolsChoose the right mix to assist in the RCA processNeed synergy between the methods
  11. Cross domainCross disciplineRequire deep understanding
  12. Not a structured approach