SlideShare a Scribd company logo
1 of 61
incident & problem
management
LIGHTWEIGHT ITIL.
Berlin, November 2015
1.why?
2.overview
3.example
Agenda
1.why?
2.overview
3.example
Agenda
Problem — too frequent incidents
in the live product
Let’s take a look into ITIL.
Wow! It has exactly what we need!
Let’s take the best out of
Problem and Incident Management!
https://www.flickr.com/photos/parkstreetparrot/9764446493
incident
— unplanned interruption or a serious
reduction in the service quality
why incident management?
why incident management?
○ restore service asap
why incident management?
○ restore service asap
○ avoid unnecessary involvement
why incident management?
○ restore service asap
○ avoid unnecessary involvement
○ avoid mistakes
how to manage incidents?
identify
really
incident?
https://www.flickr.com/photos/parkstreetparrot/9764364985
identify handle
how to manage incidents?
use defined
procedure
really
incident?
https://www.flickr.com/photos/parkstreetparrot/9764378536
handle
how to manage incidents?
identify close
add incident
record
use defined
procedure
really
incident?
https://www.flickr.com/photos/parkstreetparrot/9764364706
EXAMPLE
The goal — to minimize the amount
and severity of incidents in
live online games
We don’t treat every bug an incident.
Incident criteria was defined
identify
Incident only when:
➔ game becomes unavailable
identify
Incident only when:
➔ game becomes unavailable, or
➔ game revenue drops more
than €XXX
identify
Incident only when:
➔ game becomes unavailable, or
➔ game revenue drops more
than €XXX, or
➔ severe issues with servers
identify
Incident only when:
➔ game becomes unavailable, or
➔ game revenue drops more
than €XXX, or
➔ severe issues with servers, or
➔ it can't wait for next
planned deployment
We don’t panic when the incident
occurs. We follow the process:
➔ Elect a SWAT team
handle
➔ Elect a SWAT team
➔ Plan Communication
handle
➔ Elect a SWAT team
➔ Plan Communication
➔ Kick-off
handle
➔ Elect a SWAT team
➔ Plan Communication
➔ Kick-off
➔ Check the Knowledge Base
handle
➔ Elect a SWAT team
➔ Plan Communication
➔ Kick-off
➔ Check the Knowledge Base
➔ Create an IM chat group
handle
➔ Elect a SWAT team
➔ Plan Communication
➔ Kick-off
➔ Check the Knowledge Base
➔ Create an IM chat group
➔ Send email notifications to
stakeholders on every update
handle
➔ Elect a SWAT team
➔ Plan Communication
➔ Kick-off
➔ Check the Knowledge Base
➔ Create an IM chat group
➔ Send email notifications to
stakeholders on every update
➔ Follow defined policies and
guidelines
handle
We act smartly after
the incident is resolved:
➔ prevent recurrences
close
➔ prevent recurrences
➔ update stakeholders
close
➔ prevent recurrences
➔ update stakeholders
➔ submit the Incident record
close
➔ prevent recurrences
➔ update stakeholders
➔ submit the Incident record
➔ update Knowledge Base if
necessary
close
➔ prevent recurrences
➔ update stakeholders
➔ submit the Incident record
➔ update Knowledge Base if
necessary
➔ propose process improvements
close
★ resolved/workarounded incident
★ updated incidents database
Outcomes
problem
management
why problem management?
○ recognize problems
why problem management?
○ recognize problems
○ permanent solutions
why problem management?
○ recognize problems
○ permanent solutions
○ less emergencies
identify
problems
from incident
records
identify
problems
from incident
records
submit
problems
submit
problems
implement
permanent
solutions
identify
problems
from incident
records
implement
permanent
solutions
identify
problems
from incident
records
submit
problems
also add problems directly
we ask questions:
we ask questions:
could any of the incidents be prevented?
we ask questions:
could any of the incidents be prevented?
can we detect incident symptoms?
we ask questions:
could any of the incidents be prevented?
can we detect incident symptoms?
are there any patterns?
we ask questions:
could any of the incidents be prevented?
can we detect incident symptoms?
are there any patterns?
“the aim of incident management is to restore the service as
quickly as possible, often through a workaround, rather than
through trying to find a permanent solution which is the aim
of problem management.”
Summary
Appendices
IDENTIFICATION CLOSUREHANDLING
‣ Receive data regarding the incident and
ensure it is full and clear
‣ Qualify issue as an Incident.
DELIVERABLES
➨ Incident process is triggered
After the incident has been solved, make sure
to:
‣ Communicate the results to relevant
stakeholders by sending mail following the
'Issue on Live' closure procedure as per the
template.
‣ Take corrective actions to prevent issue from
happening again. Create JIRA tickets where
possible.
‣ Evaluate possible procedure updates that can
be made in the teams in the pipeline.
‣ Submit “Incident Login” form.
DELIVERABLES
➨ Sent report to XXX email
➨ Submitted related JIRA tickets
➨ Submitted “Incident Login” form
‣ Elect a SWAT team to fix the incident issue.
‣ Decide on a War Room for the Huddle.
‣ Huddle and lay down an Action Plan.
‣ Send out email notification to all
stakeholders. No one is allowed to disturb the
SWAT team from work, while they actively
investigate/resolve the Incident
‣ Huddle regularly to update the action plan.
‣ Send out updates to all stakeholders.
‣ If devOps is necessary, follow the “Emergency
IT Support Policy”.
‣ Follow “Live Actions Guidelines”
DELIVERABLES
➨ Resolved incident (possibly workarounded)
➨ Sent Incident report(s) to XXX email
incident management process
Resolved ?
no
Add / Update Incident record
(via Incident Login form)
Open >3 days ?
yes
Create JIRAs for fixing root cause or other
related issues if possible
Add / Update Incident record
(via Incident Login form)
yes
HANDLING
CLOSURE
IDENTIFICATION
Action plan
(five minutes huddle of the SWAT team in a war room)
Send email
Create / Update JIRAs
(contact OPS if necessary)
Fix
(first QA, then Live)
Incident
detected
Send email
(keep one thread)
no
problem management process
PROBLEM
DETECTION
ROOT CAUSE
IDENTIFICATION
SOLUTION
DEFINITION
PRIORITISATION
PROBLEM
LOGGING
IMPLEMENTATION,
CLOSURE
ACTIVITIES
‣ Define the problem
‣ Receive data regarding the
problem from incident
management
‣ Ensure the collected data is full
and clear
‣ Define which teams or
departments are affected
‣ Gather other data at the day of
incident
‣ Analyze symptoms
‣ Analyze the data collected from
various sources relating to the
major incident
‣ Analyze historical data to see if
there was such problem before
DELIVERABLES
➨ Analyzed problem
➨ Updated incident record
ACTIVITIES
Problem investigation and
diagnosis (requires tech experts)
‣ To conduct root cause
analyses using various
techniques if necessary:
• Make a sketch
• Draw Ishikawa (fishbone)
diagram
• Kepner-Tregoe
• Flow diagrams
• etc.
‣ Determine workarounds
‣ Think of potential solutions
‣ Assess the problem and
recommended actions to
resolve the problem
DELIVERABLES
➨ Updated problem record
➨ Root cause detected
➨ Workaround(s) identified
ACTIVITIES
‣ Identify the team for solution
development
‣ Determine possible resolutions
‣ Choose the best approach
‣ Make sure the solution can
effectively prevent
reoccurrence
DELIVERABLES
➨ Updated problem record
➨ Other tasks in JIRA
➨ Updated incident records
➨ Defined resources that are
necessary for implementation
ACTIVITIES
‣ Identify the urgency and
impact of this task
‣ Define a priority in the
Problem management queue
‣ Identify responsible for
the implementation
‣ Decide how this problem
should be prioritized among
other tasks of the team
DELIVERABLES
➨ The task(s) has a priority
➨ The team leads are aware of
the task and can plan it in
their sprints
ACTIVITIES
‣ Create a new JIRA record or
update the old one:
• Unique ID, timestamp
• Name of submitter
• Link associate problem
records
(with hierarchy if applicable)
• Link associate incident
records
• Problem description
• Problem category
• Status
• Severity and Impact
• Responsible person, team
• Affected game
• Associate JIRA records
• History of all taken actions
• Workaround
• Permanent solution (if known
already)
DELIVERABLES
➨ Created/updated problem
record
➨ Analyzed and updated
incident data
ACTIVITIES
‣ Conduct activities to implement
the fix to the problem
‣ Verify if the solution is appropriate
and close problem record
‣ Submit a record to the Error
Knowledge Base if applicable
‣ Share Lessons learned via email
if reasonable
‣ Ensure that all the associated
incidents are closed with a proper
fix or resolution
DELIVERABLES
➨ Updated incident record
➨ Updated problem record
➨ Updated Known Errors
Knowledge Base spreadsheet
➨ Lessons learned shared
➨ Report is sent
IMPLEMENTATION
&
CLOSURE
ROOT CAUSE
IDENTIFICATION
&
SOLUTION
DEFINITION
Close
(update knowledge base, submit lessons learned, send email)
Implement
(by defined implementation team)
DETECTION
&
LOGGING
Choose the problem area
Analyze related incident data
(symptoms, relations, historical data)
Request missing data
(symptoms, relations, historical data)
Create new Problem Record /update existing
(JIRA)
Identify root cause, workarounds
Determine work for identified solutions
(and choose implementation team)
Prioritize
Incident
record(s)
update
Problem
record
update
Problem
record
update
Problem
record
update
Problem
record
update
Incident
record(s)
update
Incident
record(s)
update
Known
Errors
update
Run MeetingPrepare Meeting
Who: Problem Manager
Process summary:
- to ensure the quality of the
incident spreadsheet
- to select follow-up’s
- to prefill the problem
management spreadsheet
Efforts: 3-5 mh
When: no later than 3 days
before the meeting
Who: particular person is
responsible for every problem as
defined in the meeting
Process summary:
- implement
- verify
- update all records
Outcomes: Updated incident and
problem records
Chairperson: Problem Manager
Participants: PMs, APs, OPS
representative
Frequency: monthly
Activities:
- identify problems
- prioritize & agree on
actions
- define responsible teams
When: at the end of the month. In
case of holidays or emergency
moved to the next working day.
Outcomes: Assigned tasks
Take Actions
problem management process simplified
Valentyn Barmak
thank you!
http://www.linkedin.com/in/valentineb
https://www.xing.com/profile/Valentyn_Barmak
www.barmak.de
ask for more:

More Related Content

What's hot

Incident Management Best Practices
Incident Management Best PracticesIncident Management Best Practices
Incident Management Best PracticesTechExcel
 
June2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem MgmtJune2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem MgmtIT Service and Support
 
ITIL Incident Management Workflow - Process Guide
	 ITIL Incident Management Workflow - Process Guide	 ITIL Incident Management Workflow - Process Guide
ITIL Incident Management Workflow - Process GuideFlevy.com Best Practices
 
ITIL Incident management
ITIL Incident managementITIL Incident management
ITIL Incident managementManageEngine
 
Comarch ICT Service Desk - infographic
Comarch ICT Service Desk - infographicComarch ICT Service Desk - infographic
Comarch ICT Service Desk - infographicComarch_Services
 
Incident Escalation process Presentation
Incident Escalation process PresentationIncident Escalation process Presentation
Incident Escalation process PresentationLukas Williamson
 
Managing a Major Incident
Managing a Major IncidentManaging a Major Incident
Managing a Major IncidentNUS-ISS
 
IT Service Management Overview
IT Service Management OverviewIT Service Management Overview
IT Service Management OverviewAhmed Al-Hadidi
 
ITIL Service Desk Tools
ITIL Service Desk ToolsITIL Service Desk Tools
ITIL Service Desk Toolsahmedshama
 
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...Simplilearn
 
Service Desk Proposition Presentation
Service Desk Proposition PresentationService Desk Proposition Presentation
Service Desk Proposition PresentationSimonAnthony
 
Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)Narudom Roongsiriwong, CISSP
 

What's hot (20)

Incident Management Best Practices
Incident Management Best PracticesIncident Management Best Practices
Incident Management Best Practices
 
June2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem MgmtJune2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem Mgmt
 
Incident Management
Incident ManagementIncident Management
Incident Management
 
ITIL Incident Management Workflow - Process Guide
	 ITIL Incident Management Workflow - Process Guide	 ITIL Incident Management Workflow - Process Guide
ITIL Incident Management Workflow - Process Guide
 
ITIL Incident management
ITIL Incident managementITIL Incident management
ITIL Incident management
 
ITIL Service Desk
ITIL Service DeskITIL Service Desk
ITIL Service Desk
 
Comarch ICT Service Desk - infographic
Comarch ICT Service Desk - infographicComarch ICT Service Desk - infographic
Comarch ICT Service Desk - infographic
 
Incident Escalation process Presentation
Incident Escalation process PresentationIncident Escalation process Presentation
Incident Escalation process Presentation
 
Managing a Major Incident
Managing a Major IncidentManaging a Major Incident
Managing a Major Incident
 
IT Service Management Overview
IT Service Management OverviewIT Service Management Overview
IT Service Management Overview
 
ITIL Service Desk Tools
ITIL Service Desk ToolsITIL Service Desk Tools
ITIL Service Desk Tools
 
ITIL Basic concepts
ITIL   Basic conceptsITIL   Basic concepts
ITIL Basic concepts
 
ITIL4 and ServiceNow
ITIL4 and ServiceNowITIL4 and ServiceNow
ITIL4 and ServiceNow
 
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...
IT Service Management Tutorial | What Is ITSM? | ITIL Foundation Training | S...
 
Cutover plan template Tool
Cutover plan template ToolCutover plan template Tool
Cutover plan template Tool
 
Problem Management
Problem ManagementProblem Management
Problem Management
 
ITIL PPT
ITIL PPTITIL PPT
ITIL PPT
 
Service Desk Proposition Presentation
Service Desk Proposition PresentationService Desk Proposition Presentation
Service Desk Proposition Presentation
 
Introducing ITIL
Introducing ITILIntroducing ITIL
Introducing ITIL
 
Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)
 

Viewers also liked

ITIL and Autotask: Incident & Problem Management
ITIL and Autotask: Incident & Problem ManagementITIL and Autotask: Incident & Problem Management
ITIL and Autotask: Incident & Problem ManagementAutotask
 
Running an Efficient Service Desk
Running an Efficient Service DeskRunning an Efficient Service Desk
Running an Efficient Service DeskAutotask
 
Top 10 incident manager interview questions and answers
Top 10 incident manager interview questions and answersTop 10 incident manager interview questions and answers
Top 10 incident manager interview questions and answerskingmin609
 
Making Problem Management Work for Your Organization
Making Problem Management Work for Your OrganizationMaking Problem Management Work for Your Organization
Making Problem Management Work for Your OrganizationAtlassian
 
ITIL Process Guide
ITIL Process GuideITIL Process Guide
ITIL Process GuideTechExcel
 
5 Problem Management Traps to Avoid!
5 Problem Management Traps to Avoid!5 Problem Management Traps to Avoid!
5 Problem Management Traps to Avoid!John Barber
 
Avoiding Mistakes when Implementing Incident and Problem Management
Avoiding Mistakes when Implementing Incident and Problem ManagementAvoiding Mistakes when Implementing Incident and Problem Management
Avoiding Mistakes when Implementing Incident and Problem ManagementJavier García Bolao
 
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-be
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-beVladimirs ivanovs-how-lean-and-agile-can-your-service-desk-be
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-beVladimirs Ivanovs
 
Help desk ticket categories create help desk ticket classification it-tool...
Help desk ticket categories  create help desk ticket classification   it-tool...Help desk ticket categories  create help desk ticket classification   it-tool...
Help desk ticket categories create help desk ticket classification it-tool...IT-Toolkits.org
 
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap GeminiPlatform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap GeminiCA | Automic Software
 
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2post
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2postitSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2post
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2postJohann Diaz
 
Strategy for Reducing Ticket Backlog
Strategy for Reducing Ticket BacklogStrategy for Reducing Ticket Backlog
Strategy for Reducing Ticket BacklogMark Copeland
 
Fusion14 session 202 problem management - making it work for your organization
Fusion14 session 202   problem management - making it work for your organizationFusion14 session 202   problem management - making it work for your organization
Fusion14 session 202 problem management - making it work for your organizationKepner-Tregoe
 
Flexible Work Practices Moodle
Flexible Work Practices MoodleFlexible Work Practices Moodle
Flexible Work Practices Moodlemanuela egidy
 
Reliability and Maintenance in production Management
Reliability and Maintenance in production ManagementReliability and Maintenance in production Management
Reliability and Maintenance in production ManagementNazneen sheikh
 

Viewers also liked (16)

ITIL and Autotask: Incident & Problem Management
ITIL and Autotask: Incident & Problem ManagementITIL and Autotask: Incident & Problem Management
ITIL and Autotask: Incident & Problem Management
 
Running an Efficient Service Desk
Running an Efficient Service DeskRunning an Efficient Service Desk
Running an Efficient Service Desk
 
Top 10 incident manager interview questions and answers
Top 10 incident manager interview questions and answersTop 10 incident manager interview questions and answers
Top 10 incident manager interview questions and answers
 
Making Problem Management Work for Your Organization
Making Problem Management Work for Your OrganizationMaking Problem Management Work for Your Organization
Making Problem Management Work for Your Organization
 
ITIL Process Guide
ITIL Process GuideITIL Process Guide
ITIL Process Guide
 
Intro to reliability management
Intro to reliability managementIntro to reliability management
Intro to reliability management
 
5 Problem Management Traps to Avoid!
5 Problem Management Traps to Avoid!5 Problem Management Traps to Avoid!
5 Problem Management Traps to Avoid!
 
Avoiding Mistakes when Implementing Incident and Problem Management
Avoiding Mistakes when Implementing Incident and Problem ManagementAvoiding Mistakes when Implementing Incident and Problem Management
Avoiding Mistakes when Implementing Incident and Problem Management
 
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-be
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-beVladimirs ivanovs-how-lean-and-agile-can-your-service-desk-be
Vladimirs ivanovs-how-lean-and-agile-can-your-service-desk-be
 
Help desk ticket categories create help desk ticket classification it-tool...
Help desk ticket categories  create help desk ticket classification   it-tool...Help desk ticket categories  create help desk ticket classification   it-tool...
Help desk ticket categories create help desk ticket classification it-tool...
 
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap GeminiPlatform-as-a-Service for Automated Business Autocomes - Cap Gemini
Platform-as-a-Service for Automated Business Autocomes - Cap Gemini
 
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2post
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2postitSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2post
itSMF Conference 2015 - Johann Diaz - Service Mgt is Dead v2.2post
 
Strategy for Reducing Ticket Backlog
Strategy for Reducing Ticket BacklogStrategy for Reducing Ticket Backlog
Strategy for Reducing Ticket Backlog
 
Fusion14 session 202 problem management - making it work for your organization
Fusion14 session 202   problem management - making it work for your organizationFusion14 session 202   problem management - making it work for your organization
Fusion14 session 202 problem management - making it work for your organization
 
Flexible Work Practices Moodle
Flexible Work Practices MoodleFlexible Work Practices Moodle
Flexible Work Practices Moodle
 
Reliability and Maintenance in production Management
Reliability and Maintenance in production ManagementReliability and Maintenance in production Management
Reliability and Maintenance in production Management
 

Similar to Incident and Problem management simplified

DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck VictorOps
 
8D Problem Solving
8D Problem Solving8D Problem Solving
8D Problem SolvingAjay Garg
 
EDM101: Implementation Practices - Project Management
EDM101: Implementation Practices - Project ManagementEDM101: Implementation Practices - Project Management
EDM101: Implementation Practices - Project ManagementLaserfiche
 
Problem solving techniques
Problem solving techniquesProblem solving techniques
Problem solving techniquesAhsan Saleem
 
Presentation On G8D
Presentation On G8DPresentation On G8D
Presentation On G8DRRChandran
 
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdfITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdfManishKumar526001
 
Root Cause Corrective Action
Root Cause Corrective ActionRoot Cause Corrective Action
Root Cause Corrective ActionUbersoldat
 
5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentationAnna Sadokhina
 
Deal With Production Issues - The ITIL Way
Deal With Production Issues - The ITIL WayDeal With Production Issues - The ITIL Way
Deal With Production Issues - The ITIL WayLinpei Zhang
 
3. Solving Problems for Mission - 2021 Participants (1).pdf
3. Solving Problems for Mission - 2021 Participants (1).pdf3. Solving Problems for Mission - 2021 Participants (1).pdf
3. Solving Problems for Mission - 2021 Participants (1).pdfFidelEhikioya
 
Lecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyLecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyRyan Olaybal
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog
 
The Ins and Outs of Accident Investigation
The Ins and Outs of Accident InvestigationThe Ins and Outs of Accident Investigation
The Ins and Outs of Accident InvestigationKPADealerWebinars
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
8-D problem solving tools training m.pdf
8-D problem solving tools training m.pdf8-D problem solving tools training m.pdf
8-D problem solving tools training m.pdfprabhatsinghrajput93
 
Service Operation Processes
Service Operation ProcessesService Operation Processes
Service Operation Processesnuwulang
 
Corrective & Preventive Action
Corrective & Preventive Action Corrective & Preventive Action
Corrective & Preventive Action Praneet Surti
 

Similar to Incident and Problem management simplified (20)

DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck
 
8D Problem Solving
8D Problem Solving8D Problem Solving
8D Problem Solving
 
EDM101: Implementation Practices - Project Management
EDM101: Implementation Practices - Project ManagementEDM101: Implementation Practices - Project Management
EDM101: Implementation Practices - Project Management
 
Problem solving techniques
Problem solving techniquesProblem solving techniques
Problem solving techniques
 
Presentation On G8D
Presentation On G8DPresentation On G8D
Presentation On G8D
 
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdfITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
 
Apollo Introduction
Apollo IntroductionApollo Introduction
Apollo Introduction
 
Root Cause Corrective Action
Root Cause Corrective ActionRoot Cause Corrective Action
Root Cause Corrective Action
 
5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation
 
Deal With Production Issues - The ITIL Way
Deal With Production Issues - The ITIL WayDeal With Production Issues - The ITIL Way
Deal With Production Issues - The ITIL Way
 
3. Solving Problems for Mission - 2021 Participants (1).pdf
3. Solving Problems for Mission - 2021 Participants (1).pdf3. Solving Problems for Mission - 2021 Participants (1).pdf
3. Solving Problems for Mission - 2021 Participants (1).pdf
 
Lecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyLecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in Company
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
The Ins and Outs of Accident Investigation
The Ins and Outs of Accident InvestigationThe Ins and Outs of Accident Investigation
The Ins and Outs of Accident Investigation
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
8-D problem solving tools training m.pdf
8-D problem solving tools training m.pdf8-D problem solving tools training m.pdf
8-D problem solving tools training m.pdf
 
Service Operation Processes
Service Operation ProcessesService Operation Processes
Service Operation Processes
 
Incident response
Incident responseIncident response
Incident response
 
Root cause analysis by: ICG Team
Root cause analysis by: ICG TeamRoot cause analysis by: ICG Team
Root cause analysis by: ICG Team
 
Corrective & Preventive Action
Corrective & Preventive Action Corrective & Preventive Action
Corrective & Preventive Action
 

Recently uploaded

Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxalinstan901
 
Reviewing and summarization of university ranking system to.pptx
Reviewing and summarization of university ranking system  to.pptxReviewing and summarization of university ranking system  to.pptx
Reviewing and summarization of university ranking system to.pptxAss.Prof. Dr. Mogeeb Mosleh
 
Strategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal AnalsysisStrategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal Analsysistanmayarora45
 
Independent Escorts Vikaspuri / 9899900591 High Profile Escort Service in Delhi
Independent Escorts Vikaspuri  / 9899900591 High Profile Escort Service in DelhiIndependent Escorts Vikaspuri  / 9899900591 High Profile Escort Service in Delhi
Independent Escorts Vikaspuri / 9899900591 High Profile Escort Service in Delhiguptaswati8536
 
International Ocean Transportation p.pdf
International Ocean Transportation p.pdfInternational Ocean Transportation p.pdf
International Ocean Transportation p.pdfAlejandromexEspino
 
Beyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable developmentBeyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable developmentNimot Muili
 
Safety T fire missions army field Artillery
Safety T fire missions army field ArtillerySafety T fire missions army field Artillery
Safety T fire missions army field ArtilleryKennethSwanberg
 
Marketing Management 16th edition by Philip Kotler test bank.docx
Marketing Management 16th edition by Philip Kotler test bank.docxMarketing Management 16th edition by Philip Kotler test bank.docx
Marketing Management 16th edition by Philip Kotler test bank.docxssuserf63bd7
 
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professional
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professionalW.H.Bender Quote 62 - Always strive to be a Hospitality Service professional
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professionalWilliam (Bill) H. Bender, FCSI
 
Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...Hedda Bird
 
How Software Developers Destroy Business Value.pptx
How Software Developers Destroy Business Value.pptxHow Software Developers Destroy Business Value.pptx
How Software Developers Destroy Business Value.pptxAaron Stannard
 
internship thesis pakistan aeronautical complex kamra
internship thesis pakistan aeronautical complex kamrainternship thesis pakistan aeronautical complex kamra
internship thesis pakistan aeronautical complex kamraAllTops
 
digital Human resource management presentation.pdf
digital Human resource management presentation.pdfdigital Human resource management presentation.pdf
digital Human resource management presentation.pdfArtiSrivastava23
 
Leaders enhance communication by actively listening, providing constructive f...
Leaders enhance communication by actively listening, providing constructive f...Leaders enhance communication by actively listening, providing constructive f...
Leaders enhance communication by actively listening, providing constructive f...Ram V Chary
 
The Psychology Of Motivation - Richard Brown
The Psychology Of Motivation - Richard BrownThe Psychology Of Motivation - Richard Brown
The Psychology Of Motivation - Richard BrownSandaliGurusinghe2
 

Recently uploaded (17)

Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptx
 
Reviewing and summarization of university ranking system to.pptx
Reviewing and summarization of university ranking system  to.pptxReviewing and summarization of university ranking system  to.pptx
Reviewing and summarization of university ranking system to.pptx
 
Intro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptxIntro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptx
 
Strategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal AnalsysisStrategic Management, Vision Mission, Internal Analsysis
Strategic Management, Vision Mission, Internal Analsysis
 
Independent Escorts Vikaspuri / 9899900591 High Profile Escort Service in Delhi
Independent Escorts Vikaspuri  / 9899900591 High Profile Escort Service in DelhiIndependent Escorts Vikaspuri  / 9899900591 High Profile Escort Service in Delhi
Independent Escorts Vikaspuri / 9899900591 High Profile Escort Service in Delhi
 
International Ocean Transportation p.pdf
International Ocean Transportation p.pdfInternational Ocean Transportation p.pdf
International Ocean Transportation p.pdf
 
Beyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable developmentBeyond the Codes_Repositioning towards sustainable development
Beyond the Codes_Repositioning towards sustainable development
 
Safety T fire missions army field Artillery
Safety T fire missions army field ArtillerySafety T fire missions army field Artillery
Safety T fire missions army field Artillery
 
Marketing Management 16th edition by Philip Kotler test bank.docx
Marketing Management 16th edition by Philip Kotler test bank.docxMarketing Management 16th edition by Philip Kotler test bank.docx
Marketing Management 16th edition by Philip Kotler test bank.docx
 
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professional
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professionalW.H.Bender Quote 62 - Always strive to be a Hospitality Service professional
W.H.Bender Quote 62 - Always strive to be a Hospitality Service professional
 
Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...Dealing with Poor Performance - get the full picture from 3C Performance Mana...
Dealing with Poor Performance - get the full picture from 3C Performance Mana...
 
How Software Developers Destroy Business Value.pptx
How Software Developers Destroy Business Value.pptxHow Software Developers Destroy Business Value.pptx
How Software Developers Destroy Business Value.pptx
 
internship thesis pakistan aeronautical complex kamra
internship thesis pakistan aeronautical complex kamrainternship thesis pakistan aeronautical complex kamra
internship thesis pakistan aeronautical complex kamra
 
digital Human resource management presentation.pdf
digital Human resource management presentation.pdfdigital Human resource management presentation.pdf
digital Human resource management presentation.pdf
 
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTECAbortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
Abortion pills in Jeddah |• +966572737505 ] GET CYTOTEC
 
Leaders enhance communication by actively listening, providing constructive f...
Leaders enhance communication by actively listening, providing constructive f...Leaders enhance communication by actively listening, providing constructive f...
Leaders enhance communication by actively listening, providing constructive f...
 
The Psychology Of Motivation - Richard Brown
The Psychology Of Motivation - Richard BrownThe Psychology Of Motivation - Richard Brown
The Psychology Of Motivation - Richard Brown
 

Incident and Problem management simplified

  • 1. incident & problem management LIGHTWEIGHT ITIL. Berlin, November 2015
  • 4. Problem — too frequent incidents in the live product
  • 5. Let’s take a look into ITIL. Wow! It has exactly what we need!
  • 6. Let’s take the best out of Problem and Incident Management!
  • 8. incident — unplanned interruption or a serious reduction in the service quality
  • 10. why incident management? ○ restore service asap
  • 11. why incident management? ○ restore service asap ○ avoid unnecessary involvement
  • 12. why incident management? ○ restore service asap ○ avoid unnecessary involvement ○ avoid mistakes
  • 13. how to manage incidents? identify really incident?
  • 15. identify handle how to manage incidents? use defined procedure really incident?
  • 17. handle how to manage incidents? identify close add incident record use defined procedure really incident?
  • 20. The goal — to minimize the amount and severity of incidents in live online games
  • 21. We don’t treat every bug an incident. Incident criteria was defined
  • 22. identify Incident only when: ➔ game becomes unavailable
  • 23. identify Incident only when: ➔ game becomes unavailable, or ➔ game revenue drops more than €XXX
  • 24. identify Incident only when: ➔ game becomes unavailable, or ➔ game revenue drops more than €XXX, or ➔ severe issues with servers
  • 25. identify Incident only when: ➔ game becomes unavailable, or ➔ game revenue drops more than €XXX, or ➔ severe issues with servers, or ➔ it can't wait for next planned deployment
  • 26. We don’t panic when the incident occurs. We follow the process:
  • 27. ➔ Elect a SWAT team handle
  • 28. ➔ Elect a SWAT team ➔ Plan Communication handle
  • 29. ➔ Elect a SWAT team ➔ Plan Communication ➔ Kick-off handle
  • 30. ➔ Elect a SWAT team ➔ Plan Communication ➔ Kick-off ➔ Check the Knowledge Base handle
  • 31. ➔ Elect a SWAT team ➔ Plan Communication ➔ Kick-off ➔ Check the Knowledge Base ➔ Create an IM chat group handle
  • 32. ➔ Elect a SWAT team ➔ Plan Communication ➔ Kick-off ➔ Check the Knowledge Base ➔ Create an IM chat group ➔ Send email notifications to stakeholders on every update handle
  • 33. ➔ Elect a SWAT team ➔ Plan Communication ➔ Kick-off ➔ Check the Knowledge Base ➔ Create an IM chat group ➔ Send email notifications to stakeholders on every update ➔ Follow defined policies and guidelines handle
  • 34. We act smartly after the incident is resolved:
  • 36. ➔ prevent recurrences ➔ update stakeholders close
  • 37. ➔ prevent recurrences ➔ update stakeholders ➔ submit the Incident record close
  • 38. ➔ prevent recurrences ➔ update stakeholders ➔ submit the Incident record ➔ update Knowledge Base if necessary close
  • 39. ➔ prevent recurrences ➔ update stakeholders ➔ submit the Incident record ➔ update Knowledge Base if necessary ➔ propose process improvements close
  • 40. ★ resolved/workarounded incident ★ updated incidents database Outcomes
  • 42. why problem management? ○ recognize problems
  • 43. why problem management? ○ recognize problems ○ permanent solutions
  • 44. why problem management? ○ recognize problems ○ permanent solutions ○ less emergencies
  • 50. we ask questions: could any of the incidents be prevented?
  • 51. we ask questions: could any of the incidents be prevented? can we detect incident symptoms?
  • 52. we ask questions: could any of the incidents be prevented? can we detect incident symptoms? are there any patterns?
  • 53. we ask questions: could any of the incidents be prevented? can we detect incident symptoms? are there any patterns?
  • 54. “the aim of incident management is to restore the service as quickly as possible, often through a workaround, rather than through trying to find a permanent solution which is the aim of problem management.” Summary
  • 56. IDENTIFICATION CLOSUREHANDLING ‣ Receive data regarding the incident and ensure it is full and clear ‣ Qualify issue as an Incident. DELIVERABLES ➨ Incident process is triggered After the incident has been solved, make sure to: ‣ Communicate the results to relevant stakeholders by sending mail following the 'Issue on Live' closure procedure as per the template. ‣ Take corrective actions to prevent issue from happening again. Create JIRA tickets where possible. ‣ Evaluate possible procedure updates that can be made in the teams in the pipeline. ‣ Submit “Incident Login” form. DELIVERABLES ➨ Sent report to XXX email ➨ Submitted related JIRA tickets ➨ Submitted “Incident Login” form ‣ Elect a SWAT team to fix the incident issue. ‣ Decide on a War Room for the Huddle. ‣ Huddle and lay down an Action Plan. ‣ Send out email notification to all stakeholders. No one is allowed to disturb the SWAT team from work, while they actively investigate/resolve the Incident ‣ Huddle regularly to update the action plan. ‣ Send out updates to all stakeholders. ‣ If devOps is necessary, follow the “Emergency IT Support Policy”. ‣ Follow “Live Actions Guidelines” DELIVERABLES ➨ Resolved incident (possibly workarounded) ➨ Sent Incident report(s) to XXX email incident management process
  • 57. Resolved ? no Add / Update Incident record (via Incident Login form) Open >3 days ? yes Create JIRAs for fixing root cause or other related issues if possible Add / Update Incident record (via Incident Login form) yes HANDLING CLOSURE IDENTIFICATION Action plan (five minutes huddle of the SWAT team in a war room) Send email Create / Update JIRAs (contact OPS if necessary) Fix (first QA, then Live) Incident detected Send email (keep one thread) no
  • 58. problem management process PROBLEM DETECTION ROOT CAUSE IDENTIFICATION SOLUTION DEFINITION PRIORITISATION PROBLEM LOGGING IMPLEMENTATION, CLOSURE ACTIVITIES ‣ Define the problem ‣ Receive data regarding the problem from incident management ‣ Ensure the collected data is full and clear ‣ Define which teams or departments are affected ‣ Gather other data at the day of incident ‣ Analyze symptoms ‣ Analyze the data collected from various sources relating to the major incident ‣ Analyze historical data to see if there was such problem before DELIVERABLES ➨ Analyzed problem ➨ Updated incident record ACTIVITIES Problem investigation and diagnosis (requires tech experts) ‣ To conduct root cause analyses using various techniques if necessary: • Make a sketch • Draw Ishikawa (fishbone) diagram • Kepner-Tregoe • Flow diagrams • etc. ‣ Determine workarounds ‣ Think of potential solutions ‣ Assess the problem and recommended actions to resolve the problem DELIVERABLES ➨ Updated problem record ➨ Root cause detected ➨ Workaround(s) identified ACTIVITIES ‣ Identify the team for solution development ‣ Determine possible resolutions ‣ Choose the best approach ‣ Make sure the solution can effectively prevent reoccurrence DELIVERABLES ➨ Updated problem record ➨ Other tasks in JIRA ➨ Updated incident records ➨ Defined resources that are necessary for implementation ACTIVITIES ‣ Identify the urgency and impact of this task ‣ Define a priority in the Problem management queue ‣ Identify responsible for the implementation ‣ Decide how this problem should be prioritized among other tasks of the team DELIVERABLES ➨ The task(s) has a priority ➨ The team leads are aware of the task and can plan it in their sprints ACTIVITIES ‣ Create a new JIRA record or update the old one: • Unique ID, timestamp • Name of submitter • Link associate problem records (with hierarchy if applicable) • Link associate incident records • Problem description • Problem category • Status • Severity and Impact • Responsible person, team • Affected game • Associate JIRA records • History of all taken actions • Workaround • Permanent solution (if known already) DELIVERABLES ➨ Created/updated problem record ➨ Analyzed and updated incident data ACTIVITIES ‣ Conduct activities to implement the fix to the problem ‣ Verify if the solution is appropriate and close problem record ‣ Submit a record to the Error Knowledge Base if applicable ‣ Share Lessons learned via email if reasonable ‣ Ensure that all the associated incidents are closed with a proper fix or resolution DELIVERABLES ➨ Updated incident record ➨ Updated problem record ➨ Updated Known Errors Knowledge Base spreadsheet ➨ Lessons learned shared ➨ Report is sent
  • 59. IMPLEMENTATION & CLOSURE ROOT CAUSE IDENTIFICATION & SOLUTION DEFINITION Close (update knowledge base, submit lessons learned, send email) Implement (by defined implementation team) DETECTION & LOGGING Choose the problem area Analyze related incident data (symptoms, relations, historical data) Request missing data (symptoms, relations, historical data) Create new Problem Record /update existing (JIRA) Identify root cause, workarounds Determine work for identified solutions (and choose implementation team) Prioritize Incident record(s) update Problem record update Problem record update Problem record update Problem record update Incident record(s) update Incident record(s) update Known Errors update
  • 60. Run MeetingPrepare Meeting Who: Problem Manager Process summary: - to ensure the quality of the incident spreadsheet - to select follow-up’s - to prefill the problem management spreadsheet Efforts: 3-5 mh When: no later than 3 days before the meeting Who: particular person is responsible for every problem as defined in the meeting Process summary: - implement - verify - update all records Outcomes: Updated incident and problem records Chairperson: Problem Manager Participants: PMs, APs, OPS representative Frequency: monthly Activities: - identify problems - prioritize & agree on actions - define responsible teams When: at the end of the month. In case of holidays or emergency moved to the next working day. Outcomes: Assigned tasks Take Actions problem management process simplified