SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
What the NTSB teaches us about
incident management & postmortems
​Jeff Weiner
​Chief Executive Officer
​Michael Kehoe
​Staff Site Reliability Engineer
​Nina Mushiana
​Sr Site Reliability Manager
Agenda and Vision
Today’s
agenda
1 Introductions
2 Background on the NTSB
3 NTSB: Investigative Process
4 Recommendations & Most Wanted List
5 How this applies to us?
6 Final thoughts
Michael Kehoe
​$ /USR/BIN/WHOAMI
● Staff Site Reliability Engineer @ LinkedIn
● Production-SRE Team
● Funny accent = Australian + 4 years American
Nina Mushiana
​$ /USR/BIN/WHOAMI
● Sr Site Reliability Engineer Manager @ LinkedIn
● Production-SRE Team & Site-Ops
Production-SRE Team @ LinkedIn
​$ /USR/BIN/WHOAMI
● Disaster Recovery - Planning & Automation
● Incident Response – Process & Automation
● Visibility Engineering – Making use of
operational data
● Reliability Principles – Defining best practice &
automating it
Incident Command System (ICS)
https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
Background on the NTSB
Background on the NTSB
​JURISDICTION
● Aviation
● Surface Transportation
● Marine
● Pipeline
● Assistance to other agencies/ governments
“The NTSB shall investigate or have investigated and
establish the facts, circumstances, and cause or
probable cause of accidents…”
U.S. Code § 1131
“… The Board shall report on the facts and
circumstances of each accident investigated…The
Board shall make each report available to the public
at reasonable cost…”
U.S. Code § 1131
“The NTSB does not assign fault or blame for an
accident or incident…accident/incident
investigations are fact-finding proceedings with no
formal issues and no adverse parties … and are not
conducted for the purpose of determining the rights
or liabilities of any person.”
U.S. Code § 1154
Similar Organizations
● Italy –Agenzia nazionale per la
Sicurezza del Volo (ANSV)
● Canada – Transportation Safety Board
of Canada (TSB)
● Indonesia- Komite Nasional
Keselamatan Transportasi (NTSC)
● Netherlands – Dutch Safety Board
(DSB)
● Australia – Australian Transport Safety
Bureau (ATSB)
● United Kingdom – Air Accidents
Investigation Branch (AAIB)
● Germany – Bundesstelle für
Flugunfalluntersuchung
● France –Bureau d’Enquetes et
d’Analyses pour la Securite de
l’Aviation Civile (BEA)
NTSB Investigation Process
NTSB Investigation Process
1. Pre-Investigation Preparation
2. Notification & Initial Response
3. On-Scene Activities
4. Post-On-Scene Activities
1. Pre-Investigation
Preparation
Pre-Investigation Preparation
​GO TEAM
● Go team: On call investigators ready for
assignments
● Investigator-In-Change (IIC) pre-assigned
● Full Go team may contain several subject
matter experts; e.g.
○ Human performance
○ Aircraft performance
○ Air Traffic Control
Pre-Investigation Preparation
​GO TEAM ROSTER
● Oncall roster made available internally
○ Phone & Pager numbers
● Updated weekly
● All personnel should be able to arrive at an
airport 2 hours after notification
○ Should have essentials on them if they
live far away from an airport
● Division Chiefs responsible for testing pager
2. Notification & Initial
Response
Notification & Initial Response
​REGIONAL RESPONSE
1. Regional office notifies headquarters of
incident
2. Closest regional office to accident will
provide at least one investigator to perform
PR & “stakedown”
Notification & Initial Response
​HEADQUARTERS RESPONSE
1. After incident occurs: communication center
advises IIC and chief of Major Investigations
(who subsequently inform their superiors)
2. OAS director decides whether to launch a
Go-Team
3. Other executives are made aware by Chief of
Major Investigations
Notification & Initial Response
​NOTIFICATION & ASSIGNMENTS
● Go-Team composition determined by
incident circumstances
● Send more specialists if in doubt
Notification & Initial Response
​PARTY NOTIFICATION
● IIC gives party status to organizations that
can provide technical assistance (airlines,
aircraft manufacturers etc.)
● Communication center will help with travel
arrangements and on-site administrative
support
● Go-Team will travel together to accident site
3. On-Scene Activities
On-Scene Activities
​COMMAND ROOMS
● Have meeting rooms to accommodate at least
30 people
● Have space for media
● Ensure you have equipment in command
room
○ PCs
○ Telephone systems
○ Forms
● IIC is responsible for managing this
On-Scene Activities
​COMMAND ROOMS
● For Major investigations, Administrative
support is provided
● Government purchase card is available for
goods or services
On-Scene Activities
​ORGANIZATIONAL MEETING
● Share preliminary information
● Organize (assign) participants
● Organize observers
● Establish lines of authority
“The manner in which the IIC conducts the
organizational meeting will establish the tone of the
investigation. Therefore, the importance of being
organized, articulate, assertive, composed, and
understanding cannot be overstated”
Major Investigations Manual Sec 3.2
On-Scene Activities
​ACCIDENT SITE SAFETY PRECAUTIONS
● Safety officer identifies & classifies risks and
then develops counter-measures
● Safety officer performs daily briefings to
accident site team.
On-Scene Activities
​OBSERVERS
● Observers may be allowed if they do not have
self-interest
● May include:
○ Congressional oversight committee(s)
○ Military personnel
○ Foreign Governments
○ Federal Agencies
On-Scene Activities
​LINE OF AUTHORITY
● IIC is the most senior person on-scene and all
investigative activity is under his/ her control
● If IIC cannot resolve an issue, IIC may talk to
Chief of Major Investigations
● Ability to escalate further if required
On-Scene Activities
​PROGRESS MEETINGS
● On-site progress meetings are held daily to:
○ Disseminate information obtained
○ Plan the day’s activities
○ Discuss plans for subsequent
investigative activities
● Generally start at 6pm
● Plan next day’s meeting
On-Scene Activities
​DAILY ACTIVITIES OF IIC
● Headquarters briefing
● Safety board staff meeting
● Party coordinator meeting
● Site visit
4. Post-On-Scene Activities
NTSB Report Structure
Gathering facts
about the incident
Factual
Information
Extra information
Appendices
Analyze how the
facts contribution to
the incident
Analysis
Draw conclusions
about what
happened
Conclusions
Write detailed
recommendations
Recommendation
s
Post-On-Scene Activities
​WORK PLANNING
● Discuss activities that will follow the on-scene
phase of investigation
● Build timelines for work
● Provides avenues for various teams to work
together
Post-On-Scene Activities
​FACTS & ANALYSIS REPORT
● A factual report based on the field notes and
subsequent investigation activities
● Each group chairman shall submit an analysis
report based on the information contained in
his or her factual report.
Post-On-Scene Activities
​PUBLIC HEARING
● Led by IIC/ Hearing Officer
● Identify witnesses whose testimony is
appropriate
● The witnesses may be from the parties to the
investigation or can be suggested by one or
more of the parties.
● Purpose: To ensure all relevant information is
gathered before writing the report
Post-On-Scene Activities
​TECHNICAL REVIEW
● Provides an additional opportunity for all
parties to review all factual information
● Ensures all issues are resolved
● Technical Review is held as soon as possible
after public hearing
Post-On-Scene Activities
​PREPARATION OF FINAL REPORT
● Dedicated department to help write report
● Follows a standard template
○ Annex 13 to the International Civil
Aviation Organization (ICAO)
● Contains formal recommendations to
manufacturers/ transportation authorities
Recommendations &
Most Wanted List
Recommendations & Most Wanted List
● NTSB advocates for particular action items
based on report(s):
○ Generally directed towards Transport
bodies/ manufacturers
● NTSB publicly tracks response of the
responsible body
https://www.ntsb.gov/safety/mwl/Pages/default.aspx
How this relates to all of us?
1. Pre-Investigation
Preparation
Applying this to operations
​PRE-INCIDENT PREPARATION
● Have an Incident commander pre-assigned
● Publish on-call schedules
○ Manager is responsible
● Test on-call pagers regularly
● Ensure that you can respond within SLA
● Printed copy of Oncall contact info
● DR
http://i.imgur.com/wvg8IDq.gif
2. Notification & Initial
Response
Applying this to operations
​NOTIFICATION & INITIAL RESPONSE
● NOC/ SiteOps teams notifies incident
commander + manager
○ Prod-SRE gets engaged
● Prod-SRE Manager/Oncall
○ Access, Engage, Notify, Mitigate
https://docs.microsoft.com/en-us/windows/uwp/design/shell/tiles-and-notifications/images/toast-mirroring.gif
Applying this to operations
​NOTIFICATION & INITIAL RESPONSE
● Once verified, we launch full response for Major
Incident
● Incident commander gives “party status” to
observers
● Manager informs executives & PR
○ Periodic updates
● Mitigate
http://www.roadrunneremaillogin.com/wp-content/uploads/2018/06/RoadRunner-Email.jpg
3. On-Scene Activities
Applying this to operations
​ON-SCENE ACTIVITIES
● Private + Public slack work-channels
● IC is empowered to make decisions
● Organizational call to ensure:
○ Problem is understood
○ Area of investigations assigned
http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
Applying this to operations
​ON-SCENE ACTIVITIES
● War room
○ Incident commander drives the
war-room
○ Roles & responsibilities assigned to each
“party”
○ Communication at regular cadence to
execs
○ Admin ensures supplies and food
● Gathering data and updating timeline doc
http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
4. Post-On-Scene Activities
Applying this to operations
​POST ON-SCENE ACTIVITIES
● Post mortem
○ Dedicated team
○ PM Template
○ Blameless
● “Postmortem rollup”
○ Action items are prioritized
○ Weekly reporting on status of
action-items
https://www.economist.com/sites/default/files/imagecache/1280-width/20180414_OFP021.gif
Recommendations:
Most Wanted List
Applying this to operations
​MOST WANTED LIST
● Use the post-incident process to improve
and hold people accountable for action
items
● Keep track of recurring issues/ repeaters
https://clip2art.com/images/meeting-clipart-animated-gif-2.gif
Final Thoughts
Final Thoughts
Complete Incident +
Postmortem process
NTSB Investigative
Process
The more you put in,
the more you’ll get
out
Invest
Accountability for
improvements/
action items
Accountability
Questions?
What the NTSB teaches us about incident management & postmortems

Weitere ähnliche Inhalte

Ähnlich wie What the NTSB teaches us about incident management & postmortems

OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009Ian van Vuuren
 
KC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGRKC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGROsmond Okonkwo
 
Attending Emergency/Accident and Incident Reporting (With first aid)
Attending Emergency/Accident and Incident Reporting (With first aid)Attending Emergency/Accident and Incident Reporting (With first aid)
Attending Emergency/Accident and Incident Reporting (With first aid)RevanuruSubramanyam
 
Is it Necessary to Document the BCMS plan?
Is it Necessary to Document the BCMS plan?Is it Necessary to Document the BCMS plan?
Is it Necessary to Document the BCMS plan?PECB
 
autonomy for hazardous scene assessment themed competition 22 September 2016
autonomy for hazardous scene assessment themed competition 22 September 2016autonomy for hazardous scene assessment themed competition 22 September 2016
autonomy for hazardous scene assessment themed competition 22 September 2016Defence and Security Accelerator
 
Event infrastructure
Event infrastructure Event infrastructure
Event infrastructure M. C.
 
NIGEL DIXON CV 260716
NIGEL DIXON CV 260716NIGEL DIXON CV 260716
NIGEL DIXON CV 260716Nigel Dixon
 
Akmal CV.doc (hse )Apply for hse job
Akmal CV.doc (hse )Apply for hse jobAkmal CV.doc (hse )Apply for hse job
Akmal CV.doc (hse )Apply for hse jobakmal khan
 
CISSP Week 12
CISSP Week 12CISSP Week 12
CISSP Week 12jemtallon
 
Gerry CV Original 2016(New Dec2016)
Gerry CV  Original 2016(New Dec2016)Gerry CV  Original 2016(New Dec2016)
Gerry CV Original 2016(New Dec2016)Gervacio Pascual
 
Manuel Neto- Resume 2016
Manuel Neto-  Resume 2016Manuel Neto-  Resume 2016
Manuel Neto- Resume 2016Neto Manuel
 
C shea 21 ctto presentaion
C shea   21 ctto presentaionC shea   21 ctto presentaion
C shea 21 ctto presentaionColin Shea
 
C shea 21 ctto presentaion - 1
C shea   21 ctto presentaion - 1C shea   21 ctto presentaion - 1
C shea 21 ctto presentaion - 1Colin Shea
 
Seminar 141120202109-conversion-gate02
Seminar 141120202109-conversion-gate02Seminar 141120202109-conversion-gate02
Seminar 141120202109-conversion-gate02Ashraf Rady
 

Ähnlich wie What the NTSB teaches us about incident management & postmortems (20)

APT Event - New York
APT Event - New YorkAPT Event - New York
APT Event - New York
 
OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009
 
KC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGRKC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGR
 
PM Symposium 2009 Apply Risk Techniques on RAI Prj
PM Symposium 2009 Apply Risk Techniques on RAI PrjPM Symposium 2009 Apply Risk Techniques on RAI Prj
PM Symposium 2009 Apply Risk Techniques on RAI Prj
 
ROGEL resume up date as of AUG.
ROGEL resume up date as of AUG.ROGEL resume up date as of AUG.
ROGEL resume up date as of AUG.
 
Attending Emergency/Accident and Incident Reporting (With first aid)
Attending Emergency/Accident and Incident Reporting (With first aid)Attending Emergency/Accident and Incident Reporting (With first aid)
Attending Emergency/Accident and Incident Reporting (With first aid)
 
Is it Necessary to Document the BCMS plan?
Is it Necessary to Document the BCMS plan?Is it Necessary to Document the BCMS plan?
Is it Necessary to Document the BCMS plan?
 
Sandeep Bhaskar Resume 2016
Sandeep Bhaskar Resume 2016Sandeep Bhaskar Resume 2016
Sandeep Bhaskar Resume 2016
 
autonomy for hazardous scene assessment themed competition 22 September 2016
autonomy for hazardous scene assessment themed competition 22 September 2016autonomy for hazardous scene assessment themed competition 22 September 2016
autonomy for hazardous scene assessment themed competition 22 September 2016
 
Event infrastructure
Event infrastructure Event infrastructure
Event infrastructure
 
NIGEL DIXON CV 260716
NIGEL DIXON CV 260716NIGEL DIXON CV 260716
NIGEL DIXON CV 260716
 
Akmal CV.doc (hse )Apply for hse job
Akmal CV.doc (hse )Apply for hse jobAkmal CV.doc (hse )Apply for hse job
Akmal CV.doc (hse )Apply for hse job
 
CISSP Week 12
CISSP Week 12CISSP Week 12
CISSP Week 12
 
Gerry CV Original 2016(New Dec2016)
Gerry CV  Original 2016(New Dec2016)Gerry CV  Original 2016(New Dec2016)
Gerry CV Original 2016(New Dec2016)
 
Manuel Neto- Resume 2016
Manuel Neto-  Resume 2016Manuel Neto-  Resume 2016
Manuel Neto- Resume 2016
 
Ramkishore choudhary resume
Ramkishore choudhary   resumeRamkishore choudhary   resume
Ramkishore choudhary resume
 
Ramkishore choudhary resume
Ramkishore choudhary   resumeRamkishore choudhary   resume
Ramkishore choudhary resume
 
C shea 21 ctto presentaion
C shea   21 ctto presentaionC shea   21 ctto presentaion
C shea 21 ctto presentaion
 
C shea 21 ctto presentaion - 1
C shea   21 ctto presentaion - 1C shea   21 ctto presentaion - 1
C shea 21 ctto presentaion - 1
 
Seminar 141120202109-conversion-gate02
Seminar 141120202109-conversion-gate02Seminar 141120202109-conversion-gate02
Seminar 141120202109-conversion-gate02
 

Mehr von Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 

Mehr von Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 

Kürzlich hochgeladen

Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Kürzlich hochgeladen (20)

Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

What the NTSB teaches us about incident management & postmortems

  • 1. What the NTSB teaches us about incident management & postmortems ​Jeff Weiner ​Chief Executive Officer ​Michael Kehoe ​Staff Site Reliability Engineer ​Nina Mushiana ​Sr Site Reliability Manager
  • 3. Today’s agenda 1 Introductions 2 Background on the NTSB 3 NTSB: Investigative Process 4 Recommendations & Most Wanted List 5 How this applies to us? 6 Final thoughts
  • 4. Michael Kehoe ​$ /USR/BIN/WHOAMI ● Staff Site Reliability Engineer @ LinkedIn ● Production-SRE Team ● Funny accent = Australian + 4 years American
  • 5. Nina Mushiana ​$ /USR/BIN/WHOAMI ● Sr Site Reliability Engineer Manager @ LinkedIn ● Production-SRE Team & Site-Ops
  • 6. Production-SRE Team @ LinkedIn ​$ /USR/BIN/WHOAMI ● Disaster Recovery - Planning & Automation ● Incident Response – Process & Automation ● Visibility Engineering – Making use of operational data ● Reliability Principles – Defining best practice & automating it
  • 7. Incident Command System (ICS) https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
  • 9. Background on the NTSB ​JURISDICTION ● Aviation ● Surface Transportation ● Marine ● Pipeline ● Assistance to other agencies/ governments
  • 10. “The NTSB shall investigate or have investigated and establish the facts, circumstances, and cause or probable cause of accidents…” U.S. Code § 1131
  • 11. “… The Board shall report on the facts and circumstances of each accident investigated…The Board shall make each report available to the public at reasonable cost…” U.S. Code § 1131
  • 12. “The NTSB does not assign fault or blame for an accident or incident…accident/incident investigations are fact-finding proceedings with no formal issues and no adverse parties … and are not conducted for the purpose of determining the rights or liabilities of any person.” U.S. Code § 1154
  • 13. Similar Organizations ● Italy –Agenzia nazionale per la Sicurezza del Volo (ANSV) ● Canada – Transportation Safety Board of Canada (TSB) ● Indonesia- Komite Nasional Keselamatan Transportasi (NTSC) ● Netherlands – Dutch Safety Board (DSB) ● Australia – Australian Transport Safety Bureau (ATSB) ● United Kingdom – Air Accidents Investigation Branch (AAIB) ● Germany – Bundesstelle für Flugunfalluntersuchung ● France –Bureau d’Enquetes et d’Analyses pour la Securite de l’Aviation Civile (BEA)
  • 15. NTSB Investigation Process 1. Pre-Investigation Preparation 2. Notification & Initial Response 3. On-Scene Activities 4. Post-On-Scene Activities
  • 17. Pre-Investigation Preparation ​GO TEAM ● Go team: On call investigators ready for assignments ● Investigator-In-Change (IIC) pre-assigned ● Full Go team may contain several subject matter experts; e.g. ○ Human performance ○ Aircraft performance ○ Air Traffic Control
  • 18. Pre-Investigation Preparation ​GO TEAM ROSTER ● Oncall roster made available internally ○ Phone & Pager numbers ● Updated weekly ● All personnel should be able to arrive at an airport 2 hours after notification ○ Should have essentials on them if they live far away from an airport ● Division Chiefs responsible for testing pager
  • 19. 2. Notification & Initial Response
  • 20. Notification & Initial Response ​REGIONAL RESPONSE 1. Regional office notifies headquarters of incident 2. Closest regional office to accident will provide at least one investigator to perform PR & “stakedown”
  • 21. Notification & Initial Response ​HEADQUARTERS RESPONSE 1. After incident occurs: communication center advises IIC and chief of Major Investigations (who subsequently inform their superiors) 2. OAS director decides whether to launch a Go-Team 3. Other executives are made aware by Chief of Major Investigations
  • 22. Notification & Initial Response ​NOTIFICATION & ASSIGNMENTS ● Go-Team composition determined by incident circumstances ● Send more specialists if in doubt
  • 23. Notification & Initial Response ​PARTY NOTIFICATION ● IIC gives party status to organizations that can provide technical assistance (airlines, aircraft manufacturers etc.) ● Communication center will help with travel arrangements and on-site administrative support ● Go-Team will travel together to accident site
  • 25. On-Scene Activities ​COMMAND ROOMS ● Have meeting rooms to accommodate at least 30 people ● Have space for media ● Ensure you have equipment in command room ○ PCs ○ Telephone systems ○ Forms ● IIC is responsible for managing this
  • 26. On-Scene Activities ​COMMAND ROOMS ● For Major investigations, Administrative support is provided ● Government purchase card is available for goods or services
  • 27. On-Scene Activities ​ORGANIZATIONAL MEETING ● Share preliminary information ● Organize (assign) participants ● Organize observers ● Establish lines of authority
  • 28. “The manner in which the IIC conducts the organizational meeting will establish the tone of the investigation. Therefore, the importance of being organized, articulate, assertive, composed, and understanding cannot be overstated” Major Investigations Manual Sec 3.2
  • 29. On-Scene Activities ​ACCIDENT SITE SAFETY PRECAUTIONS ● Safety officer identifies & classifies risks and then develops counter-measures ● Safety officer performs daily briefings to accident site team.
  • 30. On-Scene Activities ​OBSERVERS ● Observers may be allowed if they do not have self-interest ● May include: ○ Congressional oversight committee(s) ○ Military personnel ○ Foreign Governments ○ Federal Agencies
  • 31. On-Scene Activities ​LINE OF AUTHORITY ● IIC is the most senior person on-scene and all investigative activity is under his/ her control ● If IIC cannot resolve an issue, IIC may talk to Chief of Major Investigations ● Ability to escalate further if required
  • 32. On-Scene Activities ​PROGRESS MEETINGS ● On-site progress meetings are held daily to: ○ Disseminate information obtained ○ Plan the day’s activities ○ Discuss plans for subsequent investigative activities ● Generally start at 6pm ● Plan next day’s meeting
  • 33. On-Scene Activities ​DAILY ACTIVITIES OF IIC ● Headquarters briefing ● Safety board staff meeting ● Party coordinator meeting ● Site visit
  • 35. NTSB Report Structure Gathering facts about the incident Factual Information Extra information Appendices Analyze how the facts contribution to the incident Analysis Draw conclusions about what happened Conclusions Write detailed recommendations Recommendation s
  • 36. Post-On-Scene Activities ​WORK PLANNING ● Discuss activities that will follow the on-scene phase of investigation ● Build timelines for work ● Provides avenues for various teams to work together
  • 37. Post-On-Scene Activities ​FACTS & ANALYSIS REPORT ● A factual report based on the field notes and subsequent investigation activities ● Each group chairman shall submit an analysis report based on the information contained in his or her factual report.
  • 38. Post-On-Scene Activities ​PUBLIC HEARING ● Led by IIC/ Hearing Officer ● Identify witnesses whose testimony is appropriate ● The witnesses may be from the parties to the investigation or can be suggested by one or more of the parties. ● Purpose: To ensure all relevant information is gathered before writing the report
  • 39. Post-On-Scene Activities ​TECHNICAL REVIEW ● Provides an additional opportunity for all parties to review all factual information ● Ensures all issues are resolved ● Technical Review is held as soon as possible after public hearing
  • 40. Post-On-Scene Activities ​PREPARATION OF FINAL REPORT ● Dedicated department to help write report ● Follows a standard template ○ Annex 13 to the International Civil Aviation Organization (ICAO) ● Contains formal recommendations to manufacturers/ transportation authorities
  • 42. Recommendations & Most Wanted List ● NTSB advocates for particular action items based on report(s): ○ Generally directed towards Transport bodies/ manufacturers ● NTSB publicly tracks response of the responsible body https://www.ntsb.gov/safety/mwl/Pages/default.aspx
  • 43. How this relates to all of us?
  • 45. Applying this to operations ​PRE-INCIDENT PREPARATION ● Have an Incident commander pre-assigned ● Publish on-call schedules ○ Manager is responsible ● Test on-call pagers regularly ● Ensure that you can respond within SLA ● Printed copy of Oncall contact info ● DR http://i.imgur.com/wvg8IDq.gif
  • 46. 2. Notification & Initial Response
  • 47. Applying this to operations ​NOTIFICATION & INITIAL RESPONSE ● NOC/ SiteOps teams notifies incident commander + manager ○ Prod-SRE gets engaged ● Prod-SRE Manager/Oncall ○ Access, Engage, Notify, Mitigate https://docs.microsoft.com/en-us/windows/uwp/design/shell/tiles-and-notifications/images/toast-mirroring.gif
  • 48. Applying this to operations ​NOTIFICATION & INITIAL RESPONSE ● Once verified, we launch full response for Major Incident ● Incident commander gives “party status” to observers ● Manager informs executives & PR ○ Periodic updates ● Mitigate http://www.roadrunneremaillogin.com/wp-content/uploads/2018/06/RoadRunner-Email.jpg
  • 50. Applying this to operations ​ON-SCENE ACTIVITIES ● Private + Public slack work-channels ● IC is empowered to make decisions ● Organizational call to ensure: ○ Problem is understood ○ Area of investigations assigned http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  • 51. Applying this to operations ​ON-SCENE ACTIVITIES ● War room ○ Incident commander drives the war-room ○ Roles & responsibilities assigned to each “party” ○ Communication at regular cadence to execs ○ Admin ensures supplies and food ● Gathering data and updating timeline doc http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  • 53. Applying this to operations ​POST ON-SCENE ACTIVITIES ● Post mortem ○ Dedicated team ○ PM Template ○ Blameless ● “Postmortem rollup” ○ Action items are prioritized ○ Weekly reporting on status of action-items https://www.economist.com/sites/default/files/imagecache/1280-width/20180414_OFP021.gif
  • 55. Applying this to operations ​MOST WANTED LIST ● Use the post-incident process to improve and hold people accountable for action items ● Keep track of recurring issues/ repeaters https://clip2art.com/images/meeting-clipart-animated-gif-2.gif
  • 57. Final Thoughts Complete Incident + Postmortem process NTSB Investigative Process The more you put in, the more you’ll get out Invest Accountability for improvements/ action items Accountability