4. OUTLINE
A plane crash on the 8th January 1989
British Midland Flight 92. Flying from Heathrow to Belfast
Crashes by the M1 motorway near Kegworth, while
attempting an emergency landing at East Midlands Airport
The plane was a Boeing 737-400. A new variant of Boeing
737. In use by BM for less than two months
There were 118 passengers and 8 Crew. 47 die, and 74
seriously injured
5. SEQUENCE OF
EVENTS
• The pilots hear a pounding noise and feel vibrations
(subsequently found to be caused by a fan blade breaking
inside the left engine).
• Smoke enters the cabin and passengers sitting near the rear of
the plane notice flames coming from the left engine
• The flight is diverted to East Midlands Airport
• The pilot shuts down the engine on the right
6. SEQUENCE OF
EVENTS
• The pilots can no longer feel the vibrations, and do not notice
the vibration detector is still reporting a problem. The smoke
disperses.
• The pilot informs the passengers and crew that there was a
problem with the right engine and that it has been shut down
• 20 minutes later. On approach to East Midlands Airport, the
pilot increases thrust. This causes the left engine to burst into
flames and cease operating
• The pilots try to restart the left engine, but crash short of the
runway
7. WRONG ENGINE SHUT
DOWN. WHY?
Incorrect assumption: Pilots believed the “bleed air” was
taken from the right engine, and therefore the smoke
must be coming from the right. The 737 used bleed air
from the right engine, not the 737-400. Psychologists call
this a mistake in “knowledge based performance”
Design issues: No visibility of engines, so relied on other
information sources to explain vibrations. The vibration
sensors were tiny, and had a new style of digital display.
The vibration sensors were inaccurate on the 737 but not
the 737-400
Inadequate training: A one day course, and no simulator
training
8. ERROR NOT TRAPPED.
WHY?
Coincidence: The smoke disappeared after shutting down the
right engine and the vibrations lessened. - Psychologists call
this “Confirmation bias”.
Lapse in procedure: After shutting down the right engine the pilot
began checking all meters and reviewing decisions but
stopped after being interrupted by a transmission from the
airport asking him to descend to 12,000 ft.
Lack of Communication: Some cabin crew and passengers could
see the left engine was on fire, but did not inform the pilot,
even when the pilot announced he was shutting down the
right engine.
Design Issue: The vibration meters would have shown a problem
with the left engine, but were too difficult to read. There was
no alarm.
10. VIEWPOINTS
Traditional engineering view
• The crash was caused by an engine failure. Therefore we
must design better engines.
Traditional managerial view
• The crash was caused by the pilots. We must hire better
pilots.
The Socio-technical systems engineering view or new view
• The crash had no single cause, but involved problems in
Testing, Design, Training, Teamwork, Communications,
Procedure Following, Decision Making, poor „upgrade‟
management, (and more)
• We need better engines, but we also need to expect problems
to happen and to be adequately prepared for them
11. THE “NEW VIEW” OF HUMAN
ERROR
The old view The new view
Human error is the cause of Human error is a symptom of
accidents trouble deeper inside a
system
Systems are inherently unsafe
Systems are inherently safe
and people usually keep them
and people introduce errors
running well
Bad things happen to bad
All humans are fallible
people
12. THE “NEW VIEW” OF HUMAN
ERROR
Is not new! This is just a name, it has been around for 20
years.
Draws the emphasis away from modelling human error, and
towards understanding what underlies human actions when
operating technology
• How do people get things right?
Argues too much emphasis is placed on “the sharp end”. It
argues that error is symptomatic of deeper trouble
Opposes the “blame culture” that has arisen in many
organisations. We are too quick to blame system operators
when managers and engineers are at fault.
13. HUMAN RELIABILITY
Humans don‟t just introduce errors into systems, but are
often responsible for avoiding and correcting them too.
What do people really do when they are operating a
technology?
• Very little human work is driven by a clear and
unambiguous set of recipes or processes, even when
these are available
• All human work is situationally contingent. Work must
inevitably be more than following a set of steps.
• If people work to rule, accidents can happen. For example
the prior to the sinking of the SS Estonia a crew member
did not report a leak as it was not his job.
14. CORRECT PROCEDURE?
There is not always a „correct‟ procedure by which to judge
any action.
Sometimes trial and error processes are necessary
• In young organisations, best practices may not yet exist
• New and unusual situations may occur in which a trial and
error approach is appropriate
• Sometimes it is appropriate to play or experiment. This is
how innovation often happens.
So deciding when something is an error, and judging whether
an error was appropriate to a set of circumstances can be
highly context dependent.
15.
16. FIELDWORK
Often we don‟t notice that people need to do things to keep
complex systems running smoothly.
• Fieldwork is an important aspect of understanding how
systems are operated and how people work.
17. STUDYING SUCCESS
It is important to study and understand ordinary work
We can also learn lessons from “successful failures”,
including
• The Apollo 13 Mission
• The Airbus A380 engine explosion over Batom island
• The Sioux City Crash
however accounts of successful failures can turn into a form
of hero worship, and organisations that experience these
kinds of success against the odds can build a false sense of
invulnerability.
18. PROBLEMS WITH AUTOMATION
As work becomes automated, engineers often make the
mistake of automating the aspects that are easy to automate.
• The Fitts list MABA-MABA approach can lead to a
dangerous lack of awareness and control for systems
operators.
• The “paradox of automation” is that automation creates
and requires new forms of labour.
• The major design problem is no longer how to support
workflow, but how to support awareness across a system
and organisation, and how to support appropriate kinds of
intervention
19. CREW RESOURCE MANAGEMENT
One approach to improving reliability and reducing human
error is crew resource management (CRM)
• Developed in the aviation industry, and now widely used
• Formerly Crew Resource Management
CRM Promotes
• The effective use of all resources (human, physical,
software)
• Teamwork
• Proactive accident prevention
20. CREW RESOURCE MANAGEMENT
The focus of CRM is upon
• Communication: How to communicate clearly and
effectively
• Situational awareness: How to build and maintain an
accurate and shared picture of an unfolding situation
• Decision making: How to make appropriate decisions
using the available information. (and how to make
appropriate information available)
• Teamwork: Effective group work, effective leadership, and
effective followership.
• Removing barriers: How to remove barriers to the above
21. KEY POINTS
It can be too narrow to focus on human error
• Human errors are usually symptomatic of deeper
problems
• Human reliability is not just about humans not making
errors, but about how humans maintain dependability
We cannot rely on there being correct procedures for every
situations. Procedures are important, but we need to support
cooperative working
Design approaches, as well as human and organisational
approaches, can be taken to support human reliability.
Editor's Notes
20 Minutes to trap the error
Not just planes, but ambulance dispactch, terminal 5, passport issuing, enterprise sytems.