2. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
2) Identify all element or function failure modes
3) Determine the effect(s) of each failure mode and its
severity
4) Determine the cause(s) of each failure mode and its
probability of occurrence
5) Identify the current controls in place to prevent or
detect the potential failure modes
6) Assess risk, prioritize failures and assign corrective
actions to eliminate or mitigate the risk
7) Document the process
To achieve the greatest value, FMEA should be conducted
before a failure mode has been unknowingly built into the
product [13]. A typical design FMEA worksheet is shown in
Figure 1. For risk assessment, a FMEA uses occurrence and
detection probabilities in conjunction with severity criteria to
develop a risk priority number (RPN). The RPN is the
product of severity, occurrence and detection. The calculated
RPNs are prioritized and corrective actions are taken to
mitigate the risk associated with the potential failure. Once
corrective actions are implemented, the severity, occurrence
and detection values are reassessed, and a new RPN is
calculated. This process continues until the risk level is
acceptable.
A limitation of the FMEA and FMECA procedures is that
neither identifies the product failure mechanisms and models
in the analysis and reporting process. Failure mechanisms are
the processes by which physical, electrical, chemical and
mechanical stresses induce failure [14]. Investigation of the
possible failure modes and mechanisms of the product aids in
developing failure-free and reliable designs. A design team
must be aware of the possible failure mechanisms to design
hardware capable of withstanding loads without failing.
Failure mechanisms and their related physical models are also
important for planning tests and screens to audit nominal
design and manufacturing specifications, as well as the level
of defects introduced by excessive variability in
manufacturing and material parameters. Without information
on failure mechanisms, FMEA may not provide a meaningful
input to critical procedures such as virtual qualification, root
cause analysis, accelerated test programs, and to remaining
life assessment. Another potential shortcoming of the
standard FMEA is that the use of environmental and operating
conditions information is not made at a quantitative level.
In reliability simulation, failure models are used to
analytically estimate times to failure distributions. Reliability
simulation at the product development stage is essential to
reduce the product development cost and time by allowing the
identification of weaknesses in the design and evaluating
design options. Reliability simulation can only be technically
and economically effective if it considers the appropriate
failure mechanisms relevant to a particular design and
application environment.
Root cause investigations require an understanding of
possible failure mechanisms to guide the data collection for
incident analysis, and the root cause hypothesis development
and verification.
Accelerated testing is based upon the concept that a
product will exhibit the same failure mechanism and mode in
a short time under high stress conditions, as it would exhibit
in a longer time under actual life cycle stress conditions.
Accelerated tests are used to precipitate failures during
product development and qualification. Only with the
knowledge of the relevant failure mechanisms, can one design
appropriate tests (e.g., stress levels, physical architecture, and
durations) that will precipitate the failures by the same
mechanism without resulting in spurious failures. The
accelerated test data can only be analyzed for estimating times
to failure in the field, when the mechanism and stresses that
affect both the mechanism and times to failure, are known and
understood.
Health and usage monitoring of electronics involves the
selection and placement of appropriate sensors into a product
to monitor the loads experienced by the system. The
constraints on physical space and interfaces available for data
collection and transmission limit the number of sensors that
can be integrated into a product. Therefore, a prioritized list
of failure mechanisms and the environmental conditions that
affect them needs to be established to ensure that the
appropriate data is collected and utilized for remaining life
assessment.
3. Failure Modes, Mechanisms and Effects Analysis
Methodology
A novel approach, Failure Modes, Mechanisms and
Effects Analysis (FMMEA), is proposed to address
weaknesses in the traditional FMEA and FMECA processes.
FMMEA is a systematic methodology to identify potential
System
Subsystem
Component
Potential
FMEA
Number
PreparedBy
FMEA Date
Failure Mode and Effects Analysis
(Design FMEA)
Design Lead Key Date Revision Date
Page ofCore Team
ActionResults
Item /
Function
Potential
Failure
Mode(s)
Potential
Effect(s)
of Failure
Sev
Potential
Cause(s)of
Failure
Prob
Current
Design
Controls
Det
RPN
Recommended
Action(s)
Responsibility
& Target
Completion
Date
ActionsTaken
NewSev
NewOcc
NewDet
NewRPN
Figure 1. FMEA worksheet [15].
3. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
failure mechanisms and models for all potential failures
modes, and to prioritize failure mechanisms. FMMEA
enhances the value of the FMEA and FMECA methods by
identifying high-priority failure mechanisms in order to create
an action plan to mitigate their effects. High priority failure
mechanisms determine the operational stresses and the
environmental and operational parameters that need to be
controlled. Models for the failure mechanisms help in the
design and development of the product.
FMMEA is based on understanding the relationships
between product requirements and the physical characteristics
of the product (and their variation in the production process),
the interactions of product materials with loads (stresses at
application conditions) and their influence on the product
susceptibility to failure with respect to the use conditions.
This involves identifying the failure mechanisms and
Identify life cycle environmental
and operating conditions
Define system and identity
elements and functions to be analyzed
Identify potential failure modes
Identify potential failure causes
Identify potential failure mechanisms
Identify failure models
Prioritize failure mechanisms
Document the process
reliability models to quantitatively evaluate the susceptibility
to failure. In addition to the information gathered and used
for FMEA, FMMEA uses life cycle environmental and
operating conditions and the duration of the intended
application with knowledge of the active stresses and
potential failure mechanisms. The steps of the FMMEA
process are summarized in Figure 2, and described in the
following sections.
3.1. System Definition, Elements and Functions
As illustrated in Figure 2, a FMMEA process begins by
defining the system to be analyzed, which is viewed as a
composite of subsystems or levels that are integrated to
achieve a specific objective. The system is divided into
various sub-systems or levels, continuing to the lowest
possible level, which is referred to as component or element.
The system breakdown can be either be performed by
function (i.e., according to what the system elements “do”),
by location (i.e., according to where the system elements
“are”), or a combination of both (i.e., functional breakdown
by location, or vice versa). In a printed circuit board, for
example, a location breakdown would include the package,
plated through-hole (PTH), metallization, and the board itself.
For each element, all of the associated functions are listed.
For example, the primary function of a solder joint is to
connect two materials mechanically and electrically. Hence,
failure of a solder joint will relate to its inability to perform as
a physical and electrical interconnection. Further analysis is
conducted on each element thus identified.
3.2. Potential Failure Modes
For all the elements that have been identified, all possible
failure modes are listed. For example, in a solder joint the
potential failure modes are open or intermittent change in
resistance, that can hamper its functioning as an
interconnect. In cases where information on possible failure
modes that may occur is not available, potential failure modes
may be identified using numerical stress analysis, accelerated
tests to failure (e.g., HALT), past experience and engineering
judgment [12]. A potential failure mode at one level may be
the cause of a potential failure mode in a higher level system or
subsystem, or be the effect of one in a lower level component.
Figure 2. FMMEA methodology.
3.3. Potential Failure Causes
A failure cause is defined as the circumstances during
design, manufacture, or use that lead to a failure mode [12].
For each failure mode, all possible ways a failure can result
are listed. Failure causes are identified by finding the basic
reason that may lead to a failure during design,
manufacturing, storage, transportation or use. The failure
causes can include environmental and operational conditions.
In an automotive underhood environment, for example, solder
joint failure modes such as open and intermittent change in
resistance can be caused by temperature cycling, random
vibration, and shock impact. Knowledge of the potential
failure causes can help identify the failure mechanisms that
drive the failure modes for a given element.
3.4. Potential Failure Mechanisms
Failure mechanisms are determined for each combination
of potential failure mode and cause based on known
mechanisms [16]. Studies on electronic material failure
mechanisms, and the application of physics-based damage
models to the design of reliable electronic products include
[17, 18].
The failure mechanisms identified are categorized as
either overstress or wearout mechanisms. Catastrophic
failures due to a single occurrence of a stress event when the
intrinsic strength of the material is exceeded are termed
overstress failures. Failure mechanisms due to monotonic
accumulation of incremental damage beyond the endurance of
the material are termed wearout mechanisms [12]. When the
damage exceeds the endurance limit of the component, failure
will occur. Unanticipated large stress events can either cause
an overstress (catastrophic) failure, or shorten life by causing
the accumulation of wearout damage. Examples of such
stresses are accidental abuse and acts of God. On the other
hand, in well-designed and high-quality hardware, stresses
should cause only uniform accumulation of wearout damage;
the threshold of damage required to cause eventual failure
should not occur within the usage life of the product.
Failure mechanisms frequently occurring in electronics
can be classified as electrical performance failures, thermal
4. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
performance failures, mechanical performance failures,
radiation failures, and chemical failures. Electrical
performance failures can be caused by individual components
with improper electrical parameters, such as resistance,
impedance, capacitance, or dielectric properties, or by
inadequate shielding from electromagnetic interference (EMI)
or particle radiation. Failure modes can manifest as reversible
drifts in transient and steady-state responses, such as delay
time, rise time, attenuation, signal-to-noise ratio, and
crosstalk. Electrical failures common in electonic hardware
include overstress mechanisms due to electrical overstress
(EOS) and electrostatic discharge (ESD); examples of such
failures in semiconductor components include dielectric
breakdown, junction breakdown, hot electron injection,
surface and bulk trapping, and surface breakdown, and
wearout mechanisms such as electromigration and stress-
driven diffusive voiding.
Thermal performance failures can arise due to poor
optimization of the heat transfer chain in an electronic
assembly. Thermal overstress failures are a result of heating a
component beyond critical temperatures such as the glass-
transition temperature, melting point, fictile point, or flash
point. Some examples of thermal wearout failures are aging
due to depolymerization, intermetallic growth, and
interdiffusion. Failures due to inadequate thermal design can
manifest as components operating at excessive temperature
and causing operational parameters to drift beyond
specifications, although the degradation is often reversible
upon cooling. Such failures can be caused either by direct
thermal loads or by electrical resistive loads, which in turn
generate excessive localized thermal stresses.
Mechanical performance failures include those that may
compromise the product performance without necessarily
causing any irreversible material damage, such as abnormal
elastic deformation in response to mechanical static loads,
abnormal transient response (such as natural frequency or
damping) to dynamic loads, and abnormal time-dependent
reversible (anelastic) response, as well as failures that cause
material damage, such as buckling, brittle and/or ductile
fracture, interfacial separation, fatigue crack initiation and
propagation, creep, and creep rupture. For example,
excessive elastic deformations in slender structures in
electronic packages can sometimes constitute functional
failure due to overstress loads, such as excessive flexing of
interconnection wires, package lids, or flex circuits in
electronic devices, causing shorting and/or excessive
crosstalk. However, when the load is removed, the
deformations (and consequent functional abnormalities)
disappear completely without any permanent damage.
Radiation failures are principally caused by uranium and
thorium contaminants, and secondary cosmic rays. Radiation
can cause wearout, aging, embrittlement of materials, and
overstress soft errors in electronic hardware, such as logic
chips.
Chemical failures occur in adverse chemical environments
that result in corrosion, oxidation, or ionic surface dendritic
growth. There may also be interactions between different
types of stresses. For example, metal migration may be
accelerated in the presence of chemical contaminants and
composition gradients, and thermal loads can accelerate a
failure mechanism due to a thermal expansion mismatch.
3.5. Failure Models
Failure models use appropriate stress and damage analysis
methods to evaluate the susceptibility to failure based on the
time-to-failure or likelihood of a failure for a given geometry,
material construction, and environmental and set of
operational conditions. Table 1 provides a list of failure
models for common failure mechanisms in electronics.
Failure models for overstress mechanisms use stress analysis
Table 1. Examples of failure mechanisms in electronics, relevant loads, and models [19].
Failure Mechanism Failure Sites Relevant Loads Sample Model
Fatigue Die attach, Wirebond/TAB,
Solder leads, Bond pads,
Traces, Vias/PTHs, Interfaces
T, Tmean, dT/dt,
dwell time, H, V
Nonlinear Power
Law (Coffin-Manson)
Corrosion Metallizations M, V, T Eyring (Howard)
Electromigration Metallizations T, J Eyring (Black)
Conductive Filament
Formation
Between Metallizations M, ΛV Power Law (Rudra)
Stress Driven Diffusion
Voiding
Metal Traces s, T Eyring (Okabayashi)
Time Dependent
Dielectric Breakdown
Dielectric layers V, T Arrhenius (Fowler-
Nordheim)
: Cyclic range V: Voltage T: Temperature s: Stress
Λ: gradient M: Moisture J: Current density H: Humidity
5. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
to estimate the likelihood of a failure based on a single
exposure to a defined stress condition. The simplest
formulation for an overstress model is the comparison of an
induced stress versus the strength of the material that must
sustain the stress.
In the case of wearout failures, damage is accumulated
over a period until the item is no longer able to withstand the
applied load. Wearout mechanisms are analyzed using both
stress and damage analysis to calculate the time required to
induce failure based on a defined stress condition. An
appropriate method for combining multiple conditions must
be determined for assessing the time to failure. Sometimes,
the damage due to the individual loading conditions may be
analyzed separately, and the failure assessment results may be
combined in a cumulative manner [4].
Analysis of the system’s susceptibility to failure may be
limited by the availability and accuracy of the models used for
quantifying the time to failure of the system. It may also be
limited by the ability to combine the results of multiple failure
models for a single failure site and the ability to combine the
results of a single model for multiple stress conditions [12].
If no failure model is available, the appropriate parameter(s)
to monitor can be selected based on an empirical model
developed from prior field failure data or models derived
from accelerated testing.
3.6. Failure Mechanism Prioritization
Ideally all failure mechanisms and their interactions must
be considered for product design and analysis. In the life
cycle of a product, several failure mechanisms may be
activated by different environmental and operational
parameters acting at various stress levels, but in general only
a few operational and environmental parameters, and failure
mechanisms, are responsible for the majority of the failures.
High priority failure mechanisms determine the operational
stresses, and environmental and operational parameters, that
must be accounted for in the design or be controlled.
Prioritization of the failure mechanisms provides an
opportunity for effective utilization of resources. The
methodology for failure mechanism prioritization is shown in
Figure 3.
Initial prioritization of all potential failure mechanisms is
based upon environmental and operating conditions. If the
stress levels generated by certain operational and
environmental conditions are non-existent or negligible, the
failure mechanisms that are exclusively dependent on those
environmental and operating conditions are assigned a “low”
risk level and are eliminated from further consideration.
For all the failure mechanisms remaining after the initial
prioritization, the susceptibility to failure by those
mechanisms is evaluated using the identified failure models
when such models are available. For overstress mechanisms,
the susceptibility to failure is evaluated by conducting a stress
analysis to determine if failure is precipitated under the given
environmental and operating conditions. For wearout
mechanisms, the susceptibility to failure is evaluated by
determining the time-to-failure under the given environmental
Potential failure mechanisms
Initial prioritization
Evaluate failure susceptibility
Evaluate occurrence
Evaluate severity
Final prioritization
RPN
High risk Medium risk Low risk
Figure 3. Failure mechanism prioritization.
and operating conditions. To determine the combined effect
of all wearout failures, the overall time-to-failure is evaluated
with all wearout mechanisms acting simultaneously. In cases
where no failure model is available, the evaluation is based on
past experience, manufacturer data, or handbooks.
After evaluation of the susceptibility to failure, occurrence
ratings are assigned to the failure mechanisms for the
environmental and operating conditions experienced by the
system. The occurrence ratings are defined in Table 2. For
overstress failure mechanisms, the highest occurrence rating,
namely “frequent”, is assigned to mechanisms that actually
precipitate failure, and the lowest occurrence rating, namely
“extremely unlikely”, is assigned to overstress mechanisms
that do not precipitate any failure. For wearout failure
mechanisms, the ratings are assigned based on a comparison
of the individual time-to-failure for a given wearout
mechanism, with the overall time-to-failure, expected product
life, past experience and engineering judgment. “Frequent”,
“Reasonably probable”, “occasional”, “remote” and
“extremely unlikely” ratings are assigned to wearout failure
mechanisms with very low, low, moderate, remote, and very
high TTF, respectively.
To provide a qualitative measure of the impact of the
failures, each failure mechanism is assigned a severity rating.
The impact of the failure is firstly assessed at the lowest level
of the system being analyzed, followed by the immediately
higher level, and the other intermediate levels, up to system
level [9]. The severity ratings are defined in Table 3. Their
assignment is primarily based on the impact of the failure
mechanism on safety and on the end system functionality.
Past experience and engineering judgment may also be used
in assigning severity ratings. In rating the severity of a
failure, the possible worst case consequence is assumed for
the failure mechanism considered.
6. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
Table 2. Occurrence ratings.
Rating Criteria
Frequent Overstress failure or very low TTF
Reasonably Probable Low TTF
Occasional Moderate TTF
Remote High TTF
Extremely Unlikely No overstress failure or very high TTF
Table 3. Severity ratings.
Rating Criteria
Very high or
catastrophic
System failure or safety-related
catastrophic failure
High Loss of function or severe injury
Moderate or
significant
Gradual performance degradation or
minor injury
Low or minor
System operable at reduced performance
or no injury
Very low or none Minor nuisance
Table 4. Risk matrix.
OCCURRENCE
Frequent
Reasonably
Probable
Occasional
Remote
Extremely
Unlikely
SEVERITY
Very high
or
catastrophic
High risk
High risk
High risk
Moderate
risk
Moderate
risk
High High risk High risk
Moderate
risk
Moderate
risk
Low risk
Moderate
or
significant
High risk
Moderate
risk
Moderate
risk
Low risk Low risk
Low or
minor
High risk
Moderate
risk
Low risk Low risk Low risk
Very low or
none
Moderate
risk
Moderate
risk
Low risk Low risk Low risk
A “very high or catastrophic” severity rating is assigned to
a failure mode that may involve loss of life or complete
failure of the system. A “high” severity rating is assigned to a
failure mode that might cause severe injury or a loss of
function of the system. A “moderate or significant” rating is
assigned to failure modes that may cause minor injury or
gradual degradation in performance over time through loss of
availability. A “low or minor” rating is assigned to a failure
mode that may not cause any injury or result in the system
operating at reduced performance. A “very low or none”
severity rating is associated with a failure that does not cause
any injury and has no impact on the system, or may be a
minor nuisance.
The final prioritization of the failure mechanisms is
performed by rating the failure mechanisms according to three
risk levels, namely “low”, “moderate” and “high”, using the
risk matrix presented in Table 4. In principle, all failure
mechanisms with a “high risk” level are high priority
mechanisms that need to be accounted for and controlled.
Further prioritization within a given risk level may be
performed depending on the product type, use conditions, or
needs and objectives of the organization.
3.7. Documentation
The FMMEA process facilitates the organization,
distribution, and analysis of failure data. In addition,
FMMEA also documents the corrective actions considered
and implemented based on the results of the FMMEA. After
corrective actions are implemented, the FMMEA can be
maintained and updated to generate a new list of high priority
failure mechanisms.
For products already developed and manufactured, a root-
cause analysis of failures that occur during testing and usage
may be conducted, and corrective actions taken to eliminate
or reduce the impacts of the failures. The documented history
and lessons learned provide a framework for FMMEA of
future products or future product versions.
3.8. Application of FMMEA to an Electronic Assembly
A printed circuit board (PCB) assembly used in an
automotive application was selected to demonstrate the
FMMEA process. This assembly consisted of an FR-4 PCB
with copper metallization, plated through-holes (PTH), and
eight surface mount inductors that were soldered onto the
PCB pads using 63Sn-37Pb solder. This assembly was
mounted in the engine compartment of a 1997 Toyota
4Runner, and was mechanically connected to the
compartment at its all four PCB corners. Assembly failure
was defined as one that would result in breakdown, or no
current passage in the event detector circuit. To detect
failure, the PTHs were solder filled and an event detector
circuit was connected in series with all inductors through the
PTHs. The assembly was powered independently from the
automobile electrical system using a three-volt battery source.
It was verified that no external high current, voltage, magnetic
or radiation sources significantly impacted on the assembly.
The FMMEA worksheet for this application is shown in
Table 5, which details the system elements, failure modes,
causes and mechanisms, models, susceptibility, occurrence,
severity and risk. For all of the elements listed, the
corresponding functions, the potential failure modes and their
physical locations were identified. For example, for solder
joint interconnections the potential failure modes are open
and intermittent change in resistance.
For demonstration purposes, it was assumed that the board
and its components, and failure test apparatus, were defect
free. This is a valid assumption if proper screening is
conducted after manufacture. In addition, it was also assumed
that no damage was made to the assembly after manufacture.
Based on these assumptions, potential failure causes were
identified for the failure modes identified. For example, for
the solder joint interconnections, potential failure causes for
open and intermittent change in resistance are temperature
cycling, random vibration or sudden shock impact caused by
vehicle collision.
Based on the potential failure causes that were associated
7. Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
with the failure modes, the corresponding failure mechanisms
were identified. For the solder joints, for example, the
mechanisms driving an open circuit and intermittent change in
resistance were solder joint fatigue and fracture.
Appropriate failure models were identified for the failure
mechanisms from the literature. Product geometry was
obtained from design specification, board layout drawing and
component manufacturer data sheets. For example, for solder
joint fatigue, Coffin-Manson’s [20] model was used for stress
and damage analysis due to temperature cycling.
After all potential failure modes, causes, mechanisms and
models were identified for each element; an initial
prioritization of the failure mechanisms was made based on
the life cycle environmental and operating conditions.
Temperature, vibration and humidity conditions were based
on estimates provided by the Society of Automotive
Engineers (SAE) environmental handbook for automotive
underhood environments [21], as no corresponding
manufacturer field data were available for automotive
underhood environments in the Washington DC area. The
SAE handbook specifies a maximum ambient temperature of
121ºC, and maximum relative humidity of 98% at 38o
C, for
automotive underhood environments [21]. The average daily
maximum and minimum temperature in the Washington DC
area over the duration of the study were 27ºC and 16ºC,
respectively [22]. The maximum shock level was assumed to
be 45G for 3 milliseconds. The car was assumed to operate
on average three hours per day, that were divided in two trips
of equal duration in the Washington, DC area. Failures
induced by electrical overstress (EOS) and electrostatic
discharge (ESD) were assigned a “low” risk level for the test
assembly under analysis, considering both the absence of
active devices and the low voltage level supplied the batteries.
Electromagnetic interference (EMI) was also assigned a
“low” risk level as the circuit function was not susceptible to
transients.
The occurrence ratings of the wearout failure mechanisms
were determined by comparing the time-to-failure of each
wearout mechanism, with the overall time-to-failure obtained
with all wearout mechanisms acting simultaneously. The time
to failures were calculated using calcePWA1
. In absence of
failure model for inductor wearout of insulation, the
occurrence rating was derived from inductor failure rate data
published by Telcordia [23]. From prior knowledge of PCB
pad fatigue, this mechanism was assigned a “remote”
occurrence rating. CalcePWA predictions indicated that a
shock level of 45G for 3 ms would produce no interconnect or
board failure. Shock failure mechanisms were therefore was
assigned an “extremely unlikely” occurrence rating. Since no
shock failure was expected to affect the board and second
level interconnections, it was assumed that shock would not
cause pad failure either, and this pad failure mechanism was
also assigned an “extremely unlikely” rating. As the board
laminate glass transition temperature, namely 150ºC,
exceeded the estimated maximum ambient air temperature,
121ºC [21], glass transition was assigned an “extremely
unlikely” rating.
In terms of severity rating, as the PTHs were only used as
terminations for the inductors, a short or open PTH would
have no impact on circuit functionality. Consequently, this
failure mechanism was assigned a “very low” severity rating.
For all other elements, any failure mode would impact circuit
functionality. Hence, all failure modes for all other elements
were assigned a “very high” severity rating.
Using the risk matrix presented in Table 4, of all failure
mechanisms considered, solder joint fatigue due to thermal
cycling and vibration were the only mechanisms that were
associated with a high risk and thus were considered as high
priority. This was confirmed by the corresponding field
experiment, where the board assembly failed by combined
solder joint thermal and vibrational fatigue [24].
4. Summary
FMMEA allows the design team to take into account the
available scientific knowledge of failure mechanisms and
merge them with the systematic features of the FMEA
template with the intent of “design for reliability” philosophy
and knowledge. The part of the FMEA that is incorporated in
the FMMEA aids in being systematic in the identification
process so that all the elements are considered and nothing is
overlooked. The idea of prioritization embedded in the
FMEA process is also utilized in FMMEA to identify the
mechanisms that are likely to cause failures during the
product life cycle.
FMMEA differs from FMEA in a few respects. In
FMEA, potential failure modes are examined individually and
the combined effects of coexisting failures causes are not
considered. FMMEA on the other hand considers the impact
of failure mechanisms acting simultaneously. FMEA
involves precipitation and detection of failure for updating
and calculating the Risk Priority Number (RPN), and cannot
be applied in cases that involve a continuous monitoring of
performance degradation over time. By contrast, FMMEA
does not require the failure to be precipitated and detected,
and the uncertainties associated with the detection estimation
are not present. The use of environmental and operating
conditions is not made at a quantitative level in FMEA.
Consequently, at best these conditions are used to eliminate
certain failure modes. FMMEA prioritizes the failure
mechanisms using information on stress levels of
environmental and operating conditions to identify high
priority mechanisms that must be accounted for in the design
or be controlled. This prioritization overcomes the
shortcomings of the RPN prioritization used in FMEA, which
can provide a false sense of granularity. Thus the use of
FMMEA provides additional quantitative information
regarding product reliability, and opportunities for
improvement, as it takes into account specific failure
mechanisms and the stress levels induced by environmental
and operating conditions in the analysis process.
1
A physics-of-failure based virtual reliability assessment tool developed by
CALCE Electronic Products and Systems Center, University of Maryland.
8.
Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
Table 5. FMMEA worksheet for a printed circuit board assembly mounted in an automotive underhood environment.
Element
Potential failure mode
Potential failure cause
Potential failure
mechanism
Mechanism
type
Failure model
Failure
susceptibility
Occurrence Severity Risk
PTH
Electrical open in PTH
Temperature cycling Fatigue Wearout
CALCE PTH
barrel thermal
fatigue [25]
> 10 years Remote Very low Low
Metallization
Electrical short / open,
or change in electrical
resistance
High temperature Electromigration Wearout Black [26] > 10 years Remote Very high Moderate
High relative humidity
Corrosion
Wearout
Howard [27] > 10 years Remote Very high Moderate
Ionic contamination Wearout
Component
(Inductor)
Electrical short / open
between windings and
core
High temperature
Wearout of
winding insulation
Wearout No Model --- Remote* Very high Moderate
Interconnect
Electrical open, or
intermittent change in
electrical resistance
Temperature cycling
Fatigue
Wearout
Coffin-Manson
[20]
170 days Frequent Very high High
Randomvibration Wearout Steinberg [28] 43 days Frequent Very high High
Sudden impact Shock Overstress Steinberg [28] No failure
Extremely
unlikely
Very high Moderate
PCB
Electrical short
between PTHs
High relative humidity CFF Wearout
Rudra and Pecht
[29]
4.6 years Occasional Very low Low
Crack / fracture
Randomvibration Fatigue Wearout Basquin [28] > 10 years Remote Very high Moderate
Sudden impact Shock Overstress Steinberg [28] No failure
Extremely
unlikely
Very high Moderate
Loss of polymer
strength
High temperature Glass transition Overstress No model No failure
Extremely
unlikely
Very high Moderate
Open
Discharge of high
voltage through
dielectric material
EOS/ESD Overstress No model Eliminated in first level prioritization Low
Excessive noise
Proximity to high
current or magnetic
source
EMI Overstress No model Eliminated in first level prioritization Low
Pad
Lift / crack
Temperature cycling /
Randomvibration
Fatigue Wearout
No Model
--- Remote Very high Moderate
Sudden impact Shock Overstress ---
Extremely
unlikely
Very high Moderate
* Based on failure rate data for inductors from Telcordia [23].
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005
9.
Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.
FMMEA has the potential to offer several benefits to
organizations. It provides specific information on stress
conditions so that that the acceptance and qualification tests
yield useable result. The use of the failure models at the
development stage of a product also allows for appropriate
“what-if” analysis on proposed technology upgrades.
FMMEA can also be used to aid several design and
development steps considered to be the best practices, which
can only be performed or enhanced by the utilization of the
knowledge of failure mechanisms and models. These steps
include virtual qualification, accelerated testing, root cause
analysis, life consumption monitoring and prognostics. All
the technological and economic benefits provided by these
practices are realized better through the adoption of FMMEA.
FMMEA enhances the value of FMEA, by identifying and
evaluating the relevant failure mechanisms and models, using
stress levels of environmental and operating conditions and
provides a high return on investment by providing knowledge
about the possible failures and their causes in a quantifiable
manner. While FMEA and FMECA are often implemented as
a standard requirement or contractual obligation, FMMEA
makes the process useful by incorporating the scientific
knowledge regarding the failure mechanisms and models.
5. References
1. Coutinho, J. S., “Failure-Effect Analysis”, Trans. New York
Academy of Sciences, Vol. 26, 1964, pp. 564-585.
2. Bowles, J.B., “Fundamentals of Failure Modes and Effects
Analysis,” Tutorial Notes Annual Reliability and
Maintainability Symposium, 2003.
3. Kara-Zaitri, C., Keller, A.Z., Fleming, P.V., “A Smart Failure
Mode and Effect Analysis Package,” Annual Reliability and
Maintainability Symposium Proceedings, pp. 414 - 421, 1992.
4. “Guidelines for Failure Mode and Effects Analysis for
Automotive, Aerospace and General Manufacturing
Industries,” Dyadem Press, Ontario, Canada, 2003.
5. Electronic Industries Association, “Failure Mode and Effect
Analyses”, Electronic Industries Association G-41 Committee
on Reliability, Reliability Bulletin No. 9, November 1971.
6. United States Department of Defense, “Procedures For
Performing A Failure Mode Effects and Criticality Analysis”,
US Mil-Std-1629 (ships), November 1, 1974, US Mil-Std-
1629A, November 24, 1980, US Mil-Std-1629A/Notice 2,
November 28, 1984.
7. Bowles, J.B. and Bonnell, R.D., “Failure Modes, Effects and
Criticality Analysis – What Is It and How To Use It,”
Tutorial Notes Annual Reliability and Maintainability
Symposium, 1998.
8. International Electrotechnical Commission, “Analysis
Techniques for system reliability—Procedure for failure
mode and effects analysis (FMEA)”, International
Electrotechnical Commission, IEC Standard Pub. 812, 1985.
9. SAE Standard SAE J1739 “Potential Failure Mode and
Effects Analysis in Design (Design FMEA) and Potential
Failure Mode and Effects Analysis in Manufacturing and
Assembly Processes (Process FMEA) and Effects Analysis
for Machinery (Machinery FMEA)” August 2002.
10. ISO, “ISO/TS 16949 - The Harmonized Standard for the
Automotive Supply Chain,” ISO, 2002.
11. Signor, M.C., “The Failure-Analysis Matrix: a Kinder,
Gentler Alternative to FMEA for Information Systems,”
Annual Reliability and Maintainability Symposium
Proceedings, pp. 173-177, January 2002.
12. IEEE Standard 1413.1-2002, IEEE Guide for Selecting and
Using Reliability Predictions Based on IEEE 1413, 2003.
13. JEDEC Publication JEP 131 “Process Failure Modes and
Effects Analysis (FMEA),” February 1998.
14. Hu, J., Barker, D., Dasgupta, A., and Arora, A., “Role of
Failure-mechanism Identification in Accelerated Testing,”
Journal of the IES, Vol. 36, No. 4, pp. 39-45, July 1993.
15. Failure Modes and Effects Analysis (FMEA): “A Guide for
Continuous Improvement for the Semiconductor Equipment
Industry,” Technology Transfer #92020963B-ENG,
SEMATECH, 1992.
16. JEDEC Publication JEP 148 “Reliability Qualification of
Semiconductor Devices Based on Physics of Failure Risk and
Opportunity Assessment,” April 2004.
17. Dasgupta, A. and Pecht, M., “Material Failure Mechanisms
and Damage Models,” IEEE Transactions on Reliability, Vol.
40, No. 5, pp. 531-536, December 1991.
18. JEDEC Publication JEP 122-B “Failure Mechanisms and
Models for Semiconductor Devices,” August 2003.
19. Lall, P., Pecht, M., and Hakim, E., “Influence of Temperature
on Microelectronics and System Reliability”, CRC Press,
New York, 1997.
20. Foucher, B., Boullie, J., Meslet, B., Das, D., “A Review of
Reliability Predictions Methods for Electronic Devices,”
Microelectronics Reliability, Vol. 42, No. 8, pp. 1155-1162,
August 2002.
21. Society of Automotive Engineers, Recommended
Environmental Practices for Electronic Equipment Design,
SAE J1211, Rev. Nov 1978.
22. Monthly Temperature Averages for the Washington, DC
Area,
<http://www.weather.com/weather/climatology/monthly/USD
C0001> accessed August 17, 2003.
23. Telcordia Technologies, Special Report SR-332: “Reliability
Prediction Procedure for Electronic Equipment Issue 1,”
Telcordia Customer Service, Piscataway, N. J., May 2001.
24. Ramakrishnan, A., and Pecht, M., “A Life Consumption
Monitoring Methodology for Electronic Systems,” IEEE
Transactions on Compoonents and Packaging Technologies,
Vol. 26, No. 3, pp. 625-634, 2003.
25. Bhandarkar, S.M., et al., "Influence of Selected Design
Variables on Thermomechanical Stress Distributions in
Plated Through Hole Structures," Transaction of the ASME -
Journal of Electronic Packaging, Vol. 114, pp. 8-13, March
1992.
26. Black, J.R., “Physics of Electromigration,” IEEE Proceedings
of International Reliability Physics Symposium, pp. 142-149,
1983.
27. Howard, R.T., “Electrochemical Model for Corrosion of
Conductors on Ceramic Substrates,” IEEE Transactions on
CHMT, Vol. 4, No 4, pp. 520 – 525, December 1981.
28. Steinberg, D.S., “Vibration Analysis for Electronic
Equipment,” 2nd Edition, John Wiley & Sons, 1988.
29. Rudra, A.B., Li, M., Pecht, M., and Jennings, D.,
“Electrochemical Migration in Multichip Modules,” Circuit
World, Vol. 22, No. 1, pp. 67-70, 1995.
Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005