40220140505007

International Journal of Electrical Engineering and Technology (IJEET), ISSN 0976 – 6545(Print),
ISSN 0976 – 6553(Online) Volume 5, Issue 5, May (2014), pp. 57-73 © IAEME
57
INTERMITTENT FAILURES IN HARDWARE AND SOFTWARE
Dr. Michael Pecht, Anwar Mohammed
CALCE Electronic Products and Systems Center, University of Maryland, College Park, MD 20742,
USA
Flextronics, 847 Gibraltar Drive, Milpitas, CA 95035, USA
ABSTRACT
Intermittent failures are a major concern in electronics system because they are unpredictable
and non-repeatable. They can be very expensive for companies, damage the reputation of a company,
or cause catastrophic damage in safety-critical systems such as nuclear plants. This paper discusses,
both at the hardware and software level, the causes of intermittent failures and the methodology to
diagnose the causes. Mitigation strategies to help reduce the occurrence of these failures are
discussed and new, emerging technologies designed to minimize intermittent failures are also
reviewed. The paper concludes with recommendations designed to minimize the occurrence of
intermittent failures.
1. INTRODUCTION
Intermittent failures are sporadic failures that are not easily repeatable. According to IEEE,
intermittent failure (IF) can be defined as the failure of an item for a limited period of time,
following which the item recovers its ability to perform its required function without being subjected
to any external corrective action [1]. When a product can no longer perform its designed function
over the intended time frame, it is considered to have failed. When the product manifests a loss of
some of its function or performance characteristics for a limited time, but shows subsequent
recovery, it has experienced intermittent failure. Intermittent failures are hard to replicate because of
their erratic behavioral pattern. Intermittent failures are often called “ghost failures” for the obvious
reason that they come and go, as well as being hard to reproduce on the bench [2].
Therefore, it is more difficult to conduct failure analysis for intermittent failures, understand
their root causes, and isolate their failure sites than it is for permanent failures. An intermittent
INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING &
TECHNOLOGY (IJEET)
ISSN 0976 – 6545(Print)
ISSN 0976 – 6553(Online)
Volume 5, Issue 5, May (2014), pp. 57-73
© IAEME: www.iaeme.com/ijeet.asp
Journal Impact Factor (2014): 6.8310 (Calculated by GISI)
www.jifactor.com
IJEET
© I A E M E

58
failure is not necessarily repeatable; however, it often is [3]. An intermittent failure may lead to
permanent failures in later stages of the life cycle.
During the inspection process in manufacturing, intermittent failures may be reported as
rejected parts with no failure found (NFF). This means that a failure was observed in the system, but
when the device was re-tested, a failure mode could not be identified or the failure could not be
duplicated. This is also known as trouble not identified (TNI), no trouble found (NTF), cannot
duplicate (CND), or retest ok (RTOK) [3]. These failures are hard to identify or replicate, even
though they are recurrent. Many different factors can cause intermittent failures, such as process
variations like a change in the humidity level, manufacturing residuals like solder fluxes and epoxy
bleed outs, radiation, vibration, wear out leading to opens, and voltage and temperature ﬂuctuations
[3]. Such transient causes, seen both in hardware and software, are hard to reproduce and can lead to
negative consequences such as mission aborts and flight and train delays or cancellations. They can
increase system downtime and decrease system availability. A reduction in IF will increase system
availability more than a reduction in failure rate [4].
An intermittent failure can lead to unintended consequences such as increased operation cost,
higher downtime, and a perception of lower quality, especially in sensitive industries such as
aerospace. A system which has failed previous testing and then suddenly starts passing testing,
showing no signs of failure, can erode the trust in the testing methodology [5] and can cause an IF to
be identified as a false alarm even though a real failure exists in the system.
Intermittent failures inflict a heavy toll on companies. During retesting, when a failed part
cannot be validated as a failed part, extra testing must be conducted to identify the failure. These
extra tests impose additional costs. In the case of IFs, since the failures cannot be replicated
consistently, the retest and repair costs are higher than those for permanent failures. This is because
an effective repair cannot be made till the failure is validated. Maintenance can cost time and labor in
an attempt to identify a failure without any success, sometimes resulting in blind replacement of
parts that are suspected of having a defect (without finding any specific problem), which increases
the cost of inventory. For example, in 2001, fighter plane customers spent $10 million to replace
parts that were tested as intermittent failures at the shop level [6]. In another case, in the 1980s, the
thick film integrated (TFI) ignition module in an automotive company were afflicted by intermittent
failures, leading to a lawsuit settlement by the company [3]. A study carried out in 2005 found that
IFs account for about 63% of the mobile phones returned to the manufacturer, costing the industry
$4.5 billion dollars per year [7]. Kimseng et al. [8] carried out a study on intermittent failures in the
digital electronic cruise control modules made by a manufacturer for various automobiles and found
that 96% of the modules returned to the manufacturer passed the bench tests carried out by the
manufacturer. Kimseng concluded that the bench tests were not representative of the actual
automotive environment and nor was the testing appropriate to assess the original failure.
A holistic approach is helpful to understand and eliminate intermittent failures. This approach
would include better diagnostic capability and efficient mitigation techniques. Therefore, this paper
discusses both hardware and software intermittent failures, including their causes, diagnosis, and
mitigation methodologies. Emerging developments in this technology space are also reviewed to
help formulate better solutions.
2. HARDWARE INTERMITTENT FAILURES
Tentative or temporary hardware malfunctions can cause intermittent failure in electronic
devices. This section describes common hardware components that experience intermittent failures
and their failure mechanisms. The diagnosis and mitigation of hardware intermittent failures is also
examined, and some recent technologies designed to overcome these problems are covered.

59
2.1 Causes of Failure
Unlike permanent failures with persistent causes, the failure cause in intermittent failures
may no longer exist during testing, because of changes in the working environment. Hardware
intermittent failures can have different root causes, such as mismatched thermal expansion, vibration,
corrosion, and electromigration. In this section, some key intermittent failure causes for hardware
components are investigated.
2.1.1 Wire Bond and Connectors Failures
Wire bonds and connectors cause a high percentage of hardware intermittent failures [9].
Some common causes include coefficient of thermal expansion (CTE) mismatch, component wear
out caused by age or repeated usage and corrosion. For example, the CTE mismatch between the
wire bonds and the copper bonding pads on a PCB can cause intermittent opens and shorts during
temperature excursions. In another example, the contact resistance of a new, tin-plated contact may
be a few milliohms, but after a thousand contact cycles, the resistance can become as high as several
ohms. With more usage, intermittent failures that disappear in the next contact cycle may also occur
[9]. The thermal and mechanical vibrations in the connectors can lead to fretting corrosion, causing
the contact resistance to increase, thus inducing intermittent connection failures [10, 11]. It has been
identified [3] that loose PCB interconnectors and aging connectors and components are some of the
common causes for electronic systems failure. Gibson et al. [12] concluded that over 50% of all
electronic failures are triggered by interconnector related problems. Other common causes are
vibration, stress relaxation, and the movement of the wiring harness generated by the magnetic field
[9]. The following paragraphs will describe some of these failures in more details.
Wire bond related intermittent failure occurs when a poorly connected wire bond temporarily
dislodges because of thermal expansion at temperatures above the room temperature. The wire bond
may then restore to its normal state once the thermal stress caused by CTE mismatch is removed.
The failure mode in such cases is usually an open circuit. On the other hand, a loose conductive
material floating on the package may connect with a wire bond on another part of the circuit,
resulting in a short circuit. When this floating piece moves away from the failure site, because of
vibration for example, the failure is no longer observed [13]. Loose materials can be detected by
using appropriated screening methods including X-ray, vibration, and acoustic testing. Screening and
testing methodologies are designed based on the potential causes and effects of the short circuit on
the component performance. [14]. Intermittent wire bond failures may also be induced by the
molding process which can damage wire bonds. This damage is not easily detectible and is attributed
to the weakening and lifting of the gold bond during the molding process on the side of the package
opposite to where the injection molding occurred [15]. Proper molding process control parameters
and effective detection techniques would minimize such intermittent failures.
In a study done by Sorensen [16] on military aircraft he noted that 50% of all the failures
were intermittent failures and 80% of those were related to solder joints and connector pins. For the
aircraft industry, aging devices will lead to IFs, quite often as a prelude to permanent failures. Many
IFs are the result of the gradual degradation of a component or system. They may initially appear as
small noise fluctuations but could lead to permanent failures. Filho et al. [17] point out that for
continuous monitoring methods, intermittent failures can appear long before open circuits are
detected.
Corrosion can cause electrical degradation of the contact, which is initiated by a galvanic
reaction between two metals within the electrical circuit. Corrosion on electronic parts can result in
either of the two scenarios: short circuits or an increase in the electrical resistance of the components.
When corrosion occurs, it is rarely uniform on the affected surfaces, which may result in the
appearance of an intermittent failure. With respect to the contacts of electronic parts, intermittent
failures occur because of frequent connections and disconnections, as seen in the corrosion of copper

60
connectors that have layers of nickel and gold to protect against wear out. In harsh environments
(high relative humidity and the presence of H2S), formation of the corrosive component Cu2S causes
intermittent failure behaviors [18]. With vibration and temperature fluctuations, this conductive path
can be connected and disconnected, resulting into intermittent failures. Intermittent failures due to
corrosion generally occur in the early stages (the first 50%) of the product life cycle. Intermittent
behavior aggravated by CTE mismatch or vibration generally appears during the later stages (the last
50%) of the life cycle of a product [19]. For example, electrochemical migration, which occurs
between anodes and cathodes (and can be a reason behind IF reports), is a corrosion-related failure
mechanism that forms dendrites between opposite biases and eventually results in short circuits. The
driving forces for this corrosion process are the potential voltage bias, contaminated surfaces (lack of
environmental control), and the fact that the metals that are commonly used (Sn, Pb, Cu and Ag) are
susceptible to corrosion. Since this process is not time induced, the intermittent failures are
manifested early in the product life cycle. Tin whiskering has been identified [20] as another
common cause for intermittent failures. A PCB with a pure tin finish, having non-compressive
internal stress, is known to create tin dendrites that can cause short failures. However at elevated
temperatures the dendrites may melt away and repair the short.
2.1.2 Digital Integrated Circuit Failures
Integrated chip devices are being scaled down rapidly. This reduction in size makes digital
integrated circuits more susceptible to permanent and intermittent behavior. Intermittent failure
modes in logic, digital integrated circuits (ICs) have been categorized as timing violations, stuck-at-
zero or stuck-at-one failures, intermittent shorts or opens, or electro-migration failures [21].
An increase in the resistance of interconnects due to thermal or mechanical loads,
electromigration, or material diffusion, increases the time for signal propagation and leads to a
timing violation [22]. These failures are manifested because of thermal and electrical loads and
signal frequency variations. Kothawade et al. [23] found that timing violation in a processor can be
attributed to multiple factors such as process variations, negative bias temperature instability (NBTI),
temperature fluctuations, hot carrier injection (HCI), and voltage fluctuations. Since timing
violations can be caused by many factors, it is challenging for processor designers to design fault
tolerance mechanisms. Time dependent HCI failures are generally permanent in nature. NBTI
failures caused by AC stress tend to be intermittent failures whereas failures caused by static stress
usually manifest as permanent failures. Within an integrated circuit, the thin oxide layers separating
the adjacent metal traces can also lead to intermittent shorting or opens caused by traces coming in
contact with each other or losing contact. Constantinescu [21] also studied the causes of intermittent
behavior in integrated circuits (ICs). The study attributed voltage fluctuations across ICs as the cause
for oxide layer breakdown. As ICs have become smaller, the thickness of the oxide layers has
decreased. This leads to an increased risk of breakdown in oxide layer thickness. When this oxide
layer breaks down, it creates a conducting path, thereby increasing the leakage current. The
introduction of high k dielectrics reduces the rate of oxide breakdown, enabling the use of thinner
dielectrics. However, this can also lead to timing violation failures. Before a complete breakdown
takes place due to dielectric breakdown leading to a permanent failure, there is a stage known as
dielectric soft breakdown, during this stage a device may exhibit intermittent failures.
Intermittent stuck-at-zero or stuck-at-one failures occur in storage elements. Digital circuits
have two states, 0 or 1, and a fault occurs when a particular signal is tied to either 1 or 0. This
produces a logical error. Pan et al. [24] developed a metric for stuck-at-zero/stuck-at-one to
characterize the vulnerability of a microprocessor to intermittent failures based on its structure.
Experimental results show that the susceptibility varies significantly across different structures, and
the vulnerability of the reorder buffer is much higher than that of the register file. These storage
element intermittent failures have an active time and an inactive time. The active time is the time

61
during which the failure is in process and causes unexpected behavior, while the inactive time is the
time when the failure does not affect performance. The length of this active time determines how
significantly the failure affects the performance of a microprocessor.
ICs are susceptible to intermittent failures due to electro-migration. Electro-migration is the
movement of metal atoms when electrons flow through those atoms. This movement of atoms can
lead to an open or short circuit failure. In both the cases the failures appear initially as intermittent
failures and end up as permanent failures. As IC chip technology becomes smaller, the wire widths
are reduced. When current flow is not scaled down proportionally, the ICs become vulnerable to
electro-migration [24].
2.1.3 COMPONENT CONNECTION FAILURES
Another area of concern for intermittent failures is the area of component pins, whether it be
a multi-pin IC, resistor network or a simple two-lead capacitor. Intermittent failures can be caused
by imperfections in the solder process or a fractured lead where the two broken ends are
intermittently making and breaking connections. Once the pin is broken, the failure may show up
during thermal cycling or vibration testing. Resolution for these types of failure includes better
attachment methods of longer-size components like a resistor network or large capacitor to the circuit
card. Studies [25] have shown that intrinsic flaws in design and sub quality manufacturing processes
like soldering play a big role in creating intermittent failures.
2.2 Diagnosis
The Failure Modes, Mechanisms, and Effects Analysis (FMMEA) can be used to detect
intermittent and permanent failures in hardware. Mathew et al. [26] have proposed the following
methodology.
Figure 1: FMMEA Methodology [26]

62
The first two steps identified in Figure 1 are ‘define system and identify elements and
functions to be analyzed’ and ‘identify potential failure modes.’ They are more challenging for
intermittent failures than for permanent failures. This is because, in the case of intermittent failures,
it is difficult to define which system has the failure in a complex system consisting of several
subsystems intermeshed together. A failure in one of the subsystems could affect another subsystem
and result in its failure. Finding the subsystem with the initial failure is challenging, since
intermittent failures are not always detected when the system is tested for faults. Identifying the
correct failure modes is also not easy because of the erratic nature of intermittent failures; this
requires extra work.
Kirkland [27] describes a variety of methods to detect failure modes for intermittent failures
in electronic devices, including signal looping, pattern looping, signal stepping, frequency deviation,
pattern adjustment in critical areas, signal strength variation, current path duplication, measuring
capacitance variations, Vcc adjustments, resistive or impedance rebounce, temperature change
application, and noise dissimilarity testing. Using these methods can help identify failure modes,
such as increased gate delays, degraded signals, increased leakage, and high frequency failures. A
minimum set of conditions (such as voltage drop threshold and temperature variations) needs to be
present to make the failure mode observable.
Another systematic approach for analyzing intermittent failures is employing a cause and
effect diagram, which is also known as the fishbone diagram. An example of this diagram is depicted
in the following Ishikawa fishbone diagram below [28].
Figure 2: Fishbone diagram for intermittent failures in hardware and software [28]

63
A cause and effect diagram defines the key failure (also known as key effect) and investigates
the possible causes of each of the effects and offers a list of all the possible causes leading to the
failure. It is an effective method for analyzing failures in complex systems. For example intermittent
failures in plastic ball grid array packages using this method and narrowed down the possible causes
of failure, finally identifying solder joint failure as the main cause of intermittent failure [28].
Steadman et al. [29] developed a test methodology for intermittent faults in aircraft. This method
subjected an avionics system to thermal and vibrational loads, while simultaneously monitoring the
system for faulty components, thus reducing the occurrence of intermittent failures. An improved
approach should include online monitoring of critical avionics components while the system is in
operation. This would reduce the overhead cost incurred by offline monitoring that uses load profiles
that do not accurately replicate the operating conditions. The monitoring of current to detect
intermittent failure has been recommended [30] because normal circuits would carry a significantly
different current load when compared to damaged circuits.
In 1978, Savir [31] presented a paper on developing a model to detect intermittent failures in
a sequential circuit, which is a type of circuit with memory logic and is found in most digital
systems. He recommends the leveraging of both deterministic (non- random) and random test
procedures for optimizing the probability of IF detection. The intermittent failures are divided in two
major categories comprising stationary failures (such as loose connections) and transient failures
(such as failures induced by electro-magnetic interference). In sequential circuits, the first
manifestation of an active fault may induce the circuit to enter an incorrect state without producing
an immediate output error. This state change may generate an output error later when the fault has
become inactive. The optimal value of detection probability is obtained by developing a graph of all
the input sequences and determining which sequences lead to intermittent failures.
To detect intermittent failures, a minimum set of conditional requirements is necessary to
manifest the failure [32]. The challenge is in determining the environmental conditions when the
failure occurred and re-creating them. Harsh ambient conditions, such as high humidity and the
presence of halides, can initiate unintended conductive pathways on insulating surfaces. Such a
pathway could eventually become a permanent failure, but it could manifest itself in the earlier
stages as intermittent failure. Figure 3 offers a brief list of potential causes for hardware related
intermittent failures.
Figure 3: List of Causes for Intermittent Failure
2.3 Mitigation
Integrated circuits try to compensate for breakdowns by having failure tolerance built into
them. Failure tolerance masks the occurrence of failures from the end user (it prevents end users
from experiencing performance drops). For example, most processors choose a max clock rate after
having guard-banded against unpredictable interactions and variations in the actual clock rate. ICs
Component shifting during solder reflow Magnetic field variations
Contamination (including oxidation at test sites) Materials degradation (aging, chemical, stress, etc.)
Chemical degradation (including creep corrosion Overstress (example high voltage on cap. Dielectric)
fretting, whiskers, electron migration etc.) Partial delamination
Cracked substrates PCB (warpage, via cracking, black pad, etc.)
Damaged circuits Poor wire bonding (on high K dielectric, etc.)
ESD induced Temperature sensitivity (CTE mismatch, etc.)
Floating leads (or other conductive pieces) Vibration induced
Ionizing Radiation in Semiconductors Voltage overstress
Insulation Oxide layer breakdown Weak solder joints (varying with temp/stress)
Irregular or altered current path Weak structural integrity
Loose connections (wire bonds, connectors etc.) Wire sweep during molding

64
also have chip-level failure tolerance, such as error correcting codes, self-checking circuits, and
hardware-implemented check pointing and retries [22].
Three main methodologies to mitigate the intermittent behavior in ICs are dynamic instruction
delaying, core frequency scaling, and thread migration. When the processor incurs more than the
expected time to execute a process, time delay and timing violation occur. This fault may be avoided
by using techniques such as dynamic instruction delaying. This is a type of algorithm that calculates
the scheduling priorities during the execution of the system. The objective is to respond dynamically
to the changing conditions and form a self-sustained, optimized configuration. Another approach to
mitigating delay is core frequency scaling, which scales down the performance of the CPU to a lower
frequency when less is needed and scales it up to a higher frequency when more is needed. Thread
migration is another technique used to overcome intermittent failure. A thread is an ordered set of
instructions that tells a computer exactly what to do. When a specific thread encounters failures, the
content of the thread within the faulty computer core is transferred to another thread within an idle
core, where the problem is addressed and solved.
The intermittent failures in some avionic systems can be caused by failures in solder joints
and multi-layer ribbon cables [29]. These failures may be initiated by the variations in operating
conditions, such as temperature or current, and may disappear due to re-melting of the solder, closing
of the crack, or filling of the void due to thermal fluctuations. Development of robust soldering
processes which include appropriate material selection would mitigate soldering related intermittent
failures. The plethora of solder choices which include leaded solder, lead free solder, low
temperature solder, low silver solder, soft solder make it even more critical for developing
appropriate processes for solder attach and solder reflow. Since there is no known, effective method
to mitigate solder joints and multi-layer ribbon cable failures, more research on improving the
robustness and consistency of solder joints is necessary, and self-repairing wire bonds should also be
developed.
2.4 New Technology Trends
Recent technological developments to solve hardware intermittent failures offer us insight to
future solutions. The industry is addressing the IF problem by developing innovative approaches.
The focus is also shifting from failure detection to failure avoidance.
Intermittent failures on a silicon chip, such as Time Dependent Dielectric Breakdown
(TDDB) and Electromigration (EM), are caused by gate wear out because of extensive usage. Gate
usage can be monitored in the form of gate toggles [33]. Researchers [34] discovered that the
vulnerability to intermittent failure could be monitored by tracking the amount of gate toggles. They
studied four OpenSPARC RTL modules and tracked how each instruction moved through these four
modules while toggling different gates. The four modules studied were the IFU, EXU, FFU, and
LSU modules. They discovered that certain sub modules within the EXU module, such as the exu-
alu and lsu-dcdp within the load store unit, display a relatively high amount of toggling regardless of
the type of instruction being executed. This revealed that there could be groups of modules and sub
modules which would have higher susceptibility to wear out failures, resulting in intermittent
failures. Higher vulnerability by itself cannot be a good predictor for a failure rate, but when
combined with operating conditions such as temperature, the degradation of a gate structure can be
forecasted. Preemptive steps could also be taken during the design stage to avoid the occurrence of
such intermittent failures.
The intermittent loss of connection between connectors is a very common failure in electrical
systems [35]. In spite of the extra caution during connector installation, this remains a problem in
avionics and military equipment. In 2012 an approach was suggested [36] to create an online
methodology to detect intermittent failures caused by intermittent connections. The idea is premised
around the principle derived from the Lorentz Law that any sudden flux change should create a large

65
voltage manifesting as an arc which would propagate along the circuitry as a traveling wave. The arc
is defined as the electrical discharge initiated by improper cable connections. Intermittent failures
caused by lose connector connections can be detected by monitoring for the presence of this arc.
Their research describes the online monitoring methodology to detect the presence of this arc to flag
any connector disconnection failures.
Advances in semiconductor scaling technology have revealed that there is now greater
exposure and vulnerability to not only single event upsets (SEUs) in integrated memories but also to
single-event transients (SETs) in high speed logic [37]. SEUs are induced by environmental causes
such as cosmic radiation or alpha particle radiation. They initiate current pulses at random times and
locations in a digital circuit. SETs are caused by transient charge displacements which generate logic
errors in subsequent circuits. Both SEUs and SETs are responsible for creating intermittent failures.
This is a problem which is getting worse because of industry demand for semiconductor scaling. An
estimation methodology to monitor the SEUs and SETs in combinatorial circuits using CMOS
technology has been proposed [38]. The source for alpha particle contamination is some packaging
materials, such as the filler materials, deployed in molding compound or the presence of lead in non-
lead free solders. SEU problems initiated by alpha particles have been essentially solved by the
industry, but cosmic rays still pose significant SEU problems. [28]
A paper published in 2012 by Pan et al. [39] strives to address the CMOS technology scaling
problem from a different perspective. The paper proposes the quantitative characterization of the
vulnerability of the microprocessor structure to intermittent failures. This is called the intermittent
vulnerability factor (IVF), and it is the probability that an intermittent fault in the microprocessor
structure will manifest as an external visible failure. Their research revealed that it is the intermittent
stuck at one fault model which has the most serious impact on program execution. The IVF factor is
calculated after listing the causes of the intermittent failures, classifying them into different fault
models and setting parameters to determine when the intermittent fault will result in a visible error.
This information is used to develop IVF computational algorithms for different intermittent fault
models within a processor. The IVF data could now be used to improve the microprocessor quality,
reliability, and durability (QRD) by proper interventions during the design stage. The IVF could also
be used for intermittent fault detection and error recovery.
Correcher et al. in their paper [40] published in 2012 introduce the concept of modeling
intermittent failure dynamics. They propose two methodologies for characterizing the dynamics: the
probabilistic model and the temporal model. The probabilistic model allows the computing of
intermittent failure probability at any time; however, it needs historical data which may not always
be available. The temporal model is more practical, and it offers the measurement of failure density.
Research shows that the duration and frequency of intermittent failures increase with time, and the
failure density and pseudo-period can help us in predicting it. The pseudo-period is the average time
difference between failures, which is normalized by the number of failures. It is related to MTBF
(mean time between failures) and used to model the reliability of repairable systems. The pseudo-
period can be used to predict the number of operations before replacement in determining whether
the model should follow a linear or exponential fitting. A limitation of this approach is the ability to
derive optimal values for the failure density and pseudo-period.
Recent research on component residual life is helpful for predictive maintenance systems.
The approach focuses around not avoiding the intermittent failures but on predicting when the
negative effects of the IF failures are no longer tolerable. A stochastic model has been proposed [41]
to predict the residual life of live components of a coherent system. A coherent system is a system
where, when a failed component is replaced by a new component, the system does not fail. The
conditional reliability of components within a working system exhibiting an increasing failure rate
has been shown to decrease with time. Also, when two coherent working systems comprising similar

66
components have the hazard rates sequenced, the corresponding residual lives are also stochastically
ordered.
New approaches from Kleer et al. [42] offer a framework for diagnosing intermittent failures
in a continuously operating piece of machinery, where objects are transferred from one module to the
next, as in the case of a copying machine involving the transfer of paper from one site in the copier
to another. Research has shown [43, 44, and 45] that by leveraging in-situ sensors, physics of failure
models and life cycle monitoring one can predict the occurrence of failure and measure degradation
and remaining useful life. Such information could become the building blocks of developing
modalities to troubleshoot intermittent failures.
3. SOFTWARE INTERMITTENT FAILURES
Software intermittent failures are generated when some conditions occur simultaneously. For
example, if the available memory and CPU processing power are both below a certain threshold due
to other applications running on a computer, a selected program can exhibit intermittent failures due
to insufficient resources.
Software intermittent failures can also occur are when two or more processes (called threads)
are running simultaneously and can “collide”. When this happens, the computer can end up in a lock
up condition in which the software does not have a clear exit point and may result in a “frozen
screen” condition showing on the computer monitor. These potential collisions may not be obvious
when the software code is being written for the many different subroutine modules used in the
computer.
An example of one such collision of process involves a bank ATM where a customer may dip
their ATM card to open up a session, and at the same time the branch personnel may open the rear
safe door of the ATM (out of view from the customer). The resulting condition causes the computer
to “freeze up” and the screen to be stuck in one view, making the ATM non-responsive to the
customer.
Software may also contain bugs and exhibit intermittent failure whenever a user encounters
the buggy parts of the program. In the next sections, the causes of software intermittent behavior are
investigated, and then the methods for identification and mitigation of these failures are described.
Some recent research in this area is also briefly discussed.
3.1 Causes
Even though software intermittent failures occur in most software-based systems, the end
user may not always experience a drop in performance. The ability to perceive a failure is known as
observability of faults. The observability of software intermittent failures is affected by three factors:
processor speed, memory capacity, and processor load. A low processor speed increases the
possibility of occurrence of intermittent failures, whereas with high processor speed, intermittent
failures may be observed less frequently. A high memory capacity reduces the observability of
software intermittent failures, whereas an increase in the processor load could increase the
occurrence of intermittent failures. To mitigate the frequency of intermittent behavior, the factors
and fault causes of the intermittent behavior must be addressed.
Gracia et al. [46] classify the causes of software-related intermittent failures as timing
failures, errors in memory, unhandled exceptions, errors in disks, and concurrency-related failures.
Timing failures occur when process executions are delayed during processing or when the sequence
of their execution is disturbed. For example, because process executions are time-sensitive, the
timing of parallel processes running simultaneously can experience a delay if one of the processes
does not get completed within the expected time. Memory leaks and memory errors occur because of
improper memory allocation or de-allocation. This can happen when the memory footprint, which is

67
the amount of main memory a program uses or references, becomes very high. This may be caused
by prolonged memory usage and can result in intermittent freezes and crashes. Software failures
because of unhandled exceptions happen when an unexpected error occurs during execution and this
error is not handled by the software. For example, when the software tries to divide one by zero, an
error is generated. If this error is not handled, it could lead to an intermittent failure. Disk error
failures are software intermittent failures resulting from physical errors in the disk drives.
Concurrency-related failures occur when concurrent tasks are being executed, leading to heavy usage
of the system.
3.2 Diagnosis
In software, there are many different configurations possible. It is difficult, if not impossible;
to test a product under all these configurations, and intermittent failures can occur on configurations
which have not been fully tested.
While testing for intermittent behavior, the interaction between the hardware and software
needs to be considered, because hardware configuration can influence the frequency and length of
intermittent software failure. Syed et al. [47] observed that software testing results in a different
frequency of intermittent failures based upon the hardware configuration. For example, parameters
such as processor speed, memory, hard drive capacity, and processor load led to a variation in the
number of intermittent failures observed. Wei et al. [48] developed a test methodology to inject
faults at the hardware architecture level to understand the effect of hardware intermittent failures on
software failures. The authors discovered that different sites of the processor architecture affected the
software execution differently. They observed that the impact of a hardware fault on software will
depend upon the origination site and length of the hardware fault.
For the detection of intermittent software failures, five techniques [47] are used. The first
technique is known as deterministic replay debugging (DRB). It is the ability to replay precisely the
same set of instructions that led up to a software failure. Essentially, the engineer records all
instructions up to the point where the system crashes and then replays that recording to determine the
roots of the failure. It is used for bug detection, fault tolerance studies, and intrusion analysis [47]. It
is effective in debugging issues caused in multithreaded and distributed applications. The second
technique is called fuzzy testing (FT). It uses random, invalid, or unexpected data and observes how
the system reacts. Fuzzy-testing is generally used for detecting failures related to corrupted data,
leaks in memory, software crashes and assertions [47]. FT is also used to enhance software security.
The third commonly used technique is termed high volume test automation (HVTA). In this
approach the software automatically generates, executes and evaluates a large number of tests cases
to detect failures. The high volume of testing, which is automatically generated, offers a higher
probability of detecting failures. HVTA techniques are generally used in detecting failures such as
buffer overruns, stack overflows, resource exhaustion, and timing-related errors. The fourth failure
detection technique is load testing, which includes tests such as stress testing (testing at the operating
condition limits until the system breaks) and volume testing (operating very large tasks). Load
testing involves a demand which is exerted on a system or device while the response is being
monitored. It assists in determining the maximum operating capacity and identifying the bottlenecks
and weak links in a system. The last technique is called disturbance testing (DT). In this case, the
normal operation of the system is disrupted by introducing physical failures such as by unplugging
the power cord. This technique is used for testing the fault tolerance and the overall quality of a
system.
3.3. Mitigation
The aim of fault mitigation is to prevent unexpected outputs and control errors. Anderson et
al. [49] discussed the phases that constitute fault mitigation: error detection, damage assessment, and

68
error recovery. Error detection is used to identify the source of intermittent faults, while damage
assessment determines the extent of disruption and losses suffered by the system. Once the nature of
the fault is clearly identified, the next phase, error recovery, mitigates these faults. This stage
minimizes the negative effects experienced by the end user.
There are three techniques for error recovery: recovery block, n-version software, and self-
checking software [50]. Recovery blocks were originally developed by Randell [51] to prevent faults
in software components from affecting functionality at the system level. In this approach, results
from sequences in a software component are verified by adjudicator software. Each of the outputs of
the software component needs to pass an acceptance test by the adjudicator. N-version programming
(NVP) is also known as multi-version programming. In this method, multiple versions of
functionally equivalent software are created independently using the identical original specifications.
This assumes that independently generated software will have a sharply reduced probability of the
same software faults. Statistical techniques are employed to determine the most common responses
to these multiple versions, and measures are undertaken to mitigate the responses. N-version
software combines the advantages of redundancy (multiple software versions) and leveraging
statistical techniques [52]. Even though the NVP approach is commonly used in software developed
for electronic voting and switching trains, it is not free of controversy. There are critics who do not
agree that independently developed software versions will reduce the common errors. Self-checking
software [53] detects the occurrence of software errors, locate and identify the causes, and stop the
propagation of errors. For self-checking software to perform successfully, the system needs to
monitor both functional aspects of the process and the data. Functional monitoring checks for infinite
loops and incorrect loop terminations in a software program, while data monitoring checks the
integrity of defined data structures in software.
3.4 New Technology Trends
New approaches are being developed to overcome software related intermittent failures. Data
race issues can cause many intermittent failures in software. They are non-deterministic, hard to
debug, and cause problems at runtime [54]. A data race is initiated when two threads access the same
memory location without undergoing a synchronized operation and when at least one of the access
events is a write operation. Because of its complexity, the C and C++ language specifications leave
such program behavior undefined [55] and the Java specification for such programs is complicated
and known to be buggy [56]. There is a trend of increased usage of multithread programs because of
the use of multicore processors, and multithreading is prone to data race issues. One approach to
overcome data race detection issues was presented in 2013 by Wester et al. [57]. It is called
parallelizing data race detection. They point out that traditional data race detectors are too slow to be
used regularly. Wester et al. propose to increase the speed by spreading the detection work across
multiple cores. Their strategy involves a process called uniparallelism, which allows the execution of
program time intervals in a parallel manner, providing scalability while executing all threads on a
single core to eliminate locking.
Another emerging research area is automated software repair. Heuristic and algorithmic
approaches are leveraged for generating, evaluating, and repairing defective sites. This approach has
received attention in the field of language programming [58], operating systems [59], and software
engineering [60]. Automated repair is effective in solving concurrency bugs which lead to IF issues
[58]. Schulte et al. [61] presented a paper in 2013 outlining a methodology to employ automated
repair on arbitrary and non-repeatable software defects in embedded systems. This process has been
implemented on Nokia N9000 smart phones. The algorithm used for localizing fault sites is based on
Gaussian convolution and stochastic sampling. It reduces memory requirements by 85% for
embedded systems. It is ten times faster and is suited for devices where direct instrumentation is not
feasible.

69
Sahoo et al. [62] published a paper in 2013 wherein automatic diagnostic techniques are
proposed for isolating root causes for software-related intermittent failures. Self-generated likely
program invariants are used with filtering techniques at sites close to the fault-triggering point to
select a set of candidate programs as possible root causes. Likely program invariants are effective
tools for detecting and diagnosing software errors [63]. They are program properties that are
observed to hold valid in some set of successful executions but not necessarily for all executions.
The set of candidate sites are trimmed down by dynamic backward slicing, which is a technique that
can pinpoint precisely which instructions affect a particular value in a single execution of a program
[64]. The list of candidates are further reduced by dependence filtering, which is based upon the
premise that if an invariant on one instruction fails, then a different dependent instruction may also
have a chance of invariant failure, but the underlying cause is the first invariant and not the second.
The second filtering approach assumes that if multiple similar inputs result in the same failure
symptom, they are likely to have the same cause. This is a promising approach for the automatic
diagnosis of software root causes; however, this approach only works on deterministic detectors.
Future work is planned to include non-deterministic detectors.
The use of multicore processors has resulted in concurrency errors in multithreaded
programs. These errors can lead to intermittent failures arising from schedule-dependent failures.
These failures are caused by interactions between threads that were not anticipated by the program
developer [65]. Atomicity is another schedule-dependent failure that can cause intermittent failures.
This occurs when a thread accessing a shared state is inadvertently allowed to interleave between a
pair of accesses in another thread. A paper from the University of Washington [65] in 2013 discusses
the development of automated techniques for avoiding schedule-dependent failures such as
concurrency and atomicity. They established a system for collecting relevant program events during
run time. When a program fails, the information collected is analyzed to generate hypotheses for
failure causes. Leveraging the multiple instances of the deployed software in operation, a predictive
statistical model and an empirical framework has been developed to identify which hypothesis is
most likely to be correct. Corrective actions are taken by manipulating future program executions.
The emphasis of the study is not on failure detection but on failure avoidance.
4. RECOMMENDATIONS
Intermittent failures should be treated seriously not only because of the massive cost but also
because they could be early indicators to permanent failures. For intermittent failures, it is better to
focus on failure avoidance rather than failure detection or failure mitigation. From the hardware
design perspective it is recommended that the specification of minimum spacing requirements for
circuit traces should be dependent upon the current usage. With the increase in semiconductor
scaling, preemptive design strategies need to be developed that leverage data like IVF (Intermittent
Vulnerability Factor) discussed in this paper. On the packaging side it would be valuable to develop
new materials which offer better shielding from cosmic radiation to prevent SEUs (Single Event
Upsets). Self-repairing wire bonds and self-healing solder joints may sound futuristic but they can
diminish the occurrence of intermittent failures in hardware. Since connector disconnections is a
common cause for intermittent failures it is recommended to develop effective methodologies for
monitoring travel waves caused by sudden connector dis-connections. For some avionic systems it is
recommended to develop an online test methodology rather than performing lab testing to increase
the probability of detecting intermittent failures.
Software intermittent failures should always be studied within the context of the hardware
being used and it is important to focus on fault causes rather than on the observability of intermittent
failures. There is a need for more detailed studies in solving system-level intermittent failures. With
the increase in multicore processor usage, it is recommended to anticipate and preempt IF problems

70
caused by data race when using multithreading programming. Parallelizing techniques should be
employed where possible to detect data race failure. It is recommended to use automated software
repair for solving concurrency issues and likely program invariants are encouraged for automatic
diagnostic techniques for solving deterministic failures.
5. CONCLUSIONS
Intermittent failures are difficult to diagnose because, when they are investigated, the faults
cannot be replicated consistently. This paper undertakes a wider approach by describing the various
causes, diagnosis and mitigation strategies for intermittent failures manifested at the hardware and
software levels. Some promising upcoming technologies are highlighted that might help develop
future solutions for intermittent failures. Since diagnosing intermittent failure is challenging, helpful
tables and methodologies have been presented to detect the causes of hardware and software
intermittent failures. Recommendations have been offered to help minimize the occurrence of
intermittent failures in hardware and software. The paper strives to advance the state of the art and
practice by covering a wide diversity of intermittent failures, both in hardware and software while
offering an understanding of the underlying causes and proposing approaches and methodologies for
diagnosis and mitigation.
6. ACKNOWLEDGEMENTS
The authors would like to acknowledge the personnel associated with the University of
Maryland and CALCE (Center for Advanced Life Cycle Engineering) for their constant support and
assistance in developing this paper. Special appreciation and thanks are due to Diganta Das, Kelly
Smith, Mark Zimmerman, Faye Chai, Weifeng Liu and Ken Neubeck for guidance in the content,
structure and presentation of this paper.
7. REFERENCES
[1] Authoritative Dictionary of IEEE Standard Terms, 7th
edition, published by Standards
Information Network IEEE Press, 2000 IEEE 100.
[2] K. Neubeck, “Practical Reliability Analysis”, (Prentice Hall, 2004).
[3] D. A. Thomas, K. Ayers, and M. Pecht, “The ‘trouble not identified’ phenomenon in
automotive electronics,” Microelectronics Reliability, vol. 42, no. 4–5, pp. 641–651,
Apr. 2002.
[4] I. James, D. Lumbard, I. Willis, and J. Goble, “Investigating no fault found in the aerospace
industry,” in Reliability and Maintainability Symposium, 2003. Annual, 2003, pp. 441 – 446.
[5] P. Söderholm, “A system view of the No Fault Found (NFF) phenomenon,” Reliability
Engineering & System Safety, vol. 92, no. 1, pp. 1–14, Jan. 2007.
[6] B. Steadman, T. Pombo, I. Madison, J. Shively, and L. Kirkland, “Reducing No Fault Found
using statistical processing and an expert system,” in AUTOTESTCON Proceedings, 2002.
IEEE, 2002, pp. 872 – 878.
[7] WDS Global white paper, “No Fault Found returns cost the mobile industry $4.5 Billion per
year”, 2006. <online>
http://www.wds.co/news/whitepapers/20060717/MediaBulletinNFF.pdf.
[8] Kimseng K., Hoit M, Pecht M, “ Physics of failure assessment of a cruise control module”
Microelectronics Reliability, 1999, 39(10):423-444.

71
[9] C. Maul, J. W. McBride, and J. Swingler, “Intermittency phenomena in electrical
connectors,” Components and Packaging Technologies, IEEE Transactions on, vol. 24, no. 3,
pp. 370 –377, Sep. 2001.
[10] M. Antler, “Contact fretting of electronic connectors”, IEICE Trans. Electron, Vol E82-C, #1,
1994, pp 3-12.
[11] C. Maul, J. McBride and J. Swingler, “On the nature of intermittence in electrical contacts”,
in 20th Int. Conf. Electrical Contacts, Stockholm, 2000, pp 23-28.
[12] A. Gibson, S. Choi, T. Bieler and K. Subramanian, Environmental concerns and materials
issues in manufactured solder joints, Proceedings of the 1997 IEEE International Symposium,
In Electronics and the Environment (1997) 246–251.
[13] H. A. Schafft, “Failure Analysis of Wire Bonds,” in Reliability Physics Symposium, 1973.
11th Annual, 1973, pp. 98 –104.
[14] R. E. McCullough, “Screening Techniques for Intermittent Shorts,” in Reliability Physics
Symposium, 1972. 10th Annual, 1972, pp. 19 –22.
[15] T. Koch, W. Richliug, J. Whitlock, and D. Hall, “A Bond Failure Mechanism,” in Reliability
Physics Symposium, 1986. 24th Annual, 1986, pp. 55 –60.
[16] Sorensen B. Digital averaging-the smoking gun behind No-Fault-Found, Air Safety Week,
February, 24, 2003.
[17] W.C. Maia Filho, M. Brizoux, H.Fremont, Y. Danto, “Improved Physical Understanding of
Intermittent Failure in Continuous Monitoring Method”, Proceedings of 14th IPFA, 2007,
pp.141-146.
[18] M. Reid, J. Punch, G. Grace, L. F. Garfias, and S. Belochapkine, “Corrosion Resistance of
Copper-Coated Contacts,” Journal of The Electrochemical Society, vol. 153, no. 12, p. B513,
2006.
[19] D. Minzari, M. S. Jellesen, P. Møller, and R. Ambat, “On the electrochemical migration
mechanism of tin in electronics,” Corrosion Science, vol. 53, no. 10, pp. 3366–3379,
Oct. 2011.
[20] B. Sood, M. Osterman and M. Pecht, Tin whisker analysis of Toyotas electronic throttle
control, CircuitWorld 37(3) (2011) 4–9.
[21] C. Constantinescu, “Intermittent faults and effects on reliability of integrated circuits,” in
Reliability and Maintainability Symposium, 2008. RAMS 2008. Annual, 2008, pp. 370 –374.
[22] D. T. Blaauw, C. Oh, V. Zolotov, and A. Dasgupta, “Static electromigration analysis for on-
chip signal interconnects,” Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on, vol. 22, no. 1, pp. 39 – 48, Jan. 2003.
[23] S. Kothawade, K. Chakraborty, S. Roy, and Y. Han, “Analysis of intermittent timing fault
vulnerability,” Microelectronics Reliability, vol. 52, no. 7, pp. 1515–1522, Jul. 2012.
[24] S. Pan, Y. Hu, and X. Li, “IVF: Characterizing the vulnerability of microprocessor structures
to intermittent faults,” in Design, Automation Test in Europe Conf. Exhibition, 2010,
pp. 238 –243.
[25] N. Vichare and M. Pecht, Prognostics and health management of electronics IEEE
Transactions on Components and Packaging Technologies, 29(1) (2006) 222–229
[26] S. Mathew, D. Das, R. Rossenberger, and M. Pecht, “Failure mechanisms based prognostics,”
in Prognostics and Health Management, 2008. PHM 2008. International Conference, 2008,
pp. 1 –6.
[27] L. V. Kirkland, “When should intermittent failure detection routines be part of the legacy re-
host TPS?” in AUTOTESTCON, 2011 IEEE, 2011, pp. 54 –59.
[28] H. Qi, S. Ganesan, and M. Pecht, “No-fault-found and intermittent failures in electronic
products,” Microelectronics Reliability, vol. 48, no. 5, pp. 663–674, May 2008.

72
[29] Bryan Steadman, Floyd Berghout, Nathan Olsen, “Intermittent Fault Detection and Isolation
System”, IEEE AUTOTESTCON, 2008.
[30] M. Pecht, Prognostics and health monitoring of electronics, John Wiley & Sons, Ltd, 2008.
[31] J. Savir, “Detection of Intermittent Faults in Sequential Circuits” Stanford University, Rep.
TR-120, 1978.
[32] L. Kirkland, “When should intermittent failure detection routines be part of the Legacy
Re-Host TPS”, IEEE, Autotestcon, 2011, pp 54-59.
[33] R. Vattikonda, W. Wang and Y. Cao, “Modeling and minimization of PMOS NBTI effect for
robust nanometer design”, in proceedings of the Design Automation Conference, DAC 2006.
[34] M. Demertzi, B. Zandian, R. Rojas and M. Annavaram, “Benchmarking ISA Reliability to
Intermittent Failures”, IEE International Symposium on Workload Characterization (IISWC),
2012, pp. 86-87.
[35] S. Hannel, S. Fouvry, P. Kapsa and L. Vincent “The fretting sliding transition as a criterion
for electrical contact performance” WEAR, Vol 49, 2001, pp 761-770.
[36] A.Ginart, I. Ali, J. Goldwin, P. Kalgren, M. Roemer, E. Balaban and J. Celaya “Sensing and
characterization of EMI during Intermittent Connector Anomalies” Aerospace Conference,
IEEE, March 3-10, 2012, pp 1-7.
[37] R. Rao, K. Chopra, D. Blaauw and D. Sylvester, “An efficient static algorithm for computing
the soft error rates of combinatorial circuits,” in Proceedings of Design, Automation and Test
in Europe, Vol. 1, March 2006, pp1-6.
[38] N. Kehl and W. Rosenstiel, “An efficient SER estimation method for Combinatorial
Circuits”, IEEE Transactions on Reliability, vol 60, number 4, 2011, pp 742-747.
[39] S. Pan, Y. Hu and X. Li, “IVF: Characterizing the Vulnerability of Microprocessor Structures
to Intermittent Faults”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
Vol. 20, number 5, 2012, pp 777-790.
[40] A. Correcher, E. Garcia, F. Morant, E. Quiles and L. Rodriguez, “Intermittent Failure
Dynamics Characterization”, IEEE Transactions on Reliability, Vol 61, Number 3,
pp 649-658, Sep. 2012.
[41] N. Balakrishnan and M. Asadi, “A proposed measure of Residual Life of Live Components
of a Coherent System”, IEEE Trans. Rel. Vol. 61, #1, pp 41-49.
[42] J. Kleer, B. Price, L.Kuhn, M. Doh, R. Zhou, “A framework for continuously estimating
persistent and intermittent failure probabilities”, Palo Alto Research Center Publications,
2008.
[43] J. Xie and M. Pecht, Applications of in-situ health monitoring and prognostic sensors, The
9th Pan Pacific microelectronics Symposium, Exhibits and Conference (2004) 10–12.
[44] S. Mathew, D, Das, M. Oserma, M. Pecht and N. Ferebee, Prognostic assessment of
aluminum support structure on printed circuit boards, ASME Journal of Electronic Packaging
128(4) (2006), 339–345.
[45] V. Shetty, D. Das, M. Pecht, D. Hiemstra and S, Martin, Remaining life assessment of shuttle
remote manipulator system end effector, Proceedings of the 22nd Space Simulation
Conference (2002), 21–23.
[46] J. Gracia, L. Saiz, J. C. Baraza, D. Gil, and P. Gil, “Analysis of the influence of intermittent
faults in a microcontroller,” in Design and Diagnostics of Electronic Circuits and Systems,
2008. DDECS 2008. 11th IEEE Workshop on, 2008, pp. 1 –6.
[47] R. A. Syed, B. Robinson, and L. Williams, “Does Hardware Configuration and Processor
Load Impact Software Fault Observability?” in Software Testing, Verification and Validation
(ICST), 2010 Third International Conference on, 2010, pp. 285 –294.

73
[48] J. Wei, L. Rashid, K. Pattabiraman, and S. Gopalakrishnan, “Comparing the effects of
intermittent and transient hardware faults on programs,” in Dependable Systems and
Networks Workshops (DSN-W), 2011 IEEE/IFIP 41st International Conference on, 2011,
pp. 53 –58.
[49] T. Anderson and J. C. Knight, “A Framework for Software Fault Tolerance in Real-Time
Systems,” IEEE Transactions on Software Engineering, vol. SE-9, no. 3, pp. 355 – 364,
May 1983.
[50] M. R. Lyu, Software Fault Tolerance. New York, NY, USA: John Wiley & Sons, Inc.,
1995.
[51] B. Randell, “System structure for software fault tolerance,” in Proceedings of the
international conference on Reliable software, New York, NY, USA, 1975, pp. 437–449.
[52] A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Transactions on
Software Engineering, vol. SE-11, no. 12, pp. 1491 – 1501, Dec. 1985.
[53] Ronitt A. Rubinfeld, A mathematical theory of self-checking, self-testing and self-correcting
programs, University of California at Berkeley, Berkeley, CA, 1991.
[54] N. Levenson and C. Turner, “An investigation of the Therac-25 accidents”, IEEE Computer,
26(7): 18-41, July 1993.
[55] H. Boehm and S. Adve, “Foundations of the C++ concurrency memory model”, In Proc.
2008 ACM Conference on Programming Language Design and Implementation, pp. 69-78.
[56] J. Seveik and D. Aspinall, “On validity of Program Transformations in the Java memory
Model”, in Proc. 2008 European Conference on Object-Oriented Programming. Pp 27-51.
[57] B. Wester, D. Devecsery, P. Chen, J. Flinn and S. Narayanasamy, “Parallelizing Data Race
Detection”, In APLOS 2013, Houston Texas, March 16-20, 2013.
[58] G. Jin, L. Song, W. Zhang, S. Lu and B. Liblit, “Automated atomicity violation fixing”, In
Programming Language Design and Implementation”, In Programming Language Design and
Implementation, 2011, pp. 389-400.
[59] J. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, and M. Carbin, “Automatically
patching errors in deployed software. In Symposium on Operating Systems Principles, 2009,
pp. 87-102.
[60] Y. Wei, Y. Pei, C. Furia, L. Silva, S. Buchholz, B. Meyer and A. Zeller, “Automated fixing
of programs with contracts”, in International Symposium on Software Testing and Analysis”,
2010, pp.61-72.
[61] E. Schulte, J. DiLorenzo, W. Weimer, S. Forrest, “ Automated repair of binary and assembly
programs for cooperating embedded devices”, In APLOS 2013, Houston Texas, March
16-20, 2013.
[62] S. Sahoo, J. Crisswell, C. Geigle and V. Adve, “ Using Likely Invariants for automated
Software Fault Localization”, In APLOS 2013, Houston Texas, March 16-20, 2013.
[63] M. Ernst, J. Cockrell, W. Griswold, and D. Notkin, “Dynamically discovering likely program
invariants to support program evolution” IEEE Trans. Software Eng., 2001.
[64] X. Zhang, R. Gupta and Y. Zhang, “ Precise dynamic slicing algorithms”, In Proceedings of
the 25th International Conference on Software Engineering, 2003.
[65] B. Lucia and L. Ceze, “Cooperative Empirical Failure Avoidance for Multithread programs”,
In APLOS 2013, Houston Texas, March 16-20, 2013.
[66] V.Yuvaraj and T.Vasanth, “Simulation, Control and Analysis of HTS Resistive and Power
Electronic FCL for Fault Current Limitation and Voltage Sag Mitigation in Electrical
Network”, International Journal of Electrical Engineering & Technology (IJEET), Volume 4,
Issue 3, 2013, pp. 82 - 94, ISSN Print : 0976-6545, ISSN Online: 0976-6553.

40220140505007

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie 40220140505007

Ähnlich wie 40220140505007 (20)

Mehr von IAEME Publication

Mehr von IAEME Publication (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

40220140505007