SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Probabilistic Design for 
                  Reliability (PDfR) in 
                      Electronics
                      El        i
                         Part I
                         Part I
                                 Dr. E. Suhir
                            ©2011 ASQ & Presentation Suhir
                           Presented live on Jan 03~06th, 2011




http://reliabilitycalendar.org/The_Re
liability_Calendar/Short_Courses/Sh
liability Calendar/Short Courses/Sh
ort_Courses.html
ASQ Reliability Division 
                 ASQ Reliability Division
                  Short Course Series
                  Short Course Series
                  One of the monthly webinars 
                  One of the monthly webinars
                    on topics of interest to 
                      reliability engineers.
                    To view recorded webinar (available to ASQ Reliability 
                        Division members only) visit asq.org/reliability
                                             )              /

                     To sign up for the free and available to anyone live 
                    webinars visit reliabilitycalendar.org and select English 
                    Webinars to find links to register for upcoming events


http://reliabilitycalendar.org/The_Re
liability_Calendar/Short_Courses/Sh
liability Calendar/Short Courses/Sh
ort_Courses.html
PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT,
      the Roles of Failure Oriented Accelerated Testing (FOAT)
                 and Predictive Modeling (PM), and
           a Novel Approach to Qualification Testing (QT)
                                                                    “You can see a lot by observing”
                                                                Yogi Berra, American Baseball Player

                                                              “It is easy to see, it is hard to foresee”
                                                Benjamin Franklin, American Scientist and Statesman


                                              E. Suhir
     Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret),
                University of California, Dept. of Electrical Engineering, Santa Cruz, CA,
            University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and
                         ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA
                     Tel. 650-969-1530, cell. 408-410-0886, e-mail: suhire@aol.com

                     Four hour ASQ-IEEE RS Webinar short course
Dr. E. Suhir
                                  January 3-6, 2011                                               Page 1
Contents
Session I
1. Introduction: background, motivation, incentive
2. Reliability engineering as part of applied probability and Probabilistic Risk
    Management (PRM) bodies of knowledge
3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls
    and interaction with other accelerated test categories
Session II
4. Predictive Modeling (PM): FOAT cannot do without it
5. Example of a FOAT: physics, modeling, experimentation, prediction
Session III
6. Probabilistic Design for Reliability (PDfR), its role and significance
Session IV
7. General PDfR approach using probability density functions (pdf)
8. Twelve steps to be conducted to add value to the existing practice
9. Do electronic industries need new approaches to qualify their devices into products?
10. Concluding remarks
Dr. E. Suhir                                                                           Page 2
Session I
         1. Introduction: background, motivation, incentive


                                         “Vision without action is a daydream.
                                         Action without vision is a nightmare”
                                                              Japanese saying

                                       “The problem is not that old age comes.
                                        The problem is that young age passes”
                                                             Common Wisdom




Dr. E. Suhir                                                                 Page 3
Background

      The short-term down-to-earth and practical goal of a particular electronic or a
      photonic device manufacturer is to conduct and pass the established qualification
      tests, without questioning whether they are perfect or not

      The ultimate long-term and broad goal of electronic, opto-electronic and photonic
      industries, regardless of a particular manufacturer or even a particular product, is to
      make the industries deliverables sufficiently reliable in the field, be consistently good
      in performance, and so to elicit trust of the customer

      Qualification testing (QT), such as, e.g., those prescribed by the JEDEC, Telcordia,
      AEC or the MIL specs, is the major means that the electronic, opto-electronic and
      photonic industries use to make their viable-and-promising devices into reliable-and-
      marketable products.




Dr. E. Suhir                                                                              Page 4
Motivation
  It is well known, however, that devices and systems that passed the existing qualification tests
often fail in the field. Should it be this way? Is this a problem indeed? Are the existing qualification
specifications adequate? Do electronic and photonic industries need new approaches to qualify
their devices into products?

  If they do, could the today’s qualification specifications and testing procedures be improved to
an extent that if the device passed these tests, its performance in the field would be satisfactory?

  On the other hand, there is a perception, perhaps, a rather substantiated one, that some electronic
components “never fail”. Although one should never say “never”, such a perception exists because
some products might be too robust and, as the consequence of that, are more costly than
necessary. Could the situation be changed and could the cost be brought down considerably, if one
would be able to assess the actual, most likely superfluous, probability of non-failure in the field
and come up, for a particular product, with the best compromise between reliability, cost and time-
to-market?

  Would it be possible to “prescribe” (specify), predict and, if necessary, even control the low
enough probability of failure for a product that operates under the given stress (not necessarily
mechanical, of course) conditions for the given time?
Dr. E. Suhir                                                                                      Page 5
Incentive
  We argue that the improvements in the QT, as well as in the existing best practices, are
indeed possible, provided that the Probabilistic Design for Reliability (PDfR) concept is
thoroughly developed and the corresponding methodologies are employed

   One effective way to improve the existing QT and specs is to
  conduct, on a wide scale, the appropriate Failure Oriented Accelerated Testing (FOAT) at both the
design stage (DFOAT) and the manufacturing stage (MFOAT), and, since DFOAT cannot do without
predictive modeling (PM),
  carry out, whenever and wherever possible, PM to understand the physics of failure, and to
predict, based on the DFOAT, the probability of failure in the field,
  revisit, review and revise, considering the DFOAT and, to a lesser extent, MFOAT data obtained
for the most vulnerable elements of the device of interest, the existing QT practices, procedures,
and specifications,
  develop and widely implement the PDfR methodologies and algorithms having in mind that
“nobody and nothing is perfect” and that probability of failure in the field is never zero, but could be
predicted and, if necessary, minimized, controlled and maintained at an acceptable low level during
product operation.

Dr. E. Suhir                                                                                      Page 6
2. Reliability engineering as part of applied probability
           and Probabilistic Risk Management (PRM)
                     bodies of knowledge


                              “A pinch of probability is worth a pound of perhaps”
                               James G. Thurber, American writer and cartoonist

                                                  “In a long run we are all dead”
                                         John Maynard Keynes, British economist




Dr. E. Suhir                                                                Page 7
Reliability engineering

      deals with failure modes and mechanisms, “root” causes of occurrence of various
      failures, role of various defects, methods to estimate and prevent failures, and
      probability-based designs for reliability;

      provides guidance on how to make a viable device into a reliable and marketable
      product;

      in products, for which a certain level of failures is considered acceptable (such as, e.g.,
      consumer products), examines ways of bringing down the failure rate to an allowable
      level;

      for products, for which a failure is a catastrophe, examines and considers ways of
      making the probability of failure as low as necessary or possible.




Dr. E. Suhir                                                                                Page 8
Reliability engineering as part of applied probability and
        probabilistic risk management (PRM) bodies of knowledge

  Reliability is part of applied probability and probabilistic risk management (PRM) bodies
of knowledge, and includes the item's (system's) dependability, durability, maintainability,
reparability, availability, testability, etc., i.e., probabilities of the corresponding events or
characteristics

  Each of these characteristics is measured as a certain probability and could be of a
greater or lesser importance depending on the particular function and operation
conditions of the item or the system, and consequences of failure

  Applied probability and Probabilistic Risk Management (PRM) approaches and
techniques put the art of Reliability Engineering on a solid “reliable” ground.




Dr. E. Suhir                                                                                Page 9
“If a man will begin with certainties, he will end with doubts; but if he will be
                       content to begin with doubts, he shall end in certainties.”
                        Sir Francis Bacon, English Philosopher and Statesman


 “We see that the theory of probability is at heart only common sense reduced
to calculations; it makes us appreciate with exactitude what reasonable minds
      feel by a sort of instincts, often without being able to account for it… The
 most important questions of life are, for the most part, really only problems of
                                                                      probability.”
                                              Pierre Simon, Marquise de Laplace


   “Mathematical formulas have their own life, they are smarter than we, even
    smarter than their authors, and provide more than what has been put into
                                                                        them”
                                            Heinrich Hertz, German Physicist




     E. Suhir
Reliability should be taken care of
                          on the permanent basis
The reliability evaluation and assurance cannot be delayed until the device is made
   (although it is often the case in many actual industries). Reliability should be

      “conceived” at the early stages of its design (a reliability and an electronic engineers
      should start working together from the very beginning of the device/system
      development),

      implemented during manufacturing (through a high quality manufacturing process)

      qualified and evaluated by electrical, optical, environmental and mechanical testing
      both at the design and the manufacturing stages (the customer requirements and the
      general qualification requirements are to be considered),

      checked (screened) during production (by implementing an adequate burn-in process)
      and, if necessary and appropriate,

      maintained in the field during the product’s operation, especially at the early stages of
      the product’s use (by employing, e.g., technical diagnostics, prognostication and
      health monitoring methods and instrumentation).
Dr. E. Suhir                                                                              Page 11
Three classes of engineering products
                    from the reliability point of view
       See E.Suhir, Applied probability for engineers and scientists, McGraw-Hill, 1997



      Class I. The product has to be made as reliable as possible. Failure should not be
      permitted. Examples are some military or space objects

      Class II. The product has to be made as reliable as possible, but only for a certain level
      of demand (stress, loading). Failure is a catastrophe. Examples are civil engineering
      structures, bridges, ships, aircraft, cars

      Class III. The reliability does not have to be very high. Failures are permitted, but
      should be restricted. Examples are consumer products, commercial electronics,
      agricultural equipment.




Dr. E. Suhir                                                                                  Page 12
Class I (military or similar) products

      The product (object) has to be made as reliable as possible. Failure is viewed as a
      catastrophe. Examples are some warfare, military aircraft, battle-ships, spacecraft

      Cost is not a dominating factor

      The products usually have a single customer, such as the government or a big firm

       The reliability requirements are defined in the form of government standards

      The standards not only formulate the reliability requirements for the product, but also
      specify the methods that are to be used to prove (demonstrate) the reliability, and
      often even prescribe how the system must be manufactured, tested and screened

      It is typically the customer, not the manufacturer, who sets the reliability standards.


Dr. E. Suhir                                                                               Page 13
Class II (industrial or similar) products
  The product (system, structure) has to be made as reliable as possible, but only for a certain
specified level of loading (demand). If the actual load (waves, winds, earthquakes, etc.) happens to
be larger than the design demand, then the product might fail, although the probability of such a
failure should be determined beforehand and should/could be (made) very small

  Examples are: long-haul communication systems, civil engineering structures (bridges, tunnels,
towers), passenger elevators, ocean-going vessels, offshore structures, commercial aircraft,
railroad carriages, cars, some medical equipment

   These are highly expensive products, which are produced in large quantities, and therefore
application of Class I requirements will lead to unjustifiable, unfeasible and unacceptable
expenses. Failure is a catastrophe and might be associated with loss of human lives and with
significant economic losses

  The products are typically intended for industrial, rather than government, markets. These
markets are characterized by rather high volume of production (buildings, bridges, ships, aircraft,
automobiles, telecommunication networks, etc.), but also by fewer and more sophisticated
customers than in the commercial (Class III) market.

Dr. E. Suhir                                                                                    Page 14
Class III (consumer, commercial) products
  The typical market is the consumer market. An individual consumer is a very small part of the
total consumer base. The product is inexpensive and manufactured in mass quantities

  The demand for the product is usually driven by the cost of the product and time-to-market, rather
than by its reliability. As long as the product is “sellable”, its reliability does not have to be very
high: it should only be adequate for customer acceptance and reasonable satisfaction. Simple and
innovative products, which have a high degree of customer appeal and are in significant demand,
may be able to prosper, at least for some time, even if they are not very reliable

  Failure is not a catastrophe: a certain reasonable level of failures during normal operation of the
product is acceptable, as long as the failure rate is within the anticipated/expected range

  Reliability testing is limited, and the improvements are often implemented based on the field
feedback

  It is typically the manufacturer, not the consumer, who sets the reliability standards, if any, for the
product . No special reliability standards are often followed, and it is the customer satisfaction (on
the statistical basis), which is the major criterion of the viability and quality of the product.

Dr. E. Suhir                                                                                      Page 15
Reliability, cost-effectiveness, and time-to-market

      Reliability, cost effectiveness and time-to-market considerations play an important role in the
      design, materials selection and manufacturing decisions, and are the key issues in competing
      in the global market-place. A company cannot be successful, if its products are not cost
      effective, or do not have a worthwhile lifetime and service reliability to match the expectations
      of the customer. Too low a reliability can lead to a total loss of business

      Product failures have an immediate, and often dramatic, effect on the profitability and even the
      very existence of a company. Profits decrease as the failure rate increases. This is due not only
      to the increase in the cost of replacing or repairing parts, but, more importantly, to the losses
      due to the interruption in service, not to mention the “moral losses”. These make obvious dents
      in the company’s reputation and, as the consequence of that, affect its sails

      The time to develop and to produce products is rapidly decreasing. This circumstance places a
      significant pressure on both business people and reliability engineers, who are supposed to
      come up with a reliable product and to confirm its long-term reliability in a short period of time
      to make their device a product and to make this product successful in the marketplace

      Each business, whether small or large, should try to optimize its overall approach to reliability.
      “Reliability costs money”, and therefore a business must understand the cost of reliability, both
      “direct” cost (the cost of its own operations), and the “indirect” cost (the cost to its customers
      and their willingness to make future purchases and to pay more for more reliable products).
Dr. E. Suhir                                                                                      Page 16
3. Failure Oriented Accelerated Testing (FOAT): its role,
     attributes, challenges, pitfalls and interaction with
              other accelerated test categories



                    “Nothing is impossible. It is often merely for an excuse that we
                                                      say that things are impossible”
                                 Francois de La Rochefoucauld, French philosofer

                                             “Truth is really pure and never simple”
                     Oscar Wilde, British writer, “The Importance of Being Earnest”




Dr. E. Suhir                                                                     Page 17
Why accelerated tests?


      It is impractical and uneconomical to wait for failures, when the mean-time-to-failure
      for a typical today’s electronic device (equipment) is on the order of hundreds of
      thousands of hours

      Accelerated testing (AT) enables one to gain greater control over the reliability of a
      product

      AT has become a powerful means in improving reliability. This is true regardless of
      whether (irreversible or reversible) failures will or will not actually occur during the
      FOAT (“testing to fail”) or QT (“testing to pass”)

      In order to accelerate the material’s (device’s) degradation and/or failure, one has to
      deliberately “distort” (“skew”) one or more parameters (temperature, humidity, load,
      current, voltage, etc.) affecting the device functional and/or mechanical performance
      and/or its environmental durability.



Dr. E. Suhir                                                                            Page 18
Accelerated test categories: traditional definitions
Accelerated     Product development       Qualification (“screening”)   Accelerated life tests
test type       (verification) tests      tests (QTs)                   (ALTs), highly accelerated
(category)      (PDTs)                                                  life tests (HALTs), and
                                                                        failure oriented accelerated
                                                                        tests (FOATs)
                Technical feedback to     Proof of reliability;         Understand modes and
Objective       ensure that the taken     demonstration that the        mechanisms of failure and ,
                design approach is        product is qualified to       time permitting, accumulate
                viable (acceptable)       serve in the given capacity   failure statistics


                Time, type, level, and/or Predetermined time and/or
End point       number of failures        the # of cycles, and/or the   Predetermined number or
                                          excessive (unexpected)        percent of failures
                                          number of failures

Follow-up       Failure analysis, design Pass/fail decision             Failure analysis and , time
activity        decision                                                permitting, statistical
                                                                        analysis of the test data
Perfect (ideal) Specific definition(s)    No failure in a long time     Numerous failures in a
test Suhir
Dr. E.                                                                  short time            Page 19
Accelerated test categories: updated definitions
 Accelerated          Product                Qualification                           Accelerated Life Testing (ALT)=
   test type       development          (“screening”) testing                 =Failure Oriented Accelerated Testing (FOAT)
  (category)       (verification)               (QT)
                   testing (PDT)         at the         at the        at the design stage (DFOAT)              At the manufacturing stage
                                         design manufacturi                                                  (MFOAT)= Hobbs’ Highly ALT
                                      stage (DQT) ng stage                                                   (HHALT)= Accelerated burn-in
                                                       (MQT)
  Objective           Technical            Proof of reliability;    Understand physics (modes and                Assess failure limits,
                     feedback to     demonstration that the item mechanisms) of failure, failure limits,       Weed out infant mortalities
                   ensure that the is qualified into a product, and, time permitting, accumulate
                    taken design      i.e., is able to serve in the        failure statistics
                  approach is viable          given capacity
                    (acceptable)



  End point       Time, type, level, Predermined time and/or                     Predetermined number or percent of failures
                  and/or number of number of cycles, and/or
                      failures       the excessive (unexpected)
                                          number of failures

  Follow-up       Failure analysis,       Pass/fail decision        Failure analysis and, time permitting,         Pass/fail decision
   activity       Design decision                                          also statistical analysis
                                                                               of the test data
Perfect (ideal)       Specific         No failures in a long time    Numerous failures in a short time          No failures in a long time
     test            definitions
Some most common accelerated test conditions (stimuli)
High Temperature (Steady-State) Soaking/Storage/ Baking/Aging/ Dwell,
Low Temperature Storage,
Temperature (Thermal) Cycling,
Power Cycling,
Power Input and Output,
Thermal Shock,
Thermal Gradients,
Fatigue (Crack Propagation) Tests,
Mechanical Shock,
Drop Shock (Tests),
Random Vibration Tests,
Sinusoidal Vibration Tests (with the given or variable frequency),
Creep/Stress-Relaxation Tests,
Electrical Current Extremes,
Voltage Extremes,
High Humidity,
Radiation (UV, cosmic, X-rays),
Altitude,
Space Vacuum
Elevated Stress

AT uses elevated stress level and/or higher stress-cycle frequency as effective stimuli
to precipitate failures over a much shorter time frame

The “stress” in reliability engineering does not necessarily have to be a mechanical or
a thermo-mechanical: it could be electrical current or voltage, high (or low)
temperature, high humidity, high frequency, high pressure or vacuum, cycling rate, or
any other factor (stimulus) responsible for the reliability of the device or the equipment

AT must be specifically designed for the product under test

The experimental design of AT should consider the anticipated failure modes and
mechanisms, typical use conditions, and the required or available test resources,
approaches and techniques.
Qualification Testing (QT) is a must
       The objective of the qualification testing (QT) is to prove that the reliability of the
    product-under-test is above a specified level. This level is usually measured by the
    percentage of failures per lot and/or by the number of failures per unit time (failure
    rate)

      The typical requirement is no more than a few percent failing parts out of the total
    lot (population)

      QT enables one to “reduce to a common denominator” different products, as well
    as similar products, but produced by different manufacturers

      QT reflects the state-of-the-art in a particular field of engineering, and typical
    requirements for the performance of the product. Industry cannot do without QT

       Testing is time limited and is generally non-destructive (not failure oriented).


Dr. E. Suhir                                                                              Page 23
Today’s Qualification Testing (QT): shortcomings

      The today’s qualification standards and requirements are only good for what they
    are intended - to confirm that the given device is qualified into a product to serve in a
    particular capacity

      If a product passed the standardized qualification tests, it is not always clear why it
    was good, and if the product failed the tests, it is equally unclear what could be done
    to improve its reliability

       If a product passed the qualification tests, it does not mean that there will be no
    failures in the field, nor it is clear how likely or unlikely these failures might be

       Since QT is not failure oriented, it is unable to provide the most important ultimate
    information about the reliability of the product - the probability of its failure after the
    given time in service and under the given service (operation, stress) conditions.



Dr. E. Suhir                                                                               Page 24
Failure Oriented Accelerated Testing (FOAT)-1
  FOAT is aimed at the revealing and understanding the physics of the expected or
occurred failures. Unlike QTs, FOAT is able to detect the possible failure modes and
mechanisms

  Another objective of the FOAT is to accumulate failure statistics. Thus, FOAT deals with
the two major aspects of the Reliability Engineering – physics and statistics of failure

  Adequately planned, carefully conducted, and properly interpreted FOAT provides a
consistent basis for the prediction of the probability of failure after the given time in
service. Well-designed and thoroughly implemented FOAT can dramatically facilitate the
solutions to many engineering and business-related problems, associated with the cost
effectiveness and time-to-market

  This information can be helpful in understanding what should be changed to design a
viable and reliable product. Indeed, any structural, materials and/or technological
improvement can be “translated”, using the FOAT data, into the probability of failure for
the given duration of operation under the given service (environmental) conditions.
Dr. E. Suhir                                                                           Page 25
Failure Oriented Accelerated Testing (FOAT)-2

  FOAT should be conducted in addition to, and, preferably, long before the
qualification tests. There might be also situations, when FOAT can be used as an
effective substitution for the QT, especially for new products, when acceptable
qualification standards do not yet exist

   While it is the QT that makes a device into a product, it is the FOAT that enables one
to understand the reliability physics behind the product and, based on the appropriate
PM, to create a reliable product with the predicted probability of failure

  There is always a temptation to broaden (enhance) the stress as far as possible to
achieve the maximum “destructive effect” (FOAT effect) in a shortest period of time.
Unfortunately, sometimes, accelerated test conditions may hasten failure mechanisms
that are different from those that could be actually observed in service conditions
(“shift” in the modes and/or mechanisms of failure)



Dr. E. Suhir                                                                         Page 26
FOAT pitfalls
  Because of the existence of such “shifts”, it is always necessary to correctly identify the
expected failure modes and mechanisms, and to establish the appropriate stress limits, in
order to prevent “shifts” in the original (actual) dominant failure mechanism

  Examples are: change in materials properties at high or low temperatures, time-
dependent strain due to diffusion, creep at elevated temperatures, occurrence and
movement of dislocations caused by an elevated stress, or a situation when a bimodal
distribution of failures (a dual mechanism of failure) occurs

  Since, particularly, infant mortality (“early”) failures might occur concurrently with the
anticipated (“operational”) failures, it is imperative to make sure that the “early” and
“operational” failures are well separated in the tests

   Different failure mechanisms are characterized by different physical phenomena and
different activation energies, and therefore a simple superposition of the effects of two
mechanisms is unacceptable: it can result in erroneous reliability projections.

Dr. E. Suhir                                                                           Page 27
Burn-in testing (BIT) is a special type of FOAT-1
   Burn-in (“screening”) testing (BIT) is widely implemented to detect and eliminate infant
mortality failures. BIT could be viewed as a special type of manufacturing FOAT (MFOAT).
BIT is needed to stabilize the performance of the device in use

  BIT is supposed to stimulate failures in defective devices by accelerating the stresses
that will cause these devices to fail without damaging good items. The bathtub curve of a
device that undergone BIT is supposed to consist of a steady state and wear-out portions
only.

  The rationale behind the BIT is based on a concept that mass production of electronic
devices generates two categories of products that passed QT:
  1) robust (“strong”) components that are not expected to fail in the field and
  2) relatively unreliable (“week”) components (“freaks”) that, if shipped to the customer,
will most likely fail in the field

  BIT can be based on high temperatures, thermal cycling, voltage, current density, high
humidity, etc., and is performed by either manufacturer or by an independent test house.
Dr. E. Suhir                                                                         Page 28
Burn-ins – special type of FOAT-2

   For products that will be shipped out to the customer, BIT is nondestructive

  BIT is a costly process, and therefore its application must be thoroughly monitored. BIT
is mandatory on most high-reliability procurement contracts, such as defense, space, and
telecommunication systems. In the today’s practice BIT is often used for consumer
products as well. For military applications the BIT can last as long as a week (168 hours).
For commercial applications burn-ins typically do not last longer than two days (48 hours)

  Optimum BIT conditions can be established by assessment of the main expected failure
modes and their activation energies, and from the analysis of the failure statistics during
BIT

  Special investigations are usually required, if one wishes to ensure that cost-effective
BIT of smaller quantities is acceptable. A cost-effective simplification can be achieved, if
BIT is applied to the complete equipment (assembly or subassembly), rather than to an
individual component, unless it is a large system fabricated of several separately testable
assemblies.
Dr. E. Suhir                                                                          Page 29
Burn-ins – special type of FOAT-3

   Although there is always a possibility that some defects might escape the BIT, it is more
likely that BIT will introduce some damage to the “healthy” structure and/or might
“consume” a certain portion of the useful service life of the product: BIT not only “fights”
the infant mortality, but accelerates the very degradation process that takes place in the
actual operation conditions, unless the defectives have a much shorter lifetime than the
healthy products and have a more narrow (more “deterministic”, more “delta-like”)
probability-of-failure distribution density

  Some BIT (e.g., high electric fields for dielectric breakdown screening, mechanical
stresses below the fatigue limit) are harmless to the materials and structures under test,
and do not lead to an appreciable “consumption” of the useful lifetime (field life loss).
Others, although do not trigger any new failure mechanisms, might consume some small
portions of the device lifetime.




Dr. E. Suhir                                                                          Page 30
Burn-ins – special type of FOAT-4

  When planning, conducting and evaluating the BIT results, one should make sure that
the stress applied by the BIT is high enough to weed out infant mortalities, but is low
enough not to consume a significant portion of the product’s lifetime, nor to introduce a
permanent damage

  A natural concern, associated with the BIT, is that there is always a jeopardy that BIT
might trigger some failure mechanisms that would not be possible in the actual use
conditions and/or might affect the components that should not be viewed as defective
ones.

   In lasers, the “steady-state” portion is, in effect, not a horizontal, but a slowly rising
curve. In addition, wear-out failures, which are characterized by the time-dependent failure
rate, occupy a significant portion of the failure-rate (bath-tub) diagram. Standard
production BIT should be combined for laser devices with the long-term life testing.



Dr. E. Suhir                                                                           Page 31
Wear-out failures



      For a well-designed and adequately manufactured product, the were-out failures
    should occur at the late stages of operation and testing.

      If one observes that it is not the case (the steady-state portion of the “bathtub”
    curve is not long enough or does not exist at all), one should revisit the design and to
    choose different materials and/or different design solutions, and/or a different (more
    consistent) manufacturing process, etc.

       In some electronics materials (such as BGA and PGA systems) and in some
    photonics products (e.g., lasers) the wear-out part of the bathtub curve can occupy a
    significant portion of the product’s lifetime, and should be carefully analyzed.




Dr. E. Suhir                                                                            Page 32
What one should/could possibly do
                         to prevent failures-1
       Develop an in-depth understanding of the physics of possible failures. No failure
    statistics, nor the most effective ways to accommodate failures (such as redundancy,
    trouble-shooting, diagnostics, prognostication, health monitoring, maintenance), can
    replace good understanding of the physics of failure and good (robust) physical
    design

      Assess the likelihood (the probability) that the anticipated modes and mechanisms
    might occur in service conditions and minimize the likelihood of a failure by selecting
    the best materials and the best physical design of your design/product

      Understand and distinguish between different aspects of reliability: operational
    (functional) performance, structural/mechanical reliability (caused by mechanical
    loading) and environmental durability (caused by harsh environmental conditions).




Dr. E. Suhir                                                                           Page 33
What one should/could possibly do
                         to prevent failures-2

      Distinguish between the materials and structural reliability and assess the effect of
    the mechanical and environmental behavior of the materials and structures in his/her
    design on the functional performance of the product

      Understand the difference between the requirements of the qualification
    specifications and standards, and the actual operation conditions. In other words,
    understand well the QT conditions and design the product not only that it would be
    able to withstand the operation conditions on the short- and long-term basis, but also
    to pass the QT

     Understand the role and importance of FOAT and conduct PM whenever and
    wherever possible.



Dr. E. Suhir                                                                           Page 34
Session II

  4. Predictive Modeling (PM): FOAT cannot do without it


               “The probability of anything happening is in inverse ratio to its desirability”
                                                John W. Hazard, American attorney-at-law

                              “Any equation longer than three inches is most likely wrong”
                                                         Unknown Experimental Physicist




Dr. E. Suhir                                                                              Page 35
FOAT cannot do without predictive modeling (PM)

       FOAT cannot do without simple and meaningful predictive models. It is on the
    basis of such models that one decides which parameter should be accelerated, how
    to process the experimental data and, most importantly, how to bridge the gap
    between what one “sees” as a result of the accelerated testing and what he/she will
    possibly “get” in the actual operation conditions

      By considering the fundamental physics that might constrain the final design, PM
    can result in significant savings of time and expense and shed additional light on the
    physics of failure

      PM can be very helpful to predict reliability at conditions other than the FOAT and
    can provide important information about the device performance

      Modeling can be helpful in optimizing the performance and lifetime of the device, as
    well as to come up with the best compromise between reliability, cost effectiveness
    and time-to-market .

Dr. E. Suhir                                                                          Page 36
Requirements for a good predictive model
  A good FOAT PM does not need to reflect all the possible situations, but should be
simple, should clearly indicate what affects what in the given phenomenon or structure,
be suitable/flexible for new applications, with new environmental conditions and
technology developments, as well as for the accumulation, on its basis, the reliability
statistics.

   The scope of the model depends on the type and the amount of information available.

   A FOAT PM does not have to be comprehensive, but has to be sufficiently generic, and
should include all the major variables affecting the phenomenon (failure mode) of interest.
It should contain all the most important parameters that are needed to describe and to
characterize the phenomenon of interest, while parameters of the second order of
importance should not be included into the model.

  FOAT PM take inputs from various theoretical analyses, test data, field data, customer
requirements, qualification spec requirements, state-of-the-art in the given field,
consequences of failure for the given failure mode, etc.
Dr. E. Suhir                                                                         Page 37
What the existing FOAT PMs predict

  Before one decides on a particular FOAT PM he/she should anticipates the predominant
failure mechanism in advance, and then applied the appropriate model

  The most widespread PMs identify the mean time-to-failure (MTTF) in steady-state-
conditions

  If one assumes a certain probability density function for the particular failure
mechanism, then, for a two-parametric distribution (like, e.g., the normal one) he/she
could construct this function based on the determined mean-time-to-failure and the
measured standard deviation (STD)

  For a single-parametric probability density distribution function, like an exponential one,
the knowledge of the MTTF is sufficient to determine the failure rate and to determine the
probability of failure for the given time in operation.



Dr. E. Suhir                                                                           Page 38
Most widespread predictive models (PMs)

      Power law (used when the PoF is unclear),
       Boltzmann-Arrhenius equation (used when there is a belief that the elevated temperature is
    the major cause of failure),
      Coffin-Manson equation (inverse power law; used particularly when there is a need to
    evaluate the low cycle fatigue life-time),
      Crack growth equations (used to assess the fracture toughness of brittle materials),
      Bueche-Zhurkov and Eyring equations (used to assess the MTTF when both the high
    temperature and stress are viewed as the major causes of failure),
      Peck equation (used to consider the role of the combined action of the elevated temperature
    and relative humidity)
      Black equation (used to consider the roles of the elevated temperature and current density),
      Miner-Palmgren rule (used to consider the role of fatigue when the yield stress is not
    exceeded),
      Creep rate equations,
      Weakest link model (used to evaluate the MTTF in extremely brittle materials with defects),
      Stress-strength interference model, which is, perhaps, the most flexible and well
    substantiated model.
Dr. E. Suhir                                                                                  Page 39
Example: Boltzmann-Arrhenius equation
Boltzmann-Arrhenius equation underlies many FOAT related concepts . The MTTF,
τ=tau, is proportional to an exponential function, in which the argument is a fraction,
where the activation energy, Ua, eV, is in the numerator, and the product of the
Boltzmann’s constant, k=8.6174×10-5eV/ºK, and the absolute temperature, T, is in the
                                 ×
denominator:
                                           Ua      
                              τ = τ 0 exp          
                                                  (
                                            k T −T*          )
The equation was first obtained by L. Boltzmann in the statistical theory of gases, and
then applied by the S. Arrhenius to describe the inversion of sucrose. Arrhenius paid
attention to the fact that the physical processes and the chemical reactions in solid
bodies are also enhanced by the absolute temperature

Boltzmann-Arrhenius equation is applicable, when the failure mechanisms are
attributed to a combination of physical and chemical processes. Since the rates of
many physical processes (such as, say, solid state diffusion, many semiconductor
degradation mechanisms) and chemical reactions (such as, say, battery life) are
temperature dependent, it is the temperature that is the acceleration parameter.

.
Boltzmann-Arrhenius Equation and the PDfR concept

      Boltzmann-Arrhenius equation addresses degradation processes and attributes
    degradation and possible failures cased by degradation to elevated temperatures and
    possibly to the elevated humidity as well, i.e., to the environmental factors.

      The failure rate for a system whose MTTF is given by the Boltzmann-Arrhenius
    equation can be found as
                                       1        Ua      
                                 λ = exp  −
                                    τ0      k (T − T * )
                                                         
       The probability of failure at the moment t of time can be found as

                                        P = 1 − e − λt
    This formula is known as exponential formula of reliability. If the probability of failure
    P is established for the given time t in operation, then the exponential formula of
    reliability can be used to determine the acceptable failure rate.
Dr. E. Suhir                                                                              Page 41
Coffin-Manson Equation (Inverse Power Law)-1
    Many electronic materials and especially solder joints fail primarily because of the
elevated mechanical stresses and deformations (strains). The numerous existing
empirical and semi-empirical methods and approached that address the low-cycle-fatigue
life-time of solders are, in one way or another, based on the pioneering work of Coffin and
Manson

  It has been established that materials that experience elevated stresses and strains
within the elastic range fail because of elevated stresses, whether steady-state or variable,
while the materials that experience high stresses exceeding yield stress fail primarily
because of the inelastic deformations. Such a behavior, known as low-cycle-fatigue
conditions, is typical for solder materials, including even lead-free solders whose yield
point might be substantially higher than that for tin-lead solders

  The original Coffin-Manson equation is just an inversed power law that is applicable to
highly compliant materials exhibiting significant plastic deformations prior to failure. The
inverse power law is used also in some other, physically quite different, applications, such
as MTTF in random vibration tests (Steinberg’s formula); aging in high-power lasers, etc.
Dr. E. Suhir                                                                           Page 42
Coffin-Manson Equation (Inverse Power Law)-2

  The studies carried out in the 1990-s addressed primarily flip-chip tin-lead solder joint
interconnections. The today’s studies address primarily the thermal fatigue life of ball-
grid-array (BGA) and pad-grid-array (PGA) systems and especially lead-free solder joints

  The thermally-induced stresses and strains in the flip-chip solder joints are caused by
the CTE mismatch of the chip and the package substrate materials, as well as by the
temperature gradients because of the difference in temperature between the “hot” chip
and the “cold” substrate. In BGA and PGA systems the stresses and strains are caused
by the mismatch of the package structure and the PCB (“system’s substrate”)

  The numerous suggested phenomenological semi-empirical models are based on the
prediction and improving the solder material fatigue caused by the accumulated cyclic
inelastic strain in the solder material. This strain is due to the temperature fluctuations
resulting from the changes in the ambient temperature (temperature cycling) and/or from
heat dissipation in the package (power cycling).

Dr. E. Suhir                                                                         Page 43
Coffin-Manson Equation (Inverse Power Law)-3

       The modified Coffin-Manson model

                                                                     U      
                                      f   = Af   −α
                                                      ∆T   −β
                                                                   −
                                                                exp         
                                                                             
                                                                    kTmax   
    can be used to model crack growth in solder and other metals due to temperature
    cycling. In the above formula, f is the number of cycles to failure, f is the cycling
    frequency, ∆T is the temperature range during a cycle, Tmax is the maximum
    temperature reached in each cycle, and k is Boltzmann’s constant. Typical values for
    the cycling frequency exponent α and the temperature range exponent β are around -
    1/3 and 2, respectively. Reduction in the cycling frequency reduces the number of
    cycles to failure. The activation energy U is around 1.25.

      In recent years a visco-plastic rate dependent constitutive model, known as Anand
    model, is often used in combination with the FEA simulation to predict the solder
    joint reliability. In Anand’s model (that includes one flow equation and three evolution
    equations) plasticity and creep phenomena are unified and described by the same set
    of flow and evolution relations.
Dr. E. Suhir                                                                            Page 44
Stress-strength (“interference”) model
      Fig.20. Stress-strength (“Interference”) models-1


                  Stress (Demand) and Strength (Capacity) Distributions




Dr. E. Suhir                                                              Page 45
                                                                          Page 20
5.



                       EXAMPLE OF A FOAT:
               Physics, Modeling, Experimentation, Prediction
                 “A theory without an experiment is dead. An experiment without a theory is blind”
                                                                    Unknown Reliability Engineer




Dr. E. Suhir                                                                                Page 46
Dr. E. Suhir   Page 47
Dr. E. Suhir   Page 48
Dr. E. Suhir   Page 49
Dr. E. Suhir   Page 50
Dr. E. Suhir   Page 51
Dr. E. Suhir   Page 52
Dr. E. Suhir   Page 53
Finite_Element Analysis
               (FEA) Data




Dr. E. Suhir                      Page 54
Predicted Stresses and Strains
                     in a Short Cylinder




Dr. E. Suhir                                    Page 55
Dr. E. Suhir   Page 56
Experimental bathtub curve for the solder joint
               interconnections in a flip-chip multichip module




Dr. E. Suhir                                                      Page 57
Probability of failure of the solder joint interconnections
                       vs. failure rate




Dr. E. Suhir                                           Page 58
Dr. E. Suhir   Page 59
Thank you
               for taking my course


                      © 2009




Dr. E. Suhir                          Page 118

Weitere ähnliche Inhalte

Was ist angesagt?

Maintenance management
Maintenance managementMaintenance management
Maintenance management
Anupam Kumar
 
Introdution to POF reliability methods
Introdution to POF reliability methodsIntrodution to POF reliability methods
Introdution to POF reliability methods
ASQ Reliability Division
 
A CASE STUDY ON PREVENTIVE MAINTENANCE
A CASE STUDY ON PREVENTIVE MAINTENANCEA CASE STUDY ON PREVENTIVE MAINTENANCE
A CASE STUDY ON PREVENTIVE MAINTENANCE
Zubair Ali ali
 

Was ist angesagt? (20)

Maintenance management
Maintenance managementMaintenance management
Maintenance management
 
Introdution to POF reliability methods
Introdution to POF reliability methodsIntrodution to POF reliability methods
Introdution to POF reliability methods
 
Fundamentals of reliability engineering and applications part1of3
Fundamentals of reliability engineering and applications part1of3Fundamentals of reliability engineering and applications part1of3
Fundamentals of reliability engineering and applications part1of3
 
Reliability
ReliabilityReliability
Reliability
 
A CASE STUDY ON PREVENTIVE MAINTENANCE
A CASE STUDY ON PREVENTIVE MAINTENANCEA CASE STUDY ON PREVENTIVE MAINTENANCE
A CASE STUDY ON PREVENTIVE MAINTENANCE
 
Maintenance
MaintenanceMaintenance
Maintenance
 
Reliability
ReliabilityReliability
Reliability
 
MAINTENANCE.ppt
MAINTENANCE.pptMAINTENANCE.ppt
MAINTENANCE.ppt
 
PPT ON DESIGN FAILURE MODE AND EFFECT ANALYSIS (DFMEA)
PPT ON DESIGN FAILURE MODE AND EFFECT ANALYSIS (DFMEA)PPT ON DESIGN FAILURE MODE AND EFFECT ANALYSIS (DFMEA)
PPT ON DESIGN FAILURE MODE AND EFFECT ANALYSIS (DFMEA)
 
Maintenance Management (presentation)
Maintenance Management (presentation)Maintenance Management (presentation)
Maintenance Management (presentation)
 
PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNINGPRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
 
AUTOMOTIVE SHOP SAFETY.pptx
AUTOMOTIVE SHOP SAFETY.pptxAUTOMOTIVE SHOP SAFETY.pptx
AUTOMOTIVE SHOP SAFETY.pptx
 
General Maintenance
General MaintenanceGeneral Maintenance
General Maintenance
 
Plant Maintenance
Plant MaintenancePlant Maintenance
Plant Maintenance
 
Reliability engineering ppt-Internship
Reliability engineering ppt-InternshipReliability engineering ppt-Internship
Reliability engineering ppt-Internship
 
Maintenance and Safety Engineering.pdf
Maintenance and Safety Engineering.pdfMaintenance and Safety Engineering.pdf
Maintenance and Safety Engineering.pdf
 
EQUIPMENT AND MACHINERY MAINTENANCE
EQUIPMENT AND MACHINERY MAINTENANCEEQUIPMENT AND MACHINERY MAINTENANCE
EQUIPMENT AND MACHINERY MAINTENANCE
 
Unit 1 - PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
Unit 1 - PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNINGUnit 1 - PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
Unit 1 - PRINCIPLES AND PRACTICES OF MAINTENANCE PLANNING
 
Planned Maintenance
Planned MaintenancePlanned Maintenance
Planned Maintenance
 
Machinery Maintenance
Machinery MaintenanceMachinery Maintenance
Machinery Maintenance
 

Andere mochten auch

Equipo 3 innovación tecnológica , internet ,códigos QR
Equipo 3   innovación tecnológica , internet ,códigos QREquipo 3   innovación tecnológica , internet ,códigos QR
Equipo 3 innovación tecnológica , internet ,códigos QR
oscarsilvaaula51
 
Igv e ipm 2010
Igv e ipm  2010Igv e ipm  2010
Igv e ipm 2010
COEECI
 
Cedula mafl940711 mmcrlc03.pdf subes
Cedula mafl940711 mmcrlc03.pdf subesCedula mafl940711 mmcrlc03.pdf subes
Cedula mafl940711 mmcrlc03.pdf subes
Rebeca Marin Flores
 
Millorem la redacció (1)
Millorem la redacció (1)Millorem la redacció (1)
Millorem la redacció (1)
Antònia Travé
 
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier   la cultura de los pueblos que habitan en las tierras del mar caribeCarpentier   la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Lapiscina
 
Opeb and prefunding
Opeb and prefundingOpeb and prefunding
Opeb and prefunding
taatla
 
Work management of sm es
Work management of sm esWork management of sm es
Work management of sm es
Simone Santos
 

Andere mochten auch (20)

Probabilistic design for reliability (pdfr) in electronics part2of2
Probabilistic design for reliability (pdfr) in electronics part2of2Probabilistic design for reliability (pdfr) in electronics part2of2
Probabilistic design for reliability (pdfr) in electronics part2of2
 
Quiz sobre el fútbol fran serrano
Quiz sobre el fútbol fran serranoQuiz sobre el fútbol fran serrano
Quiz sobre el fútbol fran serrano
 
Turismo y sustentabilidad ambiental en formosa.
Turismo y sustentabilidad ambiental en formosa.Turismo y sustentabilidad ambiental en formosa.
Turismo y sustentabilidad ambiental en formosa.
 
Intelligent Networks ATMS
Intelligent Networks ATMSIntelligent Networks ATMS
Intelligent Networks ATMS
 
Equipo 3 innovación tecnológica , internet ,códigos QR
Equipo 3   innovación tecnológica , internet ,códigos QREquipo 3   innovación tecnológica , internet ,códigos QR
Equipo 3 innovación tecnológica , internet ,códigos QR
 
Igv e ipm 2010
Igv e ipm  2010Igv e ipm  2010
Igv e ipm 2010
 
Cedula mafl940711 mmcrlc03.pdf subes
Cedula mafl940711 mmcrlc03.pdf subesCedula mafl940711 mmcrlc03.pdf subes
Cedula mafl940711 mmcrlc03.pdf subes
 
Millorem la redacció (1)
Millorem la redacció (1)Millorem la redacció (1)
Millorem la redacció (1)
 
La historia de internet . terminado
La historia de internet . terminadoLa historia de internet . terminado
La historia de internet . terminado
 
Fichero esp 4to
Fichero esp 4toFichero esp 4to
Fichero esp 4to
 
Neix Santana
Neix SantanaNeix Santana
Neix Santana
 
Cruz roja y proyecto alter
Cruz roja y proyecto alterCruz roja y proyecto alter
Cruz roja y proyecto alter
 
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier   la cultura de los pueblos que habitan en las tierras del mar caribeCarpentier   la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
 
Manual peal
Manual pealManual peal
Manual peal
 
Sesión Nº 3 Guía de Plan marketing-online para pymes
Sesión Nº 3 Guía de Plan marketing-online para pymesSesión Nº 3 Guía de Plan marketing-online para pymes
Sesión Nº 3 Guía de Plan marketing-online para pymes
 
LA CRÓNICA 614
LA CRÓNICA 614LA CRÓNICA 614
LA CRÓNICA 614
 
Opeb and prefunding
Opeb and prefundingOpeb and prefunding
Opeb and prefunding
 
Portafolioeducativo Alma Navarro1
Portafolioeducativo Alma Navarro1Portafolioeducativo Alma Navarro1
Portafolioeducativo Alma Navarro1
 
Work management of sm es
Work management of sm esWork management of sm es
Work management of sm es
 
Reggeaton
ReggeatonReggeaton
Reggeaton
 

Ähnlich wie Probabilistic design for reliability (pdfr) in electronics part1of2

Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing Scheme
Editor IJMTER
 
Non Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdfNon Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdf
JeevaPadmini
 
Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986
Editor IJARCET
 
Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986
Editor IJARCET
 
SMRP 24th Conf Paper - Vextec -J Carter
SMRP 24th Conf Paper - Vextec -J CarterSMRP 24th Conf Paper - Vextec -J Carter
SMRP 24th Conf Paper - Vextec -J Carter
jcarter1972
 

Ähnlich wie Probabilistic design for reliability (pdfr) in electronics part1of2 (20)

Computational Modeling & Simulation in Orthopedics: Tools to Comply in an Ev...
Computational Modeling & Simulation in Orthopedics:  Tools to Comply in an Ev...Computational Modeling & Simulation in Orthopedics:  Tools to Comply in an Ev...
Computational Modeling & Simulation in Orthopedics: Tools to Comply in an Ev...
 
Accelerated reliability and durability testing technology flyer
Accelerated reliability and durability testing technology flyerAccelerated reliability and durability testing technology flyer
Accelerated reliability and durability testing technology flyer
 
Safety
SafetySafety
Safety
 
Practical reliability
Practical reliabilityPractical reliability
Practical reliability
 
U130402132138
U130402132138U130402132138
U130402132138
 
Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing Scheme
 
Managing your OnStream Inspection Program and External vs Internal inspections
Managing your OnStream Inspection Program and External vs Internal inspectionsManaging your OnStream Inspection Program and External vs Internal inspections
Managing your OnStream Inspection Program and External vs Internal inspections
 
ATI Courses Professional Development Short Course Spacecraft Quality Assuranc...
ATI Courses Professional Development Short Course Spacecraft Quality Assuranc...ATI Courses Professional Development Short Course Spacecraft Quality Assuranc...
ATI Courses Professional Development Short Course Spacecraft Quality Assuranc...
 
Non Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdfNon Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdf
 
Rbi final report
Rbi final reportRbi final report
Rbi final report
 
Basics of Reliability Engineering
Basics of Reliability EngineeringBasics of Reliability Engineering
Basics of Reliability Engineering
 
Understanding IEC 62304
Understanding IEC 62304Understanding IEC 62304
Understanding IEC 62304
 
Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986
 
Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986
 
SMRP 24th Conf Paper - Vextec -J Carter
SMRP 24th Conf Paper - Vextec -J CarterSMRP 24th Conf Paper - Vextec -J Carter
SMRP 24th Conf Paper - Vextec -J Carter
 
FMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The UglyFMEA: The Good, The Bad, and The Ugly
FMEA: The Good, The Bad, and The Ugly
 
Carolien Creemers & Maarten Bressinck Talk at UX Antwerp Meetup
Carolien Creemers & Maarten Bressinck Talk at UX Antwerp MeetupCarolien Creemers & Maarten Bressinck Talk at UX Antwerp Meetup
Carolien Creemers & Maarten Bressinck Talk at UX Antwerp Meetup
 
IRJET- A Study on Software Reliability Models
IRJET-  	  A Study on Software Reliability ModelsIRJET-  	  A Study on Software Reliability Models
IRJET- A Study on Software Reliability Models
 
Jamshed alam seinar report copy
Jamshed alam seinar report copyJamshed alam seinar report copy
Jamshed alam seinar report copy
 
Reliability Engineering and Terotechnology
Reliability Engineering and TerotechnologyReliability Engineering and Terotechnology
Reliability Engineering and Terotechnology
 

Mehr von ASQ Reliability Division

Root Cause Analysis: Think Again! - by Kevin Stewart
Root Cause Analysis: Think Again! - by Kevin StewartRoot Cause Analysis: Think Again! - by Kevin Stewart
Root Cause Analysis: Think Again! - by Kevin Stewart
ASQ Reliability Division
 
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
ASQ Reliability Division
 
Efficient Reliability Demonstration Tests - by Guangbin Yang
Efficient Reliability Demonstration Tests - by Guangbin YangEfficient Reliability Demonstration Tests - by Guangbin Yang
Efficient Reliability Demonstration Tests - by Guangbin Yang
ASQ Reliability Division
 
Reliability Modeling Using Degradation Data - by Harry Guo
Reliability Modeling Using Degradation Data - by Harry GuoReliability Modeling Using Degradation Data - by Harry Guo
Reliability Modeling Using Degradation Data - by Harry Guo
ASQ Reliability Division
 
Reliability Division Webinar Series - Innovation: Quality for Tomorrow
Reliability Division Webinar Series -  Innovation: Quality for TomorrowReliability Division Webinar Series -  Innovation: Quality for Tomorrow
Reliability Division Webinar Series - Innovation: Quality for Tomorrow
ASQ Reliability Division
 

Mehr von ASQ Reliability Division (20)

On Duty Cycle Concept in Reliability
On Duty Cycle Concept in ReliabilityOn Duty Cycle Concept in Reliability
On Duty Cycle Concept in Reliability
 
A Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTFA Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTF
 
Thermodynamic Reliability
Thermodynamic  ReliabilityThermodynamic  Reliability
Thermodynamic Reliability
 
Root Cause Analysis: Think Again! - by Kevin Stewart
Root Cause Analysis: Think Again! - by Kevin StewartRoot Cause Analysis: Think Again! - by Kevin Stewart
Root Cause Analysis: Think Again! - by Kevin Stewart
 
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huai...
 
Efficient Reliability Demonstration Tests - by Guangbin Yang
Efficient Reliability Demonstration Tests - by Guangbin YangEfficient Reliability Demonstration Tests - by Guangbin Yang
Efficient Reliability Demonstration Tests - by Guangbin Yang
 
Reliability Modeling Using Degradation Data - by Harry Guo
Reliability Modeling Using Degradation Data - by Harry GuoReliability Modeling Using Degradation Data - by Harry Guo
Reliability Modeling Using Degradation Data - by Harry Guo
 
Reliability Division Webinar Series - Innovation: Quality for Tomorrow
Reliability Division Webinar Series -  Innovation: Quality for TomorrowReliability Division Webinar Series -  Innovation: Quality for Tomorrow
Reliability Division Webinar Series - Innovation: Quality for Tomorrow
 
Impact of censored data on reliability analysis
Impact of censored data on reliability analysisImpact of censored data on reliability analysis
Impact of censored data on reliability analysis
 
An introduction to weibull analysis
An introduction to weibull analysisAn introduction to weibull analysis
An introduction to weibull analysis
 
A multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modesA multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modes
 
Reliably Solving Intractable Problems
Reliably Solving Intractable ProblemsReliably Solving Intractable Problems
Reliably Solving Intractable Problems
 
Reliably producing breakthroughs
Reliably producing breakthroughsReliably producing breakthroughs
Reliably producing breakthroughs
 
ASQ RD Webinar: Design for reliability a roadmap for design robustness
ASQ RD Webinar: Design for reliability   a roadmap for design robustnessASQ RD Webinar: Design for reliability   a roadmap for design robustness
ASQ RD Webinar: Design for reliability a roadmap for design robustness
 
ASQ RD Webinar: Improved QFN Reliability Process
ASQ RD Webinar: Improved QFN Reliability Process ASQ RD Webinar: Improved QFN Reliability Process
ASQ RD Webinar: Improved QFN Reliability Process
 
Data Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability ImprovementData Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability Improvement
 
A Novel View of Applying FMECA to Software Engineering
A Novel View of Applying FMECA to Software EngineeringA Novel View of Applying FMECA to Software Engineering
A Novel View of Applying FMECA to Software Engineering
 
Astr2013 tutorial by mike silverman of ops a la carte 40 years of halt, wha...
Astr2013 tutorial by mike silverman of ops a la carte   40 years of halt, wha...Astr2013 tutorial by mike silverman of ops a la carte   40 years of halt, wha...
Astr2013 tutorial by mike silverman of ops a la carte 40 years of halt, wha...
 
Comparing Individual Reliability to Population Reliability for Aging Systems
Comparing Individual Reliability to Population Reliability for Aging SystemsComparing Individual Reliability to Population Reliability for Aging Systems
Comparing Individual Reliability to Population Reliability for Aging Systems
 
2013 asq field data analysis & statistical warranty forecasting
2013 asq field data analysis & statistical warranty forecasting2013 asq field data analysis & statistical warranty forecasting
2013 asq field data analysis & statistical warranty forecasting
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Probabilistic design for reliability (pdfr) in electronics part1of2

  • 1. Probabilistic Design for  Reliability (PDfR) in  Electronics El i Part I Part I Dr. E. Suhir ©2011 ASQ & Presentation Suhir Presented live on Jan 03~06th, 2011 http://reliabilitycalendar.org/The_Re liability_Calendar/Short_Courses/Sh liability Calendar/Short Courses/Sh ort_Courses.html
  • 2. ASQ Reliability Division  ASQ Reliability Division Short Course Series Short Course Series One of the monthly webinars  One of the monthly webinars on topics of interest to  reliability engineers. To view recorded webinar (available to ASQ Reliability  Division members only) visit asq.org/reliability ) / To sign up for the free and available to anyone live  webinars visit reliabilitycalendar.org and select English  Webinars to find links to register for upcoming events http://reliabilitycalendar.org/The_Re liability_Calendar/Short_Courses/Sh liability Calendar/Short Courses/Sh ort_Courses.html
  • 3. PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT, the Roles of Failure Oriented Accelerated Testing (FOAT) and Predictive Modeling (PM), and a Novel Approach to Qualification Testing (QT) “You can see a lot by observing” Yogi Berra, American Baseball Player “It is easy to see, it is hard to foresee” Benjamin Franklin, American Scientist and Statesman E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret), University of California, Dept. of Electrical Engineering, Santa Cruz, CA, University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA Tel. 650-969-1530, cell. 408-410-0886, e-mail: suhire@aol.com Four hour ASQ-IEEE RS Webinar short course Dr. E. Suhir January 3-6, 2011 Page 1
  • 4. Contents Session I 1. Introduction: background, motivation, incentive 2. Reliability engineering as part of applied probability and Probabilistic Risk Management (PRM) bodies of knowledge 3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls and interaction with other accelerated test categories Session II 4. Predictive Modeling (PM): FOAT cannot do without it 5. Example of a FOAT: physics, modeling, experimentation, prediction Session III 6. Probabilistic Design for Reliability (PDfR), its role and significance Session IV 7. General PDfR approach using probability density functions (pdf) 8. Twelve steps to be conducted to add value to the existing practice 9. Do electronic industries need new approaches to qualify their devices into products? 10. Concluding remarks Dr. E. Suhir Page 2
  • 5. Session I 1. Introduction: background, motivation, incentive “Vision without action is a daydream. Action without vision is a nightmare” Japanese saying “The problem is not that old age comes. The problem is that young age passes” Common Wisdom Dr. E. Suhir Page 3
  • 6. Background The short-term down-to-earth and practical goal of a particular electronic or a photonic device manufacturer is to conduct and pass the established qualification tests, without questioning whether they are perfect or not The ultimate long-term and broad goal of electronic, opto-electronic and photonic industries, regardless of a particular manufacturer or even a particular product, is to make the industries deliverables sufficiently reliable in the field, be consistently good in performance, and so to elicit trust of the customer Qualification testing (QT), such as, e.g., those prescribed by the JEDEC, Telcordia, AEC or the MIL specs, is the major means that the electronic, opto-electronic and photonic industries use to make their viable-and-promising devices into reliable-and- marketable products. Dr. E. Suhir Page 4
  • 7. Motivation It is well known, however, that devices and systems that passed the existing qualification tests often fail in the field. Should it be this way? Is this a problem indeed? Are the existing qualification specifications adequate? Do electronic and photonic industries need new approaches to qualify their devices into products? If they do, could the today’s qualification specifications and testing procedures be improved to an extent that if the device passed these tests, its performance in the field would be satisfactory? On the other hand, there is a perception, perhaps, a rather substantiated one, that some electronic components “never fail”. Although one should never say “never”, such a perception exists because some products might be too robust and, as the consequence of that, are more costly than necessary. Could the situation be changed and could the cost be brought down considerably, if one would be able to assess the actual, most likely superfluous, probability of non-failure in the field and come up, for a particular product, with the best compromise between reliability, cost and time- to-market? Would it be possible to “prescribe” (specify), predict and, if necessary, even control the low enough probability of failure for a product that operates under the given stress (not necessarily mechanical, of course) conditions for the given time? Dr. E. Suhir Page 5
  • 8. Incentive We argue that the improvements in the QT, as well as in the existing best practices, are indeed possible, provided that the Probabilistic Design for Reliability (PDfR) concept is thoroughly developed and the corresponding methodologies are employed One effective way to improve the existing QT and specs is to conduct, on a wide scale, the appropriate Failure Oriented Accelerated Testing (FOAT) at both the design stage (DFOAT) and the manufacturing stage (MFOAT), and, since DFOAT cannot do without predictive modeling (PM), carry out, whenever and wherever possible, PM to understand the physics of failure, and to predict, based on the DFOAT, the probability of failure in the field, revisit, review and revise, considering the DFOAT and, to a lesser extent, MFOAT data obtained for the most vulnerable elements of the device of interest, the existing QT practices, procedures, and specifications, develop and widely implement the PDfR methodologies and algorithms having in mind that “nobody and nothing is perfect” and that probability of failure in the field is never zero, but could be predicted and, if necessary, minimized, controlled and maintained at an acceptable low level during product operation. Dr. E. Suhir Page 6
  • 9. 2. Reliability engineering as part of applied probability and Probabilistic Risk Management (PRM) bodies of knowledge “A pinch of probability is worth a pound of perhaps” James G. Thurber, American writer and cartoonist “In a long run we are all dead” John Maynard Keynes, British economist Dr. E. Suhir Page 7
  • 10. Reliability engineering deals with failure modes and mechanisms, “root” causes of occurrence of various failures, role of various defects, methods to estimate and prevent failures, and probability-based designs for reliability; provides guidance on how to make a viable device into a reliable and marketable product; in products, for which a certain level of failures is considered acceptable (such as, e.g., consumer products), examines ways of bringing down the failure rate to an allowable level; for products, for which a failure is a catastrophe, examines and considers ways of making the probability of failure as low as necessary or possible. Dr. E. Suhir Page 8
  • 11. Reliability engineering as part of applied probability and probabilistic risk management (PRM) bodies of knowledge Reliability is part of applied probability and probabilistic risk management (PRM) bodies of knowledge, and includes the item's (system's) dependability, durability, maintainability, reparability, availability, testability, etc., i.e., probabilities of the corresponding events or characteristics Each of these characteristics is measured as a certain probability and could be of a greater or lesser importance depending on the particular function and operation conditions of the item or the system, and consequences of failure Applied probability and Probabilistic Risk Management (PRM) approaches and techniques put the art of Reliability Engineering on a solid “reliable” ground. Dr. E. Suhir Page 9
  • 12. “If a man will begin with certainties, he will end with doubts; but if he will be content to begin with doubts, he shall end in certainties.” Sir Francis Bacon, English Philosopher and Statesman “We see that the theory of probability is at heart only common sense reduced to calculations; it makes us appreciate with exactitude what reasonable minds feel by a sort of instincts, often without being able to account for it… The most important questions of life are, for the most part, really only problems of probability.” Pierre Simon, Marquise de Laplace “Mathematical formulas have their own life, they are smarter than we, even smarter than their authors, and provide more than what has been put into them” Heinrich Hertz, German Physicist E. Suhir
  • 13. Reliability should be taken care of on the permanent basis The reliability evaluation and assurance cannot be delayed until the device is made (although it is often the case in many actual industries). Reliability should be “conceived” at the early stages of its design (a reliability and an electronic engineers should start working together from the very beginning of the device/system development), implemented during manufacturing (through a high quality manufacturing process) qualified and evaluated by electrical, optical, environmental and mechanical testing both at the design and the manufacturing stages (the customer requirements and the general qualification requirements are to be considered), checked (screened) during production (by implementing an adequate burn-in process) and, if necessary and appropriate, maintained in the field during the product’s operation, especially at the early stages of the product’s use (by employing, e.g., technical diagnostics, prognostication and health monitoring methods and instrumentation). Dr. E. Suhir Page 11
  • 14. Three classes of engineering products from the reliability point of view See E.Suhir, Applied probability for engineers and scientists, McGraw-Hill, 1997 Class I. The product has to be made as reliable as possible. Failure should not be permitted. Examples are some military or space objects Class II. The product has to be made as reliable as possible, but only for a certain level of demand (stress, loading). Failure is a catastrophe. Examples are civil engineering structures, bridges, ships, aircraft, cars Class III. The reliability does not have to be very high. Failures are permitted, but should be restricted. Examples are consumer products, commercial electronics, agricultural equipment. Dr. E. Suhir Page 12
  • 15. Class I (military or similar) products The product (object) has to be made as reliable as possible. Failure is viewed as a catastrophe. Examples are some warfare, military aircraft, battle-ships, spacecraft Cost is not a dominating factor The products usually have a single customer, such as the government or a big firm The reliability requirements are defined in the form of government standards The standards not only formulate the reliability requirements for the product, but also specify the methods that are to be used to prove (demonstrate) the reliability, and often even prescribe how the system must be manufactured, tested and screened It is typically the customer, not the manufacturer, who sets the reliability standards. Dr. E. Suhir Page 13
  • 16. Class II (industrial or similar) products The product (system, structure) has to be made as reliable as possible, but only for a certain specified level of loading (demand). If the actual load (waves, winds, earthquakes, etc.) happens to be larger than the design demand, then the product might fail, although the probability of such a failure should be determined beforehand and should/could be (made) very small Examples are: long-haul communication systems, civil engineering structures (bridges, tunnels, towers), passenger elevators, ocean-going vessels, offshore structures, commercial aircraft, railroad carriages, cars, some medical equipment These are highly expensive products, which are produced in large quantities, and therefore application of Class I requirements will lead to unjustifiable, unfeasible and unacceptable expenses. Failure is a catastrophe and might be associated with loss of human lives and with significant economic losses The products are typically intended for industrial, rather than government, markets. These markets are characterized by rather high volume of production (buildings, bridges, ships, aircraft, automobiles, telecommunication networks, etc.), but also by fewer and more sophisticated customers than in the commercial (Class III) market. Dr. E. Suhir Page 14
  • 17. Class III (consumer, commercial) products The typical market is the consumer market. An individual consumer is a very small part of the total consumer base. The product is inexpensive and manufactured in mass quantities The demand for the product is usually driven by the cost of the product and time-to-market, rather than by its reliability. As long as the product is “sellable”, its reliability does not have to be very high: it should only be adequate for customer acceptance and reasonable satisfaction. Simple and innovative products, which have a high degree of customer appeal and are in significant demand, may be able to prosper, at least for some time, even if they are not very reliable Failure is not a catastrophe: a certain reasonable level of failures during normal operation of the product is acceptable, as long as the failure rate is within the anticipated/expected range Reliability testing is limited, and the improvements are often implemented based on the field feedback It is typically the manufacturer, not the consumer, who sets the reliability standards, if any, for the product . No special reliability standards are often followed, and it is the customer satisfaction (on the statistical basis), which is the major criterion of the viability and quality of the product. Dr. E. Suhir Page 15
  • 18. Reliability, cost-effectiveness, and time-to-market Reliability, cost effectiveness and time-to-market considerations play an important role in the design, materials selection and manufacturing decisions, and are the key issues in competing in the global market-place. A company cannot be successful, if its products are not cost effective, or do not have a worthwhile lifetime and service reliability to match the expectations of the customer. Too low a reliability can lead to a total loss of business Product failures have an immediate, and often dramatic, effect on the profitability and even the very existence of a company. Profits decrease as the failure rate increases. This is due not only to the increase in the cost of replacing or repairing parts, but, more importantly, to the losses due to the interruption in service, not to mention the “moral losses”. These make obvious dents in the company’s reputation and, as the consequence of that, affect its sails The time to develop and to produce products is rapidly decreasing. This circumstance places a significant pressure on both business people and reliability engineers, who are supposed to come up with a reliable product and to confirm its long-term reliability in a short period of time to make their device a product and to make this product successful in the marketplace Each business, whether small or large, should try to optimize its overall approach to reliability. “Reliability costs money”, and therefore a business must understand the cost of reliability, both “direct” cost (the cost of its own operations), and the “indirect” cost (the cost to its customers and their willingness to make future purchases and to pay more for more reliable products). Dr. E. Suhir Page 16
  • 19. 3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls and interaction with other accelerated test categories “Nothing is impossible. It is often merely for an excuse that we say that things are impossible” Francois de La Rochefoucauld, French philosofer “Truth is really pure and never simple” Oscar Wilde, British writer, “The Importance of Being Earnest” Dr. E. Suhir Page 17
  • 20. Why accelerated tests? It is impractical and uneconomical to wait for failures, when the mean-time-to-failure for a typical today’s electronic device (equipment) is on the order of hundreds of thousands of hours Accelerated testing (AT) enables one to gain greater control over the reliability of a product AT has become a powerful means in improving reliability. This is true regardless of whether (irreversible or reversible) failures will or will not actually occur during the FOAT (“testing to fail”) or QT (“testing to pass”) In order to accelerate the material’s (device’s) degradation and/or failure, one has to deliberately “distort” (“skew”) one or more parameters (temperature, humidity, load, current, voltage, etc.) affecting the device functional and/or mechanical performance and/or its environmental durability. Dr. E. Suhir Page 18
  • 21. Accelerated test categories: traditional definitions Accelerated Product development Qualification (“screening”) Accelerated life tests test type (verification) tests tests (QTs) (ALTs), highly accelerated (category) (PDTs) life tests (HALTs), and failure oriented accelerated tests (FOATs) Technical feedback to Proof of reliability; Understand modes and Objective ensure that the taken demonstration that the mechanisms of failure and , design approach is product is qualified to time permitting, accumulate viable (acceptable) serve in the given capacity failure statistics Time, type, level, and/or Predetermined time and/or End point number of failures the # of cycles, and/or the Predetermined number or excessive (unexpected) percent of failures number of failures Follow-up Failure analysis, design Pass/fail decision Failure analysis and , time activity decision permitting, statistical analysis of the test data Perfect (ideal) Specific definition(s) No failure in a long time Numerous failures in a test Suhir Dr. E. short time Page 19
  • 22. Accelerated test categories: updated definitions Accelerated Product Qualification Accelerated Life Testing (ALT)= test type development (“screening”) testing =Failure Oriented Accelerated Testing (FOAT) (category) (verification) (QT) testing (PDT) at the at the at the design stage (DFOAT) At the manufacturing stage design manufacturi (MFOAT)= Hobbs’ Highly ALT stage (DQT) ng stage (HHALT)= Accelerated burn-in (MQT) Objective Technical Proof of reliability; Understand physics (modes and Assess failure limits, feedback to demonstration that the item mechanisms) of failure, failure limits, Weed out infant mortalities ensure that the is qualified into a product, and, time permitting, accumulate taken design i.e., is able to serve in the failure statistics approach is viable given capacity (acceptable) End point Time, type, level, Predermined time and/or Predetermined number or percent of failures and/or number of number of cycles, and/or failures the excessive (unexpected) number of failures Follow-up Failure analysis, Pass/fail decision Failure analysis and, time permitting, Pass/fail decision activity Design decision also statistical analysis of the test data Perfect (ideal) Specific No failures in a long time Numerous failures in a short time No failures in a long time test definitions
  • 23. Some most common accelerated test conditions (stimuli) High Temperature (Steady-State) Soaking/Storage/ Baking/Aging/ Dwell, Low Temperature Storage, Temperature (Thermal) Cycling, Power Cycling, Power Input and Output, Thermal Shock, Thermal Gradients, Fatigue (Crack Propagation) Tests, Mechanical Shock, Drop Shock (Tests), Random Vibration Tests, Sinusoidal Vibration Tests (with the given or variable frequency), Creep/Stress-Relaxation Tests, Electrical Current Extremes, Voltage Extremes, High Humidity, Radiation (UV, cosmic, X-rays), Altitude, Space Vacuum
  • 24. Elevated Stress AT uses elevated stress level and/or higher stress-cycle frequency as effective stimuli to precipitate failures over a much shorter time frame The “stress” in reliability engineering does not necessarily have to be a mechanical or a thermo-mechanical: it could be electrical current or voltage, high (or low) temperature, high humidity, high frequency, high pressure or vacuum, cycling rate, or any other factor (stimulus) responsible for the reliability of the device or the equipment AT must be specifically designed for the product under test The experimental design of AT should consider the anticipated failure modes and mechanisms, typical use conditions, and the required or available test resources, approaches and techniques.
  • 25. Qualification Testing (QT) is a must The objective of the qualification testing (QT) is to prove that the reliability of the product-under-test is above a specified level. This level is usually measured by the percentage of failures per lot and/or by the number of failures per unit time (failure rate) The typical requirement is no more than a few percent failing parts out of the total lot (population) QT enables one to “reduce to a common denominator” different products, as well as similar products, but produced by different manufacturers QT reflects the state-of-the-art in a particular field of engineering, and typical requirements for the performance of the product. Industry cannot do without QT Testing is time limited and is generally non-destructive (not failure oriented). Dr. E. Suhir Page 23
  • 26. Today’s Qualification Testing (QT): shortcomings The today’s qualification standards and requirements are only good for what they are intended - to confirm that the given device is qualified into a product to serve in a particular capacity If a product passed the standardized qualification tests, it is not always clear why it was good, and if the product failed the tests, it is equally unclear what could be done to improve its reliability If a product passed the qualification tests, it does not mean that there will be no failures in the field, nor it is clear how likely or unlikely these failures might be Since QT is not failure oriented, it is unable to provide the most important ultimate information about the reliability of the product - the probability of its failure after the given time in service and under the given service (operation, stress) conditions. Dr. E. Suhir Page 24
  • 27. Failure Oriented Accelerated Testing (FOAT)-1 FOAT is aimed at the revealing and understanding the physics of the expected or occurred failures. Unlike QTs, FOAT is able to detect the possible failure modes and mechanisms Another objective of the FOAT is to accumulate failure statistics. Thus, FOAT deals with the two major aspects of the Reliability Engineering – physics and statistics of failure Adequately planned, carefully conducted, and properly interpreted FOAT provides a consistent basis for the prediction of the probability of failure after the given time in service. Well-designed and thoroughly implemented FOAT can dramatically facilitate the solutions to many engineering and business-related problems, associated with the cost effectiveness and time-to-market This information can be helpful in understanding what should be changed to design a viable and reliable product. Indeed, any structural, materials and/or technological improvement can be “translated”, using the FOAT data, into the probability of failure for the given duration of operation under the given service (environmental) conditions. Dr. E. Suhir Page 25
  • 28. Failure Oriented Accelerated Testing (FOAT)-2 FOAT should be conducted in addition to, and, preferably, long before the qualification tests. There might be also situations, when FOAT can be used as an effective substitution for the QT, especially for new products, when acceptable qualification standards do not yet exist While it is the QT that makes a device into a product, it is the FOAT that enables one to understand the reliability physics behind the product and, based on the appropriate PM, to create a reliable product with the predicted probability of failure There is always a temptation to broaden (enhance) the stress as far as possible to achieve the maximum “destructive effect” (FOAT effect) in a shortest period of time. Unfortunately, sometimes, accelerated test conditions may hasten failure mechanisms that are different from those that could be actually observed in service conditions (“shift” in the modes and/or mechanisms of failure) Dr. E. Suhir Page 26
  • 29. FOAT pitfalls Because of the existence of such “shifts”, it is always necessary to correctly identify the expected failure modes and mechanisms, and to establish the appropriate stress limits, in order to prevent “shifts” in the original (actual) dominant failure mechanism Examples are: change in materials properties at high or low temperatures, time- dependent strain due to diffusion, creep at elevated temperatures, occurrence and movement of dislocations caused by an elevated stress, or a situation when a bimodal distribution of failures (a dual mechanism of failure) occurs Since, particularly, infant mortality (“early”) failures might occur concurrently with the anticipated (“operational”) failures, it is imperative to make sure that the “early” and “operational” failures are well separated in the tests Different failure mechanisms are characterized by different physical phenomena and different activation energies, and therefore a simple superposition of the effects of two mechanisms is unacceptable: it can result in erroneous reliability projections. Dr. E. Suhir Page 27
  • 30. Burn-in testing (BIT) is a special type of FOAT-1 Burn-in (“screening”) testing (BIT) is widely implemented to detect and eliminate infant mortality failures. BIT could be viewed as a special type of manufacturing FOAT (MFOAT). BIT is needed to stabilize the performance of the device in use BIT is supposed to stimulate failures in defective devices by accelerating the stresses that will cause these devices to fail without damaging good items. The bathtub curve of a device that undergone BIT is supposed to consist of a steady state and wear-out portions only. The rationale behind the BIT is based on a concept that mass production of electronic devices generates two categories of products that passed QT: 1) robust (“strong”) components that are not expected to fail in the field and 2) relatively unreliable (“week”) components (“freaks”) that, if shipped to the customer, will most likely fail in the field BIT can be based on high temperatures, thermal cycling, voltage, current density, high humidity, etc., and is performed by either manufacturer or by an independent test house. Dr. E. Suhir Page 28
  • 31. Burn-ins – special type of FOAT-2 For products that will be shipped out to the customer, BIT is nondestructive BIT is a costly process, and therefore its application must be thoroughly monitored. BIT is mandatory on most high-reliability procurement contracts, such as defense, space, and telecommunication systems. In the today’s practice BIT is often used for consumer products as well. For military applications the BIT can last as long as a week (168 hours). For commercial applications burn-ins typically do not last longer than two days (48 hours) Optimum BIT conditions can be established by assessment of the main expected failure modes and their activation energies, and from the analysis of the failure statistics during BIT Special investigations are usually required, if one wishes to ensure that cost-effective BIT of smaller quantities is acceptable. A cost-effective simplification can be achieved, if BIT is applied to the complete equipment (assembly or subassembly), rather than to an individual component, unless it is a large system fabricated of several separately testable assemblies. Dr. E. Suhir Page 29
  • 32. Burn-ins – special type of FOAT-3 Although there is always a possibility that some defects might escape the BIT, it is more likely that BIT will introduce some damage to the “healthy” structure and/or might “consume” a certain portion of the useful service life of the product: BIT not only “fights” the infant mortality, but accelerates the very degradation process that takes place in the actual operation conditions, unless the defectives have a much shorter lifetime than the healthy products and have a more narrow (more “deterministic”, more “delta-like”) probability-of-failure distribution density Some BIT (e.g., high electric fields for dielectric breakdown screening, mechanical stresses below the fatigue limit) are harmless to the materials and structures under test, and do not lead to an appreciable “consumption” of the useful lifetime (field life loss). Others, although do not trigger any new failure mechanisms, might consume some small portions of the device lifetime. Dr. E. Suhir Page 30
  • 33. Burn-ins – special type of FOAT-4 When planning, conducting and evaluating the BIT results, one should make sure that the stress applied by the BIT is high enough to weed out infant mortalities, but is low enough not to consume a significant portion of the product’s lifetime, nor to introduce a permanent damage A natural concern, associated with the BIT, is that there is always a jeopardy that BIT might trigger some failure mechanisms that would not be possible in the actual use conditions and/or might affect the components that should not be viewed as defective ones. In lasers, the “steady-state” portion is, in effect, not a horizontal, but a slowly rising curve. In addition, wear-out failures, which are characterized by the time-dependent failure rate, occupy a significant portion of the failure-rate (bath-tub) diagram. Standard production BIT should be combined for laser devices with the long-term life testing. Dr. E. Suhir Page 31
  • 34. Wear-out failures For a well-designed and adequately manufactured product, the were-out failures should occur at the late stages of operation and testing. If one observes that it is not the case (the steady-state portion of the “bathtub” curve is not long enough or does not exist at all), one should revisit the design and to choose different materials and/or different design solutions, and/or a different (more consistent) manufacturing process, etc. In some electronics materials (such as BGA and PGA systems) and in some photonics products (e.g., lasers) the wear-out part of the bathtub curve can occupy a significant portion of the product’s lifetime, and should be carefully analyzed. Dr. E. Suhir Page 32
  • 35. What one should/could possibly do to prevent failures-1 Develop an in-depth understanding of the physics of possible failures. No failure statistics, nor the most effective ways to accommodate failures (such as redundancy, trouble-shooting, diagnostics, prognostication, health monitoring, maintenance), can replace good understanding of the physics of failure and good (robust) physical design Assess the likelihood (the probability) that the anticipated modes and mechanisms might occur in service conditions and minimize the likelihood of a failure by selecting the best materials and the best physical design of your design/product Understand and distinguish between different aspects of reliability: operational (functional) performance, structural/mechanical reliability (caused by mechanical loading) and environmental durability (caused by harsh environmental conditions). Dr. E. Suhir Page 33
  • 36. What one should/could possibly do to prevent failures-2 Distinguish between the materials and structural reliability and assess the effect of the mechanical and environmental behavior of the materials and structures in his/her design on the functional performance of the product Understand the difference between the requirements of the qualification specifications and standards, and the actual operation conditions. In other words, understand well the QT conditions and design the product not only that it would be able to withstand the operation conditions on the short- and long-term basis, but also to pass the QT Understand the role and importance of FOAT and conduct PM whenever and wherever possible. Dr. E. Suhir Page 34
  • 37. Session II 4. Predictive Modeling (PM): FOAT cannot do without it “The probability of anything happening is in inverse ratio to its desirability” John W. Hazard, American attorney-at-law “Any equation longer than three inches is most likely wrong” Unknown Experimental Physicist Dr. E. Suhir Page 35
  • 38. FOAT cannot do without predictive modeling (PM) FOAT cannot do without simple and meaningful predictive models. It is on the basis of such models that one decides which parameter should be accelerated, how to process the experimental data and, most importantly, how to bridge the gap between what one “sees” as a result of the accelerated testing and what he/she will possibly “get” in the actual operation conditions By considering the fundamental physics that might constrain the final design, PM can result in significant savings of time and expense and shed additional light on the physics of failure PM can be very helpful to predict reliability at conditions other than the FOAT and can provide important information about the device performance Modeling can be helpful in optimizing the performance and lifetime of the device, as well as to come up with the best compromise between reliability, cost effectiveness and time-to-market . Dr. E. Suhir Page 36
  • 39. Requirements for a good predictive model A good FOAT PM does not need to reflect all the possible situations, but should be simple, should clearly indicate what affects what in the given phenomenon or structure, be suitable/flexible for new applications, with new environmental conditions and technology developments, as well as for the accumulation, on its basis, the reliability statistics. The scope of the model depends on the type and the amount of information available. A FOAT PM does not have to be comprehensive, but has to be sufficiently generic, and should include all the major variables affecting the phenomenon (failure mode) of interest. It should contain all the most important parameters that are needed to describe and to characterize the phenomenon of interest, while parameters of the second order of importance should not be included into the model. FOAT PM take inputs from various theoretical analyses, test data, field data, customer requirements, qualification spec requirements, state-of-the-art in the given field, consequences of failure for the given failure mode, etc. Dr. E. Suhir Page 37
  • 40. What the existing FOAT PMs predict Before one decides on a particular FOAT PM he/she should anticipates the predominant failure mechanism in advance, and then applied the appropriate model The most widespread PMs identify the mean time-to-failure (MTTF) in steady-state- conditions If one assumes a certain probability density function for the particular failure mechanism, then, for a two-parametric distribution (like, e.g., the normal one) he/she could construct this function based on the determined mean-time-to-failure and the measured standard deviation (STD) For a single-parametric probability density distribution function, like an exponential one, the knowledge of the MTTF is sufficient to determine the failure rate and to determine the probability of failure for the given time in operation. Dr. E. Suhir Page 38
  • 41. Most widespread predictive models (PMs) Power law (used when the PoF is unclear), Boltzmann-Arrhenius equation (used when there is a belief that the elevated temperature is the major cause of failure), Coffin-Manson equation (inverse power law; used particularly when there is a need to evaluate the low cycle fatigue life-time), Crack growth equations (used to assess the fracture toughness of brittle materials), Bueche-Zhurkov and Eyring equations (used to assess the MTTF when both the high temperature and stress are viewed as the major causes of failure), Peck equation (used to consider the role of the combined action of the elevated temperature and relative humidity) Black equation (used to consider the roles of the elevated temperature and current density), Miner-Palmgren rule (used to consider the role of fatigue when the yield stress is not exceeded), Creep rate equations, Weakest link model (used to evaluate the MTTF in extremely brittle materials with defects), Stress-strength interference model, which is, perhaps, the most flexible and well substantiated model. Dr. E. Suhir Page 39
  • 42. Example: Boltzmann-Arrhenius equation Boltzmann-Arrhenius equation underlies many FOAT related concepts . The MTTF, τ=tau, is proportional to an exponential function, in which the argument is a fraction, where the activation energy, Ua, eV, is in the numerator, and the product of the Boltzmann’s constant, k=8.6174×10-5eV/ºK, and the absolute temperature, T, is in the × denominator:  Ua  τ = τ 0 exp    ( k T −T*  ) The equation was first obtained by L. Boltzmann in the statistical theory of gases, and then applied by the S. Arrhenius to describe the inversion of sucrose. Arrhenius paid attention to the fact that the physical processes and the chemical reactions in solid bodies are also enhanced by the absolute temperature Boltzmann-Arrhenius equation is applicable, when the failure mechanisms are attributed to a combination of physical and chemical processes. Since the rates of many physical processes (such as, say, solid state diffusion, many semiconductor degradation mechanisms) and chemical reactions (such as, say, battery life) are temperature dependent, it is the temperature that is the acceleration parameter. .
  • 43. Boltzmann-Arrhenius Equation and the PDfR concept Boltzmann-Arrhenius equation addresses degradation processes and attributes degradation and possible failures cased by degradation to elevated temperatures and possibly to the elevated humidity as well, i.e., to the environmental factors. The failure rate for a system whose MTTF is given by the Boltzmann-Arrhenius equation can be found as 1  Ua  λ = exp  − τ0  k (T − T * )  The probability of failure at the moment t of time can be found as P = 1 − e − λt This formula is known as exponential formula of reliability. If the probability of failure P is established for the given time t in operation, then the exponential formula of reliability can be used to determine the acceptable failure rate. Dr. E. Suhir Page 41
  • 44. Coffin-Manson Equation (Inverse Power Law)-1 Many electronic materials and especially solder joints fail primarily because of the elevated mechanical stresses and deformations (strains). The numerous existing empirical and semi-empirical methods and approached that address the low-cycle-fatigue life-time of solders are, in one way or another, based on the pioneering work of Coffin and Manson It has been established that materials that experience elevated stresses and strains within the elastic range fail because of elevated stresses, whether steady-state or variable, while the materials that experience high stresses exceeding yield stress fail primarily because of the inelastic deformations. Such a behavior, known as low-cycle-fatigue conditions, is typical for solder materials, including even lead-free solders whose yield point might be substantially higher than that for tin-lead solders The original Coffin-Manson equation is just an inversed power law that is applicable to highly compliant materials exhibiting significant plastic deformations prior to failure. The inverse power law is used also in some other, physically quite different, applications, such as MTTF in random vibration tests (Steinberg’s formula); aging in high-power lasers, etc. Dr. E. Suhir Page 42
  • 45. Coffin-Manson Equation (Inverse Power Law)-2 The studies carried out in the 1990-s addressed primarily flip-chip tin-lead solder joint interconnections. The today’s studies address primarily the thermal fatigue life of ball- grid-array (BGA) and pad-grid-array (PGA) systems and especially lead-free solder joints The thermally-induced stresses and strains in the flip-chip solder joints are caused by the CTE mismatch of the chip and the package substrate materials, as well as by the temperature gradients because of the difference in temperature between the “hot” chip and the “cold” substrate. In BGA and PGA systems the stresses and strains are caused by the mismatch of the package structure and the PCB (“system’s substrate”) The numerous suggested phenomenological semi-empirical models are based on the prediction and improving the solder material fatigue caused by the accumulated cyclic inelastic strain in the solder material. This strain is due to the temperature fluctuations resulting from the changes in the ambient temperature (temperature cycling) and/or from heat dissipation in the package (power cycling). Dr. E. Suhir Page 43
  • 46. Coffin-Manson Equation (Inverse Power Law)-3 The modified Coffin-Manson model  U  f = Af −α ∆T −β − exp    kTmax  can be used to model crack growth in solder and other metals due to temperature cycling. In the above formula, f is the number of cycles to failure, f is the cycling frequency, ∆T is the temperature range during a cycle, Tmax is the maximum temperature reached in each cycle, and k is Boltzmann’s constant. Typical values for the cycling frequency exponent α and the temperature range exponent β are around - 1/3 and 2, respectively. Reduction in the cycling frequency reduces the number of cycles to failure. The activation energy U is around 1.25. In recent years a visco-plastic rate dependent constitutive model, known as Anand model, is often used in combination with the FEA simulation to predict the solder joint reliability. In Anand’s model (that includes one flow equation and three evolution equations) plasticity and creep phenomena are unified and described by the same set of flow and evolution relations. Dr. E. Suhir Page 44
  • 47. Stress-strength (“interference”) model Fig.20. Stress-strength (“Interference”) models-1 Stress (Demand) and Strength (Capacity) Distributions Dr. E. Suhir Page 45 Page 20
  • 48. 5. EXAMPLE OF A FOAT: Physics, Modeling, Experimentation, Prediction “A theory without an experiment is dead. An experiment without a theory is blind” Unknown Reliability Engineer Dr. E. Suhir Page 46
  • 49. Dr. E. Suhir Page 47
  • 50. Dr. E. Suhir Page 48
  • 51. Dr. E. Suhir Page 49
  • 52. Dr. E. Suhir Page 50
  • 53. Dr. E. Suhir Page 51
  • 54. Dr. E. Suhir Page 52
  • 55. Dr. E. Suhir Page 53
  • 56. Finite_Element Analysis (FEA) Data Dr. E. Suhir Page 54
  • 57. Predicted Stresses and Strains in a Short Cylinder Dr. E. Suhir Page 55
  • 58. Dr. E. Suhir Page 56
  • 59. Experimental bathtub curve for the solder joint interconnections in a flip-chip multichip module Dr. E. Suhir Page 57
  • 60. Probability of failure of the solder joint interconnections vs. failure rate Dr. E. Suhir Page 58
  • 61. Dr. E. Suhir Page 59
  • 62. Thank you for taking my course © 2009 Dr. E. Suhir Page 118