Skip to Main Content

What is Failure Analysis?

Failure analysis ― also known as failure investigation ― is the process of figuring out why a product has failed by trying to identify and mitigate the root cause of the failure. Failure analysis looks at the environment that caused the failure, what specific mechanism led to failure, and the location of the failure site.

For electronics products, failure analysis isolates the failure to a location on the printed circuit board assembly (PCBA), then looks deeper at the components or board location to find the exact failure site.

When a Product Fails

Any product failure requires an investigation into what happened that caused it to fail. While isolating the failure is important, one major reason for using failure analysis is to prevent it from happening again. By understanding the underlying failure mechanisms and root causes, manufacturers can take corrective actions to prevent the same issues from happening in the future. Field failures or warranty recalls are very expensive for companies, as they can cause massive financial and reputational damage. Any late-stage failures are also a cause for concern.


Many industries use failure analysis as a quality control (QC) measure during their manufacturing or product support processes to identify any potential failures, determine the root cause of customer-reported failures, and ensure that consumers receive well-made products. Industries that frequently conduct failure analyses include the automotive, aerospace, defense, manufacturing, biomedical, and consumer goods sectors, but failure analysis processes can be used within any industry to find out how and where something has gone wrong during manufacturing or in the field.

Why Do Electronic Products Fail?

There are many reasons for electronic product failures. They are not usually the result of an electrical design problem but rather material selection, thermal management, contamination, or mechanical design issues. This can be from a thermal or mechanical load that was not expected to be present or from a load that was accounted for but had adverse effects compared to what was expected. In other cases, it can be from contamination of the board, an incomplete understanding of material properties or behavior, or some level of corrosion.

There are many different failure modes and mechanisms that cause failure at the PCBA and individual component level. Some common electronic failures include:

  • Wire bond breaking and liftoff
  • Delamination
  • Capacitor cracking
  • Die damage
  • Interconnect failure
  • Solder fatigue and overstress
  • Lead fracture
  • Contamination-induced current leakage
  • Electrochemical migration
  • Conductive anodic filament failure
  • Plated through hole fatigue
  • Pad cratering and trace fracture

Failure Analysis vs. Root Cause Analysis

Failure analysis and root cause analysis (RCA) are often used interchangeably, but this is not entirely correct. RCA describes the general problem-solving methodology concerned with why a failure occurred. RCA attempts to assess the relevant contributors to a failure and may consider organizational drivers, internal communications, design practices, poor specifications, product use environment, material science assumptions, and many other potential issues. Failure analysis is a category of RCA data-gathering techniques that focuses on the systematic examination of failed devices to identify the root cause of failure and inform potential mitigations that will prevent it from recurring. The following questions form the foundation of a robust failure analysis:

  1. What is the failure mode?
  2. How did the failure occur?
  3. Where is the failure site?
  4. What is the failure mechanism?
  5. What can be done to prevent recurrence?

There are many physical and chemical failure analysis techniques that can be used to look for failures directly in an electronic system, including:

  • X-ray microscopy
  • Acoustic microscopy
  • Scanning electron microscopy (SEM)
  • Optical microscopy
  • Energy dispersive X-ray spectroscopy (EDS)
  • Use of a superconducting quantum interference device (SQUID)
  • Thermal imaging
  • Mechanical testing
  • Dye-and-pry analysis
  • Cross-sectional analysis

Common RCA techniques like the “five whys” method and Six Sigma often incorporate failure analysis as a data-gathering technique to inform failure mitigation actions resulting from the RCA.

Why Failure Analysis is Important

Product failures are frequently on the news and can have severe consequences, such as EV or smartphone battery fires. Not only are product failures expensive, but they also erode consumer confidence.

Failure analysis provides manufacturers with a way to build that confidence through corrective action and continuous improvement of their products to meet the needs of the consumer. If a product has already failed in real-world use, finding the root cause and solving the problem is key to ensuring more products make it to market.

But this goes beyond the manufacturers themselves. In many sectors, manufacturers are supplied with components from multiple sources, so failure analysis methods ensure they are reliable and trustworthy enough to be used in the end product. Failure analysis, therefore, ensures robustness and reliability in the wider manufacturing supply chain, regardless of industry.

What RCA Techniques are Typically Used?

When it comes to establishing the root cause of product failure, four RCA techniques are typically used:

Five whys: This method investigates the cause and effect of a failure to understand its root cause. It first starts with a problem, followed by a series of “why” questions investigating the product and its environment until an answer is found.

Fishbone (Ishikawa) diagram: The fishbone diagram was named for the appearance of its final shape. This tool assumes a complete ignorance of the environment so engineers can assess other factors that could have led to failure, enabling them to narrow down the root cause.

Fault tree analysis: A fault tree analysis breaks a system down into its components and subsystems. It looks at the relationship between subsystem or component failure and the rest of the system to deduce the failure path for the higher-level system. Fault tree analyses essentially examine the location of faults in certain areas and assess how they affect the wider system.

Failure mode and effects analysis: Failure mode and effects analysis (FMEA) extends fault tree analysis by defining potential failure modes at each node and determining how they will impact subsystem and system performance. FMEA investigates failures down to the component and subsystem levels and looks at the effects on the wider system. FMEA goes into more detail than fault tree analysis (e.g., down to timing loss on a chip), and there are many types of FMEA with different specifications for different industries.

How To Prevent Failures Before They Occur

While traditional RCA techniques are helpful, reliability physics and reliability engineering offer more robust insights into why a product has failed. They can be used during any phase of product development to inform RCA and prevent failures from occurring before they happen.

Reliability physics adds an extra layer of accuracy to failure analysis. Using a physics-based approach speeds up the assessment of failure modes and failure mechanisms by negating redundant or highly unlikely failure options.

Understanding the physics of the failure enables engineers to understand how the mechanical, thermal, chemical, and electrical stresses inside a product can lead to failure. In the majority of cases, failure is not due to electrical factors. Instead, the majority of failure modes stem from thermal, material selection, contamination, and mechanical (as well as electrical) causes that can be captured using simulation tools based on reliability physics to prevent a product from failing even before manufacture. For example, thermal cycling failure is a common problem in electronic devices that can be easily mitigated through failure analysis.

Simulation plus physical analysis of hardware

Combining simulation and physical hardware analysis expedites failure assessment and helps engineers understand the physics of the failure.

A typical simulation approach might take the path of a design review of the PCBAs followed by a finite element analysis (FEA). Simulation methods evaluate incoming materials and assess mechanical robustness to identify failure modes, assess potential failure modes the system will be susceptible to, determine contamination thresholds, and explore design variations that improve system reliability.

Some real-world examples where this can be put in place include:

  • Looking at the ideal temperature range for a potting compound
  • Examining potential degradation mechanisms inside a battery
  • Simulating the solder system of a PCBA
  • Simulating the impact of conformal coating on component reliability
  • Looking at creep, fatigue, and diffusion-based failures based on fundamental atomic- and molecular-scale behavior

Case Study Example: Solder Fatigue

One of the most common failure mechanisms in PCBAs is solder fatigue, driven by thermal cycling. Modern PCBAs are a combination of many different materials, including glass fiber laminates, ceramics, polymers, solder, silicon, and copper, which have widely varying material properties. One of the most critical properties to consider when assessing solder fatigue failures is the coefficient of thermal expansion (CTE).

Solder is often used inside electronic packages to attach electronic components to printed circuit boards, and it typically connects materials with very different CTEs. Due to changes in the operating environment or component power dissipation, PCBAs and components undergo thermal cycling, which causes the materials to expand and contract at different rates. This differential expansion is absorbed by the solder as creep, and the accumulated creep strains in the solder lead to cracking and, eventually, the complete fracture of the solder ball.

Physical analysis of failed samples — using techniques like electrical probing, X-ray, ultrasonic microscopy, cross-sectioning with optical inspection or SEM, and dye-and-pry — can be very effective at confirming the presence and location of solder cracks and the solder fatigue mechanism. But when it comes to determining why the failure occurred and proposing solutions to prevent further failures, simulation becomes a critical tool. With simulation, analysts can include the influence of the materials, geometry, environment, attachment methods, and other factors that may drive solder fatigue. Simulation results give insight into the physics driving the failure and enable companies to virtually test the impact of design or operating condition changes before implementing a fix.

Ansys Solutions to Failure Analysis

Whether applying physical analysis and testing or simulation to the solution of a failure analysis challenge, reliability physics is at the core of the Ansys approach. Our Reliability Engineering Services team includes experts in design for excellence, electronic system design, packaging, and manufacturing who apply physical analysis, testing, and simulation to solve even the most difficult failure analysis challenges. With years of experience in electronics design, the team always starts with nondestructive techniques to identify failure locations and failure mechanisms.

Ansys software can analyze many electronic systems to see what thermo-mechanical issues exist, or could exist, in an advanced technology product. Simulation is a powerful addition to the physical techniques of failure analysis and provides additional insight into the forces and material behaviors that may have led to failures.

Ansys Sherlock™ electronics reliability prediction software: used for predicting failures based on thermo-mechanical issues. Sherlock software can simulate the system that failed in its native environment to simulate the behavior that led to the failure. This reliability analysis approach also enables engineers to identify failure mechanisms in the components, board, and system to better optimize for its intended application environment. Sherlock software makes reliability predictions at the PCBA level and can use inputs from Ansys Mechanical™ software and the Ansys Icepak® solution to simulate reliability beyond the PCBA level, such as modeling the housing around a PCBA or creating a cooling system that reduces component temperatures.

what-is-physics-of-failure-mechanical-shock.jpg

Mechanical shock simulation from Ansys Sherlock software

Ansys Mechanical structural FEA software: Provides simulations that look at the worst-case conditions in different loading scenarios that incorporate elements of the system outside of the PCBA (e.g., housings, mechanical stiffeners, and other higher-level subsystem mechanical components). Mechanical software can be used to derive board strains under different loading conditions in complex system-level assemblies. The results of a Mechanical analysis can be used to identify overstress failures or ported to Sherlock software to make component-level reliability predictions due to complex loading and constraint scenarios.

Ansys Icepak electronics cooling simulation software: Provides a thermal analysis that examines temperatures of different components on a PCBA under the influence of different cooling solutions. The results of Icepak analyses can be used to identify temperatures beyond component temperature ratings, assess component derating margins, or be incorporated into Sherlock analysis for component-level reliability predictions.

2020-12-icepak-capability-1.jpg

Electro-thermal analysis of a PCB using Ansys Icepak software

Ansys has helped more than 3,000 customers identify and mitigate the root cause of product failures, as well as provide solutions through simulation before they become an issue. If you want to join one of the 300+ companies that choose Ansys every year to solve their technical challenges, contact our physics experts today.

Related Resources

Thermal Cycling Failure in Electronics

Thermal cycling, the repeated oscillation between temperatures over the lifetime of an electronic device, can cause failure. Learn how to avoid it in your designs.

A Model-based Approach to Failure Analysis Using Ansys ModelCenter Software

Don’t miss this webinar on coupled model-based architecture. To ensure effective designs, systems engineering practitioners must adopt novel approaches for architecting emergent system attributes. 

Reliability Engineering Services: Predict and Prevent Product Failures

Learn how Ansys' team of reliability experts can work one-on-one with you to develop customized reliability solutions for your specific product development challenges.