Failure analysis ― also known as failure investigation ― is the process of figuring out why a product has failed by trying to identify and mitigate the root cause of the failure. Failure analysis looks at the environment that caused the failure, what specific mechanism led to failure, and the location of the failure site.
For electronics products, failure analysis isolates the failure to a location on the printed circuit board assembly (PCBA), then looks deeper at the components or board location to find the exact failure site.
Any product failure requires an investigation into what happened that caused it to fail. While isolating the failure is important, one major reason for using failure analysis is to prevent it from happening again. By understanding the underlying failure mechanisms and root causes, manufacturers can take corrective actions to prevent the same issues from happening in the future. Field failures or warranty recalls are very expensive for companies, as they can cause massive financial and reputational damage. Any late-stage failures are also a cause for concern.
Many industries use failure analysis as a quality control (QC) measure during their manufacturing or product support processes to identify any potential failures, determine the root cause of customer-reported failures, and ensure that consumers receive well-made products. Industries that frequently conduct failure analyses include the automotive, aerospace, defense, manufacturing, biomedical, and consumer goods sectors, but failure analysis processes can be used within any industry to find out how and where something has gone wrong during manufacturing or in the field.
There are many reasons for electronic product failures. They are not usually the result of an electrical design problem but rather material selection, thermal management, contamination, or mechanical design issues. This can be from a thermal or mechanical load that was not expected to be present or from a load that was accounted for but had adverse effects compared to what was expected. In other cases, it can be from contamination of the board, an incomplete understanding of material properties or behavior, or some level of corrosion.
There are many different failure modes and mechanisms that cause failure at the PCBA and individual component level. Some common electronic failures include:
Failure analysis and root cause analysis (RCA) are often used interchangeably, but this is not entirely correct. RCA describes the general problem-solving methodology concerned with why a failure occurred. RCA attempts to assess the relevant contributors to a failure and may consider organizational drivers, internal communications, design practices, poor specifications, product use environment, material science assumptions, and many other potential issues. Failure analysis is a category of RCA data-gathering techniques that focuses on the systematic examination of failed devices to identify the root cause of failure and inform potential mitigations that will prevent it from recurring. The following questions form the foundation of a robust failure analysis:
There are many physical and chemical failure analysis techniques that can be used to look for failures directly in an electronic system, including:
Common RCA techniques like the “five whys” method and Six Sigma often incorporate failure analysis as a data-gathering technique to inform failure mitigation actions resulting from the RCA.
Product failures are frequently on the news and can have severe consequences, such as EV or smartphone battery fires. Not only are product failures expensive, but they also erode consumer confidence.
Failure analysis provides manufacturers with a way to build that confidence through corrective action and continuous improvement of their products to meet the needs of the consumer. If a product has already failed in real-world use, finding the root cause and solving the problem is key to ensuring more products make it to market.
But this goes beyond the manufacturers themselves. In many sectors, manufacturers are supplied with components from multiple sources, so failure analysis methods ensure they are reliable and trustworthy enough to be used in the end product. Failure analysis, therefore, ensures robustness and reliability in the wider manufacturing supply chain, regardless of industry.
When it comes to establishing the root cause of product failure, four RCA techniques are typically used:
Five whys: This method investigates the cause and effect of a failure to understand its root cause. It first starts with a problem, followed by a series of “why” questions investigating the product and its environment until an answer is found.
Fishbone (Ishikawa) diagram: The fishbone diagram was named for the appearance of its final shape. This tool assumes a complete ignorance of the environment so engineers can assess other factors that could have led to failure, enabling them to narrow down the root cause.
Fault tree analysis: A fault tree analysis breaks a system down into its components and subsystems. It looks at the relationship between subsystem or component failure and the rest of the system to deduce the failure path for the higher-level system. Fault tree analyses essentially examine the location of faults in certain areas and assess how they affect the wider system.
Failure mode and effects analysis: Failure mode and effects analysis (FMEA) extends fault tree analysis by defining potential failure modes at each node and determining how they will impact subsystem and system performance. FMEA investigates failures down to the component and subsystem levels and looks at the effects on the wider system. FMEA goes into more detail than fault tree analysis (e.g., down to timing loss on a chip), and there are many types of FMEA with different specifications for different industries.
While traditional RCA techniques are helpful, reliability physics and reliability engineering offer more robust insights into why a product has failed. They can be used during any phase of product development to inform RCA and prevent failures from occurring before they happen.
Reliability physics adds an extra layer of accuracy to failure analysis. Using a physics-based approach speeds up the assessment of failure modes and failure mechanisms by negating redundant or highly unlikely failure options.
Understanding the physics of the failure enables engineers to understand how the mechanical, thermal, chemical, and electrical stresses inside a product can lead to failure. In the majority of cases, failure is not due to electrical factors. Instead, the majority of failure modes stem from thermal, material selection, contamination, and mechanical (as well as electrical) causes that can be captured using simulation tools based on reliability physics to prevent a product from failing even before manufacture. For example, thermal cycling failure is a common problem in electronic devices that can be easily mitigated through failure analysis.
Combining simulation and physical hardware analysis expedites failure assessment and helps engineers understand the physics of the failure.
A typical simulation approach might take the path of a design review of the PCBAs followed by a finite element analysis (FEA). Simulation methods evaluate incoming materials and assess mechanical robustness to identify failure modes, assess potential failure modes the system will be susceptible to, determine contamination thresholds, and explore design variations that improve system reliability.
Some real-world examples where this can be put in place include:
One of the most common failure mechanisms in PCBAs is solder fatigue, driven by thermal cycling. Modern PCBAs are a combination of many different materials, including glass fiber laminates, ceramics, polymers, solder, silicon, and copper, which have widely varying material properties. One of the most critical properties to consider when assessing solder fatigue failures is the coefficient of thermal expansion (CTE).
Solder is often used inside electronic packages to attach electronic components to printed circuit boards, and it typically connects materials with very different CTEs. Due to changes in the operating environment or component power dissipation, PCBAs and components undergo thermal cycling, which causes the materials to expand and contract at different rates. This differential expansion is absorbed by the solder as creep, and the accumulated creep strains in the solder lead to cracking and, eventually, the complete fracture of the solder ball.
Physical analysis of failed samples — using techniques like electrical probing, X-ray, ultrasonic microscopy, cross-sectioning with optical inspection or SEM, and dye-and-pry — can be very effective at confirming the presence and location of solder cracks and the solder fatigue mechanism. But when it comes to determining why the failure occurred and proposing solutions to prevent further failures, simulation becomes a critical tool. With simulation, analysts can include the influence of the materials, geometry, environment, attachment methods, and other factors that may drive solder fatigue. Simulation results give insight into the physics driving the failure and enable companies to virtually test the impact of design or operating condition changes before implementing a fix.
Whether applying physical analysis and testing or simulation to the solution of a failure analysis challenge, reliability physics is at the core of the Ansys approach. Our Reliability Engineering Services team includes experts in design for excellence, electronic system design, packaging, and manufacturing who apply physical analysis, testing, and simulation to solve even the most difficult failure analysis challenges. With years of experience in electronics design, the team always starts with nondestructive techniques to identify failure locations and failure mechanisms.
Ansys software can analyze many electronic systems to see what thermo-mechanical issues exist, or could exist, in an advanced technology product. Simulation is a powerful addition to the physical techniques of failure analysis and provides additional insight into the forces and material behaviors that may have led to failures.
Ansys Sherlock™ electronics reliability prediction software: used for predicting failures based on thermo-mechanical issues. Sherlock software can simulate the system that failed in its native environment to simulate the behavior that led to the failure. This reliability analysis approach also enables engineers to identify failure mechanisms in the components, board, and system to better optimize for its intended application environment. Sherlock software makes reliability predictions at the PCBA level and can use inputs from Ansys Mechanical™ software and the Ansys Icepak® solution to simulate reliability beyond the PCBA level, such as modeling the housing around a PCBA or creating a cooling system that reduces component temperatures.
Mechanical shock simulation from Ansys Sherlock software
Ansys Mechanical structural FEA software: Provides simulations that look at the worst-case conditions in different loading scenarios that incorporate elements of the system outside of the PCBA (e.g., housings, mechanical stiffeners, and other higher-level subsystem mechanical components). Mechanical software can be used to derive board strains under different loading conditions in complex system-level assemblies. The results of a Mechanical analysis can be used to identify overstress failures or ported to Sherlock software to make component-level reliability predictions due to complex loading and constraint scenarios.
Ansys Icepak electronics cooling simulation software: Provides a thermal analysis that examines temperatures of different components on a PCBA under the influence of different cooling solutions. The results of Icepak analyses can be used to identify temperatures beyond component temperature ratings, assess component derating margins, or be incorporated into Sherlock analysis for component-level reliability predictions.
Electro-thermal analysis of a PCB using Ansys Icepak software
Ansys has helped more than 3,000 customers identify and mitigate the root cause of product failures, as well as provide solutions through simulation before they become an issue. If you want to join one of the 300+ companies that choose Ansys every year to solve their technical challenges, contact our physics experts today.