Fault analysis, also known as fault tree analysis, is a method used to determine the various chains of effects that would cause a system to fail, compromising safety or stability. Engineers often use fault analysis for safety or hazard evaluations. In fault analysis, complex relationships between hardware, software, and humans are analyzed with methods derived from boolean algebra, probability theory, and reliability theory. The final product of fault analysis is a logical, visual diagram representing any potential failure that a system may suffer, or existing failures that already have happened and why they've happened.
The top of a fault tree diagram displays the final failed state of the system, while the events that branch off below show the states of all the separate components of the system that could allow the final state to happen. The lines and shapes connecting the components show the logical relationship. For example, if a closed valve or an unavailable pump could cause loss of cooling, a pointed dome shape representing "or" would connect these two possible causes to the final state. If both the closed valve and the unavailable pump were necessary to cause loss of cooling, however, a rounded dome representing "and" would be used. The next level down in this hypothetical fault tree diagram would show what components might cause the closed valve or the unavailable pump.
H. A. Watson of Bell Laboratories first developed fault analysis in 1962 when he was working for the US Air Force to create a launch control system for an intercontinental ballistic missile. The Boeing company recognized the benefits of this system, and adapted it for use in the design of commercial aircraft. Fault analysis gained national attention after the Apollo 1 launch pad fire on January 27, 1967, when NASA hired Boeing to design a new safety program for the Apollo project.
Fault analysis then spread to the nuclear power industry, where it was used to analyze the Three Mile Island nuclear power plant incident on March 28, 1979. The nuclear power industry probably did more for the development of fault tree theory and software than any other group, according to Fault Tree Analysis — A History by Clifton A. Ericson II. Now, a variety of industries including auto, rail, and robotics use this method.
Currently, fault analysis software is widely available and helpful in building and evaluating fault tree diagrams. Ideally, fault analysis is meant to help prevent major incidents or accidents by identifying root causes and improving user understanding of the system. This method can also be useful to analyze accidents after they occur and determine what went wrong.