A Novel Framework for Real-time Fault Diagnosis Based on Dynamic Fault Tree Analysis

To meet the real-time diagnosis requirements of the complex system, this study proposes a novel framework for real-time fault diagnosis using dynamic fault tree analysis. It pays special attention to meeting two challenges: model development and real-time reasoning. In terms of the challenge of model development, we use a dynamic fault tree model to capture the dynamic behavior of system failure mechanisms and calculate some reliability results by mapping a dynamic fault tree into an equivalent Bayesian Network (BN) in order to avoid the infamous state space explosion problem. In terms of the real-time reasoning challenge, we adopt a logic compilation based inference algorithm, which compiles the BN into an arithmetic circuit and retrieves answers to probabilistic queries by evaluating and differentiating the arithmetic circuit. Furthermore, we incorporate sensors data into fault diagnosis, cope with the sensors reliability and propose the schemes on how to update the Diagnostic Importance Factor (DIF) and the minimal cut sets. Finally, a case study is given to validate the efficiency of this method.


INTRODUCTION
Recent technological progress and innovation have led to a continuously increasing in the complexity and functionality of systems.The failures within these systems can cause disruption to the operational functionality and may lead to huge economic loss.Fault location has therefore become a first objective in engineering applications.Effective diagnostic approaches, which bring the system back at the fastest speed as well as at the lowest cost, can decrease downtime and consequently, enhance the operational functionality.Aimed at this issue, many researchers have developed lots of effective theory and methodology.As far as the reasoning model is concerned, there are dependency model (Tsai and Hsu, 2011), fault tree (Lin et al., 2010), Petri net (Basile et al., 2009), directed graph (Gao et al., 2010), neural networks (Patan et al., 2008), BN (Mansour et al., 2012) and so on.Components' DIF or minimal cut sets' DIF was calculated based on the static fault tree analysis, which determines the order of the system diagnosis (Assaf and Dugan, 2006).However, this method determines the diagnostic sequence only by components' DIF or minimal cut sets' DIF alone, which usually causes minimal cut sets with a smaller DIF to be checked first.Assaf and Dugan (2008) put forward a method to incorporate evidence data from sensors into the diagnostic process.But, the solution for dynamic fault tree was based on Markov model which has the infamous state space explosion problem.It did not have the capability of incorporating the evidence into the reasoning and couldn't update the components' posterior failure probability based on the sensors data, which affects the diagnostic accuracy and efficiency.The online implementation of these diagnosis techniques is becoming an important research topic due to the increasing demand for higher performance, efficiency, reliability and safety of system equipments.An online fault diagnostic scheme for nonlinear systems based on neurofuzzy networks was proposed (Mok et al., 2008).This scheme needed intact historical data about the process operation under various normal and faulty conditions, which are very difficult to obtain.Nan et al. (2008) proposed a knowledge-based fault diagnosis approach (Nan et al., 2008), which used the valuable knowledge from the experts and operators, as well as real-time data from lots of sensors.Fuzzy logic was also used to make inferences based on the real-time data and the knowledge.However, this methodology is datadriven and its performance is dependent on the quality of expert knowledge as well as frequency of data processing.
Motivated by the problems motioned above, this study presents a novel framework for real-time fault diagnosis based dynamic fault tree analysis.It focuses on catering for the two challenges: model development and real-time reasoning.In addition, we adopt an efficient diagnostic decision algorithm based on the reliability results to optimize fault diagnosis.This method doesn't need mass fault data, makes full use of the qualitative and quantitative information at the phase of system design and can also be used to perform online diagnosis.

PROPOSED REAL-TIME DIAGNOSIS SYSTEM FRAMEWORK
The real-time diagnosis system uses the dynamic fault tree to model the complex system.All minimal cut sets are generated using qualitative analysis of the fault tree, while DIF is calculated via quantitative analysis.The DIF is used as the corner stone of this diagnosis algorithm.The DIF is defined conceptually as the probability that an event has occurred given the top event has also occurred (Assaf and Dugan, 2003).This quantitative measure allows us to discriminate between components or minimal cut sequences by their importance from a diagnostic point of view.The DIF of minimal cut sequence is defined conceptually as the probability that a minimal cut sequence event has occurred given the system failure has occurred: where, MCS n is the n th minimal cut sequence, C is a component in system S.
The DIF of minimal cut sets is determined by: where, P(S) represents the unreliability of the system.
Based on the analysis above, a framework for realtime fault diagnosis method is presented in Fig. 1.It focuses on the two challenges: model development and real-time reasoning.To address the challenge of model development, it uses a dynamic fault tree model to capture the dynamic behavior of system failure mechanisms and calculates some reliability results by mapping a dynamic fault tree into an equivalent BN in order to avoid the infamous state space explosion problem.In addition, BN can deal with the evidence data from sensors and update the DIF after receiving them.To address the real-time reasoning challenge, a logic compilation based inference is adopted and divided into two phases: an offline phase, which compiles the network into an arithmetic circuit and is run once; and an online phase, which answers many queries each time it is invoked and which may be invoked multiple times.Also we introduce the evidence information function to determine the location of sensors, incorporate sensors data into fault diagnosis, cope with the sensors reliability and propose the schemes on how to update the DIF as well as the cut sets.In addition, an efficient diagnostic decision algorithm is adopted to generate a Diagnostic Decision Tree (DDT) which guides the maintenance crew to make more efficient decisions when trying to repair a system.This method doesn't need mass fault data, makes full use of the qualitative and quantitative information at the phase of system design and can be used to perform online diagnosis.

Implementation of the proposed real-time diagnosis system:
Dynamic fault tree analysis: The fault tree is a deductive, structured methodology to determine the potential causes that may result in the occurrence of a predefined undesired event, referred to as the top event.Dynamic fault trees extend static fault tree to capture dynamic behavior of system failure mechanisms such as sequence-dependent events, spares and dynamic redundancy management and priorities of failure events.
All minimal cut sets are generated using qualitative analysis of fault tree, while DIF is calculated via quantitative analysis.The Semanderes algorithm and Fussell-Vasely algorithm are the most effective method for generating minimal cut sets.But they are inappropriate to the dynamic fault tree.The Zerosuppressed Binary Decision Diagram (ZBDD) can overcome this shortcoming (Tang and Dugan, 2004), which separates logic constraints and timing constraints and converts the dynamic fault tree into the static fault tree.We generate the minimal cut sets of the resulting static fault tree using some set operations and expand each minimal cut set to minimal cut sequences by considering the timing constraints.
Quantitative analysis for dynamic fault tree is used to calculate the minimal cut sequences' DIF and components' DIF.DIF values are usually acquired from marginal importance factors produced by the sensitivity analysis of dynamic fault trees solved via Markov chains.This calculation process is not only very complicated, but also leads to the infamous state space explosion problem.In this study, we map dynamic fault tree into an equivalent Discrete-Time Bayesian Network (DTBN) (Boudali and Dugan, 2005) and resort to its inference engine to calculate the posterior probability of components given that the system has failed, which is also components' DIF.We enter the evidence that the system has failed, P (S = n + 1) = 0 and ( ) 1/ ,1 1 P S x n x n      and solving the BN using a compilation based algorithm gives the following posterior failure ( 1)   , so we can calculate the component's DIF: Incorporating sensors evidence into system diagnosis: When a system failure is observed, sometimes additional evidence is observed too, which may be collected from diagnostic sensors.This evidence information can be used to optimize the system diagnosis.Also, the performance of a diagnostic system highly depends on the number and location of sensors.DIF allows us to discriminate between components or minimal cut sets by their importance from a diagnostic point of view.The higher is DIF, the more important is component or minimal cut sets.So it can be used to decide between candidate monitor locations.The components which maximum the evidence information function will be monitored by sensors.Thus the designer can just select the components with higher DIF as the sensors location (Duan et al., 2011).This sensor optimization placement considers the quantitative and qualitative data obtained from reliability analysis and can guarantee a lower expected diagnostic cost.
After the sensors location is determined, we can use sensors data to optimize the diagnostic progress.On one hand, we can use sensors data to narrow down the number of the diagnosed minimal cut sets.The Cut sets Under Evidence (CUE) is the set of all essential minimal cut sets obtained after evidence eliminates some cut sets.For example, a system has 5 minimal cut sets: {A, B}, {A, D}, {B, C}, {C, D} and {D, E}.These minimal cut sets are captured in the system's characteristic function due to ignoring evidence: If sensors detect the failure of C and D, the updated CUE function is generated: On the other hand, we can update the DIF of the components and CUE.Components' DIF can be updated solving BN according to sensors data, while the DIF of the CUE can be calculated using: ( , , ) ( ) where, S and E represent the system and the variables with given evidence, respectively.
As is known to all, sensors might not be completely reliable.A sensor that provides false information can misguide the diagnosis process, thus a sensor failure can make the DDT meaningless in diagnosing the system failure.So we must consider the effect of sensors reliability.The influences of sensors reliability are embodied not only in the changes on the DIF but also in the changes of the CUE.The BN created from the dynamic fault tree is appropriate for reliability analysis.To use the BN for fault diagnosis, we need to add to the network nodes representing the evidence.Evidence nodes in the BN provide links connecting it with the component in the BN, which are observed by the sensors.The links are directed from the component to the evidence nodes.As to the effect on the DIF, we just change the conditional probability tables of the evidence nodes during the mapping fault tree into BN and update the DIF according to the evidence data; As far as the effect on the CUE is concerned, We augment the CUE function by adding sensors as cut sets, since a failure sensor can lead to a faulty diagnosis progress.The DIF for a sensor with respect to the system is measured by the same way the DIF of the components: where, q Sensor and Q S represent the unreliability of sensor and the system, respectively.So the updated F CUE is: Real-time diagnosis: After mapping the dynamic fault tree into BN, we apply inference algorithms to the model to calculate components' DIF and update them according to the real-time evidence data.Some popular algorithms exploit global structure to a certain extent and run in time that is exponential in a measure known as tree width.However, for larger tree width BN these algorithms are difficulty for exact and real-time inference.To address this problem, a number of approaches have been proposed (Larkin and Dechter, 2003;Poole and Zhang, 2003), which seek to exploit such local structure.These techniques for exploiting local structure have achieved only limited success and haven't answered multiple queries simultaneously.BN from the dynamic fault tree has lots of local structure information, which contains some determinism (0 or 1) and some equal parameters.In addition, multiple queries need to be calculated simultaneously.So a logic compilation based inference is adopted because logic allows many types of structure to be represented explicitly and allows us to leverage state-of-the-art algorithms for knowledge compilation.The compilation based inference method is divided into two phases, offline compilation and online inference.The offline phase compiles the network into an arithmetic circuit and is run once (Chavira and Darwiche, 2005;Darwiche and Corporation, 2009).In particular, the approach encodes the BN into a Conjunctive Normal Form (CNF), converts the CNF into a smooth Decomposable Negation Normal Form (sd-DNNF) circuit that satisfies some properties and then extracts an arithmetic circuit from the sd-DNNF circuit.The online phase answers many queries each time it is invoked and which may be invoked multiple times.The main advantage of compilation is that a large amount of the work required for inference is performed once offline; this effort is then amortized over many online queries.Online inference is typically much faster when using a compilation approach and can be used to infer for BN with large tree width.
As cut sets represent minimal sets of component failures that can cause a system failure, we should diagnose them one by one to find the root reason of system failure.Only when we finish diagnosing a minimal cut set can we do next.The order by which cut sets are checked depends on its DIF ordering, while the order of components in the same cut set is determined by their DIF.The cut sets with larger DIF are checked first.Accordingly, components with larger DIF in a cut set are checked first (Duan et al., 2011).This assures a reduced number of system checks while fixing the system.
Average diagnostic cost is often used to evaluate the fault diagnosis method.The diagnostic cost is lower, the method is better.As we all know, the output of fault diagnosis method is the DDT, we can evaluate it with the help of several decision tree evaluation measures.Traditional evaluation measures have the mean depth of the tree and the expected cost function.But these measures only consider the test cost and the failure probability of components, neglect system qualitative structure and the importance factors of each component.Also, they only diagnose one fault at a time and are not capable of detecting multiple faults by a single tree traversal.Based on these evaluation mechanisms, we adopt Expected Diagnostic Cost (EDC) which incorporates the qualitative (structure) and quantitative (reliability analysis) into one measure for predicting diagnosis cost: where, Q s = The unreliability of the system cp i = The sum of all test costs from the top node to the cutset's leaf node qcutset i = The unreliability of cut sequences Case study: Micro-computer controlled straight electropneumatic braking system: The micro-computer controlled straight electro-pneumatic braking system has been the first choice braking system for urban rail transit, which has the advantages of the swift response, flexible operation, combined application with electric braking and anti-slip control.It is an electro-mechanic control system and achieves its function by the coordination of electrical circuit part and air circuit part.However, high coupling degree together with complicated logic relationships exists in these modules.Lots of current research about the micro-computer controlled straight electro-pneumatic braking system has focused on its reliability analysis using a reliability block diagram or static fault tree.This study uses a dynamic fault tree to model its dynamic fault characteristic and proposes an efficient real-time diagnostic algorithm which can guide the maintenance crew to recover the failed system at the lowest cost.

APPLICATION OF DIAGNOSTIC METHODS
Figure 2 shows a dynamic fault tree for service braking failure of a micro-computer controlled straight electro-pneumatic braking system.Any one of braking control failure, air supply unit failure, braking control output failure and braking execution unit failure will result in service braking failure.
We generate all minimal cut sequences via the efficient ZBDD and calculate the DIF of components as well as minimal cut sequences by mapping its dynamic fault tree into the equivalent DTBN as shown in Fig. 3. Now we know the service braking failure, so their failure probability should be set as 1.Solving this BN using the inference algorithm mentioned above gives the results of some importance factors in     2 shows the diagnostic data with sensors data.Based on the diagnostic decision reliability results by mapping a dynamic fault tree into an equivalent BN in order to avoid the infamous state space explosion problem.To meet the real-time reasoning challenge, we compile BN into arithmetic circuit, an approach that supports the real-time diagnosis in two ways.First, the use of arithmetic circuit results in more predictable diagnostic inference times.Second, it results in much faster inference.Furthermore, we incorporate sensor data into fault diagnosis, cope with the sensors reliability and propose the schemes on how to update DIF and the cut sets.In addition, an efficient diagnostic decision algorithm is developed based on these results to optimize system diagnosis.Finally, a case study is given to demonstrate the efficiency of this method.The proposed method makes use of the advantages of the dynamic fault tree for modeling, BN for inference ability and arithmetic circuit for real-time reasoning, which is especially suitable for the complex system diagnosis.
In the future study, we will focus on the system model optimization and evaluate diagnostic sensitivity to the underlying model and parameters to improve the effectiveness of our algorithms and methods.

Fig. 2 :
Fig. 2: A dynamic fault tree for service braking failure of braking system Assume sensors have a fixed probability of failure of 0.02; the DIF of the sensor is 0.357.If sensors monitor the failure of X18 and X19, we can use these evidences to optimize the diagnosis.On one hand, we can use them to reduce the characteristic function and generate the updated CUE function; On the other hand, we can update the DIF of components and CUE after receiving the evidence data.Table

Table 1 .
Assume the number of sensors is two.According to

Table 1 :
Detailed diagnostic data without sensors

Table 1 ,
X18 and X19 will be best location of sensors.