Biological Nitrogen Removal Process Monitoring Based on Fuzzy Robust PCA

In this study the Fuzzy Robust Principal Component Analysis (FRPCA) method is used to monitor a biological nitrogen removal process, performances of this method are then compared with classical principal component analysis. The obtained results demonstrate the performances superiority of this robust extension compared with the conventional one. In this method fuzzy variant of PCA uses fuzzy membership and diminish the effect of outliers by assigning small membership values to outliers in order to make it robust. For the purpose of fault detection, the SPE index is used. Then the fault localization by contribution plots approach and SVI index are exploited.


INTRODUCTION
During the last few decades, wastewater treatment plant has become essentially a large and complex industrial factory.Increasing requirements on modeling and process monitoring efficiency are becoming more important in order to increase the potential for improved performance and reliability.Several researches concerning the wastewater treatment modeling have been identified in the scientific literature as a useful engineering tool in design, operation and control, where the first technology that was model in this field is the activated sludge process, it has been proved to be a very stable process.Today the Activated Sludge Process (ASP) is the commonly used process for treating municipal as the industrial wastewaters, where the most successful model is the activated sludge process Model No. 1 (Henze et al., 1987).This model triggered the general acceptance of the biological modeling (Tomita et al., 2002) were default stoichiometric and kinetic parameters have been proposed and proved to give realistic results.The use of efficient models in controller design is important.However, a key element of the WWTP optimization best functioning is the efficient monitoring techniques.Large number of process monitoring including fault detection and diagnosis based on statistical process control has increased, almost exponentially over the last few decades (Yoo et al., 2003;Lee et al., 2004a;Zhao et al., 2004;Lee et al., 2004b;Wang and Romagnoli, 2005;Aguado and Rosen, 2008;Moon et al., 2009;Corona et al., 2013).Among them the PCA is the most popular method, it has been successfully applied for collected data analysis from systems in the course of operation in order to supervise their behavior.Several extensions of PCA have been investigated and extended in treatment plants.In Haimi et al. (2013), authors review the publications focusing on results of PCA application in the biological WWTPs over the last 15 years.
In this study, we investigate the effectiveness of the novel method (FRPCA) compared with the PCA one, for that these two methods will applied on biological process in order to monitor and achieve the earlier fault detection to execute corrective actions before a dangerous occurs in this process.A realistic collection of data is used to validate the technique and conclusions are drawn in the end.

MATHEMATICAL FORMULATION
Traditionnel PCA: Principal Component Analysis (PCA) is a projection-based technique that facilitates a reduction in data dimension through the construction of orthogonal principal components which are linear combinations of the original variables and generally demonstrate the data more feasible in much less dimension (Runger and Alt, 1996).These principal components are ordered so that the first Principal Component variable (PC) in the linear combination has the greatest variance, the second PC is the linear combination with the next greatest variance.The remaining PCs are defined similarly.It is often the case that a small number of principal components is sufficient to account for most of the structure in the data.The transformation is defined by: where, p j is a m×1 component (loading vectors), they are defined here as being orthonormal and so they become the eigenvectors of the data covariance matrix X T X. t j is the n×1 score vector corresponding to j th variable.m is the number of principal components retained and E is the residual error.A new sample vector X can be decomposed into two parts: where, is the residual matrix.For more information, the reader is referred to Jackson (1991), Jolliffe (1986) and Wold et al. (1987).

Fuzzy robust principal component analysis: Fuzzy
Robust Principal Component Analysis algorithms are designed to solve two majors problems faced by the PCA approach, first the problem of the real applications where the data come in the on line way, the second one is the sensitivity to outliers (Xu and Yuille, 1995).The robust rules developed in Xu and Yuille (1995) make FRPCA techniques essential when better accuracies are yielded when the algorithms are used.The FRPCA algorithms used here were introduced in Yang and Wang (1999), the derived nonlinear case was proposed in Luukka (2009).The algorithms proposed by Yang and Wang (1999) in their paper are based on Xu and Yuille algorithms (Xu and Yuille, 1995).The ability of these algorithms lay in the way they deal with outliers removal in a given data (Luukka, 2009).Yang and Wang (1999) defined a fuzzy objective function which includes Xu and Yuilles's as crisp special cases.We present a brief description of the theory by Xu and Yuille, also the modification proposed by Yang and Wang. Xu and Yuille (Yang and Wang, 1999) proposed an optimization function with an energy measure e (x i ) subject to the membership set u i ∈ {0, 1} given as: where, X = {x 1 , x 2 , …, x i } is the data set, U = {u i |i = 1, …, n} is the membership set and η is the threshold.The variable u i serves to decide whether x i is an outlier or a sample.When u i = 1, the portion of energy contributed by the sample x i is taken into consideration; otherwise x i is considered as an outlier (Moon et al., 2009).The goal is to minimize E (U, w) with respect to u i and w.Since u i is the binary variable and, w is the continuous variable, the optimization with gradient descent approach is hard to solve using gradient descent.To overcome the problem, the minimization problem was transformed to maximization of Gibbs distribution with the use of a partition function.The new problem thus looks like below: where Z is the partition function ensuring ∑ , 1 .The measure could be one of the following functions: The gradient descent rules for minimizing ∑ and ∑ are: where is the learning rate, y = w T x i , u = yw and v = w T u.Oja presented a nonlinear PCA (Oja, 1995), where, and where y = x i w and g can be chosen as nonlinear function.In this case the weight updating would be: where, The fuzzy variant of the objective function Eq. ( 1) is proposed by Yang and Wang, they adopt fuzzy memberships by altering the membership set u i , with a factor called the fuzziness variable denoted by m in the next equation below, thus the objective function was stated as: Subject to u i ∈ 0, 1 and m∈ 0, 1 .Now u i being the membership of x i belonging to data cluster and (1- u i ) is the membership of x i belonging to noise cluster.M is the so called fuzziness variable.In this case, e (x i ) measures the error between x i and the class center.This idea is similar to the C-means algorithm (Oja, 1982).Since u i is now a continuous variable the difficulty of a mixture of discrete and continuous optimization can be avoided and the gradient descent approach can be used.The derivatives of ( 7) with respect to both u i and w were found: Setting the derivative to zero gives the solution as: Using this result in the objective function and simplifying, we obtain: The gradient with respect to w is: where, And m is the fuzziness variable.If m = 1 the fuzzy membership reduces to the hard membership and can be determined by following rule: Now η is a hard threshold in this situation.There is no general rule for the setting of m, but most papers set m = 2.In Yang and Wang (1999), authors derived the three following algorithm for the optimization procedure:

FRPCA1 algorithm:
Step 1: Initially set the iteration count t = 1, iteration bound T, learning coefficient 0, 1 soft threshold η to a small positive value and randomly initialize the weight w.
Step 2: While t is less than T, perform the next steps 3 to 9.
Step 5: Step 6: Update the weight: Step 7: Update the temporary count Step 8: Add 1 to i.
Step 9: Compute and add 1 to t.

FRPCA2 algorithm:
The same as FRPCA1 except steps 6-7: Step 6: Update the weight: Step 7: Update the temporary count: FRPCA3 algorithm: Follow the same as FRPCA1 except steps 6-7: Step 6: Update the weight: Step 7: Update the temporary count: As we can remark, the three algorithms are slightly different but we have applied the new Nonlinear FRPCA3 proposed in Luukka (2009), where the change is in the way of updating the weight and count.

New nonlinear FRPCA3 algorithm:
The same as FRPCA3 except steps 6-7: Step 6: Compute g (y), F = (g (y)), e 3 (x i ) = x i -w old g (y).Update the weight: Step 7: Update the temporary count: The weight w in the updating rules converges to the principal component vector almost surely (Oja, 1982;Oja and Karhunen, 1985).

FAULT DETECTION AND IDENTIFICATION
Fault detection: Once a FRPCA model is built, it is necessary to have a criterion to judge whether this model is valid for control.The multivariate statistical process monitoring makes use of these criterions.The index, which is called the Squared Prediction Error (SPE), also known as Q, is used here, for data tests FRPCA based model.The SPE is given by: where, x j (k) is an FRPCA input and is the prediction of x j (k) from the FRPCA model.With a control limit δ 2 as: where,  is the confidence limit and: where, 1 and λ i is the i th eigenvalue of the covariance matrix.The control limit is calculated from reference data.If the SPE is above its control limit, the system is considered in a faulty state, this is explained by the change of correlation structure of the process variables.
Fault identification: When a fault is detected, it is necessary to identify the variable which is at issue; this is name the fault localization.Several methods for located the variable in defects were proposed in literature, in this study we are interesting by the techniques of contribution plots and reconstruction fault.

Contribution plots:
The classical approach used in fault identification by PCA is based on the calculated contributions to the index of detection (Alcala and Qin, 2009), so the variable with a largest contribution to the detection indicator is the variable incriminating in the cause of detection index SPE.A contribution of the j th variable to the SPE-statistic is defined as: Reconstruction fault: The principle of reconstruction consists in estimating one of the variables of data vector x (k) at a given time denoted x i (k), using the PCA model and others variables.There are three different approaches that can be used for reconstructing of faulty sensor, in this study the iterative reconstruction algorithm was used to reconstruct faulty sensor.
Assumed that fault direction is known, the reconstruction of the i th variable from the selected number of principal component by iterative technique whereby the value of the faulty sensor is replaced by the predicted value is given by: where, ˆold i x can be considered as a projection of x on the PCs, it can be calculated as by: , where and 0  is a vector of matrix C which the i th column of c ii equation: is replaced by 0. The iterative process converges to the following formula: where, c ii ≠ 1, In the case of c ii = 1, the i th cannot be reconstructed by this method.
In the case of faulty sensor, we have a significant reduction in SPE before and after reconstruction.However, in some situations, reducing the SPE may affect all entries, which makes the faulty sensor unidentifiable.A Sensor Validity Index (SVI) was introduced.Assuming that only one sensor fault occur in the system process (Dunia et al., 1996), this index determine the status of each sensor.It can be defined as: where, * is a reconstructed vector.
Apparently, ranges between (0, 1), because the . When is close to 0, it indicates that the i th sensor is faulty.On the other hand, when is close to 1, it means that the sensor variations is consistent with others ones.

SIMULATED ACTIVATED SLUDGE MODEL
In this section, an Activated Sludge Process (ASP) for nitrogen removal is presented.This process is well described in Lopez-Arenas et al. (2004).The basic design of an ASP is shown in Fig. 1.In its basic configuration, the activated sludge process consists of two reactors and a settler.To enhance the nitrogen removal, anoxic/anaerobic processes operate alternately.During the aerobic phase ammonia nitrogen is converted into nitrite by Nitrosommas and subsequently into nitrate by Nitrobacters, in the anoxic phase the produced nitrate is converted into harmless nitrogen gas.As illustrated in Fig. 1, before it enters the aeration reactor, raw wastewater Q in is passed by the anoxic zone, afterward the influent flow Q out is fed into a settler to separate the stream into the clean water and sludge, the major part of it is recycled to reactor Q r and a small part is wasted Q w , The actual process model is based on the Activated Model Sludge No. 1 (ASM1) by Kim et al. (2011).It was adopted with two modifications (Lopez-Arenas et al., 2004):  The nitrification is modeled by a two step process (the conversion of nitrite to nitrate by the nitrosoma bacteria and the conversion of nitrite to nitrate by the nitrobacters). The hydrolysis of rapidly biodegradable substrate is included.
Then the resulting biodegradation model consists of 18 state variables (particles and soluble concentrations) and 30 model parameters.However it is possible to reduce the model, such model is proposed by Gomez-Quintero et al. (2000).
This   , , where, And six exogenous inputs represent the influents concentrations and flow rates: The model needs of only five reaction rates given by: And it has twelve model parameters:

RESULTS AND DISCUSSION
Simulated activated sludge process: In our case study, a formulated process monitoring system to the problem of faulty sensor consists on estimating the measurement vector Y given as: where, u 1 = Q in , u 2 = Q r , u 3 = q air : aeration rate which affect the process through the oxygen mass transfer coefficient k La .Now, the algorithms for process monitoring, including fault detection and identification are applying to the process.The SPE for data in faulty state is given in Fig. 2 to 5 indicates clearly that the fault sensor is sx = 3 since it's the sensor validity index drops below a certain threshold δ.

Case study:
The WWTP of Annaba city: In this case study, the wastewater treatment plant of Annaba city (situated in the North-East of Algeria) has two principal stages.
The primary stage: Which includes bar racks, grit chamber, de-oiling and sand filters whose objective is the removal of solids.
The secondary stage: Whose objective is the biological treatment of the organic load and which is essential to remove the organic matter present in the incoming wastewater.The organic matter serve as food for the micro-organisms culture as it grows.Biological treatment is classified by the type of micro-organisms that are used in the removal of the organic pollutants and is either aerobic, anaerobic or both.
Interest is normally focused on secondary biological treatment and the WWTP considered in this study involves extended aeration and it is well  instrumented.A schematic of the plant is shown in Fig. 6.The actual data is collected in a period of three months of the year 2011.A data matrix X has been formed of N = 180 observations which represent a normal operation of the process.In particular, the data of such a matrix are centered and scaled using the means and standard deviations of reserved data to the model.For the monitoring model one selected a vector of measurements construct of 12 variables mentioned above.Figure 7 represents the statistical SPE of the data set during normal operation process to a confidence level for 95%.After the construction of multivariate statistical monitoring model from normal data and in order to illustrate the fault diagnosis performance, an offset fault affecting the x 7 , (x 7 ≡ , oxygen at the output of WWTP) is simulated, from the sample k = 45 as a 10% of the maximum amplitude of variation of the variable x 7 .Figure 8 shows the statistical SPE.We note that the statistical SPE indicated in Fig. 9 immediately allows the detection of the fault.To identify the faulty sensor, it was exploited the contributions approach to detection index SPE.

CONCLUSION
In many industries, it is important to determine when significant adverse process changes occur.The idea is to discover these changes while they are still relatively minor, before substandard product or significant pollution is produced.This study discusses the use of robust process control method for the purpose of monitoring wastewater data.It has been previously shown that traditional statistical process control methods are sensitive to noise.The present study investigates the use of robust multivariate statistical process monitoring for biological nutriment plant.It has been shown that the FRPCA is efficient multivariate statistical process monitoring tool.Although the good performance of this method, it can be improved for the better in the future, especially to cover a wide range of normal operating conditions due to changing influent water parameters and organic load.

ACKNOWLEDGMENT
The work reported in this study was supported by the Minister for higher education and research, within the research project under the grant J0201120100088 related to the advanced control and monitoring of biological wastewater treatment plants.

Fig. 1 :
Fig. 1: Schematic of a typical wastewater treatment plant

Fig. 2 :
Fig. 2: Time evolution of the selected variables