Synergy Interval Partial Least Square ( siPLS ) with Potentiometric Titration Multivariate Calibration for the Simultaneous Determination of Amino Acids in Mixtures

The simultaneous determination of amino acids in solution is important for food and nutrient industries. This study investigated the feasibility of potentiometric titration with synergy interval Partial Least Square (siPLS) for the simultaneous determination of glycine, glutamic acid and phenylalanine in aqueous solution. The methods used were compared with the traditional Partial Least Square (PLS) that based on full pH range. The performance was evaluated by the Root Mean Square Error of Cross-Validation (RMSECV), the Root Mean Square Error of Prediction (RMSEP) and the correlation coefficient (R). By optimizing the pH region with siPLS, a good linear model was produced for the calibration set with correlation coefficient R of 0.9947, 0.9956 and 0.9913 for glycine, glutamic acid and phenylalanine, respectively. A single set of synergy mixtures was tested independently and good results were obtained. The results show that the siPLS method can locate the informative region by using a graphically-oriented interface which is more easily to use and interpret. The study proves the feasibility of potential titration with chemometrics in the simultaneous determination of amino acid mixtures without preliminary separation.


INTRODUCTION
The simultaneous determination of different amino acids in solution is especially important for food and biopharmaceutical manufacturing as well as other related industries (Wakayama et al., 2010).Until now, HPLC based method is the most popular methodology for analyzing amino acid components.A wide range of pre-or post-column fluorescent labeling reagents such as fluorescamine, ninhydrin, 7-fluoro-4-nitrobenz-2, 1, 3-oxodiazole and dansyl chloride etc are developed for the HPLC analysis of different amino acids by using fluorescence detector (Zhao et al., 2013;Redruello et al., 2013;Chen et al., 2005).In addition to the HPLC method, gas chromatography, capillary electrophoresis and liquid chromatography combined with mass spectrometry are also developed for the determination of amino acids (Mudiam et al., 2012;Mohabbat and Drew, 2008;Smith, 1997).These methods are quite well suited for the simultaneous determination of amino acids in complex system, especially when unknown substances are presented in the mixture.However, there are some cases that the amino acids in mixtures are with the same type but in different concentrations, such as some amino acids nutrient solutions (Ohtani et al., 2006;Cerdán et al., 2013) and amino acid mixtures that extracted from protein hydrolysates (Liebster et al., 1961;Moore and Stein, 1949) in food industry.For these samples, a simple and efficient way other than the expensive and time consuming HPLC methods, for the simultaneous determination of different amino acids will be beneficial.
The potentiometric titration method is always used for determining concentration of single amino acid in solution, but can hardly be used for the simultaneous determination of several amino acids in the mixture (Michalowski et al., 2005;Ni and Kokot, 2008).Nowadays, the application of chemometrics in food industries has been an interesting area in the resolution of multi-component mixtures offers the advantages of fast, minimizing preliminary separation steps and eliminating the use of chemical reagents (Munck et al., 1998;Christensen et al., 2006).For potentiometric titration, the application of chemometrics was first introduced by Lindberg and Kowalski (1988) for the simultaneous analysis of acids with Partial Least Squares (PLS) regression.After that, several papers have been published using PLS calibration method to acid-base titration (Ni, 1998;Shamsipur et al., 2002), potentiometric precipitation titration (Ni and Wu, 1999) and complexometric titration (Zhang et al., 2005).Recently, many new algorithms such as Artificial Neural Network (ANN) and orthogonal signal correction have been applied to process the potentiometric titration of different acid mixtures (Song et al., 1993;Aktaş and Yaşar, 2004;Ghorbani et al., 2006).For example, the orthogonal signal correction has been applied to the simultaneous conduct metric titration of mixtures of acetic acid, monochloroacetic acid and trichloroacetic acid (Ghorbani et al., 2006).However, all these multivariate calibration methods in titration used the whole information pH region to build a calibration model and very little attention has been given to the optimized variable selection in regression.
It has been demonstrated that the performance of optimal variable selection before regression in multivariate calibration can improve the accuracy and robustness of the model (Li et al., 2014;Du et al., 2004;Jiang et al., 2002).Many algorithms that used for the optimal variable selection have been developed and applied, such as moving windows based methods (Du et al., 2004;Jiang et al., 2002;Fang et al., 2009), genetic algorithm (Goicoechea and Olivieri, 2003), synergy interval PLS (siPLS) (Norgaard et al., 2000;Leardi and Nørgaard, 2004) and CLoVA (Hemmateenejad et al., 2013;Hemmateenejad and Karimi, 2011) Very recently, the genetic algorithm and Competitive Adaptive Reweighted Sampling (CARS) method were used to select the effective wavelengths in the simultaneous determination of three branchedamino acids (leucine, isoleucine and valine) by Fourier transform near-infrared spectral technique (Wei et al., 2014).In these methods, the iPLS and siPLS (Norgaard et al., 2000;Leardi and Nørgaard, 2004) have been shown as efficient tools to choose the optimal sub regions in a graphical manner which can provide an overall picture of the model performance in different subintervals.Many papers have indicated that the optimal subintervals selected by iPLS or siPLS methods could provide more precision prediction results than traditional PLS model which based on fullspectrum region.For example, the spectrophotometric methods combined with iPLS or siPLS have successfully applied for the determination of total volatile basic nitrogen content of pork (Cai et al., 2011), quality parameters of biodiesel/diesel blends (Ferrão et al., 2011), contents in vinegar (Chen et al., 2012a, b), antioxidant activity in dark soy sauce (Ouyang et al., 2012).However, to the best of our knowledge, the application of iPLS and siPLS to potentiometric titration multivariate calibration to select the optimal pH subintervals has not yet been explored (Fang et al., 2009).
The aim of this study was to explore the potential of siPLS algorithm for the optimal subintervals selection in potentiometric titration multivariate calibration and its use for the simultaneous determination of amino acids mixtures in aqueous solution.The ternary mixtures of glycine, glutamic acid and phenylalanine in aqueous solution are used as a model system.The performance was evaluated according to the Root Mean Square Error of Cross-Validation (RMSECV), the Root Mean Square Error of Prediction (RMSEP) and the correlation coefficient (R).

Materials:
All amino acids are analytical-reagent grade chemicals.Stock solutions of hydrogen chloride (0.1 M), L-glycine (0.05 M), L-glutamic acid (0.05 M) and L-phenylalanine (0.05 M) were prepared according to classical method with ultra-pure water throughout.The resistivity of ultra-pure water is ≥18.2MΩ•cm.Since no standardization procedure for potentiometric titration multivariate calibration was necessary, the prepared 0.1 mol/L NaOH solution was used both for calibration and prediction.Solutions of sodium chloride (NaCl) with 1.0 M and Hydrogen Chloride (HCl) with 0.1 M were prepared to adjust the ionic strength and initial pH of the sample solutions, respectively.

Equipments and apparatus:
The titrations were conducted in batch mode with magnetic stirrer, syringe pumps and the glass vessels, which were standard equipments.A syringe pump (Harvard apparatus) was used for the precise addition of the titrant.Measurements of pH (± 0.001) were carried out with a Metrohm pH meter by using a combined glass electrode.All experiments were performed at room temperature about 22°C.All calculations were performed on a PC with the Windows operating system, which was equipped with the Excel and Matlab programs.The iToolbox (Norgaard et al., 2000;Leardi and Nørgaard, 2004) for Matlab was used for the variable selection and multivariate models of iPLS and siPLS.
Procedure: In a typical procedure, a suitable amount of three amino acids stock solution was placed in a glass vessel.Then 1.0 mL of 1.0 M sodium chloride (NaCl) solution and 0.4 mL of 0.1 M Hydrogen Chloride (HCl) solution were added to the vessel for adjusting the ionic strength and initial pH of solution, respectively.Since the same amount of hydrogen chloride was added to all the samples in calibration and prediction sets, the effect of hydrogen chloride in the calibration was neglect able.The solution was finally diluted up to 5.0 mL with ultra pure water.The mixture was stirred and then titrated by the precise addition of sodium hydroxide with flow rate ranged between 0.15 and 0.35 mL/min.The pH meter was used to monitor the solution pH data during titration and the titrant volumes added to reach the predetermined pH values were recorded.Finally, the experimental data were processed and two matrices of titration data were obtained; the volume of titrant at each pH point (0.1 pH interval from 3.0 to 12.0) formed the first matrix V and the concentration of three amino acids formed the second matrix C.

Statistics analysis:
For the construction of chemometrics model, two common statistical parameters were chosen to evaluate the model performance in the prediction ability of glycine, glutamic acid and phenylalanine in aqueous mixtures.The first parameter is the root mean square error of cross validation (RMSECV) and Root Mean Square Error of Prediction (RMSEP) as shown in Eq. ( 1).The RMSECV or RMSEP parameter is an expression of the average error in the analysis of each amino acid in the calibration or prediction set: Another commonly used parameter is the relative error of prediction (REP) that indicates the predictive ability of established chemo metrics model for each component, as calculated from the following equation: where, c i is the reference concentration and c i,pred represents the predicted concentration of the analyte in i sample by chemometrics model and n is the number of sample used.Correlation coefficient R was calculated by equation: where, c i,ave is the mean of the reference results for all samples in calibration or prediction set.

RESULTS AND DISCUSSION
In the traditional acid-base titration system, the endpoints are always determined by visual indicators.However, for acid mixtures when the ∆pK between any two acids is less than 4 logarithmic units, the titration curve will be overlapped and it will make the determination of endpoint more difficult.For amino acids used in this study, the pK values are listed in the Table 1.As can be seen in the Table 1 that pK of all these three species are very close and far less than 4 units.In the traditional titration method, the titration curve of these amino acids will be overlapped and affected each other.Under that condition, multivariate calibration methods can be used to the pH titration for the simultaneous determination of amino acids in mixtures and offer the advantage of eliminating preliminary separation steps.

Experimental design of the calibration sets:
The multivariate calibration process requires a training data   (Ni, 1998).In Table 2, all the concentrations of the ternary mixtures in the calibration set are summarized.
Many works have pointed out that it was necessary to preprocess the raw spectrum in order to develop stable and reliable calibration models (Ni, 1998;Shamsipur et al., 2002;Ni and Wu, 1999).In this study, the simple smoother developed by Savitzky and Golay (1964) was applied.The SG Smoothing could give balance in such a way that the noise is maximally removed, while the signal features are kept intact as much as possible (Savitzky and Golay, 1964).It is known that the SG algorithm contains two parameters, the polynomial order N and the window size W. Here, we use the Root Mean Square Error (RMSE) for the calibration set by full spectrum PLS were compared as a criterion, under different polynomial order N and window size W. Finally, an optimized combination of N = 3 and W = 5 was chosen, which is a good compromise in practice.The titration curve of calibration mixtures after SG smoothing are presented in Fig. 1.From Fig. 1, it can be found that all the titration curves have a large jump in the pH range 5-8 which is corresponding to the acid group of amino acids with pK between 2 and 4. It also can be seen that the y-coordinates of titration volume V and x-coordinates of pH resemble the absorbance and wavelength in traditional spectrophotometry, respectively.From this point of view, the titration curve could be seen as the titration spectrum and any chemometrics algorithm used in spectrum can be applied to titration multivariate calibration.
siPLS model for selecting optimal regions: For the building of iPLS and siPLS models, the full pH range between pH 3.0 and 12.0 was first divided into 7 equidistant subintervals, with 11 pH points in each interval.For every subinterval, a PLS model with different numbers of latent variables was established.The RMSECV for every model was calculated as a critical value for the comparison with the whole pH region model.Figure 2 shows the RMSECV obtained by iPLS for each subinterval and latent variables for each model that represented by number in the bars.The RMSECV for the full pH region using 6 latent variables are shown by the dotted line for the purpose of comparison.It can be found that the iPLS model with interval number 6 at pH between 9.5 and 10.8 give the smallest RMSECV values with 0.6085.This result is better than 0.7936 of the full-pH region PLS model as shown in Table 3.
The iPLS method can find out the optimal subinterval, however, the different combinations of subintervals may result in models with better predictive abilities.Therefore, based on the results obtained above, variables selection by siPLS was implanted to test different combinations of intervals.The principle of siPLS method is to calculate all different combinations of two, three or four subintervals that obtained by iPLS model.The results of different combinations obtained by PLS, iPLS and siPLS were shown in Table 3.As can be seen in the table, many combinations of subintervals give better results than full region and subinterval 6.The combination of 2, 6 and 7 subintervals with 6 latent variables gives the lowest RMSECV with 0.4740 that was far better than the subinterval 6.The results demonstrate that the siPLS algorithm can avoid the loose of relevant information region which will improve the performance of calibration model.
For a full view of model comparisons, a graphic test of PLS, iPLS and siPLS model for glycine was shown in Fig. 3.The results in Table 3 and Fig. 3 clearly show that the selection of optimal pH region in potentiometric titration multivariate calibration improve the model performance.The correlation coefficient R improves from 0.9844 for PLS to 0.9947 for siPLS model.As shown in Fig. 3, the combined subintervals 2, 6 and 7 with pH region 4.3-5.6 and 9.5-12 were finally selected to construct the calibration model for glycine.It is interesting to find out that this pH Fig. 3: The graphic and statistic results of PLS, iPLS and siPLS for glycine 3-4.3 and 5.6-6.9 and 9.5-10.8 3 siPLS 0.9933 0.5213 1, 3, 5 3-4.3 and 5.6-6.9 and 8.2-9.5 3 siPLS 0.9932 0.5265 range was exactly locate in the information regions of the glycine with pK 1 of 2.43 and pK 2 of 9.60.This result further confirms that siPLS can automatically locate the informative region for the given substance in multicomponent potentiometric titration.It can also be seen that the use of a graphically-oriented interface as shown in Fig. 3 in the iPLS and siPLS toolbox make it more easy to use and interpret the obtained results.
Table 4 shows the model results of PLS, iPLS and siPLS for glutamic acid.As can be seen that the full pH region PLS calibration model result a RMSECV of 0.5866 with 3 latent variables.In addition, the selected subintervals and their combinations all give better results compare with the full pH region PLS model.However, it is found that the combination of two or three subintervals does not give better results than the single subinterval 2.
As shown in Table 4, the subinterval 2 with pH range 4.3-5.6 give the best result with a correlation coefficient R of 0.9956 and RMSECV of 0.4189.So the intervals were selected to build the final iPLS model for glutamic acid.The above results for glutamic acid seem interesting, since the addition of other subintervals does not improve the model performance as in the case of glycine.It is known that the glutamic acid molecule contains another β-COOH group with pK of 4.1 by compare with glycine or phenylalanine.The optimal pH region of 4.3-5.6 obtained by iPLS may correspond to information area of this acid group.The different pK of β-COOH group in glutamic acid molecule make it sufficient to characterize glutamic acid in the known mixture.So the addition of information area of a-COOH and a-NH 3 + that are very close to glycine and phenylalanine will in turn bring more interference and perturbation factors.This result demonstrated the needs of pH or wavelength selection in the potentiometric titration multivariate calibration to build a robust model (Fig. 4).
The siPLS was also applied for the quantification of phenylalanine.In the same way, the whole pH was divided into 7 equidistant subintervals.The correlation coefficient R and RMSECV are calculated and shown in Table 5.It can be seen that the best iPLS model using number 5 subintervals with pH region of 8.2-9.5 does not produce better results than the full pH region PLS model.On the other hand, combinations of two or three subintervals can give lower RMSECV values than the full pH region PLS model.These combinations always contain the number 6 or 7 subintervals with pH regions of 9.5-10.8or 10.8-12, respectively.Take into account that the phenylalanine contains a-NH 3 + group that with lowest pK of 9.15 in all the three amino acids.So the information area for phenylalanine may locate at high pH regions.We found that the combination of 5, 6 and 7 subintervals can give a RMSECV value of 0.6178 with only 4 LVs.Although it gives a little larger RMSECV than the combination of 3, 6 and 7 with 0.5809, the model with combination of 5, 6 and 7 contains less number of LVs.It is known that the over fitting problem may occur at large number of LVs.As pointed out by Norgaard et al. (2000) the over fitting problem is not only case for siPLS toolbox but goes for all variable selection methods.So for variable selection in potentiometric titration or spectrum multivariate calibration, carefully selection with knowledge of molecular information will be beneficial to build a robust model.Here, the siPLS models using the pH subintervals 5, 6 and 7 were developed and used for the quantification of aspartame.
Table 7 summarized the obtained values of statistical parameters including RMSEP, RSEP% and R for each amino acid in the prediction set.It can be found that all the chemometrics methods that based on factor analysis can achieve good results in resolving the overlapping potentiometric titrations curves of glycine, glutamic acid and phenylalanine in their ternary mixtures, although the solution equilibrium in the acidbase titration procedure is complex.The results also show that the using of selected variables (pH) by iPLS and siPLS method presents better predictions with lower errors in relation to the full pH region PLS model.The results clearly show the successful application of iPLS and siPLS variable selection method to potentiometric titration multivariate calibration as a pre-processing method before the traditional PLS regression.
The purpose of this study is to determine the feasibility of applying siPLS method to potentiometric titration multivariate calibration.It can be seen that the siPLS method that based a graphically-oriented interface make it more easily to use and interpret the obtained results in potentiometric titration multivariate calibration.The successful application of wavelength selection iPLS and siPLS method that based on graph interface for potentiometric titration indicated that this method may have general implications in other types of titration calibration, such as potentiometric precipitation titration (Ni and Peng, 1995) and complexometric titration (Ni and Wu, 1997).Finally, it must be pointed out that as a calibration method (Assefa et al., 2013;Xu et al., 2012), when large noise and other interference substance are presented in the mixtures which are not included in the calibration set, the method may lead to large error.

CONCLUSION
This study investigated the applicability of iPLS and siPLS algorithm for the variable selection in potentiometric titration multivariate calibration.The potentiometric titration multivariate calibration combined with variable selection by the iPLS and siPLS was tested by the simultaneous determination of glycine, glutamic acid and phenylalanine in aqueous solution.The results show that the using of selected variables by iPLS and siPLS method presents better predictions with lower errors in relation to the full pH region PLS model.The toolbox can automatically locate the informative region for the given substance by using a graphically-oriented interface, make it easily to use and interpret the obtained results.The study proves the feasibility of potentiometric titration with chemo metrics method in the simultaneous determination of amino acids in mixtures without the need for preliminary separation steps and also demonstrated the utility of siPLS in titration multivariate calibration for optimal variable selections.

Fig. 4 :
Fig. 4: Regions selected to build models with subinterval 2 at pH 4.3-5.6 and results for glutamic acid

Table 3 :
Results of PLS, iPLSand siPLS models for glycine determination

Table 5 :
Results of PLS, iPLS and siPLS models for phenylalanine determination

Table 6 :
Added and found results of the synthetic mixture of glycine (Gly), glutamic acid (Glu) and phenylalanine (Phe) by the PLS and siPLS methods Added -

Table 7 :
Statistical parameters of the optimal results by using PLS and siPLS to determine glycine (Gly), glutamic acid (Glu) and phenylalanine (Phe) in synthetic mixtures Parameters