Traffic Accidents Forecasting Based on Neural Network and Principal Component Analysis

: A number of factors may affect the occurrence of road traffic accidents and these factors may exist information overlap, which sometimes even obliterate the real traffic characteristics and the inherent laws. In order to improve the forecasting accuracy of traffic accident forecasting model, this study proposed a new traffic accidents forecasting method based on neural network and principal component analysis. Compared with other models, the results show the model baed on neural network and principal component analysis is more accuracy.


INTRODUCTION
Road traffic accidents forecast methods mainly include gray forecasting method, time series method, regression analysis and BP neural network at present (Guo-Hong, 2006;Dong-Ping, 2007;Xiang-Yong, 2004).Domestic scholars have done a great quantity of work on road traffic accidents forecast.The forecasting model for fatalities established by the Beijing Transportation Research Institute, the forecasting model for traffic accident fatalities in Tianjin established by the Research Institute of Tianjin city; the time series decomposition prediction method by Jilin University, the traffic accidents time series models by Beijing University of Technology and the traffic accidents forecasting based on neural network by Shandong University of Technology are typical (Sayed, 2000;Ren-De et al., 2008;Xiang-Yong, 2003).
Road traffic accident caused by various factors, these factors may exist the information of overlap that sometimes effaces the really characteristics and inherent law about traffic accidents.So this study will bring principal component analysis into the road traffic accident forecast, eliminate some overlap informations, combined with BP neural network to forecast the road traffic accident (the BP neural network based on PCA) and compare the predicted results with the BP neural network prediction results that wasn't conducted of principal component analysis.And draw the conclusion: the BP neural network based on PCA have been significantly improved than BP neural network in the prediction precision.And draw the conclusion: the BP neural network based on PCA have been significantly improved than BP neural network in the prediction precision.

LITRETURE REVIEW
In the road traffic accident for empirical research, in order to more fully and accurately reflect the characteristics and laws of development of the traffic accidents, we tend to consider multiple indicators of impact the traffic accidents, these indicators are also known as variable in the multivariate statistics.This produces the following problems: on the one hand in order to avoid missing important information will be considered as much as possible indicators, while on the other hand, with the increase in consideration indicators increase the complexity of the research accident, at the same time because the indexes is the reflection of the traffic accident, inevitably causes a large number of overlapping information, this information overlap sometimes effacement the really characteristics and inherent law of the traffic accident.Therefore, we hope that fewer variables involved in a traffic accident research, while get more information.The principal component analysis is a multivariate statistical methods, the study of how to through the original variable of the few linear combination to explain the original variable most information.
Correlation between accidents involving many variables, there must be a co-factors of play a dominant role, According to this, the original variable correlation matrix or the covariance matrix of the internal structure of the relationship, use of a linear combination of the original variables to form several indicators (principal components), keep the original variable in the main information under the premise of dimensionality reduction and simplify the problem effect, makes it easier to grasp the principal contradiction in the study of a traffic accident problem.Generally speaking, the principal components and the original variables using principal component analysis follows the basic relationship between: • The number of principal components is far less than the number of the original variables • The principal components retain the vast majority information of variable • Each principal component is the linear combination of the original variables • Each principal component is irrelevant Through the principal component analysis of road traffic accident influence factor, we could find some main compositions from complicated relationship between the variables, which can quantitative analyses effectively with lots of statistical data, reveal the inner relationship between variables and get on deep inspiration between traffic accident characteristics and the law of development, lead the research work further.

PRINCIPAL COMPONENT ANALYSIS OF ROAD TRAFFIC ACCIDENTS
The export of sample principal components: In the study of traffic accidents, overall covariance matrix ∑ and correlation matrix R is usually unknown, so need through the sample data to estimate.Set with m samples and each sample has n indicators, so get a total of mn data and the original data matrix as follows: In which: S is the sample covariance matrix, unbiased estimate of the overall covariance matrix ∑, R is the correlation matrix of the sample, the estimates for the overall correlation matrix.
Known from the foregoing discussion, if the original data array X is standardized processes.The covariance matrix obtained from the matrix X is the correlation matrix, namely S and R exactly the same.Because the covariance matrix solving principal component process with a correlation matrix solution based on principal component process is consistent, below we introduced only by correlation matrix R is based on principal component.Principal component y covariance for: Which ^ is a diagonal matrix: Assume that material matrix X for already after standardization of the data matrix, while the correlation matrix instead of the covariance matrix and the type and can be expressed as: uRu T = ^，use u T left multiplied type, get Ru T = u T ^，then Expand all of the above equation to get the n 2 equations, here only consider the n equation derived from the first column in the matrix multiplication: Finishing been: In order to get the top homogeneous equation of non-zero solution, according to the theory of linear equations is known, requires the coefficient matrix determinant is 0, that is: The per capita income (yuan) The total retail sales of social consumer (one hundred million yuan) The resident population (Ten thousand people ) The For λ 2 , …, λ n can get completely similar equation ,thus, the variance λ i (i = 1, 2, …n) of new variables (main component) we asked is the n roots of |R-λ 1 I| = 0, λ is the characteristic value of the correlation matrix, the respective u ij is a component of the feature vector.
R is a positive definite matrix, its characteristic roots are non-negative real numbers , arranged by size order λ 1 ≥ λ 2 ≥ … ≥ λ n ≥ 0, its corresponding feature vector referred to as γ 1 , γ 2 , … γ n , The relative to Y 1 variance for: var That is for Y 1 have maximum variance, Y 2 the second largest varianc, …, and covariance for: respectively referred to as a first, second,…, n main ingredient.By the process of seeking principal components that, The direction of the Principal component in the geometry is actually R direction of the eigenvectors; The variance contribution of main components is the corresponding characteristic value of R. In this way, the process of solving the principal component of the sample data is actually converted into eigenvalues and eigenvectors process of seeking the correlation matrix or the covariance matrix (Xue, 2004;Shu and Jian-She, 2008;Jian-Xi and De-Yan, 2011).

Index selection of principal component analysis and principal component determination:
Using the index statistics data of city A from 1995 to 2004 (Table 1) predicts the road traffic accident.Data are selected the 13 factors which affect traffic accident.Namely: GDP (X1), the per capita income (X2), the total retail sales of social consumer goods (X3) , the resident population (X4), the amount of vehicle ownership (X5), not seized vehicles (X6), total passenger traffic (X7), total freight (X8), light controlled intersections (X9), the number of traffic police (X10), urban road length (X11), road area (X12), the number of the driver (X13).
Thirteen indicators of the impact of road traffic accidents are subjected to principal component analysis by using SPSS13.0 and then put out the results as shown in Table 2-5.From Table 2 we can see that there is extremely significant relationship between GDP, the per capita income , the total retail sales of social consumer goods , the resident population, the amount of vehicle ownership, not seized vehicles, total freight, light controlled intersections and the number of the The per capita income The total retail sales of social consumer goods
The number of principal components is former m, which the extraction principle is the corresponding characteristic value greater than 1.Characteristic value to some extent can be regarded as a size of principal component influence strength index.If the character values are less than 1, show that the explanation of the principal components is smaller than the average explanation of a directly into original variable, so we can use characteristic values greater than 1 as the inclusion criteria.Through the Table 3, extraction two  principal components, m = 2. From Table 5, we can see that the load of GDP, the per capita income, the total retail sales of social consumer goods, the resident population, the amount of vehicle ownership, not seized vehicles, total freight, light controlled intersections and the number of the driver in the first principal component is higher.The first principal component basic reflects the index information.The load of urban road length, road area in the second principal component is higher, which explains that the second principal component basic reflects the information of two indexes.So we should adopt two new variable to instead of the original thirteen variable.

THE FORECAST OF ROAD TRAFFIC ACCIDENT BASED ON NEURAL NETWORK
The multilayer perceptron of BP algorithm is the most widely used neural network so far.Multilayer perceptron includes the input layer, hidden layer and output layer (Jian-Xi and De-Yan, 2011;Li-Qun, 2007;Hecht-Nielsen, 1989;Xiu, 2007).
The forecast of road traffic accident based on BP neural network: Among them: each year's gross national product (X1), the per capita income (X2), the total retail sales of social consumer goods (X3), permanent population (X4), motor vehicle ownership (X5), not inspection vehicle (X6), the passengers amount (X7), freight amount (X8), light controlled intersection (X9), the number of traffic police (X10), the length of urban roads (X11), roads area (X12), the number of drivers (X13)as the input samples.
• Accident frequency as the output sample: The formula y = (x-x min )/(x max -x min ) should be normalized processing, the sample data of normalized processing is in Table 2.

CONCLUSION
Road traffic accident caused by various factors, these factors may exist the information of overlap that sometimes effaces the really characteristics and inherent law about traffic accidents.So this study will bring principal component analysis into the road traffic accident forecast, eliminate some overlap informations, combined with BP neural network to forecast the road traffic accident (the BP neural network based on PCA) and compare the predicted results with the BP neural network prediction results that wasn't conducted of principal component analysis.And draw the conclusion: the BP neural network based on PCA have The accident forecast of BP neural network based on PCA: • The selection of sample: Also choose the data of A city from 1995 to 2002 as the training sample, the data of 2003 and 2004 as the testing sample.Select the each year's principal component value F1 and F2 as the input samples, accident frequency as the output sample.

Fig. 1 :
Fig. 1: The train diagram of BP neural network prediction accident

Fig. 3 :
Fig. 3: The train diagram of BP prediction accident numbers based on principal component analysis

Fig. 5 :
Fig. 5: The error curve of BP neural network and the BP neural network based on PCA

Table 2 :
The matrix of correlation coefficient Index -

Table 3 :
Analysis of principal component extraction of variance decomposition Initial Eigenvalues

Table 5 :
The matrix of original factor's load Component • The selection of sample: In the course of modeling, the sample should put into two parts : the training sample and the test sample, test sample is mainly to check out and test the network model, this study selects the related data of A city from1995 to 2004 (Table 1), the data of 1995-2002 as the training sample, 2003 and 2004 as the test sample.

Table 7 :
The sample data sheet after normalize Index -

Table 8 :
The sample list after reverse normalized

Table 9 :
The prediction results of BP neural network and the BP neural