Measuring Linear and Nonlinear Associations

In this study, we propose a new approach based on two nonparametric techniques to capture the linear and nonlinear associations. The singular spectrum analysis technique, which is a powerful method for filtering noisy series, is used as a noise reduction method and mutual information is considered for measuring the level of association. The performance of the proposed approach is assessed using the simulated and real time series.


INTRODUCTION
Various measures have been considered for measuring the degree of associaton.The most known measure is the coefficient of linear correlation, but its application, for example, requires a pure linear relationship.This statistics may not be helpful in determining serial dependence if there is some kind of nonlinearity in the data (Granger and Lin, 1994).
It has been shown that a measure based on the mutual information, which captures linear and nonlinear dependencies, without requiring the specification of any kind of assumptions, is better than the linear correlation coefficient to measure association and serial correlation of noisy time series, particulary financial series (Hassani et al., 2009a).
It is clear that the existence of a significant noise level reduces the level of accuracy for measuring association between two series.For example, consider a noisy time series y t = s t + g t (t = 1,…, T) where s t is a deterministic part and--its stochastic part.Usually the second part is considered as noise.
In this study, we mainly consider two different approaches to obtain the level of association.In the first approach, we capture the the level of association directly from noisy series.Thus, we do not consider the existence of the noise level in the first approach.However, in the second approach we first filter the noisy time series, in order to reduce the noise level, and then calculate the measures.It is obvious that the obtained results by the second approach are more effective than the first one if we select a proper method for filtering the series.Here, we provide necessary theoretical background and concisely describe the singular spectrum analysis technique and mutual information, respectively.Using these techniques, we propose an association test for measuring the linear and nonlinear association between two series.The test is based on the filtering approach.

Singular spectrum analysis (SSA):
In recent years SSA as a powerful technique of time series analysis has been developed and applied to many practical problems (Hassani, 2007;Hassani, 2009;Hassani et al., 2009a-c;Hassani and Zhigljavsky, 2009e;Ghodsi et al., 2009;Hassani and Thomako, 2010;Mahmoudvand and Zokaei, 2011).A thorough description of the theoretical and practical foundations of the SSA technique (with several examples) can be found in (Golyandina et al., 2001).
It should be noted that despite the fact that a lot of probabilistic and statistical elements are employed in the SSA-based technique but the technique does not make any statistical assumption concerning either signal or noise while performing the analysis and investigating the properties of the algorithms (Hassani, 2007).This matter can be considered as one of the advantages of the technique against other classical methods which usually rely on some restricted assumptions.
The SSA technique consists of two complementary stages: decomposition and reconstruction and both of which include two separate steps.The original time series is decomposed into a number of additive time series, each of which can be easily identified as being part of the modulated signal, or as being part of the random noise.This is followed by a reconstruction of the original series.Here, we mainly follow Hassani (2007).
Consider the real-valued non-zero time series Where X j = (y j ,…, y L+j-1 ) T .We then consider X as multivariate data with L characteristics and K = T -L + 1 observations.The columns X j = (y j ,…, y L+j-1 ) T of X, considered as vectors, lie in an L-dimensional spac R L .Define the matrix XX T Singular value decomposition (SVD) of XX T provides us with the collections of L eigenvalues 8 1 $ 8 2 $ ... 8 L $ 0 and the corresponding eigenvectors U 1 ,…, U L where U i is the normalized eigenvector corresponding to the eigenvalue 8 (i = 1,…, L).
A group of r (with 1# r < L) eigenvectors determine an r -dimensional hyperplane in the L-dimensional space R L of vectors X j .If we choose the first r eigenvectors U 1 ,…,U r , then the squared L 2 -distance between this projection and X is equal to .According to the Basic SSA algorithm, the L-dimensional data is projected onto this r-dimensional subspace and the subsequent averaging over the diagonals allows us to obtain an approximation to the original series.
Mutual information: The mutual information, which is based on the concept of entropy, is a useful technique to capture the relation, either linear or nonlinear, between two series.Below, we provide a brief introduction and mainly follow Hassani et al. (2009a).The mutual information of two continuous random variables X and Y can be defined as: (2) I X Y P x y P x y Where p(x, y) is the joint probability distribution function of X and Y, and p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively.In the discrete case, we replace the integral by a definite double summation.Intuitively, mutual information measures the information that X and Y share: it measures how much knowing one of these variables reduces our uncertainty about the other.Mutual information can be expressed as: Where H(X) and H(Y) are the marginal entropies, H(X*Y) and H(Y*X) are the conditional entropies, and H(X, Y) is the joint entropy of X and Y.
Since H(X) $ H(X*Y), we have l (X;Y) $ 0; assuming equality iff X and Y are statistically independent.Therefore, the mutual information between the vectors of random variables X and Y can be considered as a measure of dependence between these variables, or better yet, the statistical correlation of X and Y.The statistics defined in Eq. ( 3) satisfies some of the desirable properties of a good measure of dependence (Hassani et al., 2009a).
The mutual information defined in Eq. (3) takes a value between 0 and infinity, which makes the comparisons difficult between different samples.In this context, the following equality has been used by Granger and Lin (1994) as a standard measure for the mutual information: (4) Note that 8 captures the overall dependence, both linear and nonlinear, between X and Y.This measure varies between 0 and 1 being thus directly comparable to the linear correlation coefficient, D, based on the relationship between the measures of information theory and variance analysis.Note that here we are not interested in the direction of dependency.Thus, both linear and nonlinear measures are compariable.Our proposed approach consists of two separate stages, in the first stage we filter the noisy series using the SSA technique and in the second stage we apply Eq. ( 4) to measure the level of association.

EMPIRICAL RESULTS
Below, we shall consider two types of time series; real and artificially generated time series, financial data and a He!non map, respectively.We consider the values of 8 before and after noise reduction.Let us first consider the performance of the proposed approach on the chaotic time was such as He!non map (He!non, 1976): (5) with usual parameter values: A = 1.4 and B = 0.3.Here, we follow the same approach considered in Hassani et al. (2009c).Therefore, we have generated 3000 observations.Furthermore, we have considered different noise levels to create noise series.Table 1 shows the values of 8 before and after filtering for different noise levels (F 1 : the smallest noise level and F 4 the largest).Note that the values of 8 before adding noise is 0.601, respectively.Therefore, the performance of proposed approach is promising if the captured values are close to 0.601.We also consider Autoregressive Moving Average (ARMA) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) as linear and nonlinear filtering methods.Thsese models are sonsidered as our benchmark models.We have considered GARCH model as it works well for nonlinear situation and also ARIMA as it is a well known linear model.Therefore, thsese benchmark models covre both linear and nonlinear situations.
As appears from Table 1, different noise levels give different values of 8. Table 1 confirm that that the values of 8 obtained from our proposed approach are more robust than the other methods that are considered here.Furthermore, the results also show that the values of 8 after filtering by ARMA, GARCH and proposed method are more accurate than the values of the noisy series.Similar results has been obtained by Hassani et al. (2009a).
Let us now consider the performance of the proposed approach for capturing association between two real series.Here, we mainly follow the approach considered by Menezes et al. (2011).The real data set used in our empirical analysis consists of 7 daily stock price series representing the G7 countries: US, Canada, Japan, UK, Germany, France and Italy.The data are the relative price indexes for these markets and cover the period from January, 1st 2000 to January, 1st 2010.All series are flatter than the Gaussian distribution and slightly skewed.The series also show some levels of non-stationarity.
The results in Table 2 indicate that there are different levels of relationships among theses countries.The results also show that there are reasonable associations within the EU and North-American countries (the similar results has been concluded by Menezes et al. (2011)).However, if we consider these results for original series (not for the first differences) we observe very strong long-run relationships between all markets.

CONCLUSION
In this study, we proposed a new approach based on two nonparametric techniques to capture the linear and nonlinear associations.The proposed test was based on the filtering approach.The performance of the proposed approach using simulation and the real data sets indicated the capability of the proposed test in measuring association between two time series with different structures and features.Expanding this idea to the multivariate cases would be a new development in this area.

Table 1 :
The values of 8 for different noise levels

Table 2 :
The values of 8 for the G7 countries in the first differences