Research on Heuristic Feature Extraction and Classification of EEG Signal Based on BCI Data Set

In this study, an EEG signal classification framework was proposed. The framework contained three feature extraction methods refer to optimization strategy. Firstly, we selected optimal electrodes based on the single electrode classification performance and combined all the optimal electrodes' data as the feature. Then, we discussed the contribution of each time span of EEG signals for each electrode and joined all the optimal time spans' data together to be used for classifying. In addition, we further selected useful information from original data based on genetic algorithm. Finally, the performances were evaluated by Bayes and SVM classifiers on BCI 2003 Competition data set Ia. And the accuracy of genetic algorithm has reached 91.81%. The experimental results show that our methods offer the better performance for reliable classification of the EEG signal.


INTRODUCTION
The state of mind of a person is supported by the brain activity.EEG is one of the brain imaging and recording techniques that can be used to investigate human brain's activity.Recently, EEG based Brain Computer Interface (BCI) has been an area of significant research activity with a variety of techniques being used to recognize and interpret brain events as a form of interface to a computer or other device, rather than for medical diagnosis or neuroscience research.Such a technique will open up new ways of controlling robots or making robots behave more like human beings.
BCI technology originated from the United States.Many researchers had been realized the function that using EEG to control external devices.For instance and rew Schwartz team in Pittsburgh University alleged that the monkey trained can feed itself to eat zucchini with the mechanical arm controlled by BCI system (Santucci et al., 2005).In the 90s, Niels Birbaurmer and others analyzed the brain signal of the paralyzed and enabled them to move the computer cursor (Just Short of Telepathy, 2003).As a team leader, Hunter Peckham with his member classified the patients' thinking up and down through studying beta waves extracted from limbs patients' EEG signal and thereby restored part of the hand movement function (Cane and Alan, 2005).Recently, researchers from Zhejiang University finished with the experiment that using monkey brain to control manipulator.This result synchronized with the advanced level of the international BCI field.Its significance was that the nerve signals produced by five fingers movement were precise.A few days ago, the Chinese University of Hong Kong had successfully developed a BCI system, which could translate brain waves to traditional Chinese characters.It enabled the patient who was paralyzed and unable to have the opportunity to communicate with the world.
The effective application of BCI technology is based on the accurate classification of EEG signal.Taking the BCI competition 2003 data set Ia for example, the winner (Mensh et al., 2004) achieved 88.7% using gamma band power combned with SCPs.Shiliang and Changshui (2005) improved the classification performance to 90.44% by combing SCP with the spectral centroid (Shiliang and Changshui, 2005).In the same year, Wang et al. increased the classification accuracy by 1.07% than Sun and Zhang using SCPs and beta band specific energy as feature vectors (Baojun et al., 2005).In addition, Wu et al. (2008) proposed a novel method based on WPD in 2008 and obtained 90.8% accuracy rate by selecting the energy of special sub-bands and corresponding coefficients of WPD as features.
In this study, we improved the classification accuracy by three kinds of methods.The first one was optimal electrode recombination, the second was optimal time series recombination and the last one was based on genetic algorithm combined with Bayes and SVM classifier.The experimental results showed that potentials (Wu et al., 2008) the methods can enhance or optimize the classification accuracy.
The data set description: The data set used in the experiment is the BCI competition 2003 data set Ia (Mensh et al., 2004).Six healthy subjects (evenly divided between male and female) participated in the experiment.The subjects' age was between 22 and 35 years old.The signals acquired were their Slow Cortical Potentials (SCPs).The task of the subjects was to move a cursor up and down through imagination.They take central parietal region electrode called CZ as the reference electrode to collect corresponding EEG signals from 6 recording electrodes and set the sampling rate 256 Hz.According to international 10-20 standard, the distribution of the electrodes in the scalp surface was shown in Fig. 1 as follows.The acquisition process included three stages: rest stage (1s), prompting imagination stage (1.5s) and feedback stage (3.5s).In the prompting imagination stage, it appeared a cursor instruction that was up or down in the center of the screen.The cursor didn't disappear until the end of the feedback stage.The data used for analyzing in the experiment was the Slow Cortical Potential (SCP) recorded in the feedback stage.Defining the average voltage of the 2 mastoid electrode (A1, A2) within the last 0.5s of the prompting imagine stage as the cortex negative potential, then the voltage amplitude of the reference electrode CZ became positive.The cortex negative potential was also slow cortex potential and it related to brain activities when people were in the state of alert, expectations or preparation.

The framework of the EEG signal classification:
Study found that different motor imagery activated different brain regions.For example, Leonardo found that when subjects imaged that his fingers touched his thumb, the main movement area was activated (Cicinelli et al., 2006;Leonardo et al., 1995;Gerardin et al., 2000;Lotze et al., 1999).Researches also proposed that the motor imagery of fingers, toes and tongue activated the specific body area of the main movement area (Nair et al., 2003).We supposed that signals collected from different electrodes would represent the state of different brain regions.In view of this, we proposed optimal electrodes recombination strategy.Given the EEG signal was timelocked, we inferred that the whole time span of every electrode contains two aspects of information components (positive information components related to stimulus and negative information components).We also supposed that the positive information components were conducive to classification, while the negative information components reduced the accuracy of classification and discrimination.Therefore, in order to improve the accuracy, optimal time spans recombination was used by reducing the negative information components.It was known that the two methods mentioned above were artificial selection.Maybe there was negative information in the optimal part or there was positive one in the non-optimal part.It was appropriate to select features automatic based on genetic algorithm.In order to clearly describe the three heuristic feature extraction and classification methods, we proposed the framework of the EEG signal classification as shown in Fig. 2. The details of the three methods and the corresponding experiments were described in the next two sections.

CLASSIFICATION ENHANCEMENT METHODS
In this section, we introduce three classification enhancement methods respectively.• Firstly, we define 500 ms as the unit of each time span.As the EEG signals lasting 3500 ms, there are 7 time sub-spans for each electrode.We choose N ij as an initial signal feature to specifically investigate the contribution of each sub-time series extracted from an EEG signal.When the time sub-span EEG classification result is more than 70%, we put this time span into optimal time span area (S 1 ).In contrary, we put it into non-optimal time span area (S 2 ).• Secondly, our model joins m (1≤m≤6*7) EEG spans from optimal time sub-span area (S 1 ) and the new EEG signals combination (X) is produced.• Finally, we choose 2 classifiers (SVM and Bayes) to classify EEG signal features based on X.At last, the encoding methods, genetic operator and self-fitness function are determined as follows: • Encoding methods: We set the initial feature vector X = [x 1 , x 2 … x D ] T , each component represents a feature, D is the dimension of X.We choose binary vector to represent an individual: 1 2 [ , ,..., ], {0,1}, 1, 2,..., Each data in S is a sub-time sequence, when this sub-time sequence is chosen, we set s i = 1, otherwise we set s i = 0

EXPERIMENTS
According to the three kinds of feature extraction methods, we analyze the EEG signal classification results in this section.The part I is based on the optimal electrodes recombination.And the part II discusses the performance of the optimal time spans recombination.At last, we compare the genetic algorithm with the first two methods in part III.In this study, Bayed and SVM classifiers are used in our experiment.Bayes classifier is a Naive Bayes classifier created by a NaiveBayes class object in MATLAB.We also make use of SVM classifier with the Libsvm toolbox provided by Zhiren Lin, TaiWan and the kernel function we chose is sigmoid.
Evaluate the EEG signal classification method based on optimal electrodes combination: According to the first step of algorithm 1, 6 electrodes are divided into optimal electrodes and non-optimal electrodes.It is shown in the Table 1.
According to the above conclusion, we design the experiment to compare the optimal electrodes combination classification with non-optimal electrodes combination classification.Firstly, we respectively use optimal electrodes combination and non-optimal Then, we choose SVM and Bayes to classify.Finally, we compare these EEG classification results based on optimal electrodes combination and non-optimal electrodes combination.The results are shown in Table 2. Firstly, regardless of which classification method is chosen, we can find that the EEG classification results based on optimal electrodes combination are better than all electrodes as EEG signal feature and data from non-optimal electrodes give a much lower performance.Secondly, no matter which electrodes combination is used as EEG signal feature, the Bayes classification accuracy rates are relatively higher.Therefore, the Bayes classifier has a good classification result in this BCI classification based on electrode combination.
Evaluation the EEG signal classification method based on optimal time span combination: We divide the each single electrode into 7 time sub-spans in order to improve the EEG signal classification based on the time spans combination.According to the first step of algorithm 2, we can get the optimal time span electrode names in Table 3 and non-optimal time span electrode names are surplus electrodes.We get the above conclusion by marking each time span of every electrode according to its single classification results.Such as A1 has optimal time spans in 500-1000, 1000-1500 and 1500-2000 ms and A2 is an optimal electrode in 500-1000, 1000-1500, 1500-2000 and 2000-2500 ms, respectively.
According to the EEG classification performance, on one hand, comparing with the results of the EEG signal classification based on optimal electrodes combination, we find that the EEG signal classification based on optimal time spans performance is better.The  4.
Evaluation the EEG signal classification method based on genetic algorithm: According to the algorithm 3, we propose the EEG signal classification method based on the genetic algorithm using Bayes and SVM as classifier.In the process, the results of EEG signal classification are taken as the fitness function value.Comparing with the two methods above, we can safely draw a conclusion that the feature extract by achieving the process of the automation selection by genetic algorithm is better to represent the content of the brain electrical signals.The results based on genetic algorithm are shown as Table 5.The accuracy of genetic algorithm has reached 91.81%.

CONCLUSION
In this study, three Heuristic methods are proposed to improve the classification accuracy of the EEG signal collected by a BCI system in 2003.Though the result is indeed increased, the speed of the computing is a bit slow.So in the future, we will combine parallel computing to improve the speed of recombination.Moreover, we will use the classification methods in some new data sets to validate the performance of the algorithm.Meanwhile, we will compare with a variety of feature extraction methods by using the methods in public EEG data sets.

Fig. 1 :
Fig.1: The distribution diagram of the six electrodes in scalp potentials(Wu et al., 2008)

Fig. 2 :
Fig. 2: The framework of the EEG signal classification

Method 1 :
Fig. 3: The diagram of EEG classification based on optimal electrodes recombination

Fig. 5 :Algorithm 3 :•
Fig. 5: EEG signal feature extraction scheme based on genetic algorithm Method 3: EEG classification based on genetic algorithm: Though the methods mentioned above can give us a heuristic information search, they are not selfadaptive algorithms.So we propose the EEG signal classification method based on the genetic algorithm in this section.We select the optimal time-point and recombine them and then evaluate the results by calculating the self-fitness function.Finally, we can get the ideal result.Figure 5 shows the EEG signal feature extraction scheme based on genetic algorithm.The details of the genetic algorithm are described in Algorithm 3. Algorithm 3: Genetic algorithm optimization EEG signals feature extraction and selection scheme is as follows: • Importing the EEG data in MATLAB and getting the data for training and testing • Initializing the algorithm parameters, including population size (popsize), generation number (generation), length of individual chromosome (chromlength), crossover Probability (P c ), mutation Probability in the earlier stage (P m1 ) and Mutation Probability in the later stage (P m2 ) Population size: Popsize = 100 • Generation number: Generation = 50 • Crossover probability: P c = 0.6 • Mutation probability: P m1 = 0.08, P m2 = 0.001 • Self-fitness function: The process of the classification of the EEG signal recombined is used as the fitness function and then the Bayes classifier is selected; the results of EEG signal classification are regarded as the fitness function value and the fitness function value can reflect the different classification performance

Table 1 :
The difference of optimal electrodes and non-optimal

Table 4 :
EEG classification performance based on time span selection that the EEG features based on time spans combination relate to the task or own the advantage of EEG signal classification.Hence, selecting EEG based on time spans can greatly improve the performance of classification through reduction and recombination of EEG.On the other hand, Bayes classifier has good classification results in the algorithm 1; similarly, Bayes classifier has good and stable classification result in the algorithm 2. The results are shown as Table