Neuropathology Classifier Based on Higher Order Spectra

Epilepsy is the most common neuropathology. Statistical studies related to the disease reported that 20% 25% of epileptic patients with occurrence of seizures were even under treatment with drugs. This article presents a strategy for improved detection of the neuropathology, based on electroencephalogram (EEG), using a classifier built with support vector machines (SVC). The SVC is designed based on feature extraction of higher order spectra of time series derived from the EEG applied to epileptic patients and control patients. As demonstrated in the study presented, the EEG time series are highly nonlinear and non-Gaussian, therefore, exhibit higher order spectra, which are extracted features that improve the accuracy in the performance of SVC. The results of this study suggest the development of highly accurate computational tools for the diagnosis of this dreaded neuropathology.


Introduction
Electroencephalography is the neurophysiologic measurement of brain electrical activity by recording obtained from electrodes placed on the skull.The set of traces obtained is known as electroencephalogram (EEG) and represents an electrical signal (postsynaptic biopotentials) of a large number of neurons, or brain activity; the spatial distribution in the location of the electrodes forms a map of points separated 10% or 20% of the total area under registration, the selective distribution in the location of the electrodes is known as an international system of electrode placement 10 -20 [1,2].The EEG is primarily used in monitoring and diagnosis of brain diseases such as epilepsy, syncope, sleep disorders, in some cases declaring dementia, coma and brain death (in some jurisdictions are used as legal evidence of brain death).The EEG has the great advantage of being a noninvasive diagnostic method and painless, hence the great importance of its implementation and analysis.From the viewpoint of signal processing, EEG analysis for the processing of time series, a topic extensively developed in the area of stochastic modeling in the identification of signals and systems and the pattern recognition.The aim of this paper is to develop a classifier built with support vector machines, based on the processing of features extracted from EEG signals (EEG) using higher order spectral statistics.

EEG Analysis
The issue of dynamic interpretation of the EEG has been the subject of much discussion among researchers; in recent times it has focused on two different models.The first approach considers the EEG time series as linear stochastic processes, i.e., the EEG signals are analyzed using linear techniques such as parametric spectral models, e.g., ARMA models or non-parametric Fourier transforms or wavelets [3].In [4] is verified Gaussian behavior at short length records of parkinsonian patients.The other approach is based on nonlinear dynamics, considering the EEG as a deterministic but chaotic signal as some records show a tendency towards a characteristic 1/f (f means frequency) that cannot be described by a linear analysis.The EEG processing and analysis must resolve a fundamental question: should the signal be analyzed from a deterministic or stochastic point of view?and the first case, as outlined in [5], to establish whether the signal is deterministic chaotic nature or chaotic.and autocorrelation function, due to, among other reasons, loss of information of phase.This severe limitation can be avoided by using Higher Order Statistics (HOS) [6,7].HOS are defined as the moments and cumulants of order higher than the second (the moments of order 1 and 2 are the mean and variance, respectively).

Definition and Properties of HOS
The HOS are defined in terms of its moments and cumulants, and their spectra; the most interesting are the third and fourth order and their Fourier transforms, termed bispectrum and triespectrum, respectively.As discussed below, higher order moments are natural generalizations of the autocorrelation sequence, while the so-called cumulants C ix (•) are nonlinear combinations of themselves as show in the Equations ( 1)-( 4).
, , where the superscript "asterisk" denotes the signal complex conjugate.In the case of zero delay, the cumulants are named as follow: C 2x (0) is the variance ( 2x  ); C 3X (0, 0) and C 4x (0, 0, 0) are usually identified as  3x and  4x and their normalized values, γ 3x /( 3 2 x  ) and γ 4x /( 4 2 x  ) are skewness and kurtosis respectively.The latter normalized quantities exhibit the property of shift and scale invariance; the shift invariance indicates that it should be defined with mean value zero.
The bispectrum S 3x (f 1 , f 2 ) and triespectrum S 4x (f 1 , f 2 , f 3 ) and their cumulants are related to the Fourier transform of the respective signal by (Equations ( 5) and ( 6)): , , , , and equivalently in the case of bispectrum S 3x (f 1 , f 2 ) from the spectral response of the signal X(f): evaluated in the "non-redundant zone" bounded by the Nyquist frequency f N , i.e., for frequencies Another useful statistic for the analysis of linearity and Gaussian time series is the bicoherence.In case of three signals x, y, z, bicoherence is called cross bicoherence, bic xyz (f 1 , f 2 ) and in the case of a single signal, which is the most interest in this application is represented as bic xxx (•):

Linearity and Gaussian Test
A statistical test for linearity and/or Gaussian signals check is Hinich algorithm [6]; this algorithm is based on detecting the condition of obliquity (skewness different from zero).Basically, it is based on the fact that for a Gaussian process, the cumulants of order greater than two are zero, and consequently so are the bispectrum and therefore the bicoherence.Then, there is the null hypothesis of non-Gaussian if it is determined that the bispectrum is nonzero; on other hand, if besides the bicoherence, is not constant we must conclude that the process is nonlinear.A toolbox of free use (HOSA [6]), developed under the Matlab mathematical software implements the Hinich algorithm (routine "glstat") making consistent and unbiased estimates of the bicoherence from Equation (9).

Methodology and Analysis of Results
This section describes the methodology that was developed for EEG signal processing, beginning with removal techniques and/or reduction of artifacts, measurement and determination of the nonlinearity and nongaussian, or at least, the choice of segments that exhibit this feature, in order to be processed with nonlinear analysis techniques (Fourier spectral response of higher-order spectra), finally extraction of features to train a classifier that allows discrimination between healthy patients or patients suffering from a neuropathology (specifically, epilepsy, in the case of this paper).EEG signals to be processed correspond to encephalographic records from a database used by Guillén et al. [3] of 20 EEG, 10 healthy patients and 10 epileptic patients, in a preliminary work which used the technique of signal analysis using symbolic dynamics techniques for classification.Each EEG signal is organized as a matrix of 21 columns with 15,000 samples, which were captured at a sampling frequency of 256 Hz, corresponding to segments of time series of approximately 60 seconds long, on average.Given the EEG signals from a control and an epileptic patient, we proceed to develop the following methodology: 1) Using EEGLAB [8,9] were loaded onto the work-space of mathematical software used, the above signals were bandpass filtered between 0.1 and 80 Hz, with an additional filtering of 60 Hz noise line rejection.Figure 1 shows the set of signals identified with the notation by electrode in a time segment of 5 seconds between instants 22 and 27, the vertical scale was calibrated at 37 and the frequency pair where it occurred.For each segment, it was associated a vector of features of two components: bicoherence and maximum channel power, calculated as the variance of the signal.In this way, we built a matrix of 200 rows and 2 columns (200 vectors of 2 components each one); 80% of those vectors were separated randomly to build the training set and the remaining 20% as validation set.On average, the vectors of epileptic patients showed features of high power, with the lowest bicoherence; the latter, it was associated a lower quadratic phase coupling [6] in the EEG of these patients, compared to control (short term frequency in the denominator of Equation ( 9)).
Figure 2 shows the spectral response corresponding to the EEG channel record with higher energy under analysis (channel O1), note the 10 Hz spectral component, that exhibits a maximum peak power (≈+7.12dB), which is an indicative of brain activity  corticothalamic of the patient; as well as the strong rejection of common mode line noise of approximately −44 dB, below the highest energy components.
4) Support vector machines based classifier was implemented (SVC [11][12][13]) and was trained and validated, with the matrices listed above.In the developed application in this paper, the best performance, as a function kernel, was exhibited by radial basis function after testing different kernel functions tuning their parameters.
The best tuned parameters were  = 0.001, C = 100 and  = 0.1 for the designed SVC, achieving a success in the validation process more than 92%. Figure 4 shows the graphical output of SVC [14].
Figure 3 is the quantile-quantile statistical graph (QQ Plot) [10] for Gaussian test of the previously selected channel signal.This graph shows how the line is separated from the linear reference index of normal (Gaussian).Statistical evaluation, using Kolmogorov-Smirnov test [10] confirms the nongaussian EEG signal of a patient neurologically healthy.
In the EEG of epileptic patients, it was observed higher energy concentrated in all patterns of each channel (low frequency components are above +25 dB, against little more than 10 dB in patterns of healthy patients).

Conclusion
It was verified that the highly nonlinear behavior and nongaussian of the time series are derived from EEG records and, therefore the asserted choice of use, as features, of the higher-order statistical parameters of these nongaussian and nonlinear processes, in order to detect neurophysiological patterns.The implemented SVC showed satisfactory performance as classifier, which indicates a promising future application of the SVC in the area of modeling of biochemical and electrophysiological processes.Results presented in this article attest to the 2) For each of the studied records it was identified the channel that exhibited highest power [9] and from this time series, was extracted, randomly, 10 nonoverlapping segments of 2000 samples in length, thus conforming a database composed of 200 segments of 2000 samples each one.
3) The nonlinearity and non-Gaussian test were applied to each segment using previously described methods and algorithms, verifying such a condition.In each experiment was measured the maximum bicoherence,   development of new computational tools for high accuracy in the diagnosis of terrible neuropathology epilepsy.

Figure 1 .
Figure 1.EEG waveforms of a control patient.

Figure 3 .
Figure 2. Spectral response of a control patient