A new approach for epileptic seizure detection: sample entropy based feature extraction and extreme learning machine

The electroencephalogram (EEG) signal plays a key role in the diagnosis of epilepsy. Substantial data is generated by the EEG recordings of ambulatory recording systems, and detection of epileptic activity requires a time-consuming analysis of the complete length of the EEG time series data by a neurology expert. A variety of automatic epilepsy detection systems have been developed during the last ten years. In this paper, we investigate the potential of a re-cently-proposed statistical measure parameter regarded as Sample Entropy ( SampEn ), as a method of feature extraction to the task of classifying three different kinds of EEG signals (normal, interictal and ictal) and detecting epileptic seizures. It is known that the value of the SampEn falls suddenly during an epileptic seizure and this fact is utilized in the proposed diagnosis system. Two different kinds of classification models, back-propagation neural network (BPNN) and the recently-developed extreme learning machine (ELM) are tested in this study. Results show that the proposed automatic epilepsy detection sys-tem which uses sample entropy ( SampEn ) as the only input feature, together with extreme learning machine (ELM) classification model, not only achieves high classification accuracy (95.67%) but also very fast speed.


INTRODUCTION
Epilepsy, the second most common serious neurological disorder in human beings after stroke, is a chronic condition of the nervous system and it is characterized by recurrent unprovoked seizures.Approximately one in every 100 individuals worldwide are suffering from epi-lepsy [1].Electroencephalography (EEG) is an important clinical tool, monitoring, diagnosing and managing neurological disorders related to epilepsy.In comparison with other methods such as Electrocorticogram (ECOG), EEG is a clean and safe technique for monitoring the brain activity.
In spite of available dietary, drug and surgical treatment options, currently nearly one out of three epilepsy patients cannot be treated.They are completely subject to the sudden and unforeseen seizures which have a great effect on their daily life, with temporary impairments of perception, speech, motor control, memory and/or consciousness.Many new therapies are being investigated and among them the most promising are implantable devices that deliver direct electrical stimulation to affected areas of the brain.These treatments will greatly depend on robust algorithms for seizure detection to perform effectively.Because the onset of the seizures cannot be predicted in a short period, a continuous recording of the EEG is required to detect epilepsy.However, analysis by visual inspection of long recordings of EEG, in order to find traces of epilepsy, is tedious, timeconsuming and high-cost.Therefore, automated detection of epilepsy has been a goal of many researchers for a long time.With the advent of technology, the digital EEG data can be input to an automated seizure detection system, allowing physicians to treat more patients in a given time because the time taken to review the EEG data is greatly reduced by automation.
In recent years, there has been an increasing interest in the application of pattern recognition (PR) methods for automatic epileptic seizure detection.Several methods have been developed for handling EEG signals classification, and among these methods, Multi-layer Perceptron Neural Network (MLPNN) [2][3][4][5][6][7] and Support Vector Machine (SVM) [8][9][10] are two widely-used classification paradigms.Most of the automatic epileptic seizure detection system is built by time-frequency domain based feature extraction followed by a variety of classification models.It has been found that the classification performance of these automatic detection systems totally depends on the feature extraction of the EEG time series [6,11,12].
As an effective tool for detection and characterization of signals, deterministic chaos plays a key role.Many chaos-producing mechanisms have been created and applied for recognizing the behaviour of the dynamics of the system.The physiological time-series signals are considered chaotic.Recently studies on the basis of measuring entropies have been employed for biomedical studies [13].The randomness of non-linear time series data is well embodied by calculating entropies of the time series data and it can supply recognizable variation for normal and abnormal physiological signals.Entropy is a measure of uncertainty.The level of chaos can be measured by applying entropy of the system.Higher entropy stands for higher uncertainty and a more chaotic system.X. L. Li, et al. [14] investigated permutation entropy as a tool to predict the absence seizures of genetic absence epilepsy rats by applying EEG recordings.H. Ocak [15] presented a new scheme on the basis of approximate entropy and discrete wavelet transform to detect epileptic seizure from EEG time series data that was recorded from normal subjects and epileptic patients.K. S. Pravin, et al. [16] had shown some initial investigations on wavelet entropy for epileptic seizure detection.
In this study, we proposed a new method for epileptic seizure detection by using feature extraction based on sample entropy (SampEn) followed by two non-linear classification models, namely, back-propagation neural network (BPNN) and extreme learning machine (ELM) which is a recently-proposed classification model [17].The proposed scheme was tested using clinical electroencephalogram (EEG) signals obtained from five healthy subjects and five epileptic patients during both interictal and ictal periods.The results showed that the proposed scheme (SampEn + ELM) was capable of detecting epileptic seizures not only with a high accuracy but also with a very fast speed, which demonstrates its potential for real-time implementation in an automated epilepsy detection and diagnosis support systems.Up to now, to the best of our knowledge, there is no study in the literature related to the assessment of classification performance using sample entropy based feature extraction followed by ELM classification model when applied specifically to the normal/interictal/ictal discrimination problem. Figure 1 shows the schematics of the proposed diagnosis expert system.

DATA ANALYZED
In this study, a publicly-available database introduced in

Sample Entropy
Entropy is a concept handling predictability and randomness, with higher values of entropy always related to less system order and more randomness.In recent years, a variety of estimators have been proposed to quantify the entropy of time series.These methods can be roughly divided into two categories, embedding entropy and spectral entropy [19].Embedding entropy supplies information regarding how EEG time series signals change with time, by comparing each time series signal with a lagged form of itself [20].In [13], a new family of statistics called Sample Entropy (SampEn) was introduced and characterized.This measure is embedding entropy quantifying the complexity in time series data without the weaknesses that widely utilized non-linear approaches have.The SampEn is less sensitive to noise and can be applied for short-length time series data [13].Additionally, it is resistant to short strong transient interferences (outliers) such as spikes.These characteristics make Sample Entropy an appealing tool for nonlinear analysis of physiological signals.
In spite of its advantages over other non-linear estimators, the SampEn is not widely used.In [20], sample entropy was used to analyze the electroencephalogram background activity of Alzheimer's disease patients for testing the hypothesis that the regularity of their EEGs is higher than that of age-matched controls.M. Aboy [21] conducted a characterization study of SampEn for supplying additional insights about the interpretation of this complexity metric in the context of biomedical signal analysis.Moreover, this entropy measure has been used to evaluate the signal complexity of the cyclic behaviour of heart rate variability (HRV) in obstructive sleep apnea syndrome [22].In this study, SampEn is investigated for the first time as a feature extracted in the automatic detection of epilepsy.
For calculating the SampEn, the embedding dimension (m) and vector comparison distance (r) must be specified.It is common to set the embedding dimension parameter m to be m = 1, 2 or 3 and to set the vector comparison distance r to be some percentage of the standard deviation of the time series so as not to depend on the absolute amplitude of the signal [13].SampEn(m,r,N) is the negative logarithm of the conditional probability that two sequences similar for m points remain similar at the next point, where self-matches are not included in calculating the probability.Thus, a larger value often corresponds to more irregularity or complexity in the time series data.In the proposed automated epileptic seizure detection system, the value of the SampEn is determined as shown in the following steps: 1) Given N data points from a time series {x(n 2) Let r denote the noise filter level which is defined as 0.1, 0.2,..., 0.5 r g Std for g where Std represents the standard deviation of the data sequence X.
3) The distance between vectors ( ) m X i and ( ) m X j , is defined as the maximum absolute difference between their scalar components: 4) For a given ( ) Here, note that only the first vectors of length m are considered in order to ensure that for the vector

6) We increment the dimension to m + 1 and compute
, where j ranges from 1 to .We then define as Thus, represents the probability that two sequences will match for m points, whereas A r repressents the probability that two sequences will match for m + 1 points.
The sample entropy is defined by Since the time series length is finite,

An Example of the Computation Procedure of SampEn
Assume that the sequence N X is composed of 50 sampling points (i.e., N = 50).The total sequence is periodic of 5. Let us choose m=5 and r = 2, so we have: 5 ( and so on.Firstly we want to find the number of X 5 (i) which is similar to X 5 (1).Because we have chosen r = 2 as the threshold parameter, which means each of the five elements of X 5 (i) has to be within ±2 units of the corresponding element of X 5 (1).For instance, X 5 (2) is not similar to X 5 (1) because the last elements in these two sequences (51,55) differ by more than two units.The conditions of similarity to X 5 (1) are satisfied only by X 5 (6), X 5 (11), X 5 (16), X 5 (21),..., X 5 (41) (excludes X 5 (1) since j ≠ i and also excludes X 5 (46) since only the first 45 elements of the X 5 (i) are considered in terms of the definition of SampEn).Thus, we get B 1 = 8.Because the whole number of X 5 (i) is so we have Copyright © 2010 SciRes.JBiSE

(2)
. 1 44 The above steps are repeated for determining the number of X 5 (i) which are similar to X 5 (2), X 5 (3) and so on.By using the same inference, X 5 (2) is similar to X 5 (7), X 5 (12), X 5 (17),..., X 5 (42).Thus, we also get B 2 = 8.Generally, in this example, we have B i = 8 for Therefore, is 8/44 and we can get the mean value of all 45 of : In order to get SampEn(5,2,N), the above-mentioned computation procedure needs to be repeated for m = 6.Doing so, we get Finally, we compute the value of SampEn as follows: This is the smallest value of SampEn, which indicates that the original time series data is highly regular and predictable.

Levenberg-Marquardt Algorithm
Artificial neural network training is often regarded as a nonlinear least-squares problem and the Levenberg-Marquardt algorithm is a least-squares estimation algorithm utilizing the maximum neighbourhood idea, and it appears to be the fastest method for training feed-forward neural networks.Let be an objective error function composed of n individual error terms as follows: where j and dj is the desired value of output neuron j, is the actual output of the neuron.

( ) (
) The objective of the Levenberg-Marquardt algorithm is to calculate the weight vector w so that is minimized.By utilizing the LM algorithm, a novel weight vector w p+1 can be obtained from the previous weight vector w p as follows: ( ) E w where s is defined as In Equation ( 10), p J is the Jacobian of f assessed at p , γ is the Marquardt parameter, and I is the identity matrix.The Levenberg-Marquardt algorithm can be described as follows: 1) Calculate .2) Begin with a small value of γ ( w ). 3) Solve equation (11) for

Extreme Learning Machine (ELM)
The general trend in current study of automatic epileptic seizure detection has focused on high accuracy but has not considered the time taken to train the classification models, which should be an important factor of developing an EEG-based detection device for epileptic seizures because the online device will need to update its training during use.Therefore some classification models with high classification accuracy may not be satisfactory when considering the trade-off between the classification accuracy and the time for training the classification models.In our study, in addition to exploring the potential of a nonlinear feature of the EEG signal called sample entropy for electroencephalogram time series classification and epileptic seizure detection, we also investigate the use of a novel paradigm of learning machine called Extreme Learning Machine (ELM) [17], in order to obtain a balance between high classification accuracy and short training time.In recent years, Extreme Learning Machine has been increasingly popular in classification tasks due to its high generalization ability and fast learning speed.In [23], a classification system is built using ELM to classify protein sequences with ten classes of super-families obtained from a domain database, and its performance is compared with that of Back-propagation Neural Networks.
where 1 2 denotes the weight vector connecting the ith hidden neuron and the input neuron, 1 2 denotes the weight vector connecting the i-th hidden neuron and output neurons, and represents the threshold of the i-th hidden neuron. where H is the hidden layer output matrix of the SLFN.Hence for fixed arbitrary input weights i and the hidden layer bias s, training a Single-layer Feed-forward Network equals to discovering a least-squares solution is the best weights, where † H is the Moore-Penrose generalized inverse.In terms of [17], Extreme Learning Machine utilizes such Moore-Penrose inverse approach for obtaining good generalization performance with extremely fast learning speed.Unlike some conventional methods, for example Backpropagation algorithm, Extreme Learning Machine is able to avoid problems in tuning control parameters (learning epochs, learning rate, and so on) and keeping to local minima.
The procedure of ELM for single-layer feedforward networks is expressed as follows: 1) Choose arbitrary value for input weights and biases of hidden neurons.

Performance Evaluation Parameters
All the simulations were based on a 2.27 GHz 2-core CPU with 2 GB memory.In order to compare the performance of ELM classifiers, we also implemented a backpropagation neural network (BPNN) based on a Levenberg-Marquardt back-propagation (LMBP) learning algorithm which is thought of as the fastest method for training moderate-sized feed-forward neural networks according to [27].For the BPNN and ELM, all of the input values were normalized in the range of [-1,1].The performance of the BPNN and ELM algorithms was

Training and Testing: 10-Fold Cross-Validation
There are a variety of methods of how to divide the EEG dataset into training and testing datasets.To reduce the bias of training and testing data, a 10-fold cross-validation technique is used.10-fold cross-validation is a method to improve over the holdout method.This technique will be implemented during the training periods, for estimating how well the classification models that learn from the training data will operate on future data not seen during the testing period.Generally, with 10fold cross-validation, the data set is divided into 10 subsets, and the holdout approach is re-iterated 10 times.Each time, one of the 10 subsets is utilized as the testing dataset and the other 9 subsets are put together for forming a training dataset.Then the average error across all 10 trials is calculated.According to [28], the result obtained from one 10-fold cross validation may not be dependable.In order to get low mean square error and bias, the 10-fold cross-validation procedure is performed 10 times.All the simulation results were averaged over ten repetitions of 10-fold cross validation.

Experiment Results and Discussion
Although the pattern length parameter m, the threshold r and the number of sampling points of the time series data play an important role in determining the outcome of SampEn, there are no guidelines to set the values of these parameters.In essence, the accuracy and confidence of the entropy estimate improve when the number of matches of length m and m + 1 increases.The number of matches can be increased by choosing small m and large r.However, if r is too large, some fluctuations of the signal are not detected, and if r is too small, noise has effect on the SampEn measure [29].In this study, Sam-pEn values are calculated for selected combinations of m, r, and N. The values of m, r, and N that are employed in the experiments are described as follows: 1) m = 1, 2, 3; 2) r = 10%-50% of standard deviation of the EEG data sequence in increases of 10%; 3) N = 256, 512, 1024, 2048, 4096.
Values of SampEn are calculated for all normal (healthy segments), interictal (seizure free epileptogenic zone segments) and ictal (epileptic seizure segments) EEG signals, and are fed to two classification models.Using rectangular-window with different sizes, data frames with different sizes (256,512,1024,2048,4096) are formed and the values of SampEn are computed for each data frame.From these figures, it can be noted that utilizing a simple linear discriminator may not achieve good results since SampEn demonstrates clear distinction among the normal, interictal and ictal EEG signals only for several particular parameter combinations of m, r and N.For example, a simple linear discriminator would be inefficient for the SampEn values, as demonstrated in Figure 6, because a clear partial overlapping among the normal, interictal and ictal EEG signals can be seen.From the results, it can be concluded that, generally, ELM outperforms BPNN for most of the parameter combinations.
Tables 2 and 3 show the classification results with the highest accuracies of the BPNN (95.33%) and the ELM (95.67%), respectively, by two confusion matrices.In terms of the confusion matrix for BPNN, all healthy segments were classified correctly by the BPNN, 2 seizure-free epileptogenic zone segments were classified incorrectly as healthy segments, 3 seizure-free epileptogenic zone segments were classified incorrectly as epileptic seizure segments and 2 epileptic seizure segments were classified incorrectly as seizure-free epileptogenic zone segments.In terms of the confusion matrix for ELM, all healthy segments were correctly classified, 2 seizure-free epileptogenic zone segments were classified incorrectly as healthy segments, 3 seizure-free epileptogenic zone segments were classified incorrectly as epileptic seizure segments and 2 epileptic seizure segments were classified incorrectly as seizue-free epileptogenic zone segments.
The values of statistical evaluation parameters introduced in Subsection 4.1.1 are given in Table 4.As can be    seen, the BPNN discriminated healthy segments, seizure-free epileptogenic zone segments and epileptic seizure segments with the average accuracies of 97.54%, 92.91% and 95.81%, respectively.The healthy segments, seizure-free epileptogenic zone segments and epileptic seizure segments were classified with the average accuracy of 95.33%.The average accuracies of the ELM were 98.77% for healthy segments, 91.06% for seizure-free epileptogenic zone segments, and 97.26% for epileptic seizure segments.The healthy segments, seizure-free epileptogenic zone segments and epileptic seizure segments were classified with an average accuracy of 95.67%.Hence, the average accuracy of the ELM classifier is slightly higher than that of the BPNN classifier.In addition, in Table 4, we find that the learning time of the ELM classifier is 0.0250 seconds while the learning time of the BPNN classifier is 86.4807 seconds.The ELM classifier can run 3459 times faster than the BPNN classifier.Thus, in the case of real-time implementation of epilepsy diagnosis support system, ELM classifiers are more appropriate than BPNN classifiers.
In Table 5, we present a comparison in classification performance achieved by different methods.We have quoted results from our present proposed method and also from recently reported in [30] and [31].The datasets Copyright © 2010 SciRes.JBiSE used in these experiments are the same.It is shown in the table that the result obtained from our approach is the best presented for this dataset, indicating an improvement ment from 0.84% to 9.77% from other approaches proposed in the literature.

CONCLUSIONS
This study presents an attempt to develop a generalpurpose EEG epilepsy detection scheme that can be used for classifying different kinds of EEG time series signals.
Diagnosing epilepsy is not an easy task, which needs acquisition of patients' EEG recording and collecting additional clinical information.The proposed system employed a recently-proposed statistical parameter referred to as Sample entropy (SampEn), together with extreme learning machine (ELM) which is a recentlydeveloped classification model, to classify subjects as normal subject, patients not having an epileptic seizure or patients having an epileptic seizure.This supplies a valuable diagnostic decision support tool for physicians treating potential epilepsy.Experimental results show that the proposed scheme achieves an excellent performance with not only the accuracy as high as 95.67% but also with very fast learning speed (0.0250 seconds), which demonstrates its potential for real-time implementation in an epilepsy diagnosis support system.

Figure 1 .
Figure 1.Schematics of the proposed diagnostic expert system.

Figure 3 .
Figure 3. Sample EEG recordings.(a) Normal EEG; (b) Interictal EEG; (c) Ictal EEG.seizure activity (ictal periods) using depth electrodes placed within the epileptogenic zone of the brain.All Figure 2 describes the electrode placement for recording of EEG signals.A summary of the data set is given in Table 1. Figure 3 describes example of EEG signals of each of the three data sets.


increment γ by a factor of 10 and go to step 3.

Figure 4
Figure4shows the structure of ELM.

Figure 4 .
Figure 4.The structure of ELM.

Figure 5
demonstrates the sample plots of the SampEn having clear distinction among normal, interictal and ictal EEG signals.The values of SampEn demonstrated in Figure 5(a) and Figure 5(b) are computed with N = 2048 and 1024, respectively.From Fig- ure 5, one can see that the value of SampEn is small for ictal EEG signals (between 0.5 and 1.5) compared to normal EEG signals (larger than 1.5).The value of SampEn of interictal EEG signals (less than 0.5) is smaller than that of ictal EEG signals.Figure 6 demonstrates the sample plots of the values of SampEn according to N = 1024 and 512 which have a partial overlap among normal, interictal and ictal EEG signals.From these figures, we find that the capability of the SampEn for classifying normal, interictal and ictal EEG signals totally depends on the parameter values of m, r, and N.

Figures 7 -
12 demonstrate the whole classification accuracy achieved by neural network and extreme learning machine by employing SampEn as the input feature.It can be observed from Figures 7-12 that BPNN shows good average accuracy in the range of 94.25%-95.33%,only for several combinations of m, r, and N (for example, m = 2, r = 0.3*Std, N = 2048; m = 3, r = 0.1*Std, N = 1024 and m = 2, r = 0.2*Std, N = 2048).The BPNN achieves the best average accuracy of 95.33% with m = 2, r = 0.2*standard deviation of the time series and N = 2048.For ELM, high average classification accuracy in the range of 94.97%-95.67%are obtained for some combinations of m, r, and N (for example, m = 2, r = 0.1*Std, N = 1024; m = 2, r = 0.2*Std, N = 2048 and m = 3, r = 0.1*Std, N = 1024).The ELM obtains the best average accuracy of 95.67% with m = 3, = 0.1*standard deviation of the time series and N = r Copyright © 2010 SciRes.

Figure 5 .
Figure 5. Sample figures of SampEn showing clear discrimination among normal, interictal and ictal EEG signals

Figure 8 .
Figure 8.Average classification accuracy achieved by ELM with m = 1.

Figure 9 .
Figure 9. Average classification accuracy achieved by BPNN with m = 2. 1024.The average accuracies achieved by ELM for other parameter combinations range from 91.19%-94.83%,which are also acceptable for clinical diagnosis.From the results, it can be concluded that, generally, ELM outperforms BPNN for most of the parameter combinations.Tables2 and 3show the classification results with the highest accuracies of the BPNN (95.33%) and the ELM (95.67%), respectively, by two confusion matrices.In terms of the confusion matrix for BPNN, all healthy segments were classified correctly by the BPNN, 2 seizure-free epileptogenic zone segments were classified

Figure 10 .
Figure 10.Average classification accuracy achieved by ELM with m=2.

Figure 12 .
Figure 12.Average classification accuracy achieved by ELM with m = 3.

Table 1 .
Summary of the clinical data.

Table 4 .
Performance comparison of ELM and BPNN.

Table 5 .
Comparison of classification accuracy obtained by our approach for the detection of epileptic seizures compared to the classification accuracy obtained by other researchers.