A novel approach for detection of deception using Smoothed Pseudo Wigner-Ville Distribution ( SPWVD )

For many years, the uncertainty of lie-detection systems has been one of the concerns of defense related agencies. Clearly the results of these systems must be generalized by a high value of accuracy to be acceptable by judicial systems. In this paper, a new method based on P300-based component has been proposed for lie-detection. In this regard, the test protocol is designed based on Odd-ball paradigm concealed information recognition. This test was done on 32 people and their brain signals were acquired. After preprocessing, the classic features are extracted from each single trial. After that, time-frequency (TF) transformation is applied on the sweeps and TF features are produced thereupon. Then, the best combinational feature vector is selected in order to improve classifier accuracy. Finally, Guilty and Innocent persons are classified by KNN and MLP. We found that combination of Time-Frequency and Classic features have better ability to achieve higher amount of accuracy. The obtained results show that the proposed method can detect deception by the accuracy of 89.73% which is better than other previously reported methods.


INTRODUCTION
The detection of deception has a long history.The first proposed technology was the polygraph, which recorded autonomic arousal and was used in the determination of guilt or innocence [1].A researchable hypothesis is that by looking at brain function more directly, it might be possible to understand and ultimately detect deception [2,3].Based on this hypothesis a number of neurophysiological signals have recently been investigated for the possible application to deception detection, including Functional Magnetic Resonance Imaging (fMRI) [4][5][6][7][8][9][10] and event related potentials (ERP) [2,11,12].ERP based techniques have been more widely studied and have achieved more satisfying results [2,11].Moreover, in [2] Farwell introduced another measure of participants' responses to test items: the necessary time to classify each test item as a probe or irrelevant phrase reaction time (RT).Farwell approved that the response time to probe in guilty subjects is longer than innocent subjects while the response time to irrelevant stimuli are the same for both innocent and guilty subjects.Brain signal processing is one of the most common methods in detection of deception, first considered in the 1980s [13].EEG background activities have been used in a few studies on detection of deception such as [14], however analyzing Event Related Potentials (ERPs), especially P300 wave form, has been becoming more popular in Guilty Knowledge Test (GKT) [15,16].It has been shown that the P300 waveform is elicited in response to oddball paradigm; this paradigm is a sequence of usual stimuli in which some meaningful rare-unusual stimuli are presented [15].P300 is a positive-going wave with a scalp amplitude distribution in which it is largest parietally (at Pz) and smallest frontally (Fz), taking intermediate values centrally (Cz).(Fz, Cz, and Pz are scalp sites along the midline of the head).Its peak has a typical latency of 300 -1000 ms from stimulus onset.The size or amplitude of P300 at a given recording site is inversely proportional to the rareness of presentation; in practice, probabilities < 0.3 are typically used.The meaningfulness of the stimulus is also extremely influential in determining P300 size [17].Also, GKT is a method of Polygraphy that is used in detection of concealed knowledge of the guilty persons.It is supposed that only guilty persons are aware of the detail information about the crime.Representation of these details in an oddball paradigm to guilty subjects will elicit P300 component in their EEG signals [19].GKT method has three types of stimuli [18]: 1) Probe (P): This is related to crime that only guilty persons are familiar with this stimulus whereas innocents are not.
2) Irrelevant (I): This has not any relations with crime, so is unknown by all subjects.
3) Target (T): This is unrelated to crime but is known by all subjects.
The number of irrelevant stimuli is many times greater than the numbers of the other two types; and therefore probes and targets are rare stimuli.The T stimuli force the subject to pay attention to items, because failure in responding to these stimuli suggests that the subject is not cooperating [17].Also, the T stimuli are rare and task relevant and thus evoke a P300 component that has been used in subsequent analysis of the probes as a typical P300 of the subject [2], although this assumption that the T-P300 is a classical rendition of standard P300 has been shown to be sometimes wrong [19].The basic assumption in the P300-based GKT is that, if the subject has guilty knowledge of the probe stimuli, the infrequent nature of these items will cause them to elicit a P300 component like that for the T stimuli.However, if the person has no knowledge of the probe items, they will be perceived as belonging to the irrelevant stimulus set and thus elicit an ERP with only small or no P300 component.There are conventionally two approaches in the analysis of signals and detection of deception in P300-based GKT.In the first-used by Rosenfeld et al.-the amplitude of P300 response in P and I items are compared [19].In guilty subjects, one expects P > I while in innocents P is another I and so no P-I difference is expected.Based on this theory, the Bootstrapped amplitude difference (BAD) method has been introduced and used by Rosenfeld.The second approach, introduced by Farwell and Donchin [2], is based on the expectation that in a guilty person, the P and T stimuli should evoke similar P300 responses, whereas in an innocent subject, P responses will look more like I responses.Thus, in this method the cross correlation of P and T waveforms is compared with that of P and I.In guilty subjects, the P-T correlation is expected to exceed the P-I correlation and the opposite is expected in innocents.However in many previous studies, GKT paradigm has been used in order to detection of concealed object recognition, it has been shown that the P300 component is also sensitive to concealed information recognition [13,20].A simple method in ERP extraction is calculating the average of ERP-trials.In this method, the behavior of EEG background activity is assumed to be similar to the noisy signals and can be omitted by averaging, whereas ERP is a deterministic signal and remains in this process.One of the disadvantages of averaging method is to use a large number of single-trials to reduce the noises [21].Because of difficulties of long-term EEG recording, it is reasonable to find some methods that are based on single-trials.In previous studies pattern recognition systems based on frequency, wavelet and time domain analysis were introduced for P300 detection, in a P300-based GKT method [17,22].Also, another method is introduced for ERP assessment in a P300-based GKT based on some wavelet features and a statistical classifier in [17].
In this study a new combinational feature vector which is obtained from Wigner Ville transform is introduced, then the principal component analysis (PCA) is applied to the optimal feature vector and finally two classifiers are used to discriminate between guilty and innocent subjects in a concealed information recognition test.Thus, a new algorithm has been presented for lie detection using ERPs sweep.In this algorithm, after preprocessing of EEG signal, and probe sweeps extraction, first some suitable features are extracted from each single trial.These features such as morphological and frequency features (classic features) will be those ones which contain suitable information about the inspected phenomena.In the next stage, time-frequency transformation is done on the sweeps and some other features are produced.The best features are selected in order to achieve the best combinational feature vector which is optimal.Then, PCA is applied to the feature space to reduce the features dimension.Finally, Multilayer perceptron (MLP) neural network and K-Nearest Neighbor (KNN) are used to classify guilty subjects and those of innocents.In or movie stars and also from the most famous landmarks (such as most famous monuments, city icons, places of power and worship, sacred mountains and etc.).Slides of meaningless words were used as irrelevant.Probe for guilty subjects were taken from a meaningful word.In the case of innocent subjects, the word slide of probes had not been seen by them and they didn't have any information about those words.
ing, the aim of our research is to find a new combinational feature space from classic (morphological and frequency features) and Wigner-Ville features in order to detect Innocent and Guilty subjects, and also to compare its performance with the obtained results using the wavelet features and previously reported methods.

PARTICIPANTS
Two push buttons were given to each hand of the subjects and all subjects were asked to press one of them whenever they want to say "Yes, I know", and press another one when they want to say "No, I don't know".Innocent subjects replied honestly to all stimuli, but guilty subjects replied honestly only to target and irrele vant ones.They replied falsely to probe stimulus.
Thirty two Iranian subjects participated in the experiments that were generally undergraduate or postgraduate students and all had normal or corrected vision participated in this study.They did not have any neurological disease.Participants were naive to the experimental design.The mean age was 25 years.Data from three sub jects were discarded because of too many artifacts or machine failure.
Figure 2 shows different examples of slides shown to Guilty and Innocent subjects.As can be seen, these slides include different meaningful and meaningless words.Meaningless words are produced using some randomly selected characters.Whereas, name of famous places or know persons have been chosen as meaningful words.

DATA ACQUISITION
Thirty-two subjects participated in this study.They were randomly divided into two groups (16 guilty cases and 16 innocent ones).The implementation methodology used in this study is the same as method which was employed in the Ref [23].
Figure 3 shows a one-second of EEG signal for both a guilty subject (a) and an innocent (b).It is reminded that each observation is obtained by average of single-trials of all 20 iterations for each subject.As it is seen in Figure 2(a), there is no P300 component in the probe stimulus of the innocent subject (It is not built) and is similar to the irreverent stimulus.That is because of the fact that the probe stimulus of the innocent subject is obtained by showing a meaningless word.However, the probe stimulus of the guilty subject which is shown in Figure 2(b) contains P300 component and is similar to the Target stimulus because of the guilty given answer ("I don't know") to the "slide of a meaningful word"(i.e., her/his name ).
EEG data was recorded by using Ag/AgCl electrodes which were placed at midline sites of the head (Fz, Cz and Pz); But only the results of Pz channel would be analyzed.The vertical and horizontal EOG were recorded.EEG was recorded continuously and digitized with a sampling rate of 256 Hz.All signals were filtered in the range of 0.5 -30 Hz by a zero phase digital filter.
Procedure: Before experiments, seven different slides presenting seven different words were prepared for each subject.These slides were presented randomly and each one lasted 1500 ms with the 40 iterations on the computer screen with an inter stimulus interval of 1000 ms.Among these seven slides, one was probe stimulus, the other was target and the rest were irrelevant stimuli.It is necessary that subjects (both Guilty and Innocent) recognize the target stimuli well, thus it should be a word (name) related to one of subjects family members or from well-known people such as sport stars, politicians

MATERIAL AND METHODS
Based on the previous studies, in this research, two types of Morphological and Frequency features as Classic features have been proposed.These features have shown good performances in similar studies and hence, were  believed to be useful as well for this application.Then, time-frequency transform features have been suggested that those are extracted through the Wigner Ville transform because it provides better time frequency resolution than nonparametric linear methods (i.e., wavelet and short-time Fourier transform), an independent control of time and frequency filtering, and power estimates with lower variance than parametric methods when rapid changes occur.

Pre Processing
After the filtering of signals, each continuous record was separated to single sweeps according to the known times of stimulus presentation.The length of each sweep is 1000 ms which contains 256 samples of signal.EOG data were checked for blink artifact by visual inspection and sweeps with blink artifacts were removed.
It should be noted that, in all description of P300 follow, only the results at site Pz was noted, since Pz is the site where P300 is usually reported to be maximal and therefore the analytic procedure (below) were performed on Pz data only.

Classic Features Analyses
These features are morphological and frequency features will be those which contain suitable information about the inspected phenomena.

Morphological Features
First group of features contains 17 morphologic features.
These features were previously used by Kalatzis et al. in discriminating depressive patients from healthy controls using the P600 component of ERP signal [24].These features are defined and calculated as follows [17] (due to the characteristics and typical delay of cognitive components of ERP, the time interval was confined between 400 ms and 800 ms after stimulus): 1) Amplitude (s max ): the maximum signal value 2) Absolute amplitude (|s max |).
3) Latency time: the time where the maximum signal value appears where s(t) is the ERP single trial during 400 -800 ms after stimulus and s max is the maximum signal value in this time interval.4) Latency/amplitude ratio ( max max S t s ).
where τ is the sampling interval of the signal, n the number of samples of the digital signal, and s(t) the signal value of the sample.
12) Peak-to-peak (pp): where S max and S min are the maximum and the minimum signal values, respectively: where s  = 1 if s(t) = 0, 0 otherwise .16) Zero crossings density (ZCD, d ZC )-zero crossings per time unit, in peak-to-peak time window: where n ZC are the zero crossings and t pp is the peakto-peak time window.17) Slope sign alterations (n sa ): the number of slope sign alterations of two adjacent points of the ERP signal: where τ is the sampling interval of the signal (τ = 3.9 ms, for the sampling rate of 256 Hz).

Frequency Features
The second group of defined features is three frequency characteristics of the signals.These features are mode frequency, median frequency and mean frequency, [17,[25][26][27][28] which are described and calculated as follows: 1) Mode frequency: f mode is the frequency with the most energy content in the signal spectrum, so the maximum amplitude in the power spectrum density of the signal is at this frequency: S f  S is the power spectral density of signal and f is frequency.
2) Median frequency: Median frequency (f median ) separates the power spectrum into two equal energy areas and is calculated from the following equation: 3) Mean frequency: Mean frequency (f mean ) represents the centroid of the spectrum and is calculated from the weighted averaging of the frequencies in the power spectral density of signal:

Time-Frequency Domain Analysis
Time-frequency (TF) analysis transforms time-domain signals into the so called time-frequency distribution (TFD), which can be interpreted as joint (simultaneous) distributions of signal energy in the time and frequency domains, without the scaling effects of wavelet transforms [29].
An approach to analyze non stationary EEG signal, is time-frequency (TF) methods.This can be divided into three main categories: nonparametric linear TF methods based on linear filtering, including the short-time Fourier transform [30][31][32] and the wavelet transform [33][34][35][36][37][38][39], nonparametric quadratic TF representations, including the Wigner-Ville distribution [40][41][42] and its filtered versions, and parametric time-varying methods based on autoregressive models [43][44][45][46][47][48][49][50] with time-varying coefficients.In this paper the Smoothed Pseudo Wigner-Ville distribution (SPWVD) is preferred since it provides better time frequency resolution than nonparametric linear methods, an independent control of time and frequency filtering, and power estimates with lower variance than parametric methods when rapid changes occur [51].The main drawback of the SPWVD is the presence of cross terms, which should be suppressed by the time and frequency filtering.The SPWVD of the discrete signal x(n) is defined by .
where n and m are the discrete time and frequency indexes, respectively, h(k) is the frequency smoothing symmetric normed window of length 2N − 1, g(p) is the time smoothing symmetric normed window of length 2M − 1 and r x (n, k) is the instantaneous autocorrelation function, defined as Figure 4 shows the result of applying Wigner Ville transform to the probe signal.

TF Features Extraction
Each probe sweep is divided into 10 segments of equal length; each segment is approximately 100 ms in time domain.Then, the average energy of each segment was computed using our previously reported method in [52].The features are: FLT: Frequency of latency time (i.e., the frequency of the time where the maximum signal value appears).
MAX w: maximum amount of energy in each window.
MIN w: minimum amount of energy in each window.DIF w: difference between maximum and minimum amount of energy between windows.

STD w: standard deviation between energy of time windows.
The obtained signal in TF domain is also divided into four frequency segments according to the brain wave frequency as below: E Delta : the complex of energy signal in Delta frequency band (0.5 -4) Hz, divided by length of band (3.5).
E Theta : the complex of energy signal in Theta frequency band (4 -8) Hz, divided by length of band ( 4).
E Alpha : the complex of energy signal in Alpha frequency band (8 -13) Hz, divided by length of band ( 5).
E Beta : the complex of energy signal in Beta frequency band (13 -30) Hz, divided by length of band (17).
F Delta : the average of energy signal in Delta frequency band (0.5 -4) Hz.
F Theta : the average of energy signal in Theta frequency band (4 -8) Hz.
F Alpha : the average of energy signal in Alpha frequency band (8 -13) Hz.
F Beta : the average of energy signal in Beta frequency band (13 -30) Hz.
Also, we have defined the first order derivative as a feature to show the difference between adjacent windows.This derivative is the difference between the average energy in subsequent windows, so the first order deriva- tive feature is computed as below where W n : average energy in window of n W (n-1) : average energy in window of (n -1) The result of features survey in time span of 100 ms illustrate that in guilty person the features changes from one window to next window is much more prominent so we define the first order derivative .
The defined features (34 features) were extracted from all probe sweeps of all subjects.

Feature Selection
In any classification task, there is a possibility that some of the extracted features might be redundant.These features can increase the cost and running time of the system, and decrease its generalization performance.In this way, the selection of the best discriminative features plays an important role when constructing classifiers.To identify the best features (for classification) in feature space, searching selection method is applied.In such a way, first the classification has been applied separately to each feature.The best feature has been selected in accordance with the most value of classification accuracy.This feature will be combined with the other individual

CLASSIFICATION
features and thus the best pair combination will be produced.This process would be counting to achieve the best combination of features.
To discriminate between the probe stimuli of the guilty and innocent person, the Multilayer perceptron (MLP) neural network and k-Nearest Neighbor (KNN) classifier have been used.The extracted Features from the probe sweep of the guilty person were compared with the innocent person.For each subject, the probe sweeps of other subjects with their real labels (G-probe/I-probe) were used as training data, and then the trained classifier applied on the probe sweeps of the given subject.

Principal Components Analysis
In order to reduce the dimensionality of input features, and to select the discriminating features, and to have better classification performance, and reduce the learning time, the principal components analysis (PCA) is used.The goal of the use of PCA, is finding a transform matrix to maximize the between class distribution and minimize the interclass distribution [53][54][55][56][57].The obtained features from feature selection method (26 features) and without feature selection (34 features) were reduced to 21 and 25 features, respectively, by using PCA technique.

Multilayer Perceptron Neural Network
The classifier using a three-layer MLP with error back propagation algorithm and variable learning rate.The input layer has a number of nodes equal to the input vector length.The output layer consists of one node, accounting for a possibility of only 2 classes to be classified.Also, all the possible combinations of the selected numbers of neurons in the hidden layer were selected and trained and finally the optimized number was equal to 5. The output nodes had linear transfer functions, and the hidden layer used a sigmoid function.Network training continued until the mean square error became less than 0.01 or the number of training iterations reached to 1000.Due to the limited input data set, Leave One Out crossvalidation method was done for training [58,59].At each stage one of observations was selected as test data and 31 as train data, and this process repeated 32 times.Network error in each step was computed, and finally the average was calculated.One advantage of this approach is that all the input data set are present in both processes (train and test).

k-Nearest Neighbor
k-Nearest neighbor (KNN) algorithm is one of the most effective non-parametric methods in pattern recognition [60].The k-NN algorithm is a method for classifying objects based on their distance to the training examples in the feature space.The algorithm is independent from statistical distribution of training examples.There are several distance measures that might be used in this algorithm.However Euclidean distance is commonly preferred as the distance measure.An object is classified by the majority vote of its neighbors, and the object is assigned to the class most common among its k nearest neighbors.The number k is usually chosen small.If k = 1, then the object is simply assigned to the class of its nearest neighbor.The selected feature set is then used to determine the best value of k for the classifier.Therefore, different numbers of nearest neighbors (k = 1, 3, 5, 7, 9, 11, 13) are tested in the k-NN classifier to obtain the best performance for the classifier [61].Performances of all classifiers are calculated based on their accuracy.The maximum performance is provided by a 7-nearest neighbor classifier.

RESULT
In this research, many types of features have been investigated to classify the Guilty and Innocent subjects.In this regard, after extraction of features, has been tried to find the best combinational feature vector through the feature selection method.Also, to improve the performance of the classifier and reduce the execution time, PCA method is applied and finally Guilty and innocent subjects are classified using a classifier which has the best accuracy in comparison with others.Mean values of the accuracy of classification for both MLP neural network and KNN classifier are given in   Considering the obtained results, it's clear that the optimal combinational feature vector has more ability to distinguish between Innocent and Guilty subjects.Table 2 shows the results of our proposed method and ones reported in [18].As it is seen, the deception detection accuracy has been improved from 86% to 89%.It has been investigated that because of the presence of independent Time-Frequency controllers and also simultaneous usage of Time and Frequency information, extracted features from Wigner-Ville transform are better than wavelet based analyses.

DISCUSSION
P300-based GKT, as a new method for the psychophysiological detection of concealed information, was tested on innocent and guilty subjects who were concealing information regarding a mock crime committed as a part of the experiment.
As can be seen in Figure 1(a), EEG Acquisition through Odd-ball paradigm indicates that P300 component is sensitive to the "meaningful word" which is consistent with this notion that P300 component is sensitive to the "concealed information" [18,21,24].In other words, the unexpected and significant stimulus of Target (T) and Probe (P) lead to the same production of P300based brain responses in the guilty subject.Whereas, in the innocent subjects the response of probe stimulus is more similar to the response of Irrelevant stimulus.
Therefore these results are in accordance with our ex-  Lie detection accuracy values for the proposed method and the method used in [18].

Method
The proposed method Ref [18] MLP 89.73% 86% pectation that probe stimuli result in larger P300 in guilty subjects than innocents.Also the significant increase in positive area and total area features and the significant decrease in all frequency features for the guilty subjects are in accordance with the appearance of a positive peak with low frequency content (P300) in the brain response which is according to previous studies in [18].In this research, by attention to the Oddball paradigm, some changes have been applied to the common GKT methods and the manner of implementation the reference test [24] so that the detection rate of P300 component improves in recognition of Concealed information and classification of Innocent and Guilty subjects.Also, in comparison with [24], the test time has been increased which is equivalent to increase the probability of number of iterations in the Oddball paradigm (increased from 30 to 40).Nevertheless, the negative outcomes of this task (to be tired, get used to the seeing the images, and an increase in the rate of blink artifact) must be considered.The obtained results show that an increase of the test time improves the detection rate of P300 component and as a result the accuracy of separation.Therefore, one of the probable reasons may be the increase of Test time in the process of the protocol.
Finally, comparing our findings with the results reported in [18], it is obvious that the optimal combined feature vector resulted from Wigner-Ville transform performs better than Classic and wavelet based methods, and has more ability to detect Guilty subjects, the most important reason might be the superiority of the Wigner-Ville method over Wavelet transform, simultaneous usage of the features (combinational form of Classic and Time-Frequency features), and also using the PCA method.

Figure 2 .
Figure 2. Some examples of slids represented to Guilty and Innocent groups for data acquisition.P and Ir slides are randomly shown for 40 times, and T slide is shown for one time to each subject.

Figure 3 .
Figure 3. Averaged EEG signals of single-trials of all 20 iterations after showing target, irrelevant and probe slides for two typical subjects: (a) Innocent subject; (b) Guilty subject.

5 ) 7 ) 8 )
Absolute latency/amplitude ratio ( max max S t s ). 6) Positive area (Ap): the sum of the positive signal values Negative area (An): the sum of the negative signal values Total area (Apn): the sum of the positive and negative signal values (|Apn|): Absolute total area.10) TAAR (Ap|n|): Total absolute area.

Figure 4 .
Figure 4. Wigner Ville transform of the probe signal of a guilty person: (a) 2D view; (b) 3D view. 

.
This tableshows how simultaneous usage of Feature selection and PCA can increase the accuracy.As can be seen, in both cases (using of MLP or KNN), PCA has better results in comparison with Feature Selection whereas simultaneous usage of Feature Selection and PCA methods for producing a new combinational feature vector has increased the accuracy by 4% -5%.

Table 1 .
The accuracy values of MLP and KNN classifiers with the feature dimension reduction methods (individual and combinational) for Guilty and Innocent groups.