^{1}

^{*}

^{2}

^{1}

^{3}

For many years, the uncertainty of lie-detection systems has been one of the concerns of defense related agencies. Clearly the results of these systems must be generalized by a high value of accuracy to be acceptable by judicial systems. In this paper, a new method based on P300-based component has been proposed for lie-detection. In this regard, the test protocol is designed based on Odd-ball paradigm concealed information recognition. This test was done on 32 people and their brain signals were acquired. After preprocessing, the classic features are extracted from each single trial. After that, time-frequency (TF) transformation is applied on the sweeps and TF features are produced thereupon. Then, the best combinational feature vector is selected in order to improve classifier accuracy. Finally, Guilty and Innocent persons are classified by KNN and MLP. We found that combination of Time-Frequency and Classic features have better ability to achieve higher amount of accuracy. The obtained results show that the proposed method can detect deception by the accuracy of 89.73% which is better than other previously reported methods.

The detection of deception has a long history. The first proposed technology was the polygraph, which recorded autonomic arousal and was used in the determination of guilt or innocence [

1) Probe (P): This is related to crime that only guilty persons are familiar with this stimulus whereas innocents are not.

2) Irrelevant (I): This has not any relations with crime, so is unknown by all subjects.

3) Target (T): This is unrelated to crime but is known by all subjects.

The number of irrelevant stimuli is many times greater than the numbers of the other two types; and therefore probes and targets are rare stimuli. The T stimuli force the subject to pay attention to items, because failure in responding to these stimuli suggests that the subject is not cooperating [

In this study a new combinational feature vector which is obtained from Wigner Ville transform is introduced, then the principal component analysis (PCA) is applied to the optimal feature vector and finally two classifiers are used to discriminate between guilty and innocent subjects in a concealed information recognition test. Thus, a new algorithm has been presented for lie detection using ERPs sweep. In this algorithm, after preprocessing of EEG signal, and probe sweeps extraction, first some suitable features are extracted from each single trial. These features such as morphological and frequency features (classic features) will be those ones which contain suitable information about the inspected phenomena. In the next stage, time-frequency transformation is done on the sweeps and some other features are produced. The best features are selected in order to achieve the best combinational feature vector which is optimal. Then, PCA is applied to the feature space to reduce the features dimension. Finally, Multilayer perceptron (MLP) neural network and K-Nearest Neighbor (KNN) are used to classify guilty subjects and those of innocents. In

ing, the aim of our research is to find a new combinational feature space from classic (morphological and frequency features) and Wigner-Ville features in order to detect Innocent and Guilty subjects, and also to compare its performance with the obtained results using the wavelet features and previously reported methods.

Thirty two Iranian subjects participated in the experiments that were generally undergraduate or postgraduate students and all had normal or corrected vision participated in this study. They did not have any neurological disease. Participants were naive to the experimental design. The mean age was 25 years. Data from three sub jects were discarded because of too many artifacts or machine failure.

Thirty-two subjects participated in this study. They were randomly divided into two groups (16 guilty cases and 16 innocent ones). The implementation methodology used in this study is the same as method which was employed in the Ref [

EEG data was recorded by using Ag/AgCl electrodes which were placed at midline sites of the head (Fz, Cz and Pz); But only the results of Pz channel would be analyzed. The vertical and horizontal EOG were recorded. EEG was recorded continuously and digitized with a sampling rate of 256 Hz. All signals were filtered in the range of 0.5 - 30 Hz by a zero phase digital filter.

Procedure: Before experiments, seven different slides presenting seven different words were prepared for each subject. These slides were presented randomly and each one lasted 1500 ms with the 40 iterations on the computer screen with an inter stimulus interval of 1000 ms. Among these seven slides, one was probe stimulus, the other was target and the rest were irrelevant stimuli. It is necessary that subjects (both Guilty and Innocent) recognize the target stimuli well, thus it should be a word (name) related to one of subjects family members or from well-known people such as sport stars, politicians or movie stars and also from the most famous landmarks (such as most famous monuments, city icons, places of power and worship, sacred mountains and etc.). Slides of meaningless words were used as irrelevant. Probe for guilty subjects were taken from a meaningful word. In the case of innocent subjects, the word slide of probes had not been seen by them and they didn’t have any information about those words.

Two push buttons were given to each hand of the subjects and all subjects were asked to press one of them whenever they want to say “Yes, I know”, and press another one when they want to say “No, I don’t know”. Innocent subjects replied honestly to all stimuli, but guilty subjects replied honestly only to target and irrele vant ones. They replied falsely to probe stimulus.

Based on the previous studies, in this research, two types of Morphological and Frequency features as Classic features have been proposed. These features have shown good performances in similar studies and hence, were

believed to be useful as well for this application. Then, time-frequency transform features have been suggested that those are extracted through the Wigner Ville transform because it provides better time frequency resolution than nonparametric linear methods (i.e., wavelet and short-time Fourier transform), an independent control of time and frequency filtering, and power estimates with lower variance than parametric methods when rapid changes occur.

After the filtering of signals, each continuous record was separated to single sweeps according to the known times of stimulus presentation. The length of each sweep is 1000 ms which contains 256 samples of signal. EOG data were checked for blink artifact by visual inspection and sweeps with blink artifacts were removed.

It should be noted that, in all description of P300 follow, only the results at site Pz was noted, since Pz is the site where P300 is usually reported to be maximal and therefore the analytic procedure (below) were performed on Pz data only.

These features are morphological and frequency features will be those which contain suitable information about the inspected phenomena.

First group of features contains 17 morphologic features. These features were previously used by Kalatzis et al. in discriminating depressive patients from healthy controls using the P600 component of ERP signal [

1) Amplitude (s_{max}): the maximum signal value

2) Absolute amplitude (|s_{max}|).

3) Latency time: the time where the maximum signal value appears

where s(t) is the ERP single trial during 400 - 800 ms after stimulus and s_{max} is the maximum signal value in this time interval.

4) Latency/amplitude ratio ().

5) Absolute latency/amplitude ratio ().

6) Positive area (Ap): the sum of the positive signal values

7) Negative area (An): the sum of the negative signal values

8) Total area (Apn): the sum of the positive and negative signal values

9) ATAR (|Apn|): Absolute total area.

10) TAAR (Ap|n|): Total absolute area.

11) AASS :Average absolute signal slope

where τ is the sampling interval of the signal, n the number of samples of the digital signal, and s(t) the signal value of the sample.

12) Peak-to-peak (pp):

pp = S_{max} − S_{min} (8)

where S_{max} and S_{min} are the maximum and the minimum signal values, respectively:

13) Peak-to-peak time window (tpp):

14) Peak-to-peak slope:

15) Zero crossings (ZC, n_{ZC}): the number of times t that s(t) = 0, in peak-to-peak time window:

where = 1 if s(t) = 0, 0 otherwise .

16) Zero crossings density (ZCD, d_{ZC})—zero crossings per time unit, in peak-to-peak time window:

where n_{ZC} are the zero crossings and t_{pp} is the peakto-peak time window.

17) Slope sign alterations (n_{sa}): the number of slope sign alterations of two adjacent points of the ERP signal:

where τ is the sampling interval of the signal (τ = 3.9 ms, for the sampling rate of 256 Hz).

The second group of defined features is three frequency characteristics of the signals. These features are mode frequency, median frequency and mean frequency, [17,25-28] which are described and calculated as follows:

1) Mode frequency: f_{mode} is the frequency with the most energy content in the signal spectrum, so the maximum amplitude in the power spectrum density of the signal is at this frequency:

S is the power spectral density of signal and f is frequency.

2) Median frequency: Median frequency (f_{median}) separates the power spectrum into two equal energy areas and is calculated from the following equation:

3) Mean frequency: Mean frequency (f_{mean}) represents the centroid of the spectrum and is calculated from the weighted averaging of the frequencies in the power spectral density of signal:

4.2.2.1. Time-Frequency Domain Analysis Time-frequency (TF) analysis transforms time-domain signals into the so called time-frequency distribution (TFD), which can be interpreted as joint (simultaneous) distributions of signal energy in the time and frequency domains, without the scaling effects of wavelet transforms [

An approach to analyze non stationary EEG signal, is time-frequency (TF) methods. This can be divided into three main categories: nonparametric linear TF methods based on linear filtering, including the short-time Fourier transform [30-32] and the wavelet transform [33-39], nonparametric quadratic TF representations, including the Wigner-Ville distribution [40-42] and its filtered versions, and parametric time-varying methods based on autoregressive models [43-50] with time-varying coefficients. In this paper the Smoothed Pseudo Wigner-Ville distribution (SPWVD) is preferred since it provides better time frequency resolution than nonparametric linear methods, an independent control of time and frequency filtering, and power estimates with lower variance than parametric methods when rapid changes occur [

where n and m are the discrete time and frequency indexes, respectively, h(k) is the frequency smoothing symmetric normed window of length 2N − 1, g(p) is the time smoothing symmetric normed window of length 2M − 1 and r_{x}(n, k) is the instantaneous autocorrelation function, defined as

4.2.2.2. TF Features Extraction Each probe sweep is divided into 10 segments of equal length; each segment is approximately 100 ms in time domain. Then, the average energy of each segment was computed using our previously reported method in [

FLT: Frequency of latency time (i.e., the frequency of the time where the maximum signal value appears).

MAX w: maximum amount of energy in each window.

MIN w: minimum amount of energy in each window.

DIF w: difference between maximum and minimum amount of energy between windows.

STD w: standard deviation between energy of time windows.

The obtained signal in TF domain is also divided into four frequency segments according to the brain wave frequency as below:

E_{Delta}: the complex of energy signal in Delta frequency band (0.5 - 4) Hz, divided by length of band (3.5).

E_{Theta}: the complex of energy signal in Theta frequency band (4 - 8) Hz, divided by length of band (4).

E_{Alpha}: the complex of energy signal in Alpha frequency band (8 - 13) Hz, divided by length of band (5).

E_{Beta}: the complex of energy signal in Beta frequency band (13 - 30) Hz, divided by length of band (17).

F_{Delta}: the average of energy signal in Delta frequency band (0.5 - 4) Hz.

F_{Theta}: the average of energy signal in Theta frequency band (4 - 8) Hz.

F_{Alpha}: the average of energy signal in Alpha frequency band (8 - 13) Hz.

F_{Beta}: the average of energy signal in Beta frequency band (13 - 30) Hz.

Also, we have defined the first order derivative as a feature to show the difference between adjacent windows. This derivative is the difference between the average energy in subsequent windows, so the first order derivative feature is computed as below

where W_{n}: average energy in window of n W_{(n–1)} : average energy in window of (n – 1)

The result of features survey in time span of 100 ms illustrate that in guilty person the features changes from one window to next window is much more prominent so we define the first order derivative .

The defined features (34 features) were extracted from all probe sweeps of all subjects.

In any classification task, there is a possibility that some of the extracted features might be redundant. These features can increase the cost and running time of the system, and decrease its generalization performance. In this way, the selection of the best discriminative features plays an important role when constructing classifiers. To identify the best features (for classification) in feature space, searching selection method is applied. In such a way, first the classification has been applied separately to each feature. The best feature has been selected in accordance with the most value of classification accuracy. This feature will be combined with the other individual

features and thus the best pair combination will be produced. This process would be counting to achieve the best combination of features.

In order to reduce the dimensionality of input features, and to select the discriminating features, and to have better classification performance, and reduce the learning time, the principal components analysis (PCA) is used. The goal of the use of PCA, is finding a transform matrix to maximize the between class distribution and minimize the interclass distribution [53-57]. The obtained features from feature selection method (26 features) and without feature selection (34 features) were reduced to 21 and 25 features, respectively, by using PCA technique.

To discriminate between the probe stimuli of the guilty and innocent person, the Multilayer perceptron (MLP) neural network and k-Nearest Neighbor (KNN) classifier have been used. The extracted Features from the probe sweep of the guilty person were compared with the innocent person. For each subject, the probe sweeps of other subjects with their real labels (G-probe/I-probe) were used as training data, and then the trained classifier applied on the probe sweeps of the given subject.

The classifier using a three-layer MLP with error back propagation algorithm and variable learning rate. The input layer has a number of nodes equal to the input vector length. The output layer consists of one node, accounting for a possibility of only 2 classes to be classified. Also, all the possible combinations of the selected numbers of neurons in the hidden layer were selected and trained and finally the optimized number was equal to 5. The output nodes had linear transfer functions, and the hidden layer used a sigmoid function. Network training continued until the mean square error became less than 0.01 or the number of training iterations reached to 1000. Due to the limited input data set, Leave One Out crossvalidation method was done for training [58,59]. At each stage one of observations was selected as test data and 31 as train data, and this process repeated 32 times. Network error in each step was computed, and finally the average was calculated. One advantage of this approach is that all the input data set are present in both processes (train and test).

k-Nearest neighbor (KNN) algorithm is one of the most effective non-parametric methods in pattern recognition [

In this research, many types of features have been investigated to classify the Guilty and Innocent subjects. In this regard, after extraction of features, has been tried to find the best combinational feature vector through the feature selection method. Also, to improve the performance of the classifier and reduce the execution time, PCA method is applied and finally Guilty and innocent subjects are classified using a classifier which has the best accuracy in comparison with others. Mean values of the accuracy of classification for both MLP neural network and KNN classifier are given in

P300-based GKT, as a new method for the psychophysiological detection of concealed information, was tested on innocent and guilty subjects who were concealing information regarding a mock crime committed as a part of the experiment.

As can be seen in

Therefore these results are in accordance with our ex-

pectation that probe stimuli result in larger P300 in guilty subjects than innocents. Also the significant increase in positive area and total area features and the significant decrease in all frequency features for the guilty subjects are in accordance with the appearance of a positive peak with low frequency content (P300) in the brain response which is according to previous studies in [

In this research, by attention to the Oddball paradigm, some changes have been applied to the common GKT methods and the manner of implementation the reference test [

Finally, comparing our findings with the results reported in [

TF: Time-FrequencyERP: event related potentialsfMRI: Functional Magnetic Resonance ImagingGKT: Guilty Knowledge TestEEG: ElectroencephalographyP: ProbeI: IrrelevantT: TargetPCA: Principal Component AnalysisMLP: Multilayer perceptronKNN: K-Nearest NeighborTFD: Time-Frequency distributionSPWVD: Smoothed Pseudo Wigner-Ville distributionZCD: Zero crossings density.