Keystroke Dynamics Based Authentication Using Information Sets

This paper presents keystroke dynamics based authentication system using the information set concept. Two types of membership functions (MFs) are computed: one based on the timing features of all the samples and another based on the timing features of a single sample. These MFs lead to two types of information components (spatial and temporal) which are concatenated and modified to produce different feature types. Two Component Information Set (TCIS) is proposed for keystroke dynamics based user authentication. The keystroke features are converted into TCIS features which are then classified by SVM, Random Forest and proposed Convex Entropy Based Hanman Classifier. The TCIS features are capable of representing the spatial and temporal uncertainties. The performance of the proposed features is tested on CMU benchmark dataset in terms of error rates (FAR, FRR, EER) and accuracy of the features. In addition, the proposed features are also tested on Android Touch screen based Mobile Keystroke Dataset. The TCIS features improve the performance and give lower error rates and better accuracy than that of the existing features in literature.


Introduction
Security is a concern since the advent of the computers.The need of robust and ubiquitous security systems is more apparent due to widespread use of Internet and rapidly growing online business transactions, e-banking, shopping, social interactions, emails to name a few.User authentication involving both identifi-mation as feature subset.For recording the timestamp, special scan codes are used in the interrupt handler of the standard keyboard.When two keys are pressed such that the first key is not yet released and the second gets pressed, then a negative time measurement occurs which is a limitation.To overcome this, a modified latency measurement is suggested in [11].A combination of key hold time and digraph latency metrics is used in [12] to reduce error drastically.The features used in [6] are of four types: key code (ASCII code of the key being pressed) and three timing features that include: Down-Down Time (DD), Up-Down Time (UD) and Down-Up Time (DU).The first two timing features are used to denote the inter-key latencies and third feature indicates the hold-time.
For authentication that involves both identification and verification of a user by keystroke dynamics based system, many classifiers have been used.They are divided into three broad categories, viz., statistical methods, neural networks and pattern recognition based techniques.
The statistical methods related to the first category employ statistical tools on basic keystroke features and apply distance metric to authenticate a user.The initial work on Keystroke Dynamics by Gaines et al. [7] involves t-test on diagraph features to check the similarity of mean vectors and covariance matrices on two multivariate normal populations giving FAR of 0% and FRR of 4%.But this is impossible to achieve in real life situation where the number of users is very less.
Umphress and William [13] identify a user by comparing the keystroke latencies and digraphs of the test sample with the reference profile data comprising the mean keystroke latency and average time to press in the two consecutive keys.A confidence score is specified to achieve FAR of 17% and FRR of 30%.Joyce and Gupta [10] have developed a mean reference signature consisting of a set of four vectors of keystroke latencies for username, password, first name and last name.
The norm is computed between the test keystroke pattern and the reference signature and then the user authentication is done based on some predefined threshold.By this FAR of 0.25% and FRR of 16.67% are achieved.Teh et al. [14] have proposed a statistical fusion approach for keystroke dynamics based recognition system and they authenticate a user using the weighted sum of Gaussian scores and Direction Similarity Measure based scores.
We now detail out the neural network based approaches under the second category.Giroux et al. [15] have used keypress ratios as a measure of authentication and a dedicated Artificial Neural Network (ANN) is employed for the authentication of a user.A function f: R m−1  {−1, 1} is learned from ANN, where x ϵ R m−1 denotes the m − 1 keypress interval timing ratios for m-character password and f(x) = [−1, 1] indicates whether the input keypress interval ratios correspond to that user or not.For every individual, a feed-forward ANN is trained with back-propagation, resulting in weights that are subsequently used for authentication.Bleha et al. [16] have used linear perceptron to authenticate the users and reported error rates, FAR and FRR of 9% and 8% respectively.The  The pattern recognition based techniques falling under the third category are now discussed.Support Vector machine (SVM) based on keystroke latency in [17] gives FAR of 0.02 and FRR of 0.1 for 10 users.The keystroke latency and key hold time are used as features for k-nearest neighbor classifier in [18].The classical pattern recognition based algorithms such as back propagation with sigmoid transfer function, sum of products, hybrid sum of products, Bayes' decision theory and Potential Function are used in [19] for combining key hold time and interkey latencies.Among various pattern recognition techniques used in [19], potential function gives the best results with FAR and FRR of 0.7% and 1.9% respectively.

Motivation for the Present Work
From the literature survey, it can be seen that most of the approaches on keystroke dynamics are carried out on the created datasets and they report results either on desktop or mobile but not both.It is difficult to compare the performance of different approaches due to lack of common benchmark dataset.So, we have tested the proposed approach on the benchmark datasets under both desktop and mobile environments and the results obtained are found to be superior to the best so far.
The organization of the paper is as follows: Section 2 presents the information set (IS) and some of its properties.It also formulates the IS based features and higher form of IS features.Section 3 develops an algorithm for the two-way information set approach.Section 4 describes the databases for the present work and Section 5 discusses the results of implementation.Section 6 gives the conclusions and the future work.

An Introduction to Information Set
A fuzzy set deals with vagueness or fuzziness [20].It is characterized by a membership function (MF) that maps the information source values to the degree of association in the range (0, 1).The MF of x i in a fuzzy set (F) is denoted by μ F (x i ).

Given a collection of attribute values
, , , , , , A fuzzy set suffers from some drawbacks [21]: i) The values of MF are separate from the information source values.There is no way to link the two into a single entity.ii) MF doesn't provide the overall fuzziness/vagueness of F but only the degree of association of every information source value to a vague concept, and iii) The time varying information source values are not easily represented in MF.To eliminate these drawbacks of a fuzzy set, Hanmandlu and his co-works have developed Information set theory which can be found in [21]- [27] based on the information theo-retic entropy function christened as Hanman-Anirban entropy function.The properties of information sets given later in this section will highlight the power of information sets.
Our primary goal being the representation of overall uncertainty in keystroke dynamics, we are inclined to investigate the suitability of the information set based features.We will now discuss how a fuzzy set paves the way for the information set while representing the uncertainty in its elements using an entropy function.

Information Set Concept
Consider a set of keystroke timing features T = {T ij } where T ij is the j th feature in i th keystroke sample.When a set of keystroke timing features is fitted with a membership function, denoted by {μ ij }, a pair of keystroke timing value and its membership function forms an element in a fuzzy set.Information set connects the two components of each pair into a single entity called the information value using the Hanman-Anirban Entropy function [22] which has the facility to represent both probabilistic and possibilistic uncertainties.The probabilistic uncertainty in a fuzzy set is defined by Hanman-Anirban entropy function having a polynomial in its exponential gain function as: ( ) where ( ) and a, b, c and d are real valued parameters which need to be selected appropriately.It may be noted that p represents the probabilities.As shown in [23] that the possibilistic uncertainty is a better representation of uncertainty than the probabilistic uncertainty given by Equation (1).Moreover, the number of probabilities is limited in the context of keystroke dynamics; this is the reason we are bent upon exploring the possibilistic uncertainty.
To bring Equation (1) into the information set domain, let us call the keystroke timing features T ij as the information source values.We then replace the probability p with T ij in Equation (1) and convert the exponential gain function into the Gaussian membership function by selecting the parameters as A more general entropy function is presented by Mamta and Hanmandlu in [24].This entropy function not only converts the exponential gain into the generalized Gaussian membership function with an exponent power of β but also modifies the information source values with a power of α .This is defined as: ( ) where ( ) The product of Information source value and membership function is termed as the information value and this is more general than the one in Equation (2).
The sum of all information values, ij i j H

∑ ∑
gives the effective information.
In this work, we are using only the information value as a feature.

Definition of Information
= is called the information set such that each information value is a product of the information source value and the corresponding membership function value.The values of α and β need to be selected appropriately.

Some Properties of Information Sets
The properties of information sets are presented in [25].Following are the important properties of Information Sets: 1) The membership function can be empowered to act as an agent with the capabilities that are beyond the scope of a fuzzy set.For example, the complement of a membership function can be an agent.Any intuitionist membership function can also be a contender.The membership function can be formed from other information source values not associated with the same fuzzy set.Thus, an agent extends the scope of a fuzzy set.
2) The higher form of information sets called transforms can be derived based on the information values.This is shown in the sequel.
3) The information set arises out of representing the varying information source values in either time or space.For example, a variation in the keystroke data within a sample gives the spatial information values whereas the variation in keystroke timings over a number of samples gives the temporal information values.
4) Information set can represent both probabilistic and possibilistic uncertainties.To represent the probabilistic uncertainty, frequencies of occurrence of the information source values called the probabilities are considered but for the possibilistic uncertainty, attribute values like keystroke timing values are considered.

Derivation of Information Set Based features
We will now derive the information set based features.The use of basic information set features like sigmoid and energy appears in [26].It is important to note that our unit of information is either the information value ij ij T α β µ or the com-

b) Complement Information Value
As per the second property of information sets stated above the complement of membership function, i.e., ij µ is found to be useful as an agent which is an empowered membership function with an extended scope as compared to that of a fuzzy set.As a result, the complement information value ij ij T α β µ serves as the feature, given by where = − .Note that the complement membership function has its do- main out of the fuzzy set.

c) Energy features
As the information value depends on the membership function empowered as an agent, we can generate different kinds of information values by changing the agent.To generate Energy feature, the agent is taken as So, the complement energy feature is: d) Sigmoid feature According to the first property of information set, information value considering it as a unit of information can be modified by applying some function like sigmoid function.Note that the effectiveness of the information value (feature) gets enhanced with the application of this function.So, the modification of information value using the sigmoid function leads to the sigmoid feature defined as: 1 e) Multi Quadratic feature The multi-quadratic function either increases or decreases monotonically from the center.Using this function, the membership function is computed as: where . The multi quadratic information value is computed as Inverse multi-quadratic function is the reverse of multi-quadratic function.
Membership function for the inverse multi quadratic feature is given by The inverse multi-quadratic information value is therefore T α µ .

Higher Form of Information sets
So far, we have utilized the basic information values for deriving different features.We will now derive higher form of information set based features.This requires us to consider the adaptive Mamta-Hanman entropy function in which the parameters of the exponential gain function are assumed to be variables.

Some important properties of this adaptive entropy function are relegated to
Appendix A.

a) Hanman Transform
Hanman Transform is a higher form of information derived from the adaptive Mamta-Hanman entropy function in [24].The use of this transform appears in [27].The idea of this transform is to use the first-level information values in get- ting the second-level information values.Thus, this transform is intended to get a better representation of the uncertainty in the information source values.The Hanman Transform (HT) is defined as ( ) Proof: Again, we resort to the adaptive Mamta-Hanman entropy function (3) and set where ( ) This is Pal-Pal transform.This can be shown to be equivalent to what we term as the non-linear Shannon transform in Equation ( 14) where the logarithmic function is operating on the information values.In some applications, the use of complement of μ ij in the transform improves its effectiveness.The Shannon inverse transform where the evaluation of the information source values is based upon the complement agent is expressed as: In the above transforms, Gaussian membership function as defined in Equation ( 5) is best suited.These transforms can have realistic applications in social networks though not attempted so far.For example, we gather information about an unknown person of some interest to us.This is the first-level of information and then evaluate him again to get the second-level of information camped with the first-level of information.They can be used to evaluate not only the information source values but also the membership function values to see whether the selected membership function is appropriate.

c) Composite Transform
For creating sigmoid and energy features, we have considered the basic in- µ as the unit of information.But to create the composite transform consider the Hanman transform feature as the unit of information and apply the log function on it leading to the composite transform given by ( ) In fact, this is the ij component of the following transform: ( ) By interchanging log and exponential function we can formulate yet another composite transform as given by ( ) ( ) As can be noted that the difference between Equations ( 18) and ( 19) is that in the former case log function is applied on the Hanman transform whereas in the latter case the exponential function is applied on the Shannon transform.In this paper, we have shown the results of Equation ( 18).
The Complement Composite Transform is easily obtained by considering the complement Hanman Transform as the unit of information and applying the log function on it.It is given by

d) Convex Hanman-Anirban Entropy Function
Let i φ be convex and twice differentiable function and ψ is convex, twice differentiable and strictly increasing function.Then the generalized mean can be written as: [28] ( ) ( ) where i w are such that 0 1 Its double derivative is therefore ( ) ( ) Supposing If . Substituting ( 24) in ( 21) yields the convex entropy function ( ) ( ) If we take i i i x T µ = which is the unit of information in Equation ( 25), we obtain convex Hanman-Anirban entropy function: ( ) We can find its use in the design of or to modify a classifier.Here we use it to modify the Hanman classifier.

The Two-Component Information Set (TCIS)
We have at our disposal several samples of keystroke dynamics for each user.To calculate the membership function, we have adopted Two-Component information set approach.In this approach, the temporal information I 1 is the first component for which the membership function 1 µ is computed using all the training samples.The spatial information I 2 is the second component for which the membership function 2 µ is computed using all the features in a single  σ .These membership functions along with T give us two components, Step 4: Concatenate I 1 and I 2 and generate new features such as Information, Energy, Sigmoid, Hanman Transform etc. Then train SVM/Random Forest classifier or Convex Entropy Based Classifier using these features.
Step 5: For each test sample, compute I 1 using 1 avg T and 1 σ computed in Step 1.
Step 7: Concatenate I 1 and I 2 to obtain I and use the new features for the classification using SVM/ Random Forest classifier or Convex Entropy Based Classifier.

Design of Hanman Classifier (HC) Using Convex Entropy Function
As I is a feature vector, let us denote the training feature vector of r th sample of l th user by P l (r, k) and Q(k) be the test feature vector where k refers to the k th feature value.The training and testing feature vectors are subjected to min-max normalization.In view of Equation ( 25), the test feature vector is rewritten as: ( ) ( ) ( ) Similarly, each training feature vector can also be denoted in the above form as:

of Conditional Entropy Function
The conditional Hanman-Anirban entropy termed here as conditional possibility, c poss of a test feature vector Q(k) given the training feature vector P l (r, k) is expressed by following [25] as: The conditional possibility of intersection of two training feature vectors given the test feature vector can be written as: As t-norm being the conjunction operator it gives the minimum difference between any two vectors in (30) where we have used Frank t-norm for t F as it is found to be most effective [24].It is given by We call ( ) E k as the normed error vector as it is the result of applying t-norm on the pairs of two error vectors.We now invoke the convex entropy function for the representation of uncertainty in the normed error vectors.
In order to improve the above convex entropy function, we convert it into parametric form: where γ and ρ are the parameters.The proof of ( 33 ρ = on keystroke dataset.We compute h(l) for all i, j and the minimum value associated with l gives the identity of the unknown user; so the criterion function is selected as:

Description of Databases Used
For the evaluation of the keystroke dynamics based authentication system, the following publicly available datasets are availed: a) CMU Keystroke Dynamics Benchmark Dataset [1] This database comprising 51 users is collected in 8 sessions and 50 repetitions of the same password are recorded in each session.We have 400 samples per us- The authentication accuracy is evaluated using EER (Equal Error Rate) where False Acceptance Rate (FAR) equals False Rejection Rate (FRR) on ROC curve.
FAR is the rate at which an unauthorized person (i.e.imposter) would be given access to the system as a genuine user [30] whereas FRR is the rate at which an authorized user would be rejected the access to the system considering him as imposter.FAR is calculated as the ratio of imposters granted access to the total number of imposter attempts while FRR is calculated as the ratio of genuine users denied access to the total number of genuine attempts.
For SVM and Random Forest Classifier, the performance measure EER is calculated for each set of genuine and imposter users, i.e. 51 × 50 sets of such experiments.The mean of the performance measure values (EERs) is then calculated for all the experiments.In addition to EER we also report FAR and FRR values and authentication accuracy.
For convex entropy based classifier, the performance is reported in terms of EER(mean), FAR, FRR and Accuracy, calculated as the ratio of number of users correctly classified as genuine/imposter user to the total number of user attempts across all the experiments.FAR is calculated as the ratio of number of users which are incorrectly accepted as genuine to the total number of imposter user attempts across all the experiments.FRR is calculated as the ratio of number of users which are incorrectly rejected as imposter to the total number of genuine user attempts across all the experiments.b) Sapientia University Keystroke Benchmark Dataset for Android platform [31] This data is collected from 42 users with 51 samples per user with at least 2 sessions per user.Each user types the password ". o a n l" which are 14 key presses.We have used all of 71 features of the dataset given in Table 1 for our work.that operates on the principle of majority votes to get the classification vote.In addition to these standard classifiers the proposed convex entropy based classifier is third one discussed in Section 3.2.

Results of Implementation
Before presenting our results, let us see the state of the art on keystroke dynamics in the literature.Table 2 shows EERs for some of the algorithms with the best performance on the recent CMU dataset.The first algorithm given in Table 2 is an anomaly detector that uses Manhattan Distance [1] [32].This method arrives at the mean of timing samples and the absolute mean standard deviation for each feature [32].Given a test feature vector, a distance score is calculated using the following scaled Manhattan Distance: where i x and i y are i th test feature and i th mean vectors respectively and i a is the mean absolute standard deviation of i th feature.
Zhong et al. [33] have developed the new distance metric by combining both Mahalanobis and Manhattan distances as given by ( ) S − is the inverse of the principal square root of covariance matrix S.
Deng and Zhong [29] have used Deep Belief Networks by stacking together Gaussian RBM (Restricted Boltzmann Machine) with 31 visible units and 100 hidden units, and a binary RBM with 100 visible units and 100 hidden units, and obtained a mean EER of 0.035.
Table 3 shows the EERs obtained on SU dataset in [31] for two-class classifiers using all 71 features as shown in Table 1.These features also include touch based features such as finger area and key press pressure.
The performance of different information set based features with α = 1 on CMU dataset is listed in Table 4 in terms of FAR, FRR, EER and accuracy using  The average ROC for various information set features with α = 1 on CMU dataset is shown in Figure 2 using SVM as classifier.
The features of Table 4 are applied on the same CMU dataset but with α = 2 using SVM and the results are given in Table 5.Here the best EER of 0.0225 is obtained with the Sigmoid Features.Comparing Table 4 and Table 5 we note that sigmoid feature is best in terms of EER values.The average ROC for various information set features with α = 2 on CMU dataset is shown in Figure 3 using SVM.The features used in Table 4 and Table 5 along with the additional features contribute to EERs and accuracy figures in Table 8 on the same CMU data with α = 1 but with random forest classifier.In this case the best EER of 0.0103 is obtained with the Hanman Transform.The averages of ROCs for some of the features are shown in Figure 6.
The features shown in Table 8 are now obtained with α =2.Random Forest is used on CMU dataset and the results are given in Table 9.The mean ROC curves for some of these features are displayed in Figure 7.
The results of some of the features of Table 8 and Table 9 used on SU dataset with Random Forest for α = 1 are given in Table 10.By this classifier, the best EER is obtained with Information Value and Energy features.The Composite Transform lags these features in performance slightly.The average ROC for these features is shown in Figure 8.
These features are also tested with α = 2 on SU dataset with Random Forest     classifier and the results are shown in Table 11.The mean ROC for these features is shown in Figure 9. Best EER is obtained using Hanman Transform.

Appendix A: Adaptive Mamta-Hanman Entropy Function
Let us make H adaptive in (3) by considering the parameters as functions rather than constants as per the original definition in [24].For simplicity, we consider the adaptive form of 1D H given by ( ) In this the parameters i c and i d are assumed to be variables.The 1D form of this function [24] is: ( ) where ( ) ( so is H bounded. 3) With the increase in i p , ( )   To prove that this is concave the Hessian matrix must be negative definite.
The Hessian is computed as follows: In that case, 2 0, 7) The entropy is minimum if and only if p i 's except 1 are equal to zeros and single p i = 1.
To make better representation of uncertainty, we will introduce higher form of uncertainty representation.
Submit or recommend next manuscript to SCIRP and we will provide best service for you: Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.A wide selection of journals (inclusive of 9 subjects, more than 200 journals) Providing 24-hour high-quality service User-friendly online submission system Fair and swift peer-review system Efficient typesetting and proofreading procedure Display of the result of downloads and visits, as well as the number of cited articles Maximum dissemination of your research work Submit your manuscript at: http://papersubmission.scirp.org/Or contact jmp@scirp.org µ .a) Information Value The basic information values ij ij ij H T α β µ = can also be used as one of the fea-Journal of Modern Physics tures in our study.The membership function is taken to be Gaussian with 2 of ij T values.One can take any value such as mean, maximum and median for the reference.

1 β
= in (3) we ob- tain(12).Note that the exponential gain function has its argument as the first-level information value and after evaluation using Equation(12) we get the second-level information value.This is called transform because the original Information source value ij T is modified by the information value, ij ij T µ .The Complement Hanman Transform is easily obtained by setting ij ij Shannon Transform is an offshoot of Hanman Transform as it can only be derived from the Hanman Transform and its features are shown to be useful in the face recognition in [22].The Shannon transform (Sh) features are computed from:

1 σ 2 σ
sample.Concatenation of these two information components results in Two-Component Information Set features denoted by I.A flowchart shown in Figure 1 explains how the features are computed in both training and test parts.The features from these parts go to a classifier for the authentication of keystroke sample.Algorithm: Step 1: Compute mean ( ) 1 avg T and variance ( ) of all the training samples.Step 2: Compute mean ( ) 2 avg T and variance ( ) of all the features in a single training sample.

Figure 1 . 1 avg T and 1 σ and then compute 2 µ using 2 avg T and 2
Figure 1.Flow Chart for authentication using TCIS features.
) which is no longer a convex function can be given as follows: This entropy follows from Mamta-Hanman entropy function with proper substitution of parameters.Taking loga-Journal of Modern Physics rithm of this entropy function entropy function.From our experiments, we get the best results with 0 er. CMU benchmark dataset has keystroke features, viz., DD (Down-Down) time, UD (Up-Down) time and H(Hold) time.A 10 character password (.tie5Roanl) is typed by a user.In our study, we have used H and UD since they give the best results.Accordingly, we have 21 features that include: 11 H values for 10 characters and an enter key, 10 UD values of time latencies between 11 key presses.Considering each of 51 users as both genuine and imposter we have a pool of 51x50 sets of experiments.Half of feature vectors of every user in each session is treated as the training data and the remaining half as the positive test data, i.e. 200 samples each.In addition to this, the first 5 samples from each of the remaining users are assumed to be the negative test data in every experiment.As demonstrated in [29] that by including the background user's data during the training phase in keystroke dynamics, the error rates are reduced significantly.Similarly, to train a classifier, we take the first 4 samples of the remaining users as negative training data resulting in 196 samples.The classifier is trained in each experiment such that samples of an imposter are not visible to the classifier during training.
tie5Roanl" on Android based Mobile Devices Nexus 7 Tablet and Mobil LG Optimus L7 II P710.The key sequence resulting from typing the password is "ti e [123?] 5 [abc] [Shift] R [Shift]

For
every user 45 samples are randomly selected as the training data and the remaining 6 samples constitute the positive test data.We take 1 sample from each one of the remaining users so as to have 41 negative test samples and include the first 2 samples of the remaining users who are neither genuine nor imposter in the training data resulting in 80 samples for the negative class as the imposter training data.Here again, each one of 42 users is considered as both genuine and imposter to conduct 42 × 41 sets of experiments.The classifier is trained on each of these experiments such that the samples of an imposter are unavailable to the classifier during training.Performance is then measured in terms of error rates EER, FAR, FRR and accuracy.c) Classification of the proposed features In our work, we have used three classifiers.The first is two-class SVM classifier with a linear kernel.The second classifier used is Random Forest Classifier, which generates an ensemble of decision trees based on the training data.Every test input vector is evaluated by all decision trees in the Random forest classifier

Figure 2 .
Figure 2. Average ROC for various Information Set based features on CMU dataset with α = 1 and SVM as classifier.

Figure 3 .
Figure 3. Average ROC for various Information Set based features on CMU dataset with α = 2 and SVM as classifier.

Figure 4 .
Figure 4. Average ROC for various Information Set based features on CMU dataset with α = 1 and Convex Entropy Classifier.

Figure 5 .
Figure 5. Average ROC for various Information Set based features on CMU dataset with α = 2 and Convex Entropy Classifier.

Figure 6 .
Figure 6.Average ROC for various Information Set based features on CMU dataset with α = 1 and Random Forest as classifier.

Figure 7 .
Figure 7. Average ROC for various Information Set based features on CMU dataset with α = 2 and Random Forest as classifier.

Figure 8 .
Figure 8.Average ROC for various Information Set based features on SU dataset with α = 1 and Treebagger as classifier.
. We will prove some important properties of the adaptive entropy function.In order to simplify proofs, we set α also a continuous function being a product of two continuous functions and H being the sum of continuous functions is also a an increasing function of n.

Table 2 .
EER for different algorithms on CMU dataset.

Table 4 .
Comparison of results for various Information Set based features on CMU dataset with α = 1 and SVM.

Table 6
shows the performance of Convex Entropy Based Classifier for different information set features with α = 1 in terms of FAR, FRR, EER and Accu-Table6shows the best performance for Information Feature in terms of EER of 0.0112 and accuracy of 0.9875.The average ROC for various information set features with α = 1 is shown in Figure4.The features used in Table6are obtained with α = 2 for Convex Entropy Based Classifier and the results are shown in Table7.Here we get the best performance in terms of EER of 0.0111 and accuracy of 0.9866 for Composite Transform.The average ROC with α = 2 is shown in Figure5.

Table 5 .
Comparison of results for various Information Set based features on CMU with α = 2 and SVM.

Table 6 .
Comparison of results for various Information Set based features on CMU with α = 1 and Convex Entropy Classifier.

Table 7 .
Comparison of results for various Information Set based features on CMU with α = 2 and Convex Entropy Classifier.

Table 8 .
Comparison of results for various Information Set based features on CMU with α=1 and Random Forest (Treebagger) as classifier.

Table 9 .
Comparison of results for various Information Set based features on CMU with α =2 and Random Forest (Treebagger) as classifier.

Table 10 .
Comparison of results for various Information Sets based features on SU dataset with α = 1 and Random Forest (Treebagger) as classifier.
, hence all the Eigen values of the Hessian matrix are negative.So, the Hessian is negative definite and H is concave.6)Entropy H is maximum when all p i 's are equal.In other words, 1 ,