^{1}

^{*}

^{1}

This paper presents keystroke dynamics based authentication system using the information set concept. Two types of membership functions (MFs) are computed: one based on the timing features of all the samples and another based on the timing features of a single sample. These MFs lead to two types of information components (spatial and temporal) which are concatenated and modified to produce different feature types. Two Component Information Set (TCIS) is proposed for keystroke dynamics based user authentication. The keystroke features are converted into TCIS features which are then classified by SVM, Random Forest and proposed Convex Entropy Based Hanman Classifier. The TCIS features are capable of representing the spatial and temporal uncertainties. The performance of the proposed features is tested on CMU benchmark dataset in terms of error rates (FAR, FRR, EER) and accuracy of the features. In addition, the proposed features are also tested on Android Touch screen based Mobile Keystroke Dataset. The TCIS features improve the performance and give lower error rates and better accuracy than that of the existing features in literature.

Security is a concern since the advent of the computers. The need of robust and ubiquitous security systems is more apparent due to widespread use of Internet and rapidly growing online business transactions, e-banking, shopping, social interactions, emails to name a few. User authentication involving both identification and verification has become a necessity before the access to system resources is allowed. The most common user authentication system till date employs username and password/PIN. Irrespective of whether a user chooses a very easy password or forgets any password he has chosen; the system may be prone to misuse in either case. It is possible to steal or hack the most difficult password by means of brute force methods. Use of biometrics for personal authentication is becoming more acceptable these days because it is convenient to use and there is no issue of getting lost like smart cards and no problem of getting forgotten like passwords/PINs. Biometrics deals with physiological or behavioral human traits for authentication of a user. Biometrics provides significant security compared to username/password, smartcards etc. Among biometric traits, keystroke dynamics is most convenient since keyboard is available in most of the computer systems and does not require a special device like other biometric modalities such as fingerprint, palmprint etc. Keystroke dynamics based authentication is concerned with analyzing the human typing rhythm and behavior. Further keystroke dynamics is difficult to conceal and disguise just as human behavior is difficult to copy. Keystroke Dynamics can also be implemented on a network or distributed architecture.

Keystroke dynamics based authentication system is dependent on the individual typing pattern. It is mainly based on how a user types rather than what the user types on keyboard. It measures human typing characteristics which are shown to be unique to an individual and difficult to be copied. In keystroke dynamics, there are mainly two metrics, Dwell Time which is how long a key is pressed and the other is Flight Time which is how long it takes the user to move from one key to the other. As the user types, an application running on the system captures the keystroke dynamics features flight time and dwell time.

There are some publicly available keystroke datasets. Most of these datasets prefer static text entry, for which a user is asked to type a predetermined text string. Some of the static entry dataset are from: Killorhy and Maxion [

Keystroke Dynamics features mainly include: Keystroke Latencies, Dwell Time and Flight Time. Gaines et al. [

For authentication that involves both identification and verification of a user by keystroke dynamics based system, many classifiers have been used. They are divided into three broad categories, viz., statistical methods, neural networks and pattern recognition based techniques.

The statistical methods related to the first category employ statistical tools on basic keystroke features and apply distance metric to authenticate a user. The initial work on Keystroke Dynamics by Gaines et al. [

We now detail out the neural network based approaches under the second category. Giroux et al. [^{m−1} à {−1, 1} is learned from ANN, where x ϵ R^{m−1} denotes the m − 1 keypress interval timing ratios for m-character password and f(x) = [−1, 1] indicates whether the input keypress interval ratios correspond to that user or not. For every individual, a feed-forward ANN is trained with back-propagation, resulting in weights that are subsequently used for authentication. Bleha et al. [

The pattern recognition based techniques falling under the third category are now discussed. Support Vector machine (SVM) based on keystroke latency in [

From the literature survey, it can be seen that most of the approaches on keystroke dynamics are carried out on the created datasets and they report results either on desktop or mobile but not both. It is difficult to compare the performance of different approaches due to lack of common benchmark dataset. So, we have tested the proposed approach on the benchmark datasets under both desktop and mobile environments and the results obtained are found to be superior to the best so far.

The organization of the paper is as follows: Section 2 presents the information set (IS) and some of its properties. It also formulates the IS based features and higher form of IS features. Section 3 develops an algorithm for the two-way information set approach. Section 4 describes the databases for the present work and Section 5 discusses the results of implementation. Section 6 gives the conclusions and the future work.

A fuzzy set deals with vagueness or fuzziness [_{i} in a fuzzy set (F) is denoted by μ_{F}(x_{i}). Given a collection of attribute values

Our primary goal being the representation of overall uncertainty in keystroke dynamics, we are inclined to investigate the suitability of the information set based features. We will now discuss how a fuzzy set paves the way for the information set while representing the uncertainty in its elements using an entropy function.

Consider a set of keystroke timing features T = {T_{ij}} where T_{ij} is the j^{th} feature in i^{th} keystroke sample. When a set of keystroke timing features is fitted with a membership function, denoted by {μ_{ij}}, a pair of keystroke timing value and its membership function forms an element in a fuzzy set. Information set connects the two components of each pair into a single entity called the information value using the Hanman-Anirban Entropy function [

where

and a, b, c and d are real valued parameters which need to be selected appropriately. It may be noted that p represents the probabilities. As shown in [

To bring Equation (1) into the information set domain, let us call the keystroke timing features T_{ij} as the information source values. We then replace the probability p with T_{ij} in Equation (1) and convert the exponential gain function into the Gaussian membership function by selecting the parameters as

A more general entropy function is presented by Mamta and Hanmandlu in [

where

By taking

The product of Information source value and membership function is termed as the information value and this is more general than the one in Equation (2). The sum of all information values,

Definition of Information Set: A set of information values

The properties of information sets are presented in [

1) The membership function can be empowered to act as an agent with the capabilities that are beyond the scope of a fuzzy set. For example, the complement of a membership function can be an agent. Any intuitionist membership function can also be a contender. The membership function can be formed from other information source values not associated with the same fuzzy set. Thus, an agent extends the scope of a fuzzy set.

2) The higher form of information sets called transforms can be derived based on the information values. This is shown in the sequel.

3) The information set arises out of representing the varying information source values in either time or space. For example, a variation in the keystroke data within a sample gives the spatial information values whereas the variation in keystroke timings over a number of samples gives the temporal information values.

4) Information set can represent both probabilistic and possibilistic uncertainties. To represent the probabilistic uncertainty, frequencies of occurrence of the information source values called the probabilities are considered but for the possibilistic uncertainty, attribute values like keystroke timing values are considered.

We will now derive the information set based features. The use of basic information set features like sigmoid and energy appears in [

a) Information Value

The basic information values

where the reference,

b) Complement Information Value

As per the second property of information sets stated above the complement of membership function, i.e.,

where

c) Energy features

As the information value depends on the membership function empowered as an agent, we can generate different kinds of information values by changing the agent. To generate Energy feature, the agent is taken as

So, the complement energy feature is:

d) Sigmoid feature

According to the first property of information set, information value (

e) Multi Quadratic feature

The multi-quadratic function either increases or decreases monotonically from the center. Using this function, the membership function is computed as:

where

f) Inverse Multi Quadratic feature

Inverse multi-quadratic function is the reverse of multi-quadratic function. Membership function for the inverse multi quadratic feature is given by

The inverse multi-quadratic information value is therefore

So far, we have utilized the basic information values for deriving different features. We will now derive higher form of information set based features. This requires us to consider the adaptive Mamta-Hanman entropy function in which the parameters of the exponential gain function are assumed to be variables. Some important properties of this adaptive entropy function are relegated to Appendix A.

a) Hanman Transform

Hanman Transform is a higher form of information derived from the adaptive Mamta-Hanman entropy function in [

where

Proof: By taking

Note that the exponential gain function has its argument as the first-level information value and after evaluation using Equation (12) we get the second-level information value. This is called transform because the original Information source value

The Complement Hanman Transform is easily obtained by setting

b) Shannon Transform

Shannon Transform is an offshoot of Hanman Transform as it can only be derived from the Hanman Transform and its features are shown to be useful in the face recognition in [

Proof:

Again, we resort to the adaptive Mamta-Hanman entropy function (3) and set

where

This is Pal-Pal transform. This can be shown to be equivalent to what we term as the non-linear Shannon transform in Equation (14) where the logarithmic function is operating on the information values. In some applications, the use of complement of μ_{ij} in the transform improves its effectiveness. The Shannon inverse transform where the evaluation of the information source values is based upon the complement agent is expressed as:

In the above transforms, Gaussian membership function as defined in Equation (5) is best suited. These transforms can have realistic applications in social networks though not attempted so far. For example, we gather information about an unknown person of some interest to us. This is the first-level of information and then evaluate him again to get the second-level of information camped with the first-level of information. They can be used to evaluate not only the information source values but also the membership function values to see whether the selected membership function is appropriate.

c) Composite Transform

For creating sigmoid and energy features, we have considered the basic information value

In fact, this is the ij component of the following transform:

By interchanging log and exponential function we can formulate yet another composite transform as given by

As can be noted that the difference between Equations (18) and (19) is that in the former case log function is applied on the Hanman transform whereas in the latter case the exponential function is applied on the Shannon transform. In this paper, we have shown the results of Equation (18).

The Complement Composite Transform is easily obtained by considering the complement Hanman Transform as the unit of information and applying the log function on it. It is given by

d) Convex Hanman-Anirban Entropy Function

Let

where

Its double derivative is therefore

Supposing

If

If we take

We can find its use in the design of or to modify a classifier. Here we use it to modify the Hanman classifier.

We have at our disposal several samples of keystroke dynamics for each user. To calculate the membership function, we have adopted Two-Component information set approach. In this approach, the temporal information I_{1} is the first component for which the membership function _{2} is the second component for which the membership function

Algorithm:

Step 1: Compute mean

Step 2: Compute mean

Step 3: For each training sample, compute

Step 4: Concatenate I_{1} and I_{2} and generate new features such as Information, Energy, Sigmoid, Hanman Transform etc. Then train SVM/Random Forest classifier or Convex Entropy Based Classifier using these features.

Step 5: For each test sample, compute I_{1} using

Step 6: Compute mean _{2} using

Step 7: Concatenate I_{1} and I_{2} to obtain I and use the new features for the classification using SVM/ Random Forest classifier or Convex Entropy Based Classifier.

As I is a feature vector, let us denote the training feature vector of r^{th} sample of l^{th} user by P_{l}(r, k) and Q(k) be the test feature vector where k refers to the k^{th} feature value. The training and testing feature vectors are subjected to min-max normalization. In view of Equation (25), the test feature vector is rewritten as:

Similarly, each training feature vector can also be denoted in the above form as:

a) Use of Conditional Entropy Function

The conditional Hanman-Anirban entropy termed here as conditional possibility, c poss of a test feature vector Q(k) given the training feature vector P_{l}(r, k) is expressed by following [

The conditional possibility of intersection of two training feature vectors given the test feature vector can be written as:

As t-norm being the conjunction operator it gives the minimum difference between any two vectors in (30) where we have used Frank t-norm for t_{F} as it is found to be most effective [

We call

In order to improve the above convex entropy function, we convert it into parametric form:

where

For the evaluation of the keystroke dynamics based authentication system, the following publicly available datasets are availed:

a) CMU Keystroke Dynamics Benchmark Dataset [

This database comprising 51 users is collected in 8 sessions and 50 repetitions of the same password are recorded in each session. We have 400 samples per user. CMU benchmark dataset has keystroke features, viz., DD (Down-Down) time, UD (Up-Down) time and H(Hold) time. A 10 character password (.tie5Roanl) is typed by a user. In our study, we have used H and UD since they give the best results. Accordingly, we have 21 features that include: 11 H values for 10 characters and an enter key, 10 UD values of time latencies between 11 key presses. Considering each of 51 users as both genuine and imposter we have a pool of 51x50 sets of experiments.

Half of feature vectors of every user in each session is treated as the training data and the remaining half as the positive test data, i.e. 200 samples each. In addition to this, the first 5 samples from each of the remaining users are assumed to be the negative test data in every experiment. As demonstrated in [

The authentication accuracy is evaluated using EER (Equal Error Rate) where False Acceptance Rate (FAR) equals False Rejection Rate (FRR) on ROC curve. FAR is the rate at which an unauthorized person (i.e. imposter) would be given access to the system as a genuine user [

For SVM and Random Forest Classifier, the performance measure EER is calculated for each set of genuine and imposter users, i.e. 51 × 50 sets of such experiments. The mean of the performance measure values (EERs) is then calculated for all the experiments. In addition to EER we also report FAR and FRR values and authentication accuracy.

For convex entropy based classifier, the performance is reported in terms of EER(mean), FAR, FRR and Accuracy, calculated as the ratio of number of users correctly classified as genuine/imposter user to the total number of user attempts across all the experiments. FAR is calculated as the ratio of number of users which are incorrectly accepted as genuine to the total number of imposter user attempts across all the experiments. FRR is calculated as the ratio of number of users which are incorrectly rejected as imposter to the total number of genuine user attempts across all the experiments.

b) Sapientia University Keystroke Benchmark Dataset for Android platform [

This data is collected from 42 users with 51 samples per user with at least 2 sessions per user. Each user types the password “.tie5Roanl” on Android based Mobile Devices Nexus 7 Tablet and Mobil LG Optimus L7 II P710. The key sequence resulting from typing the password is “ti e [123?] 5 [abc] [Shift] R [Shift] o a n l” which are 14 key presses. We have used all of 71 features of the dataset given in

For every user 45 samples are randomly selected as the training data and the remaining 6 samples constitute the positive test data. We take 1 sample from each one of the remaining users so as to have 41 negative test samples and include the first 2 samples of the remaining users who are neither genuine nor imposter in the training data resulting in 80 samples for the negative class as the imposter training data. Here again, each one of 42 users is considered as both genuine and imposter to conduct 42 × 41 sets of experiments. The classifier is trained on each of these experiments such that the samples of an imposter are unavailable to the classifier during training. Performance is then measured in terms of error rates EER, FAR, FRR and accuracy.

c) Classification of the proposed features

In our work, we have used three classifiers. The first is two-class SVM classifier with a linear kernel. The second classifier used is Random Forest Classifier, which generates an ensemble of decision trees based on the training data. Every test input vector is evaluated by all decision trees in the Random forest classifier

Feature Name | No. of Features |
---|---|

Key Hold Time (H) | 14 |

Down-Down Time (DD) | 13 |

Up-Down Time (UD) | 13 |

Key Press Pressure (P) | 14 |

Finger Area (FA) | 14 |

Mean Hold Time | 1 |

Mean Finger Area | 1 |

Mean Pressure | 1 |

Total | 71 |

that operates on the principle of majority votes to get the classification vote. In addition to these standard classifiers the proposed convex entropy based classifier is third one discussed in Section 3.2.

Before presenting our results, let us see the state of the art on keystroke dynamics in the literature.

where ^{th} test feature and i^{th} mean vectors respectively and ^{th} feature.

Zhong et al. [

where

Deng and Zhong [

The performance of different information set based features with α = 1 on CMU dataset is listed in

Algorithm | EER |
---|---|

Manhattan(scaled) [ | 0.096 |

Combined Mahalanobis and Manhattan distance [ | 0.084 |

DBN [ | 0.035 |

Classifier | EER |
---|---|

Random Forest | 3.1% |

Bayes Network Classifier | 4.3% |

K-NN | 8.3% |

Features | FAR | FRR | EER (mean) | Accuracy (mean) | |
---|---|---|---|---|---|

Information Feature | 0.0157 | 0.0275 | 0.0201 | 0.9791 | |

Sigmoid Feature | 0.0156 | 0.0277 | 0.0201 | 0.9790 | |

Energy Feature | 0.0193 | 0.0326 | 0.0237 | 0.9748 | |

Multi-Quadratic Feature | 0.0391 | 0.0292 | 0.0334 | 0.9653 | |

Hanman Transform | 0.0291 | 0.0293 | 0.0290 | 0.9708 | |

Complement Hanman Transform | 0.0220 | 0.0235 | 0.0223 | 0.9773 |

SVM. The best EER of 0.0201 is obtained with Information and Sigmoid features. The average ROC for various information set features with α = 1 on CMU dataset is shown in

The features of

Features | FAR | FRR | EER (mean) | Accuracy (mean) |
---|---|---|---|---|

Information Feature | 0.0193 | 0.0308 | 0.0227 | 0.9756 |

Sigmoid Feature | 0.0191 | 0.0305 | 0.0225 | 0.9758 |

Energy Feature | 0.0224 | 0.0379 | 0.0282 | 0.9707 |

Complement Hanman Transform | 0.0357 | 0.0300 | 0.0319 | 0.9668 |

Features | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0147 | 0.0099 | 0.0112 | 0.9875 |

Sigmoid Feature | 0.0147 | 0.0099 | 0.0112 | 0.9874 |

Energy Feature | 0.0126 | 0.0156 | 0.0127 | 0.9861 |

Complement Hanman Transform | 0.0171 | 0.0081 | 0.0113 | 0.9869 |

Composite Transform | 0.0180 | 0.0097 | 0.0125 | 0.9857 |

Complement Composite Transform | 0.0198 | 0.0091 | 0.0129 | 0.9850 |

Features | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0219 | 0.0124 | 0.0158 | 0.9823 |

Sigmoid Feature | 0.0218 | 0.0122 | 0.0159 | 0.9825 |

Energy Feature | 0.0191 | 0.0163 | 0.0168 | 0.9821 |

Complement Hanman Transform | 0.0322 | 0.0181 | 0.0232 | 0.9741 |

Composite Transform | 0.0177 | 0.0080 | 0.0111 | 0.9866 |

Complement Composite Transform | 0.0181 | 0.0081 | 0.0118 | 0.9863 |

The features used in

The features shown in

The results of some of the features of

These features are also tested with α = 2 on SU dataset with Random Forest

Feature | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0077 | 0.0251 | 0.0129 | 0.9846 |

Sigmoid Feature | 0.0077 | 0.0252 | 0.0128 | 0.9845 |

Energy Feature | 0.0084 | 0.0260 | 0.0131 | 0.9838 |

Hanman Transform | 0.0082 | 0.0173 | 0.0103 | 0.9878 |

Multi Quadratic Feature | 0.0114 | 0.0221 | 0.0146 | 0.9838 |

Inverse Multi Quadratic Feature | 0.0128 | 0.0255 | 0.0169 | 0.9815 |

Complement Energy Feature | 0.0128 | 0.0240 | 0.0160 | 0.9822 |

Complement Hanman Transform | 0.0086 | 0.0183 | 0.0110 | 0.9871 |

Complement Information | 0.0139 | 0.0267 | 0.0176 | 0.9804 |

CompositeTransform | 0.0085 | 0.0168 | 0.0104 | 0.9878 |

Feature | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0081 | 0.0242 | 0.0125 | 0.9848 |

Sigmoid Feature | 0.0081 | 0.0244 | 0.0126 | 0.9847 |

Energy Feature | 0.0078 | 0.0265 | 0.0130 | 0.9839 |

Hanman Transform | 0.0090 | 0.0178 | 0.0109 | 0.9871 |

Multi Quadratic Feature | 0.0112 | 0.0213 | 0.0145 | 0.9843 |
---|---|---|---|---|

Inverse Multi Quadratic Feature | 0.0109 | 0.0221 | 0.0148 | 0.9841 |

Complement Energy Feature | 0.0155 | 0.0264 | 0.0180 | 0.9797 |

Complement Hanman Transform | 0.0092 | 0.0177 | 0.0112 | 0.9870 |

Complement Information Feature | 0.0168 | 0.0312 | 0.0210 | 0.9768 |

Composite Transform | 0.0084 | 0.0167 | 0.0102 | 0.9879 |

Feature | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0216 | 0.0605 | 0.0228 | 0.9734 |

Sigmoid Feature | 0.0331 | 0.0554 | 0.0286 | 0.9641 |

Energy Feature | 0.0179 | 0.0676 | 0.0228 | 0.9757 |

Hanman Transform | 0.0279 | 0.0556 | 0.0248 | 0.9686 |

Shannon Transform | 0.0246 | 0.0600 | 0.0262 | 0.9708 |

Composite Transform | 0.0279 | 0.0552 | 0.0268 | 0.9686 |

classifier and the results are shown in

Feature | FAR | FRR | EER (mean) | Accuracy |
---|---|---|---|---|

Information Feature | 0.0225 | 0.0589 | 0.0236 | 0.9728 |

Sigmoid Feature | 0.0488 | 0.0652 | 0.0446 | 0.9491 |

Energy Feature | 0.0198 | 0.0574 | 0.0223 | 0.9754 |

Hanman Transform | 0.0281 | 0.0569 | 0.0219 | 0.9682 |

Shannon Transform | 0.0244 | 0.0518 | 0.0227 | 0.9721 |

Composite Transform | 0.0270 | 0.0540 | 0.0249 | 0.9696 |

less data.

Discussion of Results: Out of 10 features, a subset of 6 features has been found to be effective on implementing two datasets: CMU and SU using three classifiers: SVM, Convex Entropy and Random Forest (Treebagger). The best results on CMU dataset are due to Composite Transform feature with EER of 0.0102 for Treebagger classifier and EER of 0.0111 for Convex Entropy classifier. The EERs obtained by the literature features (See

The possibilistic uncertainty in the keystroke timing values termed as information source values when represented by the entropy function gives rise to the information values which are shown to be the products of information source values and the corresponding membership function values. Two Gaussian membership functions are employed: one using the mean and variance of all the samples which lead to temporal information values and the other using the mean and variance of a single sample which lead to spatial information values. These two kinds of information values, viz., spatial and temporal components are concatenated to provide us the two-component information set (TCIS) features. From the concatenated features, various new features such as Information Value, Energy, Sigmoid, Hanman Transform, Shannon Transform, Multi-quadratic, Composite Transform and their complements are generated. In this work, Hanman Classifier is redesigned by the use of Convex Entropy Function.

TCIS features from two benchmark datasets CMU and SU are classified using Convex Entropy based classifier, SVM and Random Forest classifiers. Their performance is evaluated on the proposed features in terms of error rates (FAR, FRR, EER) and accuracy. These features are also tested on Android Touchscreen based Mobile Keystroke Dataset and the performance of these features outperforms that of the literature features.

We plan to extend this work by considering new features based on information set theory and type-2 and interval fuzzy sets. It is observed that the efficiency of feature type is dependent on database. Out of all the features investigated, Sigmoid, Energy, Hanman transform and Composite transform features have made their mark as the effective features. We have not attempted the fusion of the effective features. If these features are fused by either at the feature level or score level, then the fused feature vector is likely to outperform on all the datasets considered.

There are two limitations of the proposed approach. The first limitation is that it is not suitable for capturing global characteristics as its main forte is in local characteristics. The second limitation is the choice of membership function. Generally Gaussian function serves as an effective membership function.

Bhatia, A. and Hanmandlu, M. (2017) Keystroke Dynamics Based Authentication Using Information Sets. Journal of Modern Physics, 8, 1557-1583. https://doi.org/10.4236/jmp.2017.89094

Let us make H adaptive in (3) by considering the parameters as functions rather than constants as per the original definition in [

In this the parameters

where

Properties of Adaptive Entropy Function

1)

functions and H being the sum of continuous functions is also a continuous function.

2)

3) With the increase in

4) If

Hence this is proved.

5) Note that

To prove that this is concave the Hessian matrix must be negative definite. The Hessian is computed as follows:

As c_{i}, p_{i} are in [0, 1].

where

6) Entropy H is maximum when all p_{i}’s are equal. In other words,

That is,

In that case,

7) The entropy is minimum if and only if all p_{i}’s except 1 are equal to zeros and single p_{i} = 1.

To make better representation of uncertainty, we will introduce higher form of uncertainty representation.

Submit or recommend next manuscript to SCIRP and we will provide best service for you:

Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.

A wide selection of journals (inclusive of 9 subjects, more than 200 journals)

Providing 24-hour high-quality service

User-friendly online submission system

Fair and swift peer-review system

Efficient typesetting and proofreading procedure

Display of the result of downloads and visits, as well as the number of cited articles

Maximum dissemination of your research work

Submit your manuscript at: http://papersubmission.scirp.org/

Or contact jmp@scirp.org