Grӧbner Bases Method for Biometric Traits Identification and Encryption

Biometric identification systems are principally related to the information security as well as data protection and encryption. The paper proposes a method to integrate biometrics data encryption and authentication into error correction techniques. The normal methods of biometric templates matching are replaced by a more powerful and high quality identification approach based on Grӧbner bases computations. In the normal biometric systems, where the data are always noisy, an approximate matching is expected; however, our cryptographic method gives particularly exact matching.


Introduction
Digital data sent over communication channels are subject to distorting as a result of various circumstances such as electromagnetic fluctuations.Also, it might not be able to restore correct data from a hard disk or a digital audio (or video) system as most of the storage media is liable for errors.Another extreme example is the images remitted from space-probes, where a considerable error rate takes place and re-transmission is often not possible.The last example is the biometric feature vectors made of the attributes of the individuals which are noisy by nature.The consequence is that the digital data received (read or captured) may be not the same as the primarily sent (stored or enrolled).In this regard codes provide a methodical technique to transmit messages, with some supplementary information (check digits) in such a way that an error occurring in the original messages will not only be detected by the receiver, but in many cases, it could be corrected.More algebraic structures should be added to the code spaces to solve the decoding and encoding problems efficiently.In particular, linear codes are excessively exploited for controlling errors because they are well understood, powerful and easy to generate.
Biometrics can be defined as a largely automated measurement of physiological and behavioral characteristics that are used to prove or confirm identities of human beings [1] [2].The physiological characteristics are the physical human traits such as hand shape, face, fingerprints, eyes, ear shapes, and hand veins.The behavioral characteristics of human beings are the way to sign their names, walk, speak or even keystroke dynamics; interested readers are referred to the literature [3]- [9].Biometric identification methods show several particularities over traditional methods such as identification cards (tokens) and personal identification numbers (passwords) for many reasons.For example, users to be identified have to attend physically at the particular place of identification [10].Also, biometric based identification systems avoid carrying tokens or remembering passwords.In addition, biometrics prevents misuse of stolen identification cards, credit cards and passports.Eventually, biometric systems can work in verification (authentication) or identification modes [11].Several serious matters have to be considered when designing a successful biometric system such as users must be first enrolled in the system as biometric templates which are securely stored in a database or on smart cards issued to the users.These templates can be used for matching when the users need to be identified or authorized to log into the secured system.This means that before the system can be used, during the enrollment step, the biometric imprints are acquired and data extracted from that imprints are saved as biometric templates in a central database.Later, during the authentication step, a new biometric imprint is acquired and the data extracted from that imprint are compared with the reference template [12].The result of that comparison, with (a very small) percentage of error, is a match or no match.
Once a biometric trait is captured, residual random noise is removed by using filters (as a directed smoothing process); see for instance [13].Then, the extracted feature vector or codeword, as we will more commonly call (see below), is turned out as a binary vector.Since the binary feature vectors of biometric templates acquired from the same person are most probably different from each other, it is necessary to detect and rectify the difference between the data acquired in the enrollment and verification steps [14] [15].This correction takes the place of the normal templates matching in present biometric systems.Therefore, the matching between the enrolled and verified feature vectors can be modeled as transmitting messages through a noisy communication channel.Thus, the proposed matching algorithm measures diversities between the extracted and enrolled codewords.Each codeword (or bit string) is represented (or encoded) into secure data by computing its syndrome with respect to a preferable low-density parity-check (LDPC) code.The LDPC codes were first discovered by Gallager in 1962 [16].In this work we develop a new biometric authentication algorithm that is not only based on the data contained in biometric templates but also on a randomly selected codeword from an LDPC code.Consequently, we propose a syndrome decoding algorithm (as replacement of matching process) based on Grӧbner bases calculations as a decoding scheme for the combination of data contained in biometric traits and suggested codewords.
A Grӧbner basis as a vigorous tool initiated from a commutative algebra is defined as a set of polynomials computed, using Buchberger's algorithm [17], from another set of polynomials.Grӧbner bases algorithms have been comprehensively studied, revised and implemented on most computer-algebra systems.In addition, Grӧbner bases have many interesting properties and applications in commutative (and non-commutative) algebra [18]- [20].For instance, we can decide ideal membership or ideal equivalence with the aid of Grӧbner bases.Also, the elimination property of Grӧbner bases enables us to solve non-linear systems of algebraic equations in multiple variables.The application that we are actually interested in is to use Grӧbner bases in cryptography and, therefore, in biometric identification systems.In the case of binary linear codes, the Grӧbner bases will consist of all binomials which correspond to the problem's codewords.The first connection between linear codes and Grӧbner bases was established in [21].The main obstacle to our algorithm is that the computations of Grӧbner bases are expensive, and (in non-commutative algebras) are not guaranteed to stop [22].

Theoretical Background
In this section, we give the substantial background required to understand the syndrome decoding problem and present some principles of linear codes.We also recall the elementary theory of Grӧbner basis algorithm for the case of multivariate polynomials.We show that the basic component of the Grӧbner basis theory is the concept of polynomial reduction that is used to compute the appropriately defined normal form of a specified polynomial.

Syndrome Decoding Problem
be the finite field of two elements and let n and k be positive integers with k n ≤ .In this work we assume that the feature vectors that represent biometric traits are given as elements of the vector space n F .The (Hamming) weight of a word n ∈ w F is defined to be the number of nonzero entries in w and the (Hamming) distance of , n ∈ v w F , denoted by ( ) , dist v w , is the weight of the difference − v w .A coding function φ is defined by the injective mapping Let C be an ( ) , n k -code for some n and k.We say that The elements of C are written as row vectors and are called codewords.For d ∈ N , we consider an ( ) n k d -code as an ( ) , n k -code for which d is the minimum distance between any two distinct codewords.A priority of linear codes over any other arbitrary codes is that the minimum distance between any two codewords is much easier to calculate.Here, we assume that there exists an effectual algorithm that has the ability to detect up to t errors in a given corrupted pattern, where 2 1 + , then any error pattern containing t or fewer errors can be corrected and any error pattern containing 1 t + errors can be detected.
A generator matrix for the code C is a matrix , where the first k-entries of the transmitted codeword G w contain the message vector w .During the identification, it is possible that several bits of z are changed and, hence, an incorrect word y is obtained.Therefore, we need to solve the decoding problem, that is, , where d is the minimum distance of any two different codewords, then x is equal to the original vector z .
The parity-check binary matrix of such code C is defined by the ( ) We define the syndrome ( ) In this respect, each codeword turned out from the standard generator matrix G satisfies the condition 0 H = w .Given a paritycheck matrix H of a code C , the problem is to determine the minimum distance and the weight of such code, see e.g.[23].We will solve this problem by computing the Grӧbner basis for an ideal from the parity-check matrix.

Grӧbner Bases
We introduce some basic definitions which we need to explain the Grӧbner basis theory.We only cite the theorem for the existence of the Grӧbner basis of an ideal.In 1965 Buchberger [17] gave an appropriate framework for the study of polynomial ideals in multivariate polynomials in commuting n variables over a computable field, with an introduction of Grӧbner basis.Furthermore, Mora [19] presented an algorithm for multivariate polynomials in non-commuting n variables over a computable field.We can say that both Buchberger and Mora algorithms, which based on a generalization of the Euclidean division algorithm to several variables, use the reality that coefficients are in a specified field.Therefore, given any two polynomials f and g, we can write f as f gq r = + , where r has lower degree than g or r is equal to 0. For polynomials of one variable, this gives an algorithm for ideal membership: f is in the ideal induced by g if and only if r = 0. Now, if 1 , , s I g g =  then, using the generalized division algorithm, we can write a polynomial f as f g r = + , where g I ∈ while no term of r is divisible by any of the leading terms of the i g .For the case of one variable polynomials, we follow the natural ordering 2 1 X X < < < while, in several variables case, there are numerous preferences for the term ordering, see [18].It is worth noting that the computation of Grӧbner basis could vary according to the type of ordering.A Grӧbner basis for an ideal has the property 0 f I r ∈ ⇔ =.
Buchberger not only proved that each ideal has a basis for which the problem of ideal membership is computationally solvable, he also described an algorithm that can be exploited to get such a basis.Here, we should mention that polynomial reduction is the cornerstone in the Grӧbner bases algorithms as it represents the most intensive portion in terms of computations.In this regard we say that a polynomial f reduces to a different polynomial r, denoted as f r → , if and only if r is the remainder of f upon division by some polynomial set F. The polynomial reduction is not only recognized for one reducing polynomial instead it is also defined for sets of polynomials.For any polynomials ( ) LM f is the leading monomial of f.Now, the S-polynomial of f and g is written in terms of the two polynomials as LT f is the leading term of f.Of course, in non-commutative case the situation is more sophisticated since the monomials are words and there can be either more than one S-polynomial or none.A finite set of polynomials Now we are in a position to give the layout of the Buchberger algorithm.It launches with the initial basis then h is appended to the basis.The process, with other added technical details, is reiterated till we obtain a basis satisfying the condition that is mentioned above.In the commutative situation, Buchberger showed that the process constantly terminates and gives at the end a Grӧbner basis.On the other hand, Mora noted that the process in non-commutative case does not always terminate-but, when it does, it should produce a Grӧbner basis.
For the proofs of the existence and uniqueness of a Grӧbner basis (in fact reduced Grӧbner basis) G for an ideal and the improvements of Buchberger's algorithm, readers are referred to [18].

Syndrome Decoding and Grӧbner Bases
The interaction between coding theory and Grӧbner bases has been observed from the property that each function from n F to F can be represented as a polynomial in [ ] K X .This section presents a modelling of the syndrome decoding problem to find the distance of a code.This can be achieved via computation of a Grӧbner basis in terms of an ideal and a solution of (corresponding) system of equations [24].Let = + v w e be a received corrupted word with a t-error, where ∈ w C is the codeword that was sent and e is the error vector.Since 0 H = w , then the syndromes of v and e with respect to H are equal and can be written as a linear combination of t rows of H as ( ) ( ) In order to solve the syndrome decoding problem we need to find such linear combination that gives the syndrome vector.
Let H BA = , where B is an n n × matrix with columns 1 , , n b b  .The matrix B is invertible provided that its column form a basis for n F .Now, the unknown syndrome ( ) u e of the error vector e with respect to B is defined as ( ) u B = e e .The entries of ( )  .Then, we can the error vector as ( ) . Thus, the idea is to find the unknown syndrome of an error vector with respect to some fixed bases B.
For any two vectors , n ∈ x y F we define coordinate-wise star product by ( ) We can write the star product of two columns i b and j b of B as a linear combination of 1 , , where ij l u ∈ F called the structure constants of the basis 1 , , n b b  .We define the n n × matrix of unknown syndrome of e whose rank is equal to the weight of e as ( ) ( ) The relation between

( ) U e and ( )
u e is given by ( ) ( ) Now we can express the unknown syndromes in terms of the known syndromes as ( ) ( ) ( ) ( ) Here, we have a system of n k − linear equations in the n unknowns ( ) ( ) Since the rank of ( ) U e is t, then there exists By representing ij u and , 1 , we get Let the unknowns , respectively.We have the combined system, ( ) The ideal generated by the combined system defines the set of solutions that satisfies both systems.The reduced Grӧbner basis of the combined system with respect to a monomial ordering takes the form where t is the smallest positive integer such that the system has solution (see the proof and some details on all of the above in Bulygin and Pellikaan [25]).The unique solution of the system corresponds to the unknown syndrome ( ) u e .Assuming that the number of error capacity of the proposed code exceeds the number of occurred errors, then the syndrome decoding algorithm using Grӧbner bases can be successfully formulated and implemented.

Biometric Matching Algorithm
In [26] [27] we showed that the biometric imprint (for security reason) is stored in encrypted binary form instead of plain-text.The biometric authentication device should agree access as long as two (enrolled and identified) biometric feature vectors do not differ by more than a definite amount of bits (the threshold).Here, the biometric identification systems should be able to deal with a large amount of bits.One more difficulty is how to design a system that can remedy 10% to 20% anticipated errors in extracted feature vectors [28].Therefore, we require an ( ) , , n k d -code with considerable dimension over the binary field.In addition to that we request codes with large relative minimum distance.Only low rate codes can afford these requirements.In this regard, the LDPC codes are good candidates.The number of non-zero entries, which is also fixed for each row and column, in the parity-check matrix of an LDPC code is small relative to the dimension of the matrix.Moreover, LDPC codes which are actually constitute a large family of codes are linear block codes and can handle relatively high error rates.
A biometric image is acquired and a feature vector is extracted from an enhance version of such image.Let n ∈ x F be the feature vector that we want to enroll on the database.The system requires to choose a random codeword ∈ y C .The algorithm then computes the vector − x y and stores a corrected and encrypted version of the vector on the database.In Algorithm 1 we formulate the enrollment process of a given feature vector of a biometric trait.Set the system ( )

Repeat
Find the reduced Grӧbner bases G of ( ) , J t v with respected the specified ordering The authentication process is similar to the enrollment process for any new acquired biometric image.The result of such process is called the match score.Here, we assume that the number of errors is fewer than the code correcting capacity.Algorithm 2 shows how to verify (match)a feature vector of a new captured biometric trait.Set the system ( ) Find the reduced Grӧbner bases G of ( ) , J t v with respected the specified ordering Find the vector ( ) u e of unknown syndromes of v Computethe error vector ( ) If the word w equalsthe first k bits of ⊕ v e , then matching is accepted Our technique is in fact different from conventional biometric authentication techniques which use numerical measure of the similarity of two biometric traits acquired at enrollment and verification steps.These conventional biometric systems require powerful digital signal processing algorithms in order to enhance the captured images before extracting the hidden characteristics.This process, which is called feature vector extraction, indeed plays the most critical part of biometrics identification.Our method is able to overcome most of the problems which might be resulting from the extraction of the biometric information as binary feature vectors from the realization of biometric traits.

Experimental Result
The implementation (as a proof-of-concept prototype) of our more promising approach has been done using various feature vectors of fingerprints and palm vein images as test data.We evaluated the algorithm using samples of 50 different users, with 5 samples per each user.The algorithm was implemented using an interpreted code as well as several built-in functions of MAGMA [29].Because of our particular and special error correction and encryption approach the feature vectors exploited here are different from the biometric feature vectors used in the traditional biometric systems.For example, the dimension of the required code is minimal compared to other cryptographic based biometric systems.Using our approach, we can accept poor-quality biometric images which is always rejected (before matching) in most of conventional biometric technologies.Our approach also enables the use of low-cost sensors or even wireless biometric systems.For the used segmentation process and feature vectors extraction strategies the reader is referred to the literature [26] [27].The LDPC code (with appropriate dimension), together with its standard generated matrix, was taken as the decoding strategy and proved to achieve optimal performance in terms of templates matching.We obtained an equal error rate of 0.1% for both false acceptance rate and false rejection rate, while the rate is about 5% with a template matcher in most conventional biometric systems.
Although the proposed approach does not yet fulfil the anticipated performance in terms of a Grӧbner basis computation complexity and latency, it does provide a low-cost secure biometric encryption architecture.On top of that it reveals various factors and provides beneficial insights that motivate the researchers in the area of integrating computational algebra with biometrics.The method is also suitable for other biometric systems such as iris biometrics that seem also very promising.All of these reasons make the algorithm feasible for different practical uses.

Conclusion
Even though many commercial and academic systems for biometrics identification are working out, the considerable number of publications on this domain states the necessity for extensive research for the sake of obtaining better performance and enhancing the reliability of such systems.In this paper, the problem for merging biometrics and cryptography was tackled.We used an algebraic method that allowed an exact recovery of a given a binary word (representing a biometric feature vector) using a randomly chosen word from a proposed code.We showed how to match a feature vector of a biometric trait by exploiting the theory of error-correcting codes over the field of two elements 2 F .Special attention was given to linear codes since these codes could be defined using generator matrices.In this regard, a linear code was considered as a subspace of the vector space 2 n F .The problem of errors decoding was shown to be equivalent to the problem of calculating the ideal generated by a set of polynomials.The algorithm is also applicable to general codes over q F , where q is a power of prime number.Our next aim is to give an accurate analysis of the efficiency of the algorithm.

Algorithm 1 :
Enrollment process Givena biometricfeature vector x , a randomly chosen word w of length k and a generated matrix G that defined the error correcting code Encodethe word w as a codeword G = y w of length n Find the encrypted feature vector = ⊕ v x y Store the first k bits, ′w , of the corrected codeword ⊕ v e and the word w together with the user information in the central database

Algorithm 2 :
Authentication processGivena user feature vector x and the corrected encrypted version of the feature vector,