Mexican Sign Language Recognition Using Jacobi-Fourier Moments

The present work introduces a system for recognizing static signs in Mexican Sign Language (MSL) using Jacobi-Fourier Moments (JFMs) and Artificial Neural Networks (ANN). The original color images of static signs are cropped, segmented and converted to grayscale. Then to reduce computational costs 64 JFMs were calculated to represent each image. The JFMs are sorted to select a subset that improves recognition according to a metric proposed by us based on a ratio between dispersion measures. Using WEKA software to test a Multilayer-Perceptron with this subset of JFMs reached 95% of recognition rate.


Introduction
Sign Language Recognition (SLR) is a research field that has grown in recent years; researchers around the world are increasingly interested more in this area.Sign Language (SL) is the main and the most natural form of communication for unhearing community; however, most people interact verbally, leading to the SL mainly limited to the deaf people and closest hearing people that interact with them [1].SL is not universal; each country or region has its own SL, Mexican Sign Language (MSL), American Sign Language (ASL), Chinese Sign Language (ChSL), Japanese Sign Language(JSL), Persian Sign Language (PSL); to name a few, there are differences between them depending on their uses and customs where they are used [2].
In order to express SL it usually involves the use of hands, arms, body movements and facial expressions; because of this, the development of a system to recognize all these expressions is a complex task; in fact most of the systems are very limited.SLR process can be classified into two main classes based on how they acquire information; the first type uses Digital Image Processing, allowing users to interact in a more natural way with the system, but it is more difficult to acquire accurate data because most of these proposals work with 2D images, so it is difficult to follow the position or movement of the fingers or hand shape itself; on the other hand the second type SLRs use electronic devices physically connected to the user's body allowing to acquire accurate data of position, movement or velocity fingers and other points of interest, with the disadvantage of not letting free movement of users.
This is an extended work previously published in this journal [3]; the aim of this report is to recognize static signs by digital image processing (without movement sensors, special wires or any electronic device attached to signer) avoiding the use of special color markers or clothes.

Database
Mexican sign language consists in 26 signs, two of them are expressed by movement ("j" and "z"), for this reason the database created has 24 static signs (see Figure 1) were selected.A solid white background was used for segmentation purposes; images were captured by digital Canon EOS rebel T3 EF-S 18 -55 camera using flash mode in order to decrease shadows.Five versions per sign were captured from single signer.

Jacobi-Fourier Moments
The technique named Jacobi-Fourier Moments [4] (JFMs) is a powerful tool extensively used in image analysis.JFMs are useful to extract relevant information from a function (in this case image of sign segmented in gray scale) and they are able to represent this function with few data with minimum redundancy due to its orthogonality property.
General expression of JFMs is expressed as where n denotes order and m repetition,

( )
, f r θ is the image function in polar coordinates and ( ) where e jmθ is the Fourier term and ( ) α β is the radial orthogonal Jacobi polynomial expressed as , , , , n b α β are Jacobi polynomials, weight function and normalization constant respectively and can be described using gamma function (Γ) as [5]: The restrictions for α and β are 0

Proposed System
Figure 2 shows the block diagram of proposed system.Original image can be seen in Figure 2(a).In order to reduce computational costs a Region Of Interest (ROI) was selected by cutting the original image (see Figure 2(b)).Figure 2(c) illustrates the segmented alphabet "A" represented in gray scale which is used to calculate 64 JFMs (Figure 2(d)).JFMs are used as descriptors of signs, they use four parameters (order p, repetition q, α and β).Experimentally we found best results in recognition rate for this database when 1 α β = = .64 JFMs were computed the combinations of 0,1, 2, , 7 p =  and 0,1, 2,3, , 7 q =  , this features were also experimentally adjusted.Then the 64 JFMs were sorted according to a metric that we propose that measures the performance of each JFM (process in Figure 2(e)).Finally 64 test are computed using a Multilayer Perceptron in WEKA [6], first test only uses the first JFM (best), second test uses first two sorted JFMs, third test uses first three sorted JFMs, and so on, the test number 64 uses all 64 JFMs (process in Figure 2(f)).
The metric proposed to sort the JFMs according to its performance in order to do a feature selection is described as follows.First a matrix [ ] M N × D of Descriptors (JFMs) is defined to represent the JFM calculated in database, where M and n represent signs and versions respectively.A desirable JFM should be similar (numerically) when is calculated on different versions of same sign and should change when is calculated on different signs, this means that a JFM (with a particular α, β, p and q) computed in all database and represented by matrix [ ] D should be invariant along each row (low dispersion) and at the same time should be variant along each column (high dispersion).
In order to achieve a metric which considers the above mentioned some data are calculated.First a vector is computed to get the averages of versions per sign as , 1 1 1, 2, , , where i DN stores the mean of versions per sign.This vector is used to calculate variance of versions as ( ) 1, 2, , , which is expected to be close to cero (minimum dispersion in versions), because descriptors should not change between the versions of same sign.Then versions average is calculated as in order to compute variance of versions as ( ) This two dispersion metrics (SDN i -variance of versions and S DN -variance of signs) are used to get a metric by a pondered ratio between them that estimates within a single value whether a JFM is good or not, this metric is expressed as where mo is the objective metric that determines whether a JFM is good, this means that if 0 S DN  and 0 i SDN ≅ then 1 mo ≅ , this value for mo is considered as desirable (minimum variance in versions and big va- riance in signs) for 1 mo  means that is not a good descriptor due to 0 i V  or 0 VS ≅ or both (big va- riance in versions and/or minimum variance in signs).
64 JFMs were calculated and sorted by mo metric in ascending order then 64 tests were made in WEKA [6] using a Multilayer Perceptron (first introduced by Rosenblatt [7]).First test uses only a single descriptor to represent each image for all database, second test uses two descriptors, and so on.Last test uses all 64 descriptors to represent each image.Every test was made using cross validation.
Table 1 shows the results of each classification test, first test which uses the JFM with p = 0 and q = 0 achieves 8.3333% of recognition rate, second test uses two JFMs (p = 0, q = 0 and p = 0, q = 2) achieving 8.3333%, the best subset is the one which uses the first 27 JFMs to represent each image for all database achieving 95.0% of recognition rate.

Conclusion
JFMs can be used to extract descriptors of static signs.JFMs reduce computational cost for MSL recognition since an image can be represented by only 27 values.MSL recognition can be achieved without using gloves or special markers (using a special white background).A Multilayer Perceptron can be used to classify the signs using the JFMs and can achieve 95% of recognition rate in a cross validation scheme.The proposed metric can improve the global recognition rate; this can be seen in Table 1 which shows that using all 64 JFMs 89.1667% of recognition rate was achieved and using the first 27 JFMs improves for this database the recognition rate in almost 6%.

Figure 1 .
Figure 1.Static Mexican Sign Language (MSL) alphabets captured with a white background and avoiding the use of gloves or special color markers.

Figure 2 .
Figure 2. Block diagram of proposed system.(a) Original captured image of alphabet "A"; (b) cropped image; (c) segmented and RGB to gray scale converted; (d) 64 JFMs computed from "(c)"; (e) JFMs subset computed according to metric proposed and (f) database classification using "(e)" and Multilayer Perceptron.

Table 1 .
64 classification tests using a Multilayer Perceptron.