Static Digits Recognition Using Rotational Signatures and Hu Moments with a Multilayer Perceptron *

This paper presents two systems for recognizing static signs (digits) from American Sign Language (ASL). These systems avoid the use color marks, or gloves, using instead, low-pass and high-pass filters in space and frequency domains, and color space transformations. First system used rotational signatures based on a correlation operator; minimum distance was used for the classification task. Second system computed the seven Hu invariants from binary images; these descriptors fed to a Multi-Layer Perceptron (MLP) in order to recognize the 9 different classes. First system achieves 100% of recognition rate with leaving-one-out validation and second experiment performs 96.7% of recognition rate with Hu moments and 100% using 36 normalized moments and k-fold cross validation.


Introduction
Sign Language is the basic way for communication of deaf community, and consists of a set of signs represented by different static hand shapes (dynamic in some cases) [1].Sign Language is not universal; there are several alphabets such as American Sign Language (ASL), Taiwan Sign Language (TSL), Mexican Sign Language (MSL), German Sign Language (GSL), Persian Sign Language (PSL) and so on.Sign Languages change due to different customs and cultures that characterize each region or country.
The whole set of movements that people make in order to express some idea in Sign Language, generally includes facial gestures, corporal postures, arms and hand expressions, among others [2]; this is why Sign Language Recognition is too complex.This leads most of the works to limit the problem with some predefined conditions.
Several papers focus on static hand gestures which contain mainly information about static alphabet signs of a certain Sign Language [3]- [5], reporting from 92% to 98% of recognition rate approximately.Working only with this data, the dimension of the problem is reduced; nevertheless, due to the freedom degrees of human hand, recognizing static signs of any language is a complex task.
In Sign Language, a lot of research work has been done over last decades.In general terms, Sign Language Recognition could be classified in two main categories: those using computer vision systems and those using electronic sensors attached to the fingers, hands or articulations [6].First ones allow expressing signs in a more natural way because signer doesn't need to be physically connected to the system by wires, batteries or some other hardware, nevertheless with this systems, signs are much complex to analyze because it's hard to get accurate data of fingers, like position, orientation and so on, from 2D images.In the other hand, electronic systems that use motion sensors or similar hardware can generate precise data of hand gesture with or without CCD's or digital cameras, but these systems do not allow signers to express themselves freely, because hands and fingers are connected to some hardware.

Databases
In order to see some visual dynamics of rotational-correlation signatures of static signs, for this work was selected a set of 9 digit signs of American Sign Language (ASL).Five signs (1 -5) face the dorsum to the digital camera, the remaining four (6 -9) face the palm to the sensor.
These datasets don't use electronic hardware attached to the hand or fingers, neither special color markers nor gloves, in order to analyze a more natural way of the signs.Signs were captured by EOS 1100D digital Canon camera with CMOS sensor.
Figure 1 shows a complete version of the 9 static signs, this dataset was named "Database01" and was used to analyze the rotational-correlation signatures, in order to see if this technique can represent the extended fingers.Database01 has 5 versions per sign or digit; files have 2848 width with 4272 height pixels of image size, each pixel has 24 bits of depth.The images of this database were stored in the RGB space color.
Figure 2 shows pictures from dataset named "Database02", which has 3 versions per sign.The images displayed were cropped from the original ones, in order to get a better look of the gestures.
Database02 is used to perform the 2 nd experiment by using Hu moments, original images were of 1280 height with 720 width pixels; color is represented by RGB channels.

Rotational-Correlation Operator with Minimum Distance
Correlation can be expressed as this operator can be used to generate a rotational-correlation signature [7] when h is a rotated version of f which contains a binary object.Function g could be represented by a single scalar via the maximum correlation value, if f rotates 360 times with θ = 1, 2, 3, •••, 360˚ then f could be characterized by a set of 360 maximum correlation values, normalizing these data it's obtained a rotational-correlation signature.As shown in Figure 3, binary objects can be described by rotational signatures.This technique has some valuable interesting properties for this particular problem.Rotational-correlation signatures are scale and rotation invariants besides don't require a continuous border along the hand shape perimeter.
The scheme of first system is shown in Figure 4 where it can be watched original image of digit 3 version 1 at the beginning, then original RGB image is transformed into HSI space color and Saturation channel was experimentally selected to achieve the best segmented image, this last image is binarized and finally the rotationalcorrelation signature is calculated with 360 values.
Nine static signs that represent digits from one to nine respectively, captured with digital camera avoiding the use of gloves or special color markers    Figure 5 shows nine graphs corresponding from digit 1 to 9 respectively, signs 1 and 2 are similar and sign 5 is the most dynamic one.Using this technique with minimum distance the recognition rate achieves 100% in a leave-one-out scheme.

Hu Moments with MLP
Invariant moments describe geometrically objects from digital images by values that tend to maintain constant despite changes in translation, rotation, scale and others.Low order moments give some geometrical information Each graph represents five rotational-correlation signatures from one to nine respectively about the object, such as, area, mass center or mass distribution; if moments were normalized then they could be interpreted as statistical measures such as mean or variance.Geometric moments are expressed as follows: ( ) Central moments are defined as Hu derived his seven invariant moments to rotation [8] and they are given as    and segmented in the Red channel (experimentally selected from RGB and HSI space colors), after been binarized, the sign is enhanced by Laplacian high-pass filter, then 7 Hu moments and 36 normalized moments were extracted, finally a Multilayer Perceptron performs classification in two tests, first with 7 Hu moments where system achieves 96.29% using k-fold crossvalidation supported by Weka Pattern Recognition software, second test achieving 100% of recognition rate with a Multilayer Perceptron and k-fold cross validation using 49 normalized invariants (see Equation ( 3)) by combining the values of p, q = 0, 1, •••, 6.

Conclusions
Two systems were presented for recognition of static signs (digits from one to nine) using two datasets (one per system).First system tested "Database01" with rotational-correlation signatures; this technique is a binary object descriptor and can be used for recognize some static signs, nevertheless some signatures of different classes look similar although with minimum distance the similar classes can be separated numerically achieving indeed 100% of recognition rate.Second system used "Database02" with invariant moments' extraction, 7 Hu moments and 49 normalized moments to achieve a performance of 96.29% and 100% of recognition rate.
One of the most challenging tasks in Sign Language Recognition is the segmentation phase that enhances the hand shape; that's why many authors use gloves or color markers; some other works like this one use a special solid background color and there are other strong limitations about this field.For better segmentation results each database used its own preprocessing: first system changed the color space from RGB to HSI and used Saturation channel to get experimentally a solid hand shape while second system segmented the hand shape in the RGB space color by using the Red channel.
Although the two systems achieve 100% of recognition rate, invariant moments use fewer descriptors (7 or 49) than rotational-correlation signatures (360 in this case); therefore invariant moments allow cheaper computational costs.
are invariant to translation.Normalized moments have a scale factor, they can be expressed by central moments

Figure 6
Figure6shows the scheme of second system starting with the sign 7 version 1 original image that is cropped