^{1}

^{1}

^{1}

^{1}

Aiming at addressing the problem of interactive gesture recognition between lunar robot and astronaut, a novel gesture detection and recognition algorithm is proposed. In gesture detection stage, based on saliency detection via Graph-Based Manifold Ranking (GBMR) algorithm, the depth information of foreground is added to the calculation of superpixel. By increasing the weight of connectivity domains in graph theory model, the foreground boundary is highlighted and the impact of background is weakened. In gesture recognition stage, Pyramid Histogram of Oriented Gradient (PHOG) feature and Gabor amplitude also phase feature of image samples are extracted. To highlight the Gabor amplitude feature, we propose a novel feature calculation by fusing feature in different directions at the same scale. Because of the strong classification capability and not-easy-to-fit advantage of Adaboosting, this paper applies it as the classifier to realize gesture recognition. Experimental results show that the improved gesture detection algorithm can maintain the robustness to influences of complex environment. Based on multi-feature fusion, the error rate of gesture recognition remains at about 4.2%, and the recognition rate is around 95.8%.

In the field of pattern recognition and human-computer interaction, gesture recognition has become one of the research hotspots. For example, when the lunar robot performs the space task, based on the accurate gesture semantics interpretation, the robot can complete the corresponding motion control.

Many researchers focus on the salient object detection problem for image [

The initial gesture recognition process mainly use the machinery and typical equipment, such as data gloves, to obtain manual space information. At present, compared with wearable devices, gesture recognition based on computer vision can adapt to the freedom of human action, so based on this, the researchers have put forward a lot of gesture detection and recognition algorithms. As mentioned by T.H. Kim [

In the gesture detection stage, the state-of-the art algorithm can be divided into two parts: the algorithm based on motion information and the algorithm based on appearance feature extraction. The typical examples in the first one can be listed as the Background Subtraction (BS) method and the Optical Flow (OF) method. BS method needs to obtain the image background in advance, and it is based on the assumption that the color of foreground and background is obviously different, so it is susceptible to the influence of external conditions such as illumination change. OF method doesn’t have to obtain the background image in advance, but it also requires a relatively constant illumination condition. Methods in the second one are modeled under different color spaces, also susceptible to external conditions and not robust to complex background.

In the recognition stage, we also divide the relative algorithms into two parts: the algorithm based on template matching and the algorithm based on artificial neural network. The prior one is to compare target and the template through learning a large number of samples, and the category judgment is carried out by the similarity measure. The second one is also built on the premise of a large number of learning samples. The complexity of the network structure and numbers of parameters make it a great challenge for the practical application.

With the increasing popularity of depth camera, because of its color-data and depth-data simultaneous acquisition capability, more and more visual tasks begin to use deep acquisition equipment. The fusion of depth-data and color-data can contribute to the feature extraction of samples. Based on these image acquisition equipment, by adding depth data, this paper realizes the detection of gesture in complex environment.

In this paper, a novel gesture detection and recognition algorithm is proposed. In gesture detection stage, applying saliency detection via Graph-Based Manifold Ranking (GBMR) algorithm, the depth information of foreground is added to the calculation of superpixel. By increasing the weight of connectivity domains in graph theory model, the foreground boundary is highlighted and the impact of background is weakened. In gesture recognition stage, Pyramid Histogram of Oriented Gradient (PHOG) feature and Gabor amplitude also phase feature of image samples are extracted. To highlight the Gabor amplitude feature, we propose a novel feature calculation by fusing feature in different directions at the same scale. Because of strong classification capability and not-easy-to-fit advantage of Adaboosting, this paper applies it as the classifier to realize gesture recognition. The structure flow of algorithm is shown in

The Saliency Detection via Graph-Based Manifold Ranking (GBMR) was proposed by Chuan Yang [

The GBMR algorithm process can be expressed as follows:

After Superpixel (SP) segmentation process of input image, the K regular graph of single layer image is constructed to establish the relationship between the SP blocks, and MR algorithm is applied to calculate the ranking score between the query point and then on-query point.

Given a dataset X = { x 1 , x 2 , ⋯ , x n } ∈ R m × n , the algorithm uses vectors to record the tagging of data. When y i = 1 , its corresponding x i can be seen as a query point; when y i = 0 , its corresponding x i can be seen as equal-marked data.

Define the ranking function: f : X → R m which is used to output the corresponding rank score f = ( f 1 , f 2 , ⋯ , f n ) T for x i .

1) Construct a graph model G = ( V , E ) based on dataset X, where V is a point set and E is an edge set.

2) Calculation of E-based association matrix W = [ ω i j ] n × n , where

ω i j = exp ( − d i s t 2 ( x i , x j ) σ 2 ) .

3) Calculation of the degree matrix of graph D = diag ( d 11 , d 12 , ⋯ , d n n ) .

4) The manifold ranking function is f * = ( D − α W ) − 1 y .

In the case of SP segmentation of input image, if RGB color information is considered only, the gesture segmentation result is not effective when the background is complicated. The incomplete segmentation of target or the segmentation with parts of background will adversely affect subsequent graph theory modeling and ranking algorithm process. In this paper, we consider adding depth information to the SP segmentation so that the target boundary can be highlighted. In the calculation of boundary weights in graph theory model, depth information is also added to weaken the influences of background.

In this paper, in the process of implementing SLIC superpixel segmentation [

d c = ( l i − l j ) 2 + ( a i − a j ) 2 + ( b i − b j ) 2 (1)

d s = ( D i − D j ) 2 (2)

d x y = ( x i − x j ) 2 + ( y i − y j ) 2 (3)

In the formula, d c represents ( l , a , b ) distance measure in the CIELAB color space of pixels i and j. d s represents the distance measure between depth pixel values of depth image. d x y represents the spatial coordinates of different pixels in the image. As a result, the final distance metric calculation formula can be obtained as follows:

D i s t = ( α ∗ d c ) 2 + ( β ∗ d s ) 2 + ( γ ∗ d x y ) 2 (4)

In the formula, the parameter α , β , γ represent the balance weight of d c , d s , d x y respectively. Based on the above distance measurement method, the boundary of target in the SP segmentation stage is more clear, which can contribute to the subsequent mapping model and ranking score algorithm. The result is shown in

In order to make edge weights of nodes greater in the graph model, this paper considers updating the weight calculation as:

ω i j = exp ( − d i s t 2 ( x i , x j ) σ 1 2 − λ ∗ d i s t 2 ( D i , D j ) σ 2 2 ) (5)

where λ and σ are the balance coefficients. The d i s t 2 ( x i , x j ) and d i s t 2 ( D i , D j ) of each sub-block are measured by χ 2 distance.

The experimental hardware environment in this paper is Intel Core i3 processor, the main frequency 3.60 GHz. We select the ChaLearn Kinect dataset [

The PHOG feature idea was initially proposed by Anna Boschd [

1) After segmenting the Region of Interest (ROI) in the input images and RGB image grayscale processing, canny operator is applied to obtain the edge information of image.

2) Image layering. The first layer focused on the entire input image, labeled as Level = 0. The second layer will be image 2 * 2 separation, labeled as Level = 1. The third layer will image 4 * 4 separation, labeled as Level = 2.

3) Calculate the gradient direction by pixel at each layer. Divide π or 2π angle into several parts, and obtain the statistical histogram to generate the one-dimensional vector (HOG feature).

4) Combine the HOG feature levels of each layer to obtain the PHOG features of the entire image.

In this paper, the example result of PHOG features extraction is shown in

The Gabor feature origins from Gabor transform, which is extended from one-dimensional Gabor filter to two-dimensional image feature extraction. The two-dimensional Gabor kernel function is defined as:

ϕ u , v ( z ) = ‖ k u , v ‖ 2 σ 2 exp ( − ‖ k u , v ‖ 2 ‖ z ‖ 2 2 σ 2 ) ⋅ ( exp ( i k u , v z ) − exp ( − σ 2 2 ) ) (6)

In the formula, u and v represent the direction and scale of the Gabor nucleus respectively, z = ( x , y ) represents the coordinates of a given point in the image, k u , v represents the central frequency of the filter. k u , v on one certain direction and scale can be calculated as:

k u , v = k v e i ϕ u (7)

In the formula, k v = k m a x / f v , v ∈ { 0 , 1 , 2 , 3 , 4 } ; ϕ u = π u 8 , u ∈ { 0 , 1 , 2 , ⋯ , 7 } .

If the grayscale value of the input graph is I ( z ) , the Gabor feature of image is the convolutional result of I ( z ) and Gabor kernel function. The result can be expressed as:

Q u , v ( z ) = I ( z ) ⊗ ϕ u , v ( z ) (8)

In the formula, Q u , v ( z ) represents the feature description of the image I ( z ) in the u direction and v scale.

As shown in

When extracting the feature data of input image, grayscale processing is firstly implemented. In this paper, the Gabor conversion of image is carried out by 8 directions and 5 scale Gabor filters. If 40 image features are cascade directly, the dimension of feature data will be expanded 40 times. And some data for the description of image will not have much impact on the whole, which results in the redundancy. In this paper, by fusing the feature of Gabor in different directions on the same scale, the dimension is lowered while retaining the valid ones.

Take the maximum value of Gabor feature in different directions on the same scale, that is,

Q ′ v ( z ) = max ( Q u , v ( z ) ) u ∈ { 0 , 1 , 2 , ⋯ , 7 } (9)

In the formula, Q ′ v ( z ) represents the eigenvalues of each v scale with different directional feature fusion, and z = ( x , y ) represents the coordinates of a given point in image. The fused feature result is shown in

In order to further reduce the Gabor feature dimension, the Gabor fusion diagrams of each scale are divided into non-overlapping sub-diagram. The mean and standard deviation of each sub-graph are calculated and recorded as ( m , γ ) . Each sub-graph ( m , γ ) are cascade, which constitutes the eigenvectors of the fusion graph. Assuming that the fusion graph of each scale is divided into

n sub-graph and Gabor scale parameter is set as v, the final Gabor feature dimension of image to be measured will be 2 × n × v .

When the Gabor filter is carried out, the amplitude and phase feature are output. The real part and the imaginary part of the Gabor filter coefficient are obtained by the Quadrant Binary encoding (QBC) for each pixel point. The phase feature is described by Local XOR Pattern (LXP) operator. It is improved by the Local Binary Pattern (LBP) [

After the PHOG feature of image samples and the fused Gabor amplitude feature and phase feature are obtained, the data can be merged as the final feature vector output of input image.

Adaboosting classifier is an adaptive enhanced high-precision classifier, which puts the weak classification algorithm as the base classification algorithm in the BOOSTING [

In the experiments, gesture recognition samples are selected from the American Sign Language (ASL) database, with 12210 samples of 6 categories. The number of samples for each category is shown in

The comparison of different algorithms is shown in

In this paper, an improved gesture detection and recognition algorithm is proposed, the contributions can be expressed as follows:

1) In gesture detection stage, the GBMR algorithm is improved. The depth information is added to the boundary weight calculation of SP segmentation and graph theory model, highlighting the boundary of target region and weakening background impact.

2) In gesture recognition stage, the Gabor amplitude feature fusion is carried out in different directions on the same scale, highlighting texture information, and the dimension of Gabor amplitude feature are reduced by applying block statistics method.

3) In gesture recognition stage, multi-scale PHOG feature, Gabor amplitude feature fusion and phase feature are integrated, and Adaboosting classifier is applied to realize recognition.

The authors declare no conflicts of interest regarding the publication of this paper.

Zhu, X.F., Yan, W.Z., Chen, D.Z. and Gao, C.C. (2019) Research on Gesture Recognition Based on Improved GBMR Segmentation and Multiple Feature Fusion. Journal of Computer and Communications, 7, 95-104. https://doi.org/10.4236/jcc.2019.77010