Unsupervised human height estimation from a single image

The single image containing only a human face not previously addressed in the literature is employed to estimate body height. The human face especially the facial vertical distribution possesses some important information which strongly correlates with the stature. The vertical proportions keep up relative constancy during the human growth. Only a few facial features such as the eyes, the lip and the chin are necessary to extract. The metric stature is estimated according to the statistical measurement sets and the facial vertical golden proportion. The estimated stature is tested with some individuals with only a single facial image. The performance of the proposed method is compared with some similar methods, which shows the proposal performs better. The experimental results highlight that the developed method estimates stature with high accuracy.


INTRODUCTION
Human height estimation has many important applications such as soft-biometrics and human tracking [1].In the first case, the stature can be used to rule out the possibility that a particular person is the same person from the surveillance cameras [2,3].In the latter case, it can be exploited to distinguish among a small set of tracked people in the scene [1,4,5,6,7].The stature, therefore, may become a very useful identification feature.In the cases with two or more images, the stature estimation by stereo matching is computationally expensive and there exists ambiguity being resident in the stereo correspondence, which is not overcome efficiently so far [8,9,10].In the case of only one image available, the stature measurement from single image has to be performed.Many single view based approaches are proposed based on some geometric structures or models [1,2,3,4,5,6,7,11,12,13,14,15,16].In [1,2,3,4,5,6,7,12,13,14,15,16], plane metrology algorithms based on vanishing points and lines are developed to measure distances or lengths on the planar surfaces parallel to the reference plane.Any slight inaccuracy in measuring vanishing points will result in large errors [2,17].Reference points defining the top and bottom of the object should be clear and unambiguous.Besides, reference objects must be in the same plane as the target object.Moreover, the full body of the user must be visible in measuring the stature.In [11], BenAbdelkader and Yacoob incorporated some certain statistical properties of human anthropometry into the stature estimation.It would be difficult to obtain the stature when some anthropometric values such as acromion and trapezius (or neck) are not available.Besides, if the whole body and/or the facial plane are at an angle with respect to the camera, the weak perspective assumption cannot be exploited.
Although many single view based approaches have been proposed, there exist some problems in the literature.In order to improve both validation and automation of the procedure, we have developed an approach to estimating the stature with the following advantages.1) The used image contains only the human face not previously addressed in the literature.The human face especially the facial vertical distribution owns some important information which strongly correlates with the stature [18].The facial vertical proportion used is based on the golden proportion [19,20].A number of investigators have commented on the relative constancy of the facial vertical proportions during the human growth [21].2) Only a few facial features are necessary to extract, and the procedure is processed automatically by image analysis operators.3) The extracted facial features and the facial vertical proportions are used together to estimate the stature.The estimated result is tested with some individuals with only a facial image, showing high esti-mation accuracy, which validates the developed approach to be objective, and can be taken as an automated tool for estimating the stature.The potential applications of this work include biometric diagnoses, user authentication, smart video surveillance, human-machine interface, human tracking, athletic sports analysis, virtual reality, and so on.
The rest of the paper is organized as follows: Section 2 describes the facial proportions used for estimating the stature.Section 3 discusses the face detection and the facial features extraction.Section 4 describes the stature estimation based on a calibrated camera.Experimental results and evaluation of the performance are given in Section 5 and followed by some conclusions and future works in Section 6.

FACIAL PROPORTIONS
All living organisms including humans are encoded to develop and confirm to a certain proportion [20].The human face especially the facial vertical proportion owns some important information which correlates with the stature [18].The facial vertical proportions include the golden proportion [19,20] and the facial thirds method [22,23].The facial golden proportion is approximately the ratio of 1.618 to 1 as shown in Figure 1.It states that the human face may be divided into a golden proportion distribution by drawing horizontal lines through the forehead hairline, the nose, and the chin, or through the eyes, the lip, and the chin.
The facial thirds method states that the face may be divided into roughly equal a third by drawing horizontal lines through the forehead hairline, the eyebrows, the base of the nose, and the edge of the chin as shown in Figure 2. Besides, the distance between the lip and the chin is double the distance between the base of the nose and the lip [22,23].
The golden proportion and the facial thirds are similar to each other.The former specifies a larger number of proportions than the latter.They used some different features so that they cannot be directly compared [19].By experiments it shows that the accuracy of stature estimated by the golden proportion is more consistent with the ground-truth data than that of by the facial thirds.

FOREGROUND DETECTION AND FACIAL FEATURE EXTRACTION
Robust and efficient extraction of foreground object from image sequences is a key operation.Many algorithms have been developed [24,25,26].The algorithm proposed in [26] is employed to extract foreground blob as shown in Figure 3B.After extracting the foreground objects, accurate facial features extraction is important for reliable estimation of the stature.A number of methods have been developed for extracting facial features [27,28,29].Among the facial features such as the eyes, nose, lip, chin, and so on, the eyes are one of the most important facial features [30,31].Since effective automatic location and tracking of a person's forehead hairline is difficult, we select the eyes, lip, and the chin as the facial features used in the study.The first step consists of locating the facial region to remove irrelevant information.Human skin color, though differs widely from person to person, is distributed over a very small area on a C b C r plane [32,33].This model is robust against different types of skin, such as those of people from Europe, Asia and Africa.The skin tone pixels are detected using the C b and C r components.Let the thresholds be chosen as a pixel is classified to skin tone if the values [C b , C r ] fall within the thresholds.Each pixel in the C b and C r layer which does not meet the range [C b , C r ] is set to zero.In some cases, the obtained mask has concavities or spikes as shown in Figure 3D, which affects the facial features location.We use the algorithm in [34] to process this problem.
There are many fairly long horizontal edges near the facial features.In order to make the edge detector behaves more stable, we transform the intensity of the image into a second derivate and then horizontally project it to determine the horizontal positions of facial features [34] (seen from Figure 3E).The positions of peaks in the horizontal projection curve correspond with the horizontal facial features including the eyes, nose and lip.Horizontal transition is stronger at the lip than at the eyes in some cases.The lip can be detected in hue/saturation color space [35].We detect the peak with a maximal value above the lip as the horizontal position of eyes.In the meantime, the chin is automatically located between the lip and the neck.The horizontal location of the eyes, lip and chin is given in Figure 3F.

STATURE ESTIMATION
Stature estimation is discussed here to highlight the use of the extracted facial features.Assume the person stands or walks on a plane and a camera is calibrated with respect to this plane.We compute 3D position of the extracted facial features according to the golden proportion.Since the facial vertical proportions keep relative constant during the human growth [21], the 3D position of the extracted facial features can be determined if a certain length or distance is known.
More detailed metric description of the head and face was recognized quite early.Major U.S. surveys, those in which large numbers of measurements have been made on samples of a thousand or more individuals, have been carried out on military personnel [18].The measurement device could provide a sensitivity of less than 0.01 mm in each axis, and the accuracy with the order of 0.1 mm could be achieved [36].Shiang [18] has made extensive 3D statistical work of human head and face also.According to the measurement sets the metric stature can be estimated based on the calibrated camera.The camera model used is a central projection.Effect such as radial distortion can be removed and is not detrimental to the method.The camera perspective projection model can be represented by a 34 matrix M. The image coordinates (u i , v i ) of a point P i expressed in a homogenous coordinate system are given as follows: When estimating the stature, we assume that depth difference among the eyes, lip and the chin is negligible.For simplifying computation, assume that the 3D coordinates of the chin point (u 1 , v 1 ) is (X, Y, Z) whose horizontal line intersects with the vertical line through the center of the eyes.The coordinates of the lip point (u 2 , v 2 ) is (X, Y, Z+h) whose horizontal line intersects with the same vertical line as above.According to the golden proportion the coordinates of the center of the eyes point (u 3 , v 3 ) is (X, Y, Z+2.618h).If h, the height between the chin and the lip is known, (1) can be used to infer the 3D coordinate of the chin point.Expending the (X, Y, Z) gives as follows: where, From ( 2), the linear least-squares solution is given by Once person's head-top point (u 4 , v 4 ) are known, we can rearrange (2) to estimate the stature as (6) using the coordinates (X, Y).It is found that (6)

EXPERIMENTAL RESULTS
To evaluate the performance of the proposed method, we have done experiments with some individuals.The experiments are performed with a CCD camera which produces 640×480 pixels image sequences.The camera is mounted overhead which look down at an oblique angle to capture human face.In the experiment, the parameters of skin segmentation are fixed for all images as follows: The parameter h used in (4) is selected as 44.96 mm.The experimental setup includes a wall screen with a maximum size of 2.4 m  4 m which is parted into 24 intended panels pointing by the users.The users' position is about 2 m~4.5 m away from the screen and the user can walk freely in the experimental room.The room size is about 3 m5 m.The experiment is performed as following way.We capture the pointing person as she/he is pointing at the intended panel.We extract the pointing user's face and estimate his stature.Figure 4(a) gives some input video images as a user pointing at the panels.The tested user's face is shown in some light blue pixels superimposed on the original images.The estimated stature (unit: mm) is given in the image also.The standard derivation σ of the estimated stature by the proposal is 6.744 mm, and the maximal deviation from the real height is 17.80 mm which is accurate within 3σ.Correspondingly, the σ of the thirds based estimated height is 10.4188 mm, while the maximal deviation from the real height is 36.9mm which is out of 3σ.The proposed method outperforms the thirds based method in estimating the stature.
The deviation is due to the fact that relative error increase with the distance between the camera and the user since the pixel size is proportional to the view angle, which means smaller resolution from a larger distance.In the experiment a pixel difference corresponds to about 5.26 mm resolution ambiguity from the distance of 3 m.Besides, inaccurate camera calibration affects the measurement results.Another factor is human gesture.Human gesture involves periodic up-and-down displacement.Some other factors such as occlusion, and face orientation may affect the estimation result.This problem can be avoided by using multiple cameras and a camera with the best view is used to estimate the stature.Wang [12]  1820.4  Lee [15]  1818.5

Algorithm
We hope to discuss it in the future.The processing speed of the proposal is roughly 15frames/s for a single object and frontal face in the scene.
To verify the effectiveness of the mentioned approach, we have performed experiments with some moving individuals.The performance of the proposal for four tested individuals with known stature is compared with that of some similar methods [7,12,15] shown in Table 1.Table 1 summarizes the average measurement (AM) stature, the standard derivation (σ), and the maximal derivation (MD) for each tested object.It is clearly seen that the developed approach performs better.

CONCLUSIONS AND FUTURE WORKS
We have developed unsupervised single view based method for robust and real-time estimating the stature.The image contains only a face or upper body little discussed in the literature.Only a few facial features such as the eyes, lip and chin are necessary to extract.The metric stature is estimated according to the statistical measurement sets and the facial vertical golden proportion.The estimated stature is tested with some individuals with only a facial image, showing high accuracy, which validates the proposal to be objective, and can be taken as an automated tool for estimating the stature.Extension to un-calibrated scenario case would be developed in the future.

Figure 4
(b) shows the stature as the user pointing at the panels at different locations.The symbol in the legend refers to some different statures: the real human height (real H), the thirds proportion based estimated height (thirds based H), and the golden proportion based estimated height (the proposed H).

Figure 4 .
Figure 4. Height results.(a) The person pointing at the panels and the extracted face shown in light blue pixels superimposed on the original images.The estimated height is given based on the proposal also.(b) Heights at different locations.
as a function of v 4 can get a more stable estimation of the stature.