Depth-Aided Tracking Multiple Objects under Occlusion

In this paper, we have presented a novel tracking method aiming at detecting objects and maintaining their la-bel/identification over the time. The key factors of this method are to use depth information and different strategies to track objects under various occlusion scenarios. The foreground objects are detected and refined by background subtraction and shadow cancellation. The occlusion detection is based on information of foreground blobs in successive frames. The occlusion regions are projected to the projection plane XZ to analysis occlusion situation. According to the occlusion analysis results, different objects’ corresponding strategies are introduced to track objects under various occlusion scenarios including tracking occluded objects in similar depth layer and in different depth layers. The experimental results show that our proposed method can track the moving objects under the most typical and challenging occlusion scenarios.


Introduction
Object tracking in the video sequence has played an important role in a research area of computer vision and a wide range of applications, such as video monitoring and surveillance, video conferencing and video summarization.Based on different camera configurations, objects can be tracked by using a single camera or stereo/multiple cameras.Object tracking with a single camera has studied in many literatures and different methods have been developed such as tracking by model-based tracking method [1], appearance-based methods [2][3][4], featurebased tracking [5], and statistical methods [6][7][8].Many algorithms can obtain good results in some cases, such as when the targets are separated.However, multiple object tracking is still a challenging task due to the non-rigid motion of deformable object, persistent occlusion and the dynamic change of object attributes, such as color distribution, shape and visibility.In the real scene, occlusion between objects often occurs.For example, in typical surveillance scenario a person is partially or fully occluded by other people.Unfortunately, these occlusions lead to failed tracking.Some classical frameworks have been extended to track multi objects.In the multi-object tracking system [9], the level set method is used to handle contour splitting and merging.Extensive methods, i.e.Monte Carlo based probabilistic methods [10], game theory based approaches [11] and appearance model based deterministic methods [12,13] have been presented to solve the mutual occlusion problem.Another attractive research direction is stereo or multiple camera based method.While object detection and tracking with a single camera are well-explored topics, the use of multi-cameras technology for this purpose has attracted much attentions recently due to the availability and low price of new hardware.A multi-camera system observes the scene from two or more different views, and obtains more comprehensive information than a monocular camera system, which can take the advantage of depth information to improve the tracking system performance.Some tracking methods focus on usage of depth information only [14], or usage of depth information on better foreground segmentation [15], or usage depth information as a feature to be fused in a maximum likelihood model to predict 3D object positions [16].
In this work, we have presented a novel tracking method aiming at detecting objects and maintaining their label/identification across video frame sequence.The main points of this method are to use depth information and different strategies to track objects under various occlusion scenarios.Figure 1 shows the flowchart of our tracking system.
The rest of this paper is organized as follows: Section 2 presents the proposed tracking method.Section 3 shows experimental results; and, finally, Section 4 concludes this paper.

Proposed Synthesis Method
Our proposed tracking system is shown in Figure 1 and it consists of below main steps.

Depth Estimation
Depth estimation aims at calculating the structure and depth of objects in a scene from two views or a set of multiple views.This topic has been attracted extensive attentions in research communities.A comprehensive survey and evaluation of dense two-view stereo matching algorithms can be found in [17].
In this work, depth is estimated based on block matching algorithms proposal in [18].This block matching technique is a one-pass stereo matching algorithm that uses a sliding sums of absolute differences window between pixels in the left image and the pixels in the right image.An example of depth image is shown in Figure 2.

Foreground Segmentation and Shadow Cancellation
Our method performs foreground segmentation to speed up the process of object tracking.There are many fore-  ground segmentation algorithms for instance of Gaussian mixture model [19,20].In our method, we use simple technique based on absolute differences between current image and background image.
In some cases, we have the fixed cameras observing the scene, so we may have an image of the background of the scene.However, in most case this background is not readily available.Moreover, the background scene often evolves over time because for example the light condition might change or because of new object could be added or removed from the background.Therefore, it is necessary to dynamically build the background model by regularly updating it.This can be accomplished by computing moving average using the following formula: where, is pixel value ate a given time ,   is the current average value, and  is called the learning rate and it defines the influence of the current value.
In our method, first a color background model is created by computing a moving average for each channel (R, B and G channels of color image) of each pixel of incoming frames (around 10 frames).The decision to define a foreground pixel is simply based on comparing the current frame with background model and then updating this.Specifically, where, .Blobs are also given temporally identification

 
of shadows, a shadow detection described in [21] is employed.More detail about this method, please refer to [21].The result of the shadow detection is shown in Figure 4.
We define two kinds of distance: the distance between blob and blob in the same frame and the distance between blob of frame and blob of fra-

Blobs Extraction and Blobs' Information Store
me   First, we try to extract the blobs of objects from segmented foreground image.Blob extraction is performed on the foreground binary image by connected component labeling using CvBlobLib library [22].The foreground binary image can obtained from simple threshold operation followed by the application of eroded and dilated operation on the segmented foreground image.CvBlob-sLib provides two basic functionalities: extracting 8-connected components, referred to as blobs, in binary or grayscale images using Chang's contour tracing algorithm [23], and filtering the obtained blobs to get the objects in the image that we are interested in.In our method, we remove any blobs that have area small than 100 pixels.
The distance   , t t d i j between blob i and blob j in the same frame is computed by: t where   and   , t t j j x y are the center coordination of blob and blob at frame t , respectively.i j Similarly, a distance between blob i of frame and blob of frame is calculated by: ) For every frame, after extracting blobs, information about blob is stored in a structured record for later processing steps.The blob's information includes total num-where   x y and are the center coordination of blob at frame and blob at frame

Occlusion Detection
We detect the occlusion in current frame according to blobs' information at frame t and previous frame and the second one is the difference of number of blobs at frame and .First, we find the shortest distance between blobs in frame   1 t  , assuming that it occurs between blob m and blob , i.e. n We define an occlusion flag .This flag gets occ_start f value t if occlusion is found at frame and otherwise it gets value the threshold of blob distance at the same frame.
Similarly, we also detect when the occlusion terminates.The end of occlusion is checked based on the shortest distance between blobs at the current frame and the difference of number of blobs at current frame t and previous frame .We define the end of occlusion flag as following:

Object Tracking
According to the result of occlusion detection, the tracking objects can be dividing into two types: tracking objects without occlusion and tracking objects under occlusion.

Tracking Objects without Occlusion
The video objects correspondence under non-occlusion is obtained through the shorted distance be- tween blobs in previous frame and blobs in the current frame.This distance between blobs in previous frame and blobs in the current frame is calculated by Equation (4).For instance, once a foreground blob at frame m finds its corresponding blob in frame , its label or identification is updated correspondingly to the of blob .Specially, where k j B denotes the blob at frame ;

Tracking Objects under Occlusion
The main idea of our tracking method is that the object label or identification (ID) is maintained constantly during occlusion and after they switch their positions.
When occlusion occurs, we can detect and extract the occlusion region.We also can detect and separately extract a list of objects that are non-occlusion objects in previous frame but overlaying each other in current frame.
In order to track the objects under occlusion, depth information is used to analysis the occlusion situation.First, the occluded regions are projected to the ground plane XZ according to their horizontal position and their depth gray level (more detail in the next subsection).Then according to the XZ plane, the occlusion objects can be divided into two types based on the depth ranges: 1) in the different depth layer or 2) in the same depth layer.We are dealing with these situations as following parts.
1) Project Occlusion Regions into Ground Plane XZ Each foreground pixel in occlusion region has information obtained from depth map.These pixels are projected to the ground plane The projected point, which is located at  ,  x z , is defined as   , p x z .The value at position of projection plane is the total number of points at position x in the depth maps that have same gray level (depth value z). Figure 5 illustrates the image plane XY and ground plane XZ.

 , p x z
In order to remove noisy points, if the value at point   , p x z is less than threshold 1 T the point   , p x z will be discarded.Then we also apply morphological operations (dilating and eroding operation) to remove noisy points and connect nearby points.The remaining points in XZ plane are grouped in to the blobs that are based on connected component analysis technique using CvBlobLib library [22].If a projected blob is small than threshold 2 , it is consider a noise and it will be removed.The projected blobs are defined as where m is total number of projected blobs.Each projected blobs j PB is mask as object regions.Figure 6 shows an example of projected blobs in XZ plane.
As mention before, according to the projected blobs in XZ plane, the occlusion objects can be divided into two types based on the depth ranges: 1) in the different depth layers or 2) in the one depth layer.
2) Tracking Occluded Objects in Different Depth Layers Figure 6 shows the case of occluded objects is in different depth layers.Once occluded objects have different  In our method, object correspondence under different layer is based on Bhattacharyya distance [24] between the color histograms.In statistics, the Bhattacharyya distance measures the similarity of two probability distributions.In our case, Bhattacharyya distance represents the similarity between two normalized histograms.The Bhattacharyya distance is calculated by: where, BD denotes Bhattacharyya distance; p and q are the two normalized color histograms; N is number of bin in histogram.Let is the occluded objects at frame and denotes the ex- , , ,

HSV
For each occluded object , we calculate the Bhattacharyya distance between color histo- gram of this object and color histogram of every object n j U in and then find the shortest distance .The Bhattacharyya distance is computed according to Equation ( 8), specially: The occluded object will update its according to . Figure 9 shows the numeric results of calculating Bhattacharyya distance Copyright © 2013 SciRes.JSIP between two histograms.Figure 10 illustrates an example of tracking occluded objects in different depth layers.
3) Tracking Occluded Objects in One Depth Layer Figure 11 shows an example of occluded objects in similar depth layer.
When occlusion objects have similar depth range or  3) Calculating a back projection of a hue plane of occlusion region using the pre-computing histogram of Figure 13 demonstrates a result of tracking occluded objects in one depth layer.
In practice, when projecting partly occluded objects or fully occluded objects with similar depth ranges into XZ plane, we will obtain only one blob/region in XZ plane.In the case, the objects in similar depth range are partly occluded, the above algorithm will work well (see Figure 13).However, the fully objects current frame t will reappear as partial occluded objects bind their occluder in the later frame, so it will be tracked by our method (see example in Figure 14).

Experimental Results
In this section, we show the experimental results to evaluate the proposed tracking method.We evaluate the tracking performance by the capability of detecting and maintaining constant of foreground objects during the occlusion and after the occlusion over.

ID
The proposed tracking method has been test on some video sequences.The input of out method is a pair of video sequence and the output is the left video sequence in which has a set of moving objects labeling with and bounding boxes with different color.

ID
Figure 15 shows results of tracking non-occlusion objects.In the example 15(a), in the 3 frames (left to right) each object appears one after another and in the fourth frame, object with has gone out the scene.These results illustrate the ability of our algorithm in detecting object, assigning and maintaining the objects .

ID ID
We demonstrate the result of tracking of occluded objects in different depth layers in Figure 16.Our proposed method can successful detect the object under partial occlusion.All of objects have constantly label over the  time even they are moving in variety of pose and position.
Tracking the occluded object under the similar depth layer is shown in Figure 17.These examples show that the proposed algorithm can detect and track a partial occluded object that is equivalent to at least one half a human body.
Figure 14 shows the case an object is fully occluded and in the similar depth layer with its occluder and our system cannot detect it.However, in the future time when this object reappears as partial occluded object, the system can detect and maintain its ID.This example demonstrate the capability of proposed algorithm in term of maintain constant label for object over the video sequence.

Conclusions
In this paper, we have presented a novel tracking method aiming at detecting objects and maintaining their ID over the time.The key factor of this method is to use depth information to track objects under various occlusion scenarios.Different object tracking strategies are applied  according to occlusion situation including finding corresponding objects based on Bhattacharyya distance between two histograms and using a camshift based algorithm with the help of object depth ordering.The experimental results have confirmed the capability of our proposed objects tracking algorithm under the most typical and challenging occlusion scenarios.
However, the proposed algorithm can work only in an indoor or medium sized environment since the reliability of depth information diminishes in proportion to the distance from the camera and only when the moving velocity of objects is slow.In the future work, to construct a robust moving object tracking system in both indoor and outdoor environment, we will study to use more object's features to classify and track the objects.

Figure 1 .
Figure 1.The flowchart of proposed tracking method.

Figure 2 .
Figure 2. Color image and depth image.

F p is value of pixel in
R, G, B channels.H t is threshold and for the each color channel this threshold can be set to   0.3* bg I p .An example of foreground image is shown in Figure 3.However, the segmented foreground image includes noise affected by the shadow.The shadow regions are the parts of moving objects.Shadow detection and removing will be used to refine the foreground.To avoid the effects  blob area   BA , and the average depth value of blob   min Z . It is based on two clues.The first clue comes from the shortest distance between blobs at the same frame   1 t 

Figure 4 .
Figure 4. Shadow regions detection.(a) segmented foreground image; (b) shadow detection (the blue color pixels are the detected shadow).

D 3 XZ
according to their horizontal position and their depth gray level, where X is the width of the depth map and the range of Z is [0,255].

Figure 5 .
Figure 5.The image plane XY and ground plane XZ.

Figure 6 .
Figure 6.Projected foreground blobs in XZ plane.(a) input image; (b) foreground blob (occlusion region); (c) occlusion region (depth image); (d) projected blobs in XZ plane.depth layer, they can be segmented in color image by means of their depth ranges.An example of color image segmentation by means of depth ranges is shown Figure 7.In our method, object correspondence under different layer is based on Bhattacharyya distance[24] between the color histograms.In statistics, the Bhattacharyya distance measures the similarity of two probability distributions.In our case, Bhattacharyya distance represents the similarity between two normalized histograms.The Bhattacharyya distance is calculated by: non-occluded objects in previous frame n (the frame before occlusion is found).The color histogram of each occluded objects and existing non-occluded objects are .In this paper, color histograms are created from hue component of color space.

Figure 8
shows an example of the color histogram.

Figure 7 .Figure 8 .
Figure 7.An example of image segmentation by means of depth ranges.(a) input image; (b) foreground blob (occlusion region); (c) occlusion region (depth image); (d) Projected blobs in XZ plane (occluded objects in different depth layers); (e) segmented object based on depth layer 1 (red color in d); (f) segmented object based on depth layer 2 (blue color in d).

Figure 10 .
Figure 10.Tracking occluded objects in different depth layers.(a) Input frame (having occlusion objects); (b) Output (tracking occlusion objects in different layers).

Figure 11 .
Figure 11.An example of tracking occluded objects in one depth layer.(a) input frame (having occlusion objects); (b) occlusion region (depth image); (c) projected blobs in XZ plane (occlusion objects in one depth layer).fullocclusion, it is difficult to segment and track multiple objects as above technique.To deal with this problem, we propose the tracking method based on camshift (Continuously Adaptive Mean shift) algorithm[25] as following part.Assuming that there are occluded object m   1, 2, ,

Figure 12 Figure 12 .
Figure 12 shows an example of projection image.4) Removing the in .Selecting the next foremost k O

Figure 13 .
Figure 13.An example of tracking occluded objects in one depth layer.(a) Input frame (having occlusion objects); (b) Output (tracking occlusion objects in different layers).