^{1}

^{*}

^{2}

^{*}

Disparity estimation is an ill-posed problem in computer vision. It is explored comprehensively due to its usefulness in many areas like 3D scene reconstruction, robot navigation, parts inspection, virtual reality and image-based rendering. In this paper, we propose a hybrid disparity generation algorithm which uses census based and segmentation based approaches. Census transform does not give good results in textureless areas, but is suitable for highly textured regions. While segment based stereo matching techniques gives good result in textureless regions. Coarse disparities obtained from census transform are combined with the region information extracted by mean shift segmentation method, so that a region matching can be applied by using affine transformation. Affine transformation is used to remove noise from each segment. Mean shift segmentation technique creates more than one segment of same object resulting into non-smoothness disparity. Region merging is applied to obtain refined smooth disparity map. Finally, multilateral filtering is applied on the disparity map estimated to preserve the information and to smooth the disparity map. The proposed algorithm generates good results compared to the classic census transform. Our proposed algorithm solves standard problems like occlusions, repetitive patterns, textureless regions, perspective distortion, specular reflection and noise. Experiments are performed on middlebury stereo test bed and the results demonstrate that the proposed algorithm achieves high accuracy, efficiency and robustness.

Stereo vision is a fundamental problem in computer vision. An extensive analysis of stereo matching algorithms can be found in [_{l }in left image and P_{r }in right image match if they result from the projection of the same point P in the scene, for example, P_{l }and P_{r} will have similar intensity or color. The difference in the location of the two corresponding points in their respective images is called disparity. The appearance of corresponding points will vary in stereo images due to different perspective projections. For a point in one image, there are many possible corresponding points in the second image, especially in repetitive texture or textureless regions. This is called problem of ambiguity. Disparity depends on both, the position of the point in the scene and the position, orientation and physical characteristics of the stereo cameras.

Matching technique may be usually classified on the basis of the type of matching primitives used [

In feature-based approaches [

In transform-based approaches, transformation of the pixel values in the stereo images is carried out before matching. The transformed images may then be matched using area-based metrics. Non-parametric transforms including the rank and census transforms [

There are various issues [

Zabih and Woodfill [

Census transform may create noise in the uniform regions while being able to maintain good result in regions having variable disparity. This is because in uniform regions the most of the pixels fit in to the same object. The intensities of the pixels in such regions are all identical to each other; hence only by using magnitude relationship it’s tough to calculate the best disparity value. On the other hand, in regions having variable disparity different objects with relevant disparities will all be included in the census transform window which can offer a large amount of information to centre pixel by magnitude difference. This clear difference in performance of census transform in texture and textureless regions is the motivation for our proposed approach. As adjacent pixels with similar colors have similar or continuous disparity, image segmentation is utilized to simplify the stereo problem [

The rest of the paper is organized as follows: In Section 2, related work is explained in detail. Section 3 presents proposed algorithm. Section 4 shows experimental results with its discussions and demonstrates the performance of our algorithm. Section 5 draws the conclusion and future work of this paper. Finally, Section 6 lists out bibliography.

A large number of algorithms for disparity map generation are pixel based in which the disparity is calculated pixel-by-pixel. Pixel based methods does not give good results in textureless surfaces. Noise is typically present in disparity calculated for such surfaces. Compared to the large number of research papers on pixel-based disparity map generation algorithms, there are not many related to region-based disparity estimation algorithms [

The study of different similarity measures is performed in [

While most of the pixel based methods are susceptible to variation in camera gain or bias, non-parametric methods such as rank and census transforms [

Census [

In census transform, if fewer pixels in a local neighborhood have a very diverse intensity distribution compared to majority pixels, only comparisons relating that fewer pixels are affected. The variable size increases as the dimensions of the window increases. The variable used to store the census value would be of size 2^{3} or 8 bits for a census window of 3 × 3. While for a census of window size 5 × 5, 2^{5} or 32 bits are required to store census value.

Thus using census transform every pixel within an image is transformed into a sequence of bit representing the intensity relations between the centre and its neighboring pixels. Census transform is invariant to changes in gain and bias. As shown in

In [

of the matching quality at textureless regions and occlusions. Census based correlation method is used to calculate the local cost. The confidence of a match is calculated and by computing a disparity plane for the corresponding segment, non-confident or non-textured pixels are estimated. Modified Semi-Global Matching (SGM) step with sub pixel accuracy is utilized to enhance the quality of the local optimized matches. Instead of whole image, horizontal stripes of the image are used for disparity optimization.

Various well performed stereo matching algorithms are often complex and have high computational complexity. In [

Segment-based methods [

Small segments may be inefficient for estimating surfaces like slanted plane, while segmentation errors in large segments can affect the efficiency of disparity cost estimation. Similar colors in image do not always represent similar disparity value. For example, the projected image region of an extremely slanted Lambertian plane having uniform texture tends to be categorized as one image segment having similar disparity. Image can be segmented by a large number of segmentation methods available and after segmentation step further pro- cessing is carried out on this segmented image. Few segmentation based stereo matching algorithms do not take into consideration the quality of segments, which results into incorrect disparity computation.

Segment based stereo matching algorithms usually consists of four successive steps. First step is to segment the reference image using proper segmentation technique; second step is to generate initial disparity map using local matching technique; in third step, a plane fitting method is utilized to obtain disparity planes; lastly, an optimal disparity map is estimated using optimization technique like BP or graph cut.

A hybrid disparity map generation method which combines the pixel-based and region-based approaches is proposed in [

Mean shift segmentation technique [

Color mean-shift segmentation on the reference image is carried out and thereafter local matching based on windows is utilized in [

A novel stereo matching algorithm is presented in [

A novel census and segmentation based disparity estimation algorithm using region merging is proposed which gives quality disparity map as output from input stereo image pair. Census transform produces quality results in depth discontinuous regions but may generate noise in textureless regions. Region matching technique is used to solve this issue. Our algorithm solves issues like occluded regions and keeping edges sharp and clear while preserving the smoothness of surfaces. These problems cannot be solved by census and segmentation based technique separately. The proposed algorithm produces quality results compared to the classic census transform.

The rectified stereo image pair is given as input to the proposed algorithm. In the rectified images the pixel rows are aligned in parallel to the baseline which makes matching efficient. Rectified images satisfy the epipolar constraint, which can lessen the search along one corresponding row. Bilateral filter [

Stereo image pairs are generally acquired by different cameras sometimes at different time. Typically the brightness is inconsistent in corresponding areas of stereo image pair. This increases complexity for stereo matching techniques assuming brightness consistency between two images. Census transform makes use of relative intensity of input images leading to robustness under different absolute intensities of input images and noises. Census transform is applied on both filtered images for disparity estimation. Census transform can be divided into two steps: transform step and correlation step. Calculation of a bit string, which summarizes local texture of the current corresponding pixel pair from left and right window centre is done in transform step. Comparison of two strings using the hamming distance, i.e. count of differing bits is accomplished in correlation step. Finally, disparity is selected by referring to the best window pair containing the minimum hamming distance. Below are the details of both steps:

The census transform is realized with a comparison function ξ (Equation (1)) which converts the intensity values into 1 or 0. This function compares the intensity value of the centre pixel

where,

Matching is the next step after census transform. The cost for possible match has to be calculated for each pixel. The hamming distance is computed between census-transformed pixels by performing XOR operation between two binary strings and counting the number of set bits in the output string for finding the matching value for each pixel. The costs are computed using Equation (2) and is stored in three-dimensional data structure Disparity space image (DSI) [

where the left image is the reference image and the right image shifts horizontally from right to left.

The hamming distance is minimized after applying the census transform to calculate the matching value. For a pixel

where ∑ is the hamming distance between two bit strings

Thus, disparity map between left and right image is computed by using Equations (2) and (3). The output disparity is having integer value but generally the true disparity lies somewhere in between two pixels. Due to this the minimum disparity value plus both adjacent disparities are taken into consideration and sub pixel accuracy is applied. The sub-pixel refinement adds additional accuracy to the disparity map output.

Median filter is applied on disparity map obtained to remove some outliers. This filter is popular method for removing salt-and-pepper noise from images. It is also used to remove noise generated occasionally because of sub pixel refinement. Filtering of disparity map can increase the accuracy of output. Thus in this way, coarse disparity map is generated.

Next step is to estimate region based disparity map. The chances of making an incorrect decision upon an area can be greatly reduced, as area contains more information compared to individual pixels. The precision of disparity map generation depends on how well the color segmentation step segments the image. The color segmentation technique has two hypotheses: a) in segmented areas disparity value changes smoothly; b) depth discontinuity occurs on edges only. First of all left image

The segmentation based techniques makes it possible to match large textureless regions very well which is a considerable problem with area based stereo matching techniques. With the increase in the number of segments obtained by utilizing mean shift segmentation method time complexity also increases. The disparity map output is improved by removing noise in each area by using affine transformation, but non-smoothness exits among few neighboring areas due to over segmentation. Region merging is applied on the segmented image to solve this problem and to improve the output. Region merging merges the neighboring areas fulfilling similarity condition. Disparity maps generated are having smoothness within the segments and disparity discontinuity on the edges.

The first step in region based disparity estimation is to compute the disparity for the areas extracted in the preceding segmentation step. The segmented image and the coarse disparity map are the inputs to this step and each segment is assigned the median of the disparity values of the region pixels. All the pixels of each region are assigned same disparity value as it is supposed that pixels within the same region will have the same disparity. It is supposed that the coordinates

Thus, the disparity

Every pixel within the region gives one equation as in Equation (5). If the number of pixels within a region is N, then we will have N equations of (5) for this particular region. Mostly the number of pixels within region will be larger than the number of affine parameters e.g., three for 1-D affine transform. Therefore, the calculated

Region based algorithms should be capable to deal with the segmentation errors. When the stereo image pair is segmented, errors may occur due to many factors like noise, bad imaging situations, over segmentation and limitations of segmentation procedure used. The mean shift algorithm segments the images utilizing color and intensity information and hence it produces more than one segment of the same object. Few homogeneous color regions are supposed to belong to the same planar or surface model, but due to over-segmentation approach they are separated in the partition label and we can refine the disparity map by assigning one universal disparity plane/surface to all of them. Grouping similar homogeneous color segments to extract their disparity layer can help in regions not having sufficient inliers because of noise or occlusion to get good plane/surface estimation by using affine. In such situations, more accurate points for the disparity estimation can be obtained by merging regions having similar disparities, resulting in to bigger regions.

Region merging is a method that groups two different segments into one segment based upon two conditions: proximity and homogeneity. Criterion is needed to take decision regarding which neighboring regions are good candidates to be merged. Two regions represented by the same set of model parameters can be successfully merged. The second criterion, homogeneity, is satisfied by a similarity measure that computes the similarity between regions and selects the optimal regions to be merged. We compute the intensity variation between all neighboring regions and if the difference value is less than threshold value than those corresponding regions are merged. In this technique, deciding the threshold value is an overhead.

The overview of the region merging [

To facilitate in the region merging process matrix representation of the regions adjacency is created. Region Adjacency Matrix (RAM) is the lower triangle of a square table where rows and columns represent regions. If cell C_{ij} is marked as true than it means that regions i and j are neighboring and if it is marked as false, than those regions are assumed not to be neighbors.

Finally, a multilateral filtering is applied on the disparity map obtained to preserve information and to smooth the disparity map in occluded regions at object boundary, discontinuous and textureless area to generate final disparity map as output.

In this section, we present and discuss the experimental results of our algorithm. The Middlebury dataset [

neighboring regions fulfilling the similarity condition. However, few regions having occlusions give worse effects. Multilateral filtering is applied to solve this problem and final disparity map is estimated.

A quantitative approach is required to assess the performance of a stereo matching algorithm by estimating the quality of the final disparity map generated. The quality of the estimated disparity map is determined with respect to the ground truth by utilizing similarity measure Root Mean Square Error (RMSE). RMSE is computed in terms of disparity units between the resultant disparity map

where N is the total number of pixels.

The performance of the proposed algorithm is summarized in

Sr. No | Stereo Image Pair | RMSE (SAD) | RMSE (Census Based Approach [ | RMSE (Segment Based Approach [ | RMSE (Our Proposed Approach) |
---|---|---|---|---|---|

1 | Tsukuba | 24.34 | 4.978 | 35.38 | 0.0019 |

2 | Cones | 25.32 | 3.235 | 39.42 | 0.0037 |

3 | Teddy | 20.39 | 7.519 | 38.98 | 0.0011 |

4 | Venus | 17.95 | 5.892 | 31.19 | 0.0048 |

From the

The test images consist of regions having different characteristics like occluded, disparity discontinuous and textureless portion. Our proposed algorithm gives excellent disparity map results in all cases. From the results shown in Figures 4.1-4.4 and

This paper presents a novel, robust, efficient, and flexible stereo matching algorithm which combines census- based and region-based approach. The algorithm deals with rectified stereo image pair. It is shown that the segmentation based algorithm works well with the census-based algorithm. The originality of our algorithm lies in the fact that it offers a robust technique to solve few long standing problems in the disparity map generation like the smoothness of regions while keeping edges clear and sharp, occluded regions, textureless regions, repetitive patterns, perspective distortion, specular reflection, noise, disparity discontinuous regions. These issues cannot be solved by either approach independently. Census measures are appropriate for highly textured areas. It is also somewhat computationally complex. Census transform offers high resistance to noise as it is based on the relative ordering of local pixel intensity values. While segment based methods are popular for its excellent performance in dealing with textureless areas, edges and noise. The chances of making a wrong selection of disparity upon a segment is significantly lessen as segments enclose a large amount of information compared to individual pixels. It is shown that the segmentation based algorithm makes it possible to match the excellently large textureless region which is a major issue with standard area-based stereo matching techniques. Time complexity of algorithm increases with the increase in the number of segments which are obtained by utilizing mean shift segmentation approach.

Disparity map results are improved using affine transformation as it removes noise in each region, but non- smoothness prevails between some neighboring regions due to over segmentation. To solve this problem of non- smoothness we have applied region merging on the segmented image to refine the disparity map results. Region merging merges the neighboring areas satisfying the similarity condition. Disparity maps estimated are having smoothness within the segments and disparity discontinuity on the object boundaries. At last, multilateral filtering is applied on the disparity map generated to preserve information and to smooth the disparity map in occluded areas at edges, discontinuous and textureless regions to estimate final disparity map as output. Non parametric census transform works well in image areas having same colors and commonly used area based similarity measures like SAD and SSD produces quality results for image areas with similar local structures. The real-time application of our algorithm will be our future work. The proposed algorithm can be implemented using FPGA or GPU for hardware acceleration in future.

The authors declare that there is no conflict of interests regarding the publication of this paper.

The authors would like to convey gratitude to Daniel Scharstein and Richard Szeliski for providing stereo images with ground truth.

Viral H.Borisagar,Mukesh A.Zaveri, (2015) Census and Segmentation-Based Disparity Estimation Algorithm Using Region Merging. Journal of Signal and Information Processing,06,191-202. doi: 10.4236/jsip.2015.63018