Fast Stereo Matching Fully Utilizing Super-Pixels

In this paper, we propose a depth image generation method by stereo matching on super-pixel (SP) basis. In the proposed method, block matching is performed only at the center of the SP, and the obtained disparity is applied to all pixels of the SP. Next, in order to improve the disparity, a new SP-based cost filter is introduced. This filter multiplies the matching cost of the surrounding SP by a weight based on reliability and similarity and sums the weighted costs of neighbors. In addition, we propose two new error checking methods. One-way check uses only a unidirectional disparity estimation with a small amount of calculation to detect errors. Cross recovery uses cross checking and error recovery to repair lacks of objects that are problematic with SP-based matching. As a result of the experiment, the execution time of the proposed method using the one-way check was about 1/100 of the full search, and the accuracy was almost equivalent. The accuracy using cross recovery exceeded the full search, and the execution time was about 1/60. Speeding up while maintaining accuracy increases the application range of depth images.


Introduction
The depth image is an image in which the distance to the object in the three-dimensional space is projected as shading or color on the imaging plane set at the observer's viewpoint.Depth images are used for driving safety support, automatic driving, autonomous robot, user interface, monitoring and so on.
Distance measurement using millimeter waves or laser light is good in accuracy, but it depends on physical properties of object and is susceptible to disturbance.
On the other hand, in the method using a stereo camera, disparity is obtained from two images captured by two cameras on the left and right.Then, based on the principle of triangulation, we obtain the distance from the camera to the object using the disparity.In the stereo method, matching between right and left pixels is necessary, and the amount of calculation is huge.For this reason, compatibility between accuracy and processing speed is a problem of stereo matching.
So we introduce super-pixels (SPs) for the depth image generation.An SP is a small area formed by collecting adjacent similar pixels.SP segmentation over-divides the inside of the object, but extracts object boundaries well.
Processing on SP basis has an overwhelmingly small amount of calculation as compared with processing on pixel basis.By using SPs for depth image generation, depth images with clear object boundaries can be created at high speed.
This paper newly proposes a fast method of depth image generation using SPs.
To reduce the amount of computation, we use an existing method in which block matching is performed only at the center of the SP and the obtained disparity is applied to all pixels in the SP [1].Then, filtering of matching cost on SP basis is newly proposed to improve disparities.Furthermore, in contrast to usual cross check which requires disparity estimations in both left and right directions, we newly introduce one-way check that detects errors only using unidirectional disparity estimation.We also propose a new method called cross recovery that restores lacks of objects problematic in SP-based matching.As a result of the experiment, the execution time of the proposed method with the one-way check was about 1/160 of full search, and the accuracy was equivalent.The accuracy using the cross recovery exceeded the full search, and the execution time was about 1/60.Compared with the state of the art using pixel-based cost filtering, the speed becomes 15 times or more, though the accuracy decreases a little.
In contrast to the conventional methods on pixel basis, this work fully utilizes SPs to speed up disparity estimation while maintaining accuracy.For this purpose, we newly propose SP-based cost filter, one-way check, and cross recovery, which are contributions of this work.In depth image generation, speedup while keeping accuracy is important for real time processing, resolution enhancement, and search range expansion.This will extend the application scope of the depth image.In this paper, we added the cross recovery proposal and a comparison with the state of the art to our previous work [2].
The structure of this paper is as follows.Section 2 describes related work.Section 3 details the proposed method.Section 4 explains the experiment setup and results.Section 5 concludes this paper.

Related Work
Many researches on depth image generation by stereo matching have been conducted.A survey of these researches has been performed [3].This survey also provides the benchmark data and evaluation programs used for comparing var-M.Miyama Journal of Computer and Communications ious algorithms, and they are released in public.
The method of stereo matching is roughly divided into local method and global method.The local method estimates disparity using only pixels around the target pixel.Block matching, which is the most basic method, sets a support region in a block shape around the pixel of interest.A block image most similar to the support block image is searched from the opposite image, and the distance from the center of the similar block to the original position is taken as a disparity.
At this time, if the support area contains another object different from the object to which the target pixel belongs, it will easily cause an erroneous estimation.
A method of collecting only similar pixels surrounding the target pixel in a cross shape to construct the support area, and efficiently aggregating matching costs between pixels in the area has been proposed [4].A method, multiplying the matching cost of the surrounding pixel by the weight corresponding to the similarity and the distance to the target pixel, aggregating them in the neighbor, and finding the disparity having the minimum cost, has been proposed [5] [6].These are high precision and high speed, but they are pixel basis processing.The support area is set to every pixel and not usually used for other purposes.
The local methods often adopt simple error detection by cross checking.The cross checking is used to detect errors such as occlusion, and finds the difference between disparities of the corresponding left and right pixels.Error pixels are filled with the smaller disparity on either the left or the right.
The global method performs disparity estimation with global optimization of the entire image.Dynamic programming is typical for the optimization.In this method, disparities are optimized under the constraint that the arrangement order of the corresponding pixels of the left and right images does not intersect.A method to increase the speed by limiting the constraint to 8 directions has been proposed [7].Besides this, a method based on graph cut [8] and a method using belief propagation [9] are known.
The global methods often adopt error detection with global optimization.After optimization by belief propagation, an error detection method using constraints by warping and cross check has been proposed [10].After global stereo matching, an error detection method using ordering, uniqueness, and color similarity constraints has been proposed [11].A method of alternately performing disparity estimation and occlusion detection by belief propagation using visibility constraint has been proposed [12].Compared to the local methods, global methods are disadvantageous for speeding up.
Methods with SPs for depth image generation have been proposed.For reducing computational complexity, a way to perform block matching only on the center of the SP rather than the whole image has been proposed [1].The use of SPs in this method is only pixel sampling to estimate disparity.They don't use the SPs either to create support areas or to improve disparities.
A method for improving the disparity model of an SP with the Markov ran-Journal of Computer and Communications dom field consisting of the connection of adjacent SPs has been proposed [13].
This method uses the semi-global method to obtain the pixel disparities, and calculates an initial model with the all disparities in an SP.A method, in which the disparities of all the pixels in the SP is obtained by the semi-global method, the median value is taken as the initial value of the SP disparity, and the SP disparity is improved by filtering using the surrounding SP disparities, has been proposed [14].In order to clarify the object boundary, a method of adding filtering in SP basis after the conventional pixel-wise cost filtering has been proposed [15], but the execution time increases.An SP-based method explicitly using graph-based energy minimization to deal with occlusion has been proposed, but it is slow as it is a global method [16].These methods use SPs only for disparity improvement.In contrast, the proposed method fully utilizes SPs for all of pixel sampling, support area creation, and disparity improvement.

Proposed Method
This section explains the details of the proposed method.The procedure of the method with the one-way check is as follows.1) SP segmentation; 2) Center search; 3) One-way check; 4) Reliability calculation; 5) Smoothing with cost filter; 6) One-way check on SP basis; 7) Gap filling.The procedure of the method with the cross check and recovery is as follows.1) SP segmentation; 2) Center search; 3) Cross check; 4) Reliability calculation; 5) Smoothing with cost filter; 6) Cross check and recovery; 7) Gap filling.A process of a test image cone is illustrated in Figure 1.The SP segmentation divides the image into SPs.For the purpose, we adopt Simplified SEEDS (SS) [17].SS initially divides the image into a lattice shape and repeats boundary update using color similarity.SS has high speed and high segmentation accuracy.A segmentation result of the cone by SS is shown in Figure 1(a).
For the center search, block matching is performed only at the center of the SP [1].The disparity of the center pixel is used as the disparity of the entire super-pixel.The formula for the center search is as follows.
Here, d is a disparity, R S is a support area of R, u, v are horizontal and ver- x y are the center coordinates of R, I represents a As can be seen from this example, the SP of the background, partially hidden by the foreground SP, is often presumed to be the same disparity of the foreground SP.At this time, it is impossible to detect all erroneous detections with step 1 alone.In order to find erroneous disparity estimations of all the pixels in the background SP, re-matching the loser warped with the disparity of the winner in step 2 is necessary.
In this check, the pixel-wise error is used for the correspondence judgment of the left and right pixels, and the support area is not used.For this reason, the judgment is easy to make a mistake.Therefore, aggregation is performed on SP basis, and if the ratio of the total number of erroneously estimated pixels to all pixels is larger than the threshold TH check , erroneous estimation of the SP is determined.As shown in Figure 1(d), erroneous estimations due to occlusion and texture shortage are detected as a result of the one-way check.
In the estimation based on SP, lacks often occur in part of objects due to erroneous estimation.The cross recovery is the way to repair the lacks.The algorithm of the cross recovery is shown in Figure 3(a) and explained using the created by the authors of the paper [6].We used Microsoft VisualStudio 2013 for the software development.The OS of the PC that executed the program was Windows 10, the CPU was Intel Core i5, the operating frequency of the CPU was 3 GHz, and the memory capacity was 4 GB.

Results
The result of the cones is shown in Figure 4.Because the merged segmentation map is used to create the support area, only the pixels in the same object are matched, so the object boundary of FNC15 is clear.With the small block size 7 × 7, disparities in the same object vary due to matching failures, and a mottled pattern is observed in CNC07.CSC07, CSO07, and CSR07 have smooth disparity change in the objects thanks to the smoothing.There are few erroneous estimations of texture-less portions such as the tree lattice portion on the upper right.
Compared with CSC07 and CSO07, CSR07 shows that the lacks of objects such as cones in front are corrected by the cross recovery.
The result of teddy is shown in Figure 5.The roof of the house is texture-less, CNC07 has a lot of matching failures.CSC07, CSO07, and CSR07 use the same center search result as CNC07, but there is no big mis-prediction thanks to the smoothing.Furthermore, in CSO07, the occlusion part between the roof and the back wall is correctly detected by the one-way check.Even without using the cross check, the roof boundary is clear and there is no blur due to erroneous estimation on the boundary.
The accuracy and execution time of each algorithm for each image are shown in Table 1.Image resolution for each image is listed in the "res."row."#SP L, R" means the number of SPs in the left image and right image.Compared to FNC15, the accuracy degradation of CNC07 is great, but that of CSO07 is little.The processing time of CSO07 is about 1/100 of FNC15.In the proposed method with the one-way check, the center search can be done only in one direction   thanks to the introduction of the one-way check, and the block size can be small as 7 × 7 by introducing the smoothing.Furthermore, since the support area of the right image has no effect in the case of 7 × 7 block, the SP segmentation of the right image is unnecessary.Due to these factors, the proposed method with the one-way check is fast.
The accuracy of CSR07 in many cases exceeded the accuracy of FCN15, and the execution time was about 1/60.The accuracy of CSR07 exceeded the accuracy of CSC07, and the effect of recovery was shown.The accuracy of CSR07 was lower than that of CVF, but the speed was more than 15 times.The proposed method is as accurate as the full search and orders of magnitude faster than the state of the art.Such a property contributes to real-time generation of depth image generation, resolution enhancement, search range expansion, and expands the application scope of the depth image.
The execution time of each process against the image cones is shown in Table 2.In the one-way check, error calculation is performed for several left pixels matching the same right pixel, and in case of losers, re-matches are performed.
Therefore, the execution time of the one-way check is longer than the cross check that only confirms the matching of the left and right disparities.However, since error calculation is a comparison between pixels, execution time is much shorter than the center search which performs block matching.The execution time of two one-way checks is less than half of the execution time of one center search.Then the execution time of CSO07 is about 60% of CNC07.

Conclusion
In this paper, we proposed a fast method of depth image generation using SPs.
The proposed method reduces the computational complexity without severe accuracy degradation by fully utilizing SPs for all of pixel sampling, creation of support area, and disparity improvement.We introduced one-way check that performs error detection using unidirectional disparity obtained by only once center search.We also introduced a cost filter on SP basis.Disparities of the part without texture are improved by the cost filter which aggregates the weighted matching costs of the surrounding SPs.As a result of the experiment, the execution time of the proposed method was less than 1/100 of the full search and the accuracy degradation was slight.We also proposed a new method called cross recovery that restores lacks of objects problematic in SP-based matching.The accuracy of the method using cross recovery exceeded the full search, and the execution time was about 1/60.Speed up while keeping accuracy extends the application range of the depth image.To realize the real-time image processing of high-resolution video by developing a dedicated processor based on the proposed algorithm is a future work.

Figure 2 .
Figure 2. Onw-way check algorithm and its example.

example of Figure 3
(b).The disparity of the R region is +1, O region is +2, and G region is +3.Notice that pixel C and F are occluded.As a result of the center search from left to right, it is estimated that the disparity of R region is +1, O region is +3, and G region is +3.In the case of right to left, the disparity of R region is +1, O region is +2, and G region is +3.The top table of Figure3(c)shows the correspondence of left to right disparity (LRd), right pixel (LRp), and error (LRe) with respect to the left pixel (Lp).The second table shows the correspondence of right to left disparity (RLd), left pixel (RLp), and error (RLe) with respect to the right pixel (Rp).The third table shows the result of the usual cross check (CC).For example, although the disparity of C corresponding to D is +1, since the disparity of d corresponding to D is +2, the check fails.Since the pixels C to F become undefined due to the check failure, the disparity of O region is incorrectly estimated as +1 after the filling of disparities.The bottom table is the result of the cross recovery.Consider pixel C first.There is no right pixel to be warped to the left pixel C, and the RLd of C is unknown.When the RLd is unknown (condition 1), the disparity Ld is determined to be unknown.

Table 1 .
Accuracy and execution time of each algorithm for each image.

Table 2 .
Execution time of each process for cones.