Regular Stereo Matching Improvement System Based on Kinect-supporting Mechanism

In this paper, we built a stereoscopic video associated experimental model, which is referenced as Kinect-supporting improved stereo matching scheme. As the depth maps offered by the Kinect IR-projector are resolution-inadequate, noisy, distance-limited, unstable, and material-sensitive, the appropriated de-noising, stabilization and filtering are first performed for retrieving useful IR-projector depths. The disparities are linearly computed from the refined IR-projector depths to provide specifically referable disparity resources. By exploiting these resources with sufficiency, the proposed mechanism can lead to great enhancement on both speed and accuracy of stereo matching processing to offer better extra virtual view generation and the possibility of price-popularized IR-projector embedded stereoscopic camera.


Introduction
Multi-view video systemization is the most emerging subject for bare eyes stereo vision. The camera array is utilized for the promising implementation in multi-view manufacture [1]. Generally speaking, the camera array applied to multi-view video photography is not suited for dynamic setups. The pre-processing filter is specified to modify the depth map for sustaining fewer holes in synthesizing another view [2]. If the filter is only designed for holes reduction [2], the side effect such as bending distortion [3] is hardly inevitable. Moreover, the one view with single-channel depth using DIBR is rather difficult to obtain adequate view angle extensions. With low-labor calibration, the binocular 3D-camera is acknowledged by proper equipment to straightforwardly capture the stereoscope two-view videos. However, when the depth decision fully leans on the stereo matching of paired color images, the suited depth acquisition of sparse-texture is quite difficult. In general, the IR-sensor emitting infrared (IR) for stably acquiring depths is usually expensive [4], and cannot completely sense materials being IR-detection unavailable. Therefore, a more valuable approach shall attempt to implement a common low-cost IR-sensor into an efficient auxiliary apparatus for generating high-quality depths. The Kinect IR-projector is not really considered to be useful for depth measurement due to its low spatial/depth resolution, distance-limited sensing and unstable depth capture, difficultly of detecting specular, transparent, and reflective objects. However, Kinect has become a very popular, low-price off-the-shelf depth detector nowadays; the stereoscope investigation based on Kinect begins emerging [5].
In [5], the work first performs the stereo matching between Kinect's IR-image and RGB image to generate a depth map. The depth map and the inner depth map are then computed using Kinect IR-projector captured patterns are fused for accomplishing a more qualified depth map. Because the captures of IR and RGB images by Kinect are in turn active, rather than synchronous, the mechanism [5] seems efficient, but at most only suited for monotonous, low-activity videos.
Theoretically, by adding an IR sensor, the handling of stereo matching problem shall become much easier. In effect, to effectively integrate two hetero images of large quality difference, Kinect IR-projector image and 3D-camera image is quite a challenge task. Therefore, our work is to identify a valuable approach of developing an inexpensive IR-sensor embedded 3D-camera for facilitating the multi-view manufacture can be obtained. Because the high-resolution depth maps can be easily yielded by operating stereo matching on the captured two views of 3D-camera, the depths captured by Kinect IR-projector only play a reference/consultation role rather than an arbiter in stereo matching operations.
Our strategy leaves the IR-projector depth purely as a stereo matching indicator, so some IR-projector problems need not to be taken care of any more. And fortunately, the add-on charge coupled device (CCD) camera in Ki- The proposed system contains a series of appropriated refinement processing on the IR-projector image, the registration from the main view image to the IR-projector depth image, and the stereo matching improvement supported by the refined IR-projector image, as shown by Figure 1. The computation of homographic transform matrixes for the registration between Kinect and 3D-camera images is the most complex part in the proposed system, but only performed once at the initialization of depth generation of a 3D video sequence.

Cross Geometrical Image Relation Identification
The most troublesome barrier is the large content differences between 3D-camera color image and IR-projector depth image in their registration. Since the Kinect add-on color image can be easily calibrated to the IR-projector depth image synchronically captured, it is a good intermediate interface for the following registration. Specifically, the mapping for image registration will run from 3D-camera main view image to Kinect add-on color image, and then to IR-projector depth image. For well mutual images registration, the geometrical image relation identification need to be performed in advance so it includes the image localization (partition) by analyzing feature points distribution and the global transition registration by selecting a representative registration feature point based on the precedence of center location and sa-lient response strength. Generally speaking, the 3D camera and the Kinect sensor are not difficult to obtain adequate parallelization of photographing planes via hand-made adjustment for constrain their perspective difference as much as possible. Therefore, the remaining registration task is to find the relative transition and rotation between two frames captured by them. The implementation of relating the global center registration between 3D camera Image and Kinect Color Image is depicted as below.
For simplifying the registration of two images with large-scale resolution difference, the faster way is to pull down the resolution of high-resolution image to compromise low-resolution one. Therefore, in the study, the main-view color image of 3D camera (abbreviated as MCI) is made large-scale down-sampling and then small-scale linear interpolation to attain same low horizontal/vertical resolutions as Kinect RGB image. Before computing the image registration (homographic transform) matrix, the relative global location difference between the resolution-reduced main-view color image (RMCI) and the Kinect color image (KCI) need be estimated in advance. Such a location difference need be shift back for the translation of two targeted images during performing their registration. In this work, a center zone in RMCI is allocated to extract several speeded up robust features (SURFs), of which their response strengths can exceed a given threshold, as the candidate anchor nodes. The simultaneously sampled KCI is also extracted at the same number of SURFs as its candidate anchor nodes by the same way operated on RMCI. A double checking process is then utilized to identify the trustful reference points between two images for making the necessary global transition at the beginning of registration. The process tests the anchor nodes one by one from the place nearest the mask center to those around the mask borders in the clockwise or counter-clockwise direction by two phases shown as follows.

Undetectable Depth Assignment and Edge Preserving De-noising
For better promoting the utilization confidence of IR-projector depth image, IR-projector depth image after undetectable depth assignment and edge preserving de-noising will be further refined by three proposed processes. The first step of the process is to detect the single impulse noise in the IR-projector depth image by subtracting the strength of each pixel (sensed depth value) and the strengths of its 8 neighbors. If all the 8 subtraction outcomes of a pixel are larger than a threshold, this pixel is regarded as single impulse noise. The strength of this pixel is then replaced by the mean of the 8 neighboring strengths for removing its impact on the subsequent processes, the processed image is called the impulse-noise dropped IR-projector depth-image (IDIRI). The second process is for marking the so-called steady edges. Via performing the Sobel-filter filtering on IDIRI, each edge point is examined whether its filtered intensity is similar to anyone of its 8-neighboring filtered intensities or not. If it does, this point is denoted as a steady edge point. For simply enhancing the effect of edge preserving via bilateral filtering, the procedure of bilateral filtering proposed herein will skip over the steady edge points. Except the steady edge points, of which set is grouped as set G ssp , the remaining points of IDIRI are performed by an appropriated bilateral filter with adaptive piece-wise mask.
Through the bilateral filtering of window size of (2L+1)× (2L+1), The filtered intensity at (x, y), denoted by I SIR (x, y), is given by The bilateral filter applies several window sizes, where L and its maximum, denoted by L max , are proportional to the quantized I IR (x, y) and the standard deviation of quantized I IR (x, y), respectively, on the impulse-noise dropped IR depth image. More specifically, the point closer to the camera (with the larger depth value) will be filtered by the bigger filtering window. The larger standard deviation of quantized I IR (x, y) will cause the smaller L max applied.

IR-Projector Supported Stereo Matching Improvement
The candidate anchor nodes of paired RMCI and KCI are treated as targets of Random Sample Consensus (RAN-SAC) processing to obtain the registration matrix for the registration achievement between the pixels of KCI and that of MCI.

Adjustment of Referred Search Location and Search Region
The computed by Kinect IR-projector detected depth can offer good and fast search references in the search of two views stereo matching. Further adjustment is necessary, if two adjacent pixels having close chrominance and luminance but quite different search reference positions referred by the IR-projector image. Such a case usually happens nearby the borders of two occluded objects. In this case, the suspected search reference location shall be replaced by a higher confidence one, but in practice, the definition or measurement of confidence is not easy. For the views stereo matching from left to right, the computation of original search reference position at (x, y) is given by (2) . .
. (5) disparity of pixel at (x, y) in the left view image that (x', y') is registered to (x, y) by the obtained registration matrix. The search range centered at P S_ref (x, y) is then set as to find an adequately matched pixel on the right view image that the searching offset χ(I IR (x', y')) is a variable according to I IR (x', y'). In (5) and (6), Γ(.) and χ(.) are mainly relevant to the display screen parameters and the dynamic range of IR-projector captured depths.

Pixel Extrapolation Outsides the IR-Projector Image Region
When the registered location of referred point in the IR-projector image exceeds the image region, the effective extrapolation is necessary for figuring out the depth value in that location. Since the pixel extrapolation can be considered as the extension of image size, therefore a continuity-preference predication is able to be addressed to extend the IR-projector image. The registered pixel exceeding yet still contacts the border of current extended IR-projector image will have three (or two) neighbors, which has the depth values (IR-projector detected intensities), among its 8-nrighboring locations. Assume the pixel position is (x', y'). The connection straight line among the other three connection straight line radiated from (x, y) has the lowest depth change is selected such that the difference of successive two pixels on it will be adopted to extrapolate the pixel intensity at (x, y). The continuity-preference predication to predict the pixel intensity at (x, y), denoted by Ĩ IR (x, y), is formularized by where (∆x = 1, ∆y = γ), (∆x = −1, ∆y = −γ), (∆x = γ −1 , ∆y = 1) and (∆x = −γ −1 , ∆y = −1) are set for extrapolating (extending) the exterior pixel along the left border, the right border, the bottom, and the top of IR-projector image, respectively. Parameter γ as the slope of e continuity preference is given by

False-Reduced Modification for Stereo Matching Search References
After applying the homographic transform matrix to map coordinates from IR-projector image to the main view image, the necessary adjustment of original search locations will be set for rational relations rather than accurate status from the point of geometrical view. The proposed method is following and exploiting the raster-scan processing order for prompt adjustment that its procedure is depicted as follows. For the non-leftmost pixels at (x ≠ 0, y)'s, if the criterion below is satisfied: is the color vector at (x, y), δ C and δ S are empirical thresholds. Similarly, for the leftmost pixels at (0, y)'s, if the criterion given by holds, then P iS_ref (x, y−1) substitutes for P S_ref (x, y). For alleviating the miss-matching error resulting from unsuitable modification in suspected P S_ref (x, y), the search region for the pixel at (x, y) is enlarged for making a protective compensation in stereo matching. The adopted straightforward way is to add a fraction of δ S to the search distance.

Stereo Matching Acceleration by IR-Projector Depth Image
For facilitating the mapping between refined IR-projector depths and stereo matching depths as well as removing wrong or inappropriate differences in flat zones, the above refined IR-projector depth is quantized in advance. Then, the level mapping relation, which is associated with dynamic regions registration and one-on-one statistic observation, between the stereo matching depth and the above-refined IR-projector depth can be statistically estimated. Its formula is treated as a mutual mapping function of heterogeneous depths. Thus, via the heterogeneous mapping of refined IR-projector depth to 3D camera dual images, the initial search coordinate can be obtained to improve both of speed and accuracy of the stereo matching for the depth computation in testing point. This is suited for all of existing stereo matching algorithms including supporting weight [6], cross-based and census transform ones [7]. . (6) .

Simulation Results
In our experiments, the stereo matching adopts AD-Census method [7] to find the best matched point in right view images from the left (main) view ones. Figure 2 displays an original IR-projector image and its refined result with the proposed processes, the refinement speed of IR-projector can obtain to 30 frames per second. It demonstrates that the refined IR-projector images are quite stable that various noise, photography artifacts, and IR-detection unavailable parts causing IR-sensed image holes can be removed. In Figure 3, the stereo matching outcomes without and with the proposed Kinect-supporting improvement are compared. By the Kinect-supporting improvement, the stereo matching accuracy can be raised especially for the fatness or sparse-texture parts. The Kinect-supporting stereo matching against original stereo matching speed-up is 34.08%, and the frameby-frame computational overheads acquiring the referred search points are counted in the former except for homographic transform matrixes generating, which belongs to the system-setup initialization.

Conclusion
In this study, a Kinect-supporting mechanism with regular structure is proposed for efficiently improving the stereo matching processing. Through proposed different-resolution hetero-image registration, the disparities linearly computed from those refined IR-projector depths are applied to the main view color image of 3D camera as disparity searching references. By concisely exploiting the disparity reference resources, the proposed scheme can lead to the effectiveness of promotion for the accuracy and speed of stereo matching. This investigation indicates that developing a low-cost IR-sensor embedded 3D-camera, by which the multi-view video beyond five views generation can be manufactured rapidly as soon as users (or artist) shoots a two-view video sequence.