A Fast Depth-Map Generation Algorithm based on Motion Search from 2D Video Contents

doi:10.4236/jsea.2012.512B028

Paper Menu >>

Journal Menu >>

A Journal of Software Engineering and Applications, 2012, 5, 144-148

doi:10.4236/jsea.2012.512b028 Published Online December 2012 (http://www.scirp.org/journal/jsea)

A Fast Depth -Map Generation Algorithm based on

Motion Search from 2D Video Con t ent s

Weiwei Wang1, Yuesheng Zhu2

1Communication and information Security Lab; 2Shenzhen Graduate School, Peking University, China.

Email: cxjxfkl@126.com ,zhuys@pkusz.edu.cn

Received 2012

ABSTRACT

Generation of a depth-map from 2D video is the kernel of DIBR （Depth Image Based Rendering） in 2D-3D video

conversion systems. However it occupies over most of the system resource where the motion search module takes up

90% time-consu ming in typic al motion estimat ion-based depth-map generation algorithms. In order to reduce the com-

putational co mplexity, in this pap er a new fast depth-map generation algorithm based on motion search is developed, in

which a fast diamond search algorithm is adopted to decide whether a 16x16 or 4x4 block size is used based on Sobel

operator in the motion search module to obtain a sub-depth-map . Then the sub-depth-map will be fused with the

sub-depth-maps gotten from depth from color component Cr and depth from linear perspective modules to compensate

and refine detail of the depth-map , finally obtain a b e tter depth-map. The simulation results demonstrate that the new

approach can greatly reduce over 50% computational complexity co mpared to o ther existing methods.

Keywords: Block-Matching; Dep th-Map; Motion Search; DIB R

1. Introduction

Comme rcialization and industrialization of

three-dimension televisions (3D TV) [1]not only depend

on the development of 3D display as well as standardized

technology, but also rely on a large amount of 3D video

contents. Although 3D movie are on its way to develop,

currently the 3D video contents are still not rich enough

to satisfy the 3D-video market needs. In fact, the market

is overwhelmed with 2D video. Converting 2D video

into 3D video automatically and enabling the existing

movies to be played on 3D displays becomes an impor-

tant way to alleviate the shortage of 3D program. Also

the 2D to 3D conversion technique can deliver the 3D

videos effectively and efficiently. Therefore, the transi-

tion from 2D to 3D video is a low cost solution for the

3D industry compared with that captures 3D video di-

rectly.

There are some approaches for converting 2D video

into 3D video [2-10]. Depth-map contains information

relating to the distance of the scene obj ects from a vie w-

point in a video content, and generating a depth-map

effectively from 2D video is the kernel of DIBR in

2D-3D video conversion systems. The basic principle of

DIBR [2,11] is to obtain a depth-map from 2D video and

then synthesize the left and right views. The depth from

motion （DFM）[5] is a kind of depth-map generation

algorithm in which video is segmented first and the

frame dispar ity is esti mated to obtain t he depth-ma p. But

the DFM requires that moving objects must exist in suc-

cessive frames. Fusion with color information can im-

prove the depth-map quality [12] [13]. In the literature

[12], the depth-map generated from motion-parallax is

fused with color segmentation to obtain a clear and relia-

ble depth-map, but its color segmentation algorithm in-

tro duces hi gh co mputa tional co mplexity. In the literature

[13] motion estimation is performed by using luminance

and c hromi nance informatio n in motion se arc h modul e to

yield a reliable depth-map and reduce the computation

complexity. However, the computational complexity of

the color information in motion search module is still

high.

In this paper, a new fast depth-map generation algo-

rithm based on motion search is developed, in which a

fast diamond search algorithm is adopted to decide

whether a 16 x 16 or 4x4 block size is used based on

Sobel operator[14,15] in the motion search module

without using color information to obtain a main

sub-depth- map, and then the depth from color compo-

nent Cr [4] and the depth from linear perspective [6] are

used as auxiliary sub-depth-maps to fuse the main

sub-depth- map . Finally the bilateral filter is adopted to

A Fast Depth-Map Generation Algorithm based on Motion Search from 2D Video Contents

145

eliminate the block effect and the staircase edges that

remained in the fused depth-maps. T he results show that

with the proposed algorithm a smooth and reliable

depth-map and a better visual 3D video can be obtained

with low computational complexity compared to the

methods in [12] and [13].

The remainder of the paper is organized as follows.

The proposed depth-map generation algorithm is pre-

sented in section II. Experimental results are provided in

secti on III. Final ly, a conclusion is given in sectio n IV.

2. The Proposed Depth-Map Generation

Algorithm

The block diagram of proposed algorithm is shown in

Figure 1.

In Figure 1, the final depth-map is fused with three

sub-depth-maps, that is, depth from improved motion

estimatio n, depth from color component Cr, and depth

from linear perspective. In depth-map fusion, depth from

color component Cr and depth from linear perspective

are used as auxiliary sub-depth-map to compensate the

main sub- depth-map gotten from improved motion es-

timation. And then a bilateral filter is adopted to elimi-

nate the block effect and the staircase edges. In this sec-

tion, the module of improved block-matching based

depth from motion estimation is developed and described.

The approach and the corresponding algorithms are de-

scribed in de ta il a s follows.

2.1. Dep th from Imp r oved Motion Esti ma tion

In paper [7], the motion estim ation is performed by using

luminance information, which may cause mismatch in

the areas where the luminance components tend to dis-

tribute uniformly. In paper [10], luminance and chro-

minance information is adopted in the motion vector

processing which uses Y (lu mina nc e component), Cr (red

component) and Cb (green component) to calculate the

motion vectors. Our comparison results of time con-

sumption in motion search module for four cases:(1) Y,

(2) Y and Cb,(3) Y and Cr, (4) Y, Cr and Cb are shown in

Figure 2.

Frame n-1

Frame n

Depth-Map

Fusion Bilateral

Filter

Depth From Linear

Perspective

Depth From Color

Component C

Depth From Improved

Motion Estimation

Final Depth-

Map

Figure 1. Block diagram of proposed algorithm.

Figure 2. The time cons ump ti o n in moti o n se arch mod ul e i n

four cases.

The results indicate that using color information in

motion search module would increase the computational

complexity. So , in the proposed module, the motion es-

timation is performed by using luminance information

only in motion search module without using color infor-

mation. But the depth from color component Cr is used

as auxiliary sub-depth-map to fuse the sub-depth-map

obtained from improved motion estimation.

In motion search module current fra me is divided into

small blocks for depth assignment while the correspond-

ing small blocks in reference frame are used as center

and expanded to bigger blocks for matching. Then

block-matching based motion estimation is performed to

find the best match block and the motion vectors gener-

ated are used to assign depth for small blocks. The

block-matching algorithm based motion estimation [7]

utilizes the fact that objects with different motions

usually have different depths. Near objects move fa s t er

than far objects and the relative motions are used to es-

timate the depth-map. The depth value D(i,j,k) are esti-

mated b y the magn itud e of the motion vec tors as follows:

22 ),,(),,(),,( yx kjiMVkjiMVCkjiD +=

(1)

where MV(i,j,k)x and MV(i,j,k)y are horizontal and ver-

tical component s of the motio n vectors and C is a prede-

fined constant. It is noted that the motion searching

module takes as much as 40 percentage of the total time

consumption. In order to reduce the computational com-

plexity a fast diamond search algorithm is adopted in the

new motion sear ch mo d ule .

The common algorithms generate depth-map based on

motion esti mation in 4 x 4 block size. While, we observe

that if the frame picture is homogenous enough, the

depth value of blocks is close to their neighbors. W e fi nd

it will save much computational complexity if we use

16x16 block size instead of 4 x 4 block size in homoge-

neous area. To evaluate the smoothness of a picture, sta-

tistical measurement such as standard deviation, variance ,

A Fast Depth-Map Generation Algorithm based on Motion Search from 2D Video Contents

146

skewness and kurtosis [16] are used. Paper [14,15] use

Sobel operator to create the edge maps of pictures for

high efficiency of video coding process. I n order to clas-

sify the homogeneity of a block, the amplitude of edge

vector is defined by formula (2).

The vertical and horizontal directions are defined ac-

cording to a luminance or chrominance pixel at position

(i,j) with value Vij by formula (2)(3). Then the block

homogeneity measureme nt H can be set b y formula (5).

jiji

ji EyExEAmp ,,

,)( +=

(2)

,11, 11, 11, 1, 11, 1

,2 2

ijijiji jijijij

Ex vvvvvv

− ++++−−−+−

=+×+−−×−

(3)

1,1,11,11,1,11,1, 22 +−−−−+++−+ −×−−+×+= jijijijijijiji vvvvvvEy

(4)

where

,,...,2,1,,...,2,1 CjRi ∈∈

1,( )

ij H

ij N

Amp EThd

Hotherwise

≤<

<



=





∑

(5)

After the block size is assigned, the fast diamond

search algorithm makes two round iterations to calculate

the motion vector s [17]. In the fir st rou nd, it takes a lar ge

diamond search pattern (Fig.3a) and 9 test points are

compared to find the best points. The iteration process

breaks out when the center point is just the most match-

ing point. The second round aims to find the best point in

a small diamond search pattern (F ig.3b). The first round

takes at least 90% time according to related experiment.

So we design a quick quit scheme to exit the iterative

loop ahead of time if the SAD (Sum of Absolute Differ-

ence) of current point is less than the predefined thre-

shold. A fast depth-map generation algorithm based on

motion searc h is as follows.

Step1. Read the current macro block data (16x16 block

size) and compute the homogeneity measurement H of

the block by formulas (2)(3)(4)(5). If H equals to 0, the

block is divided into 16 little 4 x 4 block size, and the

motion search module will be based on the 4 x 4 block

size, otherwise the large 16 x 16 block size will be used.

Figure 3. Diamond search pattern. (a) Large diamond

sear ch patter n. (b) Small diamond search pattern.

Step2. Compute the SAD of the 9 neighbor points of

center point in large diamond pattern (Fig.3a). If the

SAD value of current point is less than the given thre-

shold T1, go to step 4; if it is larger tha n T1 b ut less than

T2, go to step 3; otherwise, compute the minimum SAD

value and loop over the 9 points.

Step3. Compute the SAD of the 5 neighbor points of

center point in small diamond pattern (Fig.3b). The best

point will be cho s en as the ma tching point.

Step4. Compute the depth value of the block with for-

mula (1).

2.2. Depth from Color Component Cr

In the research of [4], it has been proved that different

objects have different hues in the 2D color video se-

quences, and each of the hues has its own associated grey

level intensi ties in the Cr co lor co mpo nent i mage s. If we

take the gray level intensities as indexes of depth, the

depth of the boundaries of each object is different from

that of its immed iate surround ings. Therefore gr ay int en-

sity images associated with the color component Cr of

standard 2D-colour video sequences can be used as

proxy depth-map. In our situation, the depth from color

component Cr which is derived directly from the current

frame of the 2D images is used as auxiliary sub-depth-

map to fuse the sub-depth-map from improved motion

estimation. The fused depth-map can increase the accu-

racy and detail of sub-depth-map. Others, for static scene

we can get the different depth values using color com-

ponent Cr while the other methods [7] obtain the same

depth values. So depth from color component Cr can

strengthen the la yer o f the stereo videos. The depth from

color component Cr is shown in Fig. 4(b).

2.3. Depth from Linear Persp ect ive

Research in [6] shows that depth fro m linear perspective

can make the stereoscopic video more comfortable for

human to watch. In our method, near to far global scene

depth gradient is applied as the auxiliary depth map as

human visual perception tends to interpret most of the

Figure 4. (a) The depth-map from color component Cr.(a)

The original “cheerleader” video image.(b)The depth from

color component Cr.

A Fast Depth-Map Generation Algorithm based on Motion Search from 2D Video Contents

147

images represent scenes in which the bottom part is re-

lated to the ground and consequently close to us and the

upper part represents the sky and consequently far from

us.

2.4. Depth-Map Fusion

The final de pth-map is fused with three sub-depth-maps,

that is, depth from improved motion estimation, depth

from color component Cr, and depth from linear pers-

pective. In this paper a simple linear module is used to

fuse the three kinds of sub-depth-maps. The fused

depth-map can be described in the following equation :

llccmmall

WDWDWDD ×+×+×=

(6)

1=++

lcm

WWW

(7)

Where Dm, Dc and Dl are the values of sub-depth-maps

estimated b y motion estimation, color component Cr and

linear perspective respectively. Dall is the values of the

fused depth-map while Wm, Wc and Wl are the weights

of them.

Because in depth-map fusion, depth from improved

motion estim ation is used as the main sub-depth-map, and

depth from color component Cr and depth from linear

perspective are used as the auxiliary sub-depth-maps,

selecting the values of Wm, Wc and Wl we follow the

principle that th e value of Wm is larger than Wc and Wl.

3. Experimental Results

To evaluate our proposed algorithm, several test se-

quences are used to perform the [12] ，[13] and our

met hod to make some comparisons on the efficiency.

We set the threshold of Sobel to 5000 while the thre-

sholds of motion search modules are 0.15 and 0.3. The

test result s are shown in Ta ble 1.

Table 1. The test results of several se quence

Sequence Time(s)

Po [12] Chen [13] Our method

FLOWER 4.12 8.12 2.92

CO AST GUARD 3.45 8.23 1.98

CARPHONE 4.23 7.98 2.03

FOREMAN 4.47 8.43 2.65

MOBILE 5.01 8.90 3.21

CALENDAR 5.43 8.79 3.45

According to Table 1,we can easily find that the effi-

ciency of o ur me t ho d i s b et ter tha n t he al go ri t hms i n [1 2 ]

and [13], about 31% of time saving than [12] and 65% to

[13]. The promotion is especially remarkably in CAR

PHONE and COASTGUARD for the large scale of

smooth area in these two sequences. It is the contribution

of Sobel operator which decides whether a 16x16 or 4x4

block size is used, and the quick quit scheme to exit the

iterative loop ahead of time in the search module.

The sub-depth-map obtained from the improved mo-

tion estimation module is shown in Fig.5.(b). From the

Fig.5.(b), we can see that there are many isolated points

in this sub-depth-map. So this sub-depth-map will be

fused with the sub-dep th -maps gotten from depth from

color component Cr and depth from linear perspective

modules to eliminate these isolated points and compen-

sate this sub -depth-map and get a be tter fused depth-map

as shown in Fig.5.(c). Finally, a smooth and reliable

depth-map shown in Fig.5.(d) is obtained by passing the

bilinear filter .

4. Conclusions

In this paper, a fast depth-map generation algorithm

based on motion search is proposed to enhance the effi-

ciency of the generation of depth-map. In the new pro-

posed modules, a fast diamond search algorithm is

adopted to decide whether a 16x16 or 4x4 block size is

used based on Sobel operator in the motion search mod-

ule without using color information to obtain a

sub-depth -map, and this sub-dep t h-map will be fused

with the sub-depth-maps gotten from depth from color

component Cr and depth from linear perspective modules

respectively to compensate this sub-depth-map and ob-

tain a improved fused depth-map. Finally, the bilateral

Figure 2. Depth-map from our proposed algorithm (a) The

original “flower” video image. (b) The sub-depth-map es-

timate by improved m otion estimation module. (c) The fused

depth-maps after fused w ith three s ub-depth-maps. (d) The

final depth-ma p by p assing bilinear filter.

A Fast Depth-Map Generation Algorithm based on Motion Search from 2D Video Contents

148

filter is adopted to eliminate the block effect and the

staircase edges that remained in the fused depth-map.

The results show that with the proposed algorithm a

smooth and reliable depth-map and a better visual 3D

video can be obtained with over 50% reduction of com-

putational comple xity compared to the other methods.

REFERENCES

[1] M. Op de Beeck, and A. Redert, “Three Dimensional

video for the Home,” Proceedings of International

Conference on Augmented, Virtual Environments and

Thr ee-Dimensional Image, May-June 2001, pp.

188-191.

[2] Fehn, C.: Depth-image-based rendering (DIBR), com-

pression, and transmission for a new approach on

3D-TV. SPIE, vol.529 1, no.2, pp. 93-104 2004.

[3] P. Harman, J. Flack, S. Fox, M. Dowley.: Rapid 2D to

3D Conversion. Proceedings of SPIE, vol. 4660, pp.

78-86, 2002

[4] Wa James Tam, Carlos Vázquez, Filippo Speranza.:

Thr ee-dimensional TV: A novel method for generating

surrogate depth maps using color information. Proc.

SPIE Electronics Imaging-stereoscopic Displays and

Applicati ons X X, 2009.

[5] D. Kim, D. Min, K. Sohn.: Stereoscopic video genera-

tion method using motion analysis. in Proceedings of

the 3DTV Conference, pp. 1–4, Kos Island, May

2007.

[6] Sung-Fang Tsai, Chao-Chung Cheng, Chung-Te Li,

Liang-Gee Chen.：A Real-Time 1080p 2D-to-3D

Video Conversion System. IEEE Transactions on

Consumer Electronics, Vol. 57, No. 2, pp. 803–804,

May 2011.

[7] I. Ideses, L. P. Yaroslavsky, B. Fishbain.: Real-time

2D to 3D video conversion. Journal of Real-Ti me Im-

age Proces s ing , vol. 2, no.1, pp. 3 –9, 20 07 .

[8] C. Tomasi, R. Manduchi.: Bilateral Filtering for Gray

and Color Images. Proceedings of the IEEE Interna-

tional Conference on Computer Vision, Bombay, India,

pp.839-846, Bombay January 1998.

[9] M. T. Pourazad, P. Nasiopoulos, and R. K. Ward.: An

H.264-based scheme for 2D to 3D video conversion.

IEEE Transactions on Consumer Electronics, vol.55,

no.2, pp. 742–748, 2009.

[10] A.-M. Huang, T. Nguyen.: Motion vector processing

using the color information. IEEE International Con-

ference on Image Processing, ICIP, pp. 1605-1608,

Cairo, 2009.

[11] W. J. Tam, f. Speranza, L. Zhang, R. Renaud, J. Chan,

C. Vazquez.: Depth Image Based Rendering for Mul-

tiview Stereoscopic Displays: Role of Information at

Object Boundaries. Three-Dimensional TV, Video,

and Display IV, vol. 6016, pp. 75-85, 2005.

[12] L. Po, X. Xu, Y. Zhu, S. Zhang: Automatic 2 D-to-3D

video conversion technique based on

depth-from-motion and color segmentation. IEEE In-

ternational Conference on Signal Processing, ICSP,

pp.1000-1003, 2010.

[13] J. Chen, Yuesheng Zhu and X. Liu.: A New

Block-Matching Based Approach for Automatic 2D to

3D Conversion. The 4th International Conference on

Computer Engineering and Technology, ICCET, pp.

109-113，Thailand,2012.

[14] D. Wu, S. Wu, K. Lim, F. Pan, Z. Li, X. Lin: Block

INTER mode decision for fast encoding of H.264.

IEEE International Conference on Acoustics, Speech,

and Signal Processing, 2004. Proceedings. (ICASSP

'04). vol.3, 2004 pp. iii- 181.

[15] F. P an, X . Lin , R. Susanto, K. P. Lim, Z. G. Li, G. N.

Feng,D. J. Wu, and S. Wu.: Fast Mode Decision Al-

gorithm for Intra Prediction in JVT. 7th JVT meeting,

JVT-G013, Thailand, March 2003.

[16] K. R. Castleman.: Digital Image Processing. Prentice

Hall Inc, 1996.

[17] Shan Zhu, Kai-Kuang Ma.: A new diamond search

algorithm for fast block-matching motion estima-

tion. IEEE Transactions on Image Processing, vol.9,

no.2, pp.287-290, Fe b 2000.