A Hierarchical Grab Cut Image Segmentation Algorithm

This paper aims to speed up a segmentation algorithm “Grab Cut” by separating the process of segmentation into hierarchical steps. The Grab Cut algorithm segments images by means of the color clustering concept and the process requires a lot of iteration for it to get converged. Therefore, it is a time-consuming process which we are interested in improving this process. In this study, we adopt the idea of hierarchical processing. The first step is to compute at low resolution to make the iteration much faster, and the second step use the result of the first step to carry on iteration at original resolution so that the total execution time can be reduced. Specifically speaking, segmentation of a low resolution image will lead to high-speed and similar-segmentation result to the segmentation at original resolution. Hence, once the iterations at low resolution have converged, we can utilize the parameters of segmentation result to initialize the next segmentation on original resolution. This way, the number of iteration of segmentation at original resolution will be reduced through the initialization of those parameters. Since the execution time of low resolution images is relatively short, the total hierarchical execution time will be reduced consequently. Also, we made a comparison among the four methods of reduction on image resolution. Finally, we found that reducing the number of basins by “Median Filter” resulted in best segmentation speed.

important for image editing. There are two kinds of segmentation methods: "Auto" [1] and "Semi-Auto". Auto segmentation processes image without any other information. Therefore, the results are not reliable enough.
On the other hand, Semi-Auto segmentation added the user information, for example, a desired region assigned by user, will be more accurate and faster than auto segmentation. There is a powerful, popular semi-auto segmentation algorithm which is called "Graph-Cut" [2] [3]. Users only need to plot some points on the image for deciding which is background or foreground. Then the algorithm will segment the image with the user information automatically. Recently, an algorithm, which improved graphcut, is called "Grab Cut" [4]. Grab cut adds one more step in the processing. Before assigning the points, users need to mark two corners of a rectangle so that the first segmentation result will show up in the rectangle. Therefore, users don't have to specify so many points.
Our work focuses on the rectangle segmentation stage. Since the rectangle segmentation utilizes the clustering concept [5] [6], the process will need a lot of iterations to achieve converging. As a result, we provided the process that makes the original process faster. Specifically speaking, there are two steps of the process. The first step is reducing the resolution of the image and doing the segmentation which has the similar result as the original segmentation. Second, we take the converged result parameters of the first stage, then go back to the original resolution and use the parameters to do segmentation iterations. Since the result from low resolution stage is similar to original segmentation and will help the segmentation on original resolution to get converged with lesser iterations. Still the speed under low resolution process is faster than the original process. So, we will make the grand process faster than the traditional method.
The practical measure is about "Lazy Snapping" [7] which mainly discusses a concept of considering a basin produced by the watershed algorithm [8] [9] as a pixel. This way, the number of pixels will decrease heavily and the speed will improve greatly. Because a basin is now considered as a pixel, we use the smooth filter to reduce basins [10] as the way of decreasing the resolution. Additionally, we experiment with four methods of decreasing resolution which are median filter, mean filter, gaussian filter, and down sample. The experiment result is that the median filter is the best choice to our algorithm.

Gaussian Mixture Model
The implementation of grab cut segmentation is based on "Clustering" and "Max-Flow Min-Cut Theorem". Clutering is a concept and also an algorithm that separates multi-dimension data into several groups. Gaussain Mixture Mo-del (GMM) [11] is one of the existing clustering algorithms. The concept of GMM is to distribute several gaussian probability models to every data point and find out the model with the highest probability for each point. As a result, every data point belongs to one gaussian model. This way, we may distribute all the The equation form of the probability produced by the k-th gaussian model Now we have learned each data point belongs to the model with the highest probability. Therefore, we can calculate the mean value and the variance (matrix) of every gaussian probability model (component), the results are (2) and (3). Then we do the steps above until the results to get converged.

Max-Flow Min-Cut Theorem
If we build a flow network as Figure 1, we can see there are sources, "S", and a sinks "T". The denominator means the capacity of the flow. The numerator means the amount of current flow. All the flow cannot be disappeared; they must be the same amount as the converge flow. In a flow network, there can be different flow path from S to T, as Figure 1. So, there are different path to achieve max flow. If we separate two regions for S and T, it will be a "cut", as  amount of them are all the same. Because current flow value must be identical to the previous flow, or the flow will disappear. We can find out that in Figure 2, the capacities are 29, 26, 23, but the real flow amount must be 23 which just equal to the max flow amount in Figure 1. The reason is that the max flow is constrained to the bottle necks of a flow network, and the cut with capacity 23 is so called "Min-cut". As a result, if we want to find the max flow value, we can find the min cut and its capacity; if we want to find a min cut capacity, we can find the max flow value.
If we image there is a chessboard, the cross point is the pixel, we can apply this theorem [12] to images.

Grab-Cut Algorithm
For a segmentation algorithm, the cost function is always the necessity, they are as Equation (4)- (6), where k n means the n-th pixel belonging to k-th component of GMMs, and α n only possess value 0 and 1 which means the n-th pixel belonging to back/foreground. θ are the parameters of GMMs; they are weighting coefficients, mean values, and covariance matrix. z n is the pixel value of n-th pixel. Moreover, COV(α n , k n ) refers to the covariance matrix of the component that belongs to back/foreground and corresponds to the n-th pixel. L We can see that the function D in (4)-(5) is a muti-dimension gaussian probility distribution. Therefore, function D means how possible a pixel belongs to back/foreground. The total cost function is as (7) which includes function U and V. The function V is written as (8) If two neighbor pixels have large difference in value, they can be considered as background and foreground with low cost. If the values of two neighbor pixels are close, the cost of assigning them different back/foreground will be large. The segmentations executing with max-flow min-cut theorem and cost function (7) are as Figure 3.
In the end, the process of grab cut can be divided into four steps. First, user marks two corner of a rectangle to ensure a segmentation region, and the pixels inside will be used to build foreground GMMs while the pixels outside are used to build background GMMs. Second, these GMMs are used to form the cost function U, and the segmentation will begin. Third, the segmentation result forms new back/foreground GMMs. We need to check if the new GMMs are identical to the old ones. If they didn't change, it means that the segmentation is converged, the program will go to step four. If they did change, the program will back to step two until it converges. The fourth step is using graph cut to mark the imperfect region to complete the segmentation.

Lazy Snapping
Since the computation complexity of graph cut is O(mn 2 ), which n is the pixel number and m is the flow number. It is a third power of pixel number. So, [7] has proposed an algorithm that uses watershed segmentation and consider a Our work is based on lazy snapping and [10]. [10] has proposed a method that is using median filter to smooth the image and the number of basins will be decreased. Therefore, we can consider different degree of smooth as different image resolution, due to the reduction of basins.

Speeding up Method
Our proposed method is as Figure 4; the first step is to segment image for 2 iterations so that the result will be similar to the result of traditional method for sure. We can see Figure 5, for all experiment patterns, the energy (cost) at least decreased the amount of getting converged by 80%. The second step is to reduce resolution by using smooth filter, then use the GMMs parameters of the first step to continue segmentation. As a result, the GMMs parameters of the second step are closer to those converged parameters.
The third step is to go back to the original image and use the final parameters of step two to carry out a new segmentation. Although the process is more complicated, the execution time decreases due to the replacing in part of entire process by high speed computation in low resolution stage.

Simulation and Comparisons
We pick four approaches to accomplish the proposed speeding up algorithm.
They are median filter, mean filter, Gaussian filter, and down sampling. By using these filters and down sampling, we can lower the number of basins so that the speed of execution on these images is increased.
We use a method called "normalized ranking" with four methods to find out which method has the best execution time. The normalized ranking is to set the longest execution time as 1, the others are divided by the longest one. Therefore, we will get a grand score by summing scores on all patterns. Obviously, the largest grand score means the longest execution time. In our simulation, the median filter has the best grand score among four methods. The simulation platform is a personal computer with Intel Core i3-530, 2.93 GHz, and 4 G byte DRAM.
Moreover, in Table 1, we can see the average speeding up rate for various mask size of median filter. The number "9-1" means the segmentation was processed on an image pretreated by a 9 × 9 mask median filter, and then back to the original image. We can find out that the speeding up rate is from 20% to 30% for median filter of those sizes.

Conclusion
Image segmentation is such a highly developed field and so many researchers are still trying to make some improvement. Our improvement is mainly about speeding up and we used four methods to make the segmentation faster. Finally, we find out that median filter has the best performance among four methods. In the future, if there exists any need is strong to real-time while with less emphasis   quality of image, such as object tracking, our proposed algorithm may be useful for these kind of needs.