Video Frame ’ s Background Modeling : Reviewing the Techniques

Background modeling is a technique for extracting moving objects in video frames. This technique can be used in machine vision applications, such as video frame compression and monitoring. To model the background in video frames, initially, a model of scene background is constructed, then the current frame is subtracted from the background. Eventually, the difference determines the moving objects. This paper evaluates a number of existing background modeling techniques in terms of accuracy, speed and memory requirement.


Introduction
Detection of objects or persons in a video sequence requires, in most of the techniques, that the background of the frame be omitted from the scene.A common method for extracting moving objects in video sequences is background subtraction [1,2].This technique can be used in monitoring applications such as work place security, traffic control and video frame compression [3][4][5].To detect the moving objects in video frames, initially, the model of scene background must be constructed (i.e. the image without the moving objects), then current frame is subtracted from the background model and eventually, the difference, determines the moving objects [6,7].
Background modeling can be classified into two main groups: non-statistical [8-10] and statistical approaches [2,11,12].In the former group, the background image, usually from the initial frame, is modified along the frame sequences.In this approach, to extract the moving objects in the video sequences, the difference between the current frame and the background model is computed.Non-statistical approaches are fast, hence, they are suitable for real time applications.
The non-statistical background modeling presented in [1,13] namely RGABM, considers each pixel in a frame to be either as part of the moving object (simply the object) or the background.In this approach, the first frame is considered as the background and subsequent frames are subtracted from the background.Then the pixels with a value higher than a threshold are considered as the ob-jects.In this approach, the background is updated along the frame sequences.
In the second group of background modeling approaches, the statistical based approaches, the probability distribution functions of the background pixels are estimated; then, in each video frame, the likelihood that a pixel belongs to the background is computed.The statistical based approaches have a better performance, compared to the non-statistical based approaches, in modeling background of the outer scenes.However, they may require more memory and processing time and hence be slower than the non-statistical based approaches.
One of the important statistical-based approaches to model the background image is the Gaussian mixture model.This approach uses mixture of models (multimodels) to represent the statistics of the pixels in the scene.The multimodal background modeling can be very useful in removing repetitive motion from, for examples shining water, leaves on a branch, or a wigging flag [14,15].This approach is based on the finite mixture model in mathematics, and its parameters are assigned using the expectation maximization (EM) algorithm.
The non-parametric statistical-based background modeling presented in [11] can handle situations in which the background of the scene is cluttered and not completely static.In the other words, the background may have small wiggling motions, as it is in tree branches and bushes.This model estimates the probability of observing a specified value for a pixel in its previous values , obtained from older frame sequences.This model is frequently updated to adapt with the changes in the scene, hence to have a sensitive detection of moving targets.
There is a tradeoff between computational speed, memory requirement and accuracy in using the statistical based methods compared to non-statistical based methods.It is important for users to know the capabilities of different techniques, to choose the suitable method for their applications, which is the aim of this paper.
There are a number of issues need to be considered in any background modeling technique, they include detecting objects from the background, updating the background during time and extracting moving objects from video frames.These issues are considered as comparison factors in the evaluation process of this paper.
The rest of this paper is organized as following: Section 2, and Section 3, respectively, review a number of non-statistical and statistical background modeling methods.Experimental results of evaluating different background modeling methods on various videos are presented in Section 4. Finally, the paper is concluded in Section 5.

Non-Statistical-Based Background Modeling Methods
The non-statistical approaches suppose that the background is an image, usually from the initial frame, which is modified along the frame sequences.These approaches, aimed to extract the moving objects in the video sequences, use the difference between the current frame and the background model.The non-statistical methods are suitable for real time applications as they are considerably fast.To detect moving objects, in these approaches, subsequent frames are subtracted from the background, and then pixels with the value higher than a threshold are considered as the objects.A number of existing nonstatistical background modeling methods are briefly described in the following subsections.

Background Modeling Independent of Time
This method is the simplest approach for computing background, which is independent of time; hence this method is named as BMIT (Background Modeling Independent of Time) [14,16,17].In this approach, the first frame in video frame sequences is supposed to be the background and remains unchanged along the video sequences.The mathematical description of the background model can be represented as

The Improved Basic Background Modeling
BMIT suffers from noise and varying luminance in image sequence.The improved basic background modeling (IBBM) method was developed in [5] to alleviate the deficiencies of BMIT approach.Once the pixel value of the absolute difference frame is more than the threshold value, the pixel is regarded as part of the foreground; otherwise, it is assigned to the background.Whenever a pixel belongs to the moving object, it should be updated; otherwise, it is not essential to update.According to this idea, the mathematical description of IBBM can be expressed as the following [16]: , where , k x y AD is the pixel (x, y) of the absolute difference frame between the k-th captured frame and the (k-1)-th background model, i.e: , , The IBBM method can be used to reduce the noise effect and the varying luminance effect.However, IBBM has deckle effect hence it suffers from updating the deckle of the foreground in the background model [16,18].On the other hand, if the foreground and background have similar colors then the wrong updating occurs.

The Long-Term Average Background Modeling
To solve the deckle effect problem of IBBM, the longterm average background modeling (LTABM) was suggested in [15,19] as defined bellow: , 1 and its recursive model is as follows: The LTABM computes the average by involving the whole past frame up to the current frame.This approach depends on the number of frames (K).The smaller the number of frames, the larger the weight of each frame; hence, noise in each frame would be considered more.
By increasing the number of frame, the weight of each frame is reduced; subsequently, the luminance variation would considerably generate amount of noise effect on the background.

The Moving Average Background Modeling
The moving average background modeling (MABM) is the pixel (x, y) of k-th background model.improves the LTABM, by employing the following definition for background model [10]: where W is the moving length.The background is the average of recent W captured frames.The weights of the last W frames are equal, but it cannot be written in a recursive form, which results in a very high memory requirement.

Running Gaussian Averaging Background Modeling
The Running Gaussian Averaging Background Modeling (RGABM) approach not only can be used to reduce the varying luminance effect and noise, but also can be written in a recursive form, as follows [16]: where previous frames, and  is the background updating rate.In this method, the value of  is very important and is typically set to 0.05 [18].

Statistical-Based Background Modeling Methods
In statistical-based background modeling, the probability function of background is estimated.This function determines the probability for the belonging of the pixel to the background.Despite non-statistical based methods, these approaches are suitable for modeling outdoor and dynamic scenes.A number of these approaches are reviewed in the following subsections.

Gaussian Mixture Model
One of the challenging issues in background modeling is to model repetitive motions in the video such as the shining water, leaves of a branch, or a waving flag.
Stauffer and Grimson in [2] have introduced the Gaussian mixture model (GMM) to extract the statistics of repetitive moving objects often exist in outdoor scenes.This approach employs the finite mixture method [20] to estimate the background model.In finite mixture method,   can be estimated as the sum of c weighted kernels as below: wher denotes the weight for the i-th kernel e p i , is the probability density function with i  as the kernel density parameter.For each n give feature vector x, it is considered as the background if   f x   .In (8), the parameter should be set in a way that a hig ensity function be assigned to the samples her value d be long to the background.This issue causes the background and foreground pixels be classified with an acceptable accuracy.
The authors in [2] have used the EM as a valuable tool for optimizing problem.In using the EM technique, the number of kernel function must be given.In this approach, an initial estimate for the kernel parameters values is needed.After that, the parameters are updated following the new data value.The first step is to determine the posterior probabilities given by (9)., 2, , ; î where ˆij represents the estimated posterior prob ties that belongs to the i-th kernel, abili-

 
; , in (8) was assumed to be Normal with is the finite mixture estim at x j .robability in (9) provides the likelihood that a point s to each of the separate kerne ated Indeed, the posterior p belong l densities.We can use this estimated posterior probability to obtain a weighted update of the parameters for each kernel.Following the EM algorithm, updated parameters for the mixing coefficients, the means, and the covariance matrices are obtained as bellow [20]: Stauffer and Grimson have used the follo tionally simplified equations, as they can be implemented us x   wing computaing recursive technique in programming [2]:  

Non-Parametric Model
Previous methods assumed that the ey need to optimize the rs; but optimization is a parameters of the model are unknown, and hence, th initial values of kernel paramete time consuming operation.To overcome this problem, a non-parametric model was introduced in [1].In this approach, there is no need to optimize the parameters of each kernel.For modeling the messy and fast wiggling behavior, the model must be updated continuously in order to capture the fast changes in the scene background.
For describing this model, let 1 2 , , , N x x x  be a recent sample of intensity values for a pixel.The probability density function, which indi va cates the pixe d using the l intensity lue (x t ) at time t, can be estimate kernel estimator K as following: If we choose our kernel estimator function, K, to be a Normal function presents width of kernel nsity can be estim , where then th  e re de the bandated using bellow equation [11]: If we assume independency between the diffe or channels, and each color channel (j-th chann different kernel bandwidth value of rent colel) has a 2 i  , then the band width matrix would be: and the density estimation is reduced for each pixel, then ea-sure-methods to evaluate their performa ory requirement, consuming time, a ground modeling.In the following sections, we discuss the above factors in each approach.Experimental results of the last two factors are also provided.In the evaluation process, the video frames are assumed to be gray-scale and the size of video frames is (M*N) pixels.

Memory Requirement
In this section, we compare the memory requ the above described approach cal-based approaches, afterward approaches.

Statistical Approaches
The GMM algorithm has two phases: the updating phase, and the movi phase, the mean, variance, a with each kernel (K) are updated.In addition, a number of features (d) are considered for each pixel.In the following discussions, the mean, variance, and coefficient are verified for a frame of video in using this approach.
The following assumptions are considered in evaluating the memory requirement:  The number of total pixels in a frame is M*N.According to the two aforementioned assumptions, the order of memory requiremen is O(M*N*K) for each frame.
Eventually, it is concluded that the first phase of GMM algorithm needs O(M*N*K*d*d) units of memory.
The second phase of GMM algorithm extracts the moving objects.At this step, the values of mean, covariance, emory.According to the first phase, the memory requirement for this step is also O(M*N*K*d*d).
The second algorithm in statistical-based approaches is non-parametric background modeling.In this approach, memory requirement directly depends upon th nction.If the considered kernel function has a Normal distribution, the number of evaluating samples is equal to H, and the extracting feature dimension is equal to d for each pixel, then the required memory will have an order of O(H*M*N*d*d).

Non-Statistical Approaches
The non-statistical approaches of background modeling consider the backgro these approaches need M*N units el-data storage except the MABM technique, which needs L*M*N entries (L denotes the number of frames).Table 1 summarizes the memory requirement for all of the techniques discussed above.

Time Consumption
In this section, we compute the t ferent background modeling earlier, updating and extrac the two phases of the GMM algorithm.It is essential in  ompute by assum me co tion for multiplication and division operations, and neglecting the time consumption for addition and subtraction operations, then: 1) In ( 13), the order of time is O(d*d).
2) In ( 14), the order of time is O(d).
3) In (15), the order of time is O(d*d).4) In ( 16), the order of time is O(1).5) In (17), the order of time is O(d*d).
6) If the model has k kernels, then orde e for the updating phase would be O( xel, thus, order of time for each frame will be O(M*N*k*d*d).
We need to use (19) to extract the moving object from the video sequences.In this equation, the order of time is O(k*d*d) per pixel, therefore the time consumption of each frame will be O(k*d*d*m*n).
In non-parametric method, by considering K and A as respectively the number of kernels and the time consumption to compute the belonging of a pixel to each kernel, the order of time for each pixel and each frame will respectively be O(K*A) and O(M*N*K*A).The order of time for BMIT is a constant, hence of order O(1).For background modeling using IBBM, according to (3), the order of this function is O(M*N).The LTABM method uses (15) for background modeling and this function contains two multiplications and one addition operation.Consequently, the order of time for LTABM will be of order O(M*N).MABM algorithm, for constructing the model, calculates mean of L frames, hence the time consumption for this method is of order O (L*M*N).Eventually the order of time for RGABM method, according to (9), is of order O(M*N).
To intuitively evaluate the time consumption of different background modeling techniques, we applied them on a sample video containing 100 frames each with a dimension of 320 × 240 pixels.The experimental results have been provided in Table 2.The results demonstrate that the non-statistical methods are faster than statistical method; hence, these methods are suitable for real time applications.On the contrary, the consuming time of statistical methods (in particular, the non-parametric method) is much more than non-statistical methods; therefore, these methods cannot be used for real time applications.These results confirm the results in [5,10,16].

Accuracy
The last factor to be discussed in this paper is accu which is the abil jects from the background.For computing the accuracy, we define two error parameters.The first parameter is he number of object pixels misclasnd pixels over the total ber of object p the number of background pixels misestimated as object pixels over the total number of background pixels.Noise or illumination changes are the causes of FP error.
We have used four frames from two video sequences, one indoor and one outdoor, to evaluate accuracy of each method.The error rate parameters for the indoor video sequence are shown in Table 3.The FN results demonstrate that the lowest error rate belongs to MABM method; and the accuracy of GMM, LTABM, and RGAM methods are not acceptable.The high FN error rates express that parts of the objects are not extracted successfully, thus, extracted objects cannot be used for object indexing applications.The FP error rates demonstrate that the best result belongs to GMM and RGABM method, confirming that these methods are relatively safe against noisy conditions.The results from Table 3 indicate that BMIT, IBBM, and MABM are more suitable for indoor background modeling compared to the other approaches evaluated in this research.
Table 4 represents accuracy of

Conclusions
The background mode two main groups: st Approaches in the f ts for background modeling.Memory requirement and consuming time of these methods are very high; hence, statistical approaches are not suitable for real time applications.On the other hand, these methods can be suitable for outdoor environment if their parameters be properly tuned, because of their safe operation in noise and sudden change condition.However, non-statistical methods are very easy to implement, and memory requirement of these approaches is very low compared to statistical methods.In addition, time consumption of non-statistical methods is rather low; consequently, these methods are suitable for indoor environments and real-time applications.

Table 3 . Comparing accuracy of different backgr odeling
background of the outdoor video.The FN error rates of BMIT, IBBM and MABM methods are very high; therefore, the extracted objects using these approaches are incomplete.Consequently, these methods may not be suitable for real-world applications, which require completely identified objects.The results in Table4indicate that the FN error rate in GMM and RGABM are less than 1%, however these methods have a high FP error rate.The results indicate that any of the evaluated approaches