Predictive Block-Matching Algorithm for Wireless Video Sensor Network Using Neural Network

This paper proposed a back propagation neural network model for predictive block-matching. Predictive block-matching is a way to significantly decrease the computational complexity of motion estimation, but the traditional prediction model was proposed 26 years ago. It is straight forward but not accurate enough. The proposed back propagation neural network has 5 inputs, 5 neutrons and 1 output. Because of its simplicity, it requires very little calculation power which is negligible compared with existing computation complexity. The test results show 10% 30% higher prediction accuracy and PSNR improvement up to 0.3 dB. The above advantages make it a feasible replacement of the current model.


Introduction
A good wireless multimedia sensor network (WMSN) must be energy efficient, for the possibility of operating in harsh environment and lack of maintenance [1] [2] [3].However, video encoding requires massive amount of calculation power.
Taking H.264 standard as an example, processing an HDTV (720P) video signal in real time requires a 3600 GIPS processor with 5570 GB/s memory band-width [4] [5], which exceeds the computing power of most desktop processor.Apart from implementing more advanced processers with higher energy efficiency ratio, which will increase the hardware cost, using more efficient encoding algorithm is a more Z.G. Yan  For a video encoding system, predictive coding is the key part and requires more calculation than other parts such as entropy coding.Motion estimation plays an important role in predictive coding [6] [7].Therefore, in the field of video stream transmission for WMSNs, an energy-efficient block-matching algorithm is of high importance.There are variety of block-matching search algorithms for motion estimation, such as three-step search, four-step search, diamond search, modified diamond-square search and gradient descent search, etc. [6] [7].Prediction technique proposed by Zhang and Zafar [8], [9] uses a simple prediction model to reduce the search windows size for block-matching, therefore significantly reduced the calculation complexity of motion estimation, with only small sacrifice on image quality.It has been widely used by many researchers in different block-matching algorithms [10] [11] [12].The prediction model proposed by Zhang and Zafer is straight forward, simply assuming that the current block has the same motion vector as the block at the same place in the previous frame or around it in the same frame.However, in our test the model did not provide high prediction accuracy.Instead of taking a mean value, we believe the reference motion vectors have certain weights.Therefore, we thought of artificial neural network (ANN).
An ANN mimics somewhat the learning process of a human brain.Instead of complex rules and mathematical routines, ANNs are able to learn the key information patterns within a multidimensional information domain [13] [14].In this report we aim to use a simple back propagation neural network instead of the traditional prediction model, to improve the prediction accuracy.The results show that the proposed neural network can significantly improve the prediction accuracy and leads to better image quality.Because of the simplicity of the neural network, the required calculation is little.

Motion Estimation
The principle of block-matching motion estimation is to find out the movement of each microblock compared with the previous frame.Getting motion vector is the first step to do predictive encoding for video compression.To find out the movement information, the current frame is divided into 16 × 16 pixels microblocks.Each microblock is compared to the reference frame in a search window of a certain size, therefore locate a microblock which has a minimum MAD (mean absolute difference) value as is shown in Figure 1.The lengths of the microblock moved along the x axis and y axis in pixels are the motion vectors.

Three Step Search
Three-step search (TSS) algorithm was first introduced in 1981 by Koga et al. [15].It is a classic search algorithm for motion estimation.It is an efficient algorithm which needs only 3 three steps to finish the search in a search window Step 1: The step size is set as half of (search window size −1).Together with the initial point, the SAD values of 8 search points at the distance of step size are compared.
Step 2: The point which has the minimum SAD value is picked as the new initial point, the step size is halved.
Step 1 and 2 are repeated until the step size is smaller than 1.
Figure 2 shows the search pattern of TSS algorithm.

Enhanced Modified Orthogonal Search
Orthogonal search (OS) algorithm was introduced by Soongsathitanon et al. [16] in 2005.The search pattern is horizontally and vertically conducted alternatively (Figure 3(a)).It may be considered as improved TSS algorithm, which takes less number of search points.However, this algorithm has the same disadvantage as TSS does: it is inefficient for small motion estimation.
Modified Orthogonal search (MOS) algorithm was introduced by Metkar et al. [17]    Step 1: An extra small diamond search pattern around the initial point is added in addition to the original horizontal search points in the first step of OS algorithm.
If the initial point (0, 0) of the search window has the minimum SAD value, the block is assumed to have zero movement and search will be terminated.Otherwise, the point which has the minimum SAD value is selected as the new initial point.
Step 2: Step size is halved if the minimum SAD point is any one of the points in the small diamond pattern. 2 new search points at a distance of step size in the horizontal direction from the initial point together with the initial point itself are compared.The initial point is moved to the winning point.
Step 3: Two points in the vertical direction at a distance of the step size from the initial point are compared with the initial point.The initial point is moved to the winning one.
Step 4: Halve the step size until it is smaller than 1, etc.
The early termination of the algorithm at step 1 helped to reduce the computations significantly also retain the simplicity and regularity of EMOS algorithm.
The number of search points needed to find the motion vector for EMOS as compared with the MOS, OS and TSS algorithms.

Predictive Block-Matching Motion Estimation Scheme
Predictive block-matching motion estimation scheme was introduced in 1991, namely predictive pattern search (PPS) [8] [9].PPS has two prediction modes: inter-block prediction and inter-frame prediction.Inter-block prediction assumes neighboring blocks usually have the same or similar moving direction and distance.Therefore, via the reference of a nearby motion vector, the search window for the current block can be reduced, hence reduces the number of search points.
Inter-frame prediction assumes a current block moves to the same or similar direction at the same or a similar distance compared with the corresponding block in the previous frame.Figure 4 demonstrates the way inter-block and inter-frame prediction works.
Predictive Enhanced Modified Orthogonal Search (PEMOS), performs both inter-block and inter-frame prediction.The estimation of the current vector can be calculated using an autoregressive model given by [8] [9]: where { } , p q α is a set of prediction coefficients and θ can be a causal, semi- causal or noncausal set defined by [8]: According to [8] and [9], for inter-block prediction, a casual model is used.
For inter-frame prediction, any one of the three might be used.
Predictive search scheme has been added to different search algorithms and proved effective.For example, [10] proposed predictive three step search, which adds predictive block-matching scheme to three step search.The results show that predictive three step search significantly reduces computational complexity, with negligible decrease on image quality.
Another example is [11], which added predictive block-matching scheme to EMOS.EMOS is believed to be one of the most efficient block-matching algorithms [6].According to [11], the proposed algorithm used Zhang and Zafer's prediction model to estimate the motion vectors, therefore the first step of EMOS can be skipped, i.e., a smaller search window is used.If the estimated motion vector is very small, an even smaller search pattern will be used.As a result, PEMOS can reduce computational complexity up to 30%, also with very slight decrease on image quality.

Back Propagation Neural Network
Rumelhart and McCelland published a book namely Parallel Distributed Processing [19], discussed the algorithm Error Back Propagation in detail.Today back propagation neural network has become one of the most widely used artificial neural network models [13].It is widely used in the field of function approximation, pattern recognition, data mining, system identification, automation technology and so on [14].
Figure 5 shows a typical single hidden layer back propagation neural network For the hidden layer, we have: For the output layer, we have: where In Equations ( 2) and ( 3), the transform function f (x) can be sigmoid functions similar as Equations ( 7) or ( 8), according specific applications.

Proposed Neural Network for Predictive Block-Matching
A back propagation neural network model with 5 inputs, 1 hidden layer which contains 5 neutrons, and 1 output is used in this report (Figure 6).

Results and Discussion
In order to find out the effectiveness of neural network prediction, three block-matching algorithms introduced in Section 0were tested, respectively full search, three step search and enhanced modified orthogonal search.We recorded prediction accuracy with different prediction schemes, image quality in terms of peak signal to noise ratio (PSNR) and energy efficiency in terms of average search points required per block.
In our test, for traditional prediction scheme, the mean value of inter-block and inter-frame prediction is used for the estimation of motion vector ( )  .Several videos were tested using different block-matching search algorithms.
Table 1 shows the prediction accuracy of the proposed back propagation neural network compared with traditional prediction scheme.Tables 2-7 illustrate average search points needed per block, and the average peak signal-to-noise ratio (PSNR) which represents the image quality for different video and different search algorithms.Table 3 and Table 6 show that using prediction schemes does have negative effect on image quality, causing the drop of average PSNR.Videos with larger movements such as "Bus" and "Stefan" appears to be more affected compared with videos with smaller movements such as "Akiyo" and "Claire".However the average search points decrease to around a quarter using predicting schemes.
Compared with traditional prediction scheme, neural network prediction can provide slightly higher image quality, because of its higher prediction accuracy as is shown in Table 1.
Using predictive block-matching schemes on three step search and EMOS shows similar results, as is illustrated in Tables 4-7.Orthogonal search (OS) can be seen as an improved version of TSS, and EMOS is improved OS. EMOS is already one of the most efficient block-matching algorithms.However, using predictive schemes is still an effective way to further improve its efficiency.Using neural network to predict the motion vector again is more accurate than the traditional approach, and leads to higher image quality without adding too much computation complexity.

Future Work
As the proposed back propagation neural network requires little calculation, in the future, implementation on real-time system might be carried out.The simplicity of the neural network model makes it not difficult to be implemented on either FPGA systems or embedded systems with general processors.Its performance in practice is to be found out and compared with the MATLAB simulation results after that.

Conclusion
This report proposed a predictive model for block-matching based on neural network.This model is a back propagation neural network with 5 neutrons, aiming to improve the prediction accuracy of predictive block-matching schemes, with little computation cost.The results approve that the proposed model is effective.Firstly, Compared with the traditional predictive scheme, neural network shows great advantage in prediction accuracy, especially when the video has large movements.
In addition, we did three sets of tests, respectively: without using prediction,
in 2010, meant to solve the small motion estimation problem.It adds 8 new search points as a 3 × 3 square around the initial point in the first search step (Figure 3(b)).The added 8 search points improve its performance on small motion estimation.Enhanced modified orthogonal search (EMOS) algorithm was introduced by Pandian et al. [18] in 2011.It replaced the 3 × 3 square by a small diamond search pattern (Figure 3(c)).EMOS further improved the motion estimation performance compared with MOS, especially in reducing search points.

Figure 5 .
Figure 5.An example of back propagation neural network.
et al.

Table 1 .
Prediction accuracy comparison between neural network and traditional method.

Table 3 .
Average search points per block with full search (search window ±7).

Table 5 .
Average search points per block with three step search (search window ±7).

Table 7 .
Average search points per block with EMOS (search window ±7).

Table 8
illustrates the computational complexity of the neural network for each block.Take EMOS as an example, assuming number of required search points per block is 7.The required numerical operation includes 3584 add or sub and 1792 absolute value for each block.The additional calculation complexity for the neural network is negligible compared with the existing calculation.Considering the improvement on prediction accuracy and image quality, it is very feasible to use neural network for motion vector prediction.

Table 8 .
Computational complexity of back propagation neural network.