Evaluation of Impairment Caused by MPEG Video Frame Loss

This article presents a study on the impact of video frame losses on the quality perceived by users. Video compression standards, such as MPEG, use a sequence of frames called Group of Pictures (GOP), which is a video compression method which a frame is expressed in terms of one or more neighboring frames. This dependence between frames impacts directly in the quality because a loss of a reference frame prevents the decoding of other frames in GOP, thereby reducing the user-perceived quality. The assessment of quality in this article is estimated by Peak Signal Noise Ratio (PSNR), which compares the original and the received images. Computer simulations were used to show that the degradation on the quality may vary for different patterns of GOPs and type of lost frames.


Introduction
Multimedia transmission systems represent a significant portion of the use of current telecommunication systems.The evolution in software and hardware technologies for data transmission allowed the improvement of new multimedia services like IPTV (Internet Protocol Television) and VOD (Video on Demand) [1] [2].The advances and emerging technologies have been gaining a great deal of space in the development of services and applications.Among these services, it is possible to highlight the video streaming, which has a high bandwidth consumption [3] [4].This kind of services, like IPTV and VOD, became very popular in the last years, generating an enormous amount of data, with video streaming being the most popular one [5], running in devices such as smartphones, desktops, wire-less computer, and tablets [6].With the increase of data being transmitted, the actual systems have to be upgraded to guarantee the quality of the service.The user-perceived quality of video streaming applications is very sensitive to delay, loss, and throughput.Quality of service (QOS) involves the totality of characteristics of a telecommunication service that bears on its ability to satisfy stated and implied needs of the user of the service [7].The most important QOS parameters are the delay, the packet loss probability, the jitter and throughput [8].
The telecommunications companies had focused in the user-perceived quality, mainly because the customer experience has become one of the most important factors in competitive market environment [9].The analysis of the image quality during data transmission can help to estimate user-perceived quality of video streaming.A better understanding of the effects of frame loss in the user-perceived quality can be used to improve the network configuration [10].
Mean Opinion Score (MOS) is a subjective measure of user-perceived quality that gives a numerical indication of quality of media received, where 1 is worst and 5 the best possible quality.However, subjective methods like MOS have high costs, once that the requirements to implement the test environment are expensive [11].In the other hand, objective methods use tools and statistical approach to evaluate the quality.The most used objective methods are: Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM) and Video Quality Metric (VQM) [12].MPEG-4 is a standard video coding format that uses: sub-sampling, spatial and temporal compression.The sub-sampling decreases color information that is not noticeable by human eyes.Spatial compression uses the redundant information within the frame.Temporal compression compares the changes between frames of GOP and stores the data that represents only the changes [13] [14] [15].
A GOP is composed of three types of frames: the I-frame (Intra) is encoded without any references to other frames and use only the spatial compression; P-frame (Predictive) is encoded using as base the previous I-frame or P-frame; and B-frame (Bi-directionally) that uses information from an earlier I-or P-frame together with a next I-frame or P-frame as reference for its encoding [16] [17].
The GOP always starts with an I-frame, followed by P-and B-frame, as showed in the Figure 1.To represent the GOP sequence of frames, it is common to use the notation (M, N), where M represents the number of frames per GOP and N is the number of consecutive B-frames.Once that I-frame is used as base for the other frames, an impairment in an I-frame is propagated to the subsequent Figure 1.GOP sequence of frames.frames of the GOP [18].This paper shows the impact of frame loss in the video streaming quality, using different settings of GOP for video encoded using MPEG-4 with part 10 [19].Performance evaluation was made using computational simulations, evaluating the quality with PSNR.
Besides this introductory section, this paper is organized as follows: Section 2 presents the related works; Section 3 shows quality evaluation using PSNR, followed by the Section 4 that explains the methodology, the parameters, and software tools used.Section 5 shows the results and discussion.Section 6 gives a short conclusion and an outlook on future work.

Related Works
The problems caused by packet loss in video playback are called artifacts [14] [20].The main artifacts are the slice and the blocking or pixelization error.In [21] is showed a study on the quality degradation due the loss of I-frame, and the evaluation was done using an small resolution format know as Quarter Common Intermediate Format (QCIF).The metric used was PSNR, but is used only one GOP configuration that was not informed in the paper.
In [22] is analyzed the quality degradation of the loss of P-frames, with video quality estimated using VQM.The authors do not consider the effects of loss of I-or B-frames.The SSIM is used in [23] to indicate the loss of the full I-, P-, and B-frame, however does not show the impact from the loss of specific frame in GOP.
A study of the quality degradation is showed in [24] using four videos (Foreman, Akiyo, Coastguard, Football and Tennis) encoded with a GOP (15.2) and MPEG-2.The tests were performed in using IPTV/VOD configurations and random destinations on the internet.Results show that videos with high motion patterns are the most affected in quality with packet loss.
In [25] is proposed a method to set an optimal GOP configuration to maximize the encoding efficiency and improve quality of video streaming.Results show that the use of larger GOP length results in better user-perceived quality.The number of B-frames between two reference frames such as I-or P-frames was investigated in [26].According to the results the number B-frames of GOP should be between 1 to 4 to improve quality, while in [27] states that this number should be varied from 0 to 2.
The PSNR is used in [28] to estimate the quality of video.Different Variable Bit Rate (VBR) and Constant Bit Rate (CBR) settings were used to analyze the quality degradation caused by frame loss.This research also uses H.264 video compression.However, this work does not delve into the issues of video characteristics and the structure of GOP.
None of the previous works studied the effects of frame loss in using different GOP configuration and specific frame in the GOP.Another important aspect is the fact that the I-frame will be sent using several packets.The loss of a packet in the beginning or end of frame results in different impairment on the video quality.
From that, it was identified a lack of analyses and tests in different GOP con-figuration, modern video codec's and loss of specific frames that allows a better understanding of the impact in the user-perceived quality caused by frame loss.

Peak Signal to Noise Ratio
PSNR is an objective method used for quality evaluation that uses the relationship between the maximal possible value of the signal and the power of corrupting noise that affects the quality of the received image.This method is classified in a category called Full Reference (FR), which indicates that both original and received images are available for evaluation [29] [30].
The PSNR uses the Mean Square Error (MSE), evaluated as: where M and N represents the width and the height of the frame, respectively.The horizontal and vertical coordinates are represented respectively x and y.The The PSNR is obtained from: where max 2 1 n L = − is the biggest value that a pixel can have and n is the number of bits per pixel.
The mapping between the PSNR, SSIM and MOS are showed in the Table 1.

Methodology
PSNR evaluation was done using Evalvid [32], which is an open source tool developed by the Berlin University.The video was encoded using ffmpeg [33], an open source multimedia framework able to encode, decode, transcode, stream, and play video using MPEG standard.

Video Test Sequences
The video test sequences used in simulations are openly available and are part of a library used for research and projects related to video transmission and encoding.The video used were: Coastguard, Football, Akiyo and BlueSky.
The identification of each frame in GOP is important to analyze the impairment caused by frame loss.In this paper, the fallowing notation was used: GOP Twelve different models, of frame loss for each GOP were studied.The Table 3 presents the test setup.Twelve different scenarios of frame loss were used.
Models M 2 and M 3 consider the loss of first 50% and last 50%, respectively, of the I-frames packets.This approach allows the investigation of the impairments caused by a burst of loss in the I-frame.The modern network packet loss models indicate that loss is not random, but in burst [36].The frame loss occurs sequentially during the transmission of all the videos.
The video quality was measured using PSNR with video encoded with H.264/AVC codec using a selective frame loss generator.Initially the original video was encoded in MPEG-4 and submitted to frame loss generator as showed in the Table 3, finally the resulting video was compared with the original using the PSNR.

Results and Discussion
The results are showed in three different phases: first is presented a comparative of the number of frames generated for each test setup used; second, the frames size is verified; and finally, is presented the resulting PSNR for each test.
Table 4 presents the number of I-, P-, and B-frames of all video test sequences.
Figure 2 shows the maximum possible value of PSNR for each video analyzed.
It's possible to notice that different GOP configuration leads to different PSNR maximum values.For example, in the Akiyo maximum PSNR slightly as the GOP length increases, while the other CIF videos kept maximum PSNR constant.The video Blue Sky, on the other hand, reduces maximum PSNR value as the GOP length increases.The variation GOP length leads to a variation in the number of frames generated.Figure 3 indicates the relationship between the total size in Mbytes (represented by bars) and average size in Kbytes (represented by lines) of all video test sequence and GOP used.In videos with low motion pattern, like Coastguard and in special the Akiyo, the spatial compression is higher, resulting in bigger I-frames.The size of I-frames depends on the quantization matrix used in the spatial compression.For the video Akiyo is observed that the total size of I-frames is much bigger than the P-frames and B-frames for all GOP options.
The average size of the frames presents the same behavior for all GOP studied.
The quality evaluation per video and frame loss model is showed in the Figure   Considering the loss of P-frames in a GOP, in all cases can be observed that the loss of last P-frame results in more severe impairment if compared with the loss of the first P-frame.This happen because the P-frames are encoded using a previous I-or P-frame.Thus the lost of the first P-frame prevents the decoding of the next P-frames.
For the other loss models, the increase of GOP length results in lower average PSNR.The effects of spatial compression in combination with GOP setting will be object of future research.

Conclusions
The search for network systems that lead to quality improvement in video The motion pattern also plays a fundamental role in user-perceived quality decrease resulting of frame loss if temporal compression is more efficient there are more packets of I-frames than other frame type, and the loss of I-frame impairs the user-perceived quality differently.In video with high motion pattern, with more packets carrying P-and B-frames than I-frames, depending on GOP length the loss of P-or B-frames leads to a bigger average PSNR decrease if compared with low motion pattern videos.
original frame is represented by ( ) , f x y and the received frame by ( ) , g x y .

4 .
It's possible to notice that the frame loss models M 1 , M 2 and M 3 , loss related to the loss of I-frame, results in smaller values of PSNR if compared with the other models, with loss of B-and P-frames.Comparing the PSNR between the models M 1 and M 2 , the loss of the first half of the packets of I-frame have the same result as losing the entire frame.In the model M 3 the loss of second half of I-frame results in severe impairment, but not prevent the video decoding as the M 1 and M 2 loss configuration.In M 4 setup, the total loss of P-frame packets impairs the quality in different ways depending on the GOP configuration, with exception for the video Akiyo.However, the worst PSNR was observed in GOP (9.2) or (12.2).For the same GOP configuration video Coastguard presents a low PSNR.In the video Football and BlueSky, the impairments caused by the loss of P-frames increase for bigger GOP lengths.Still, video Coastguard and BlueSky presents nearly the same average PSNR with the loss of all packets of P-frames and the second half of I-frame packets.However, in video Football results showed that the loss of all P-frames leads to lower average PSNR if compared with the loss of the second half of the I-frames packets.

Figure 3 .
Figure 3.Total and average size for each frame type.
streaming is an important area of research.The loss of different types of frames impairs the user-perceived quality uniquely.From the simulations, it was possible to identify behavior of average PSNR decrease caused by frame loss.The loss of an I-frame results in worst video quality if compared with the loss of P-or B-frames.The loss of the second half of the I-frames leads to a better average PSNR if compared with the loss of the first half.Other results showed the loss of result in worst average PSNR if compared with the loss of B-frames, but the impairment level depends on the position of P-or B-frame in the GOP sequence.

Table 1 .
[31]ing between PSNR, SSIM and MOS[31]. ) at a rate of 24 fps.In all cases it was used a Maximum Transfer Unit (MTU) of 1460 bytes.The Table2summarizes the main characteristics of the video test sequence used.

Table 2 .
Main characteristics of CIF and HD videos used for simulation and testing.

Table 3 .
Test setup for frame loss in each GOP.

Table 4 .
Number of I-frame, P-frame, and B-frame for GOP configuration.
Figure 2. Maximum values of PSNR.