A Perceptual Video Coding Based on JND Model

In view of the fact that the current high efficiency video coding standard does not consider the characteristics of human vision, this paper proposes a perceptual video coding algorithm based on the just noticeable distortion model (JND). The adjusted JND model is combined into the transformation quantization process in high efficiency video coding (HEVC) to remove more visual redundancy and maintain compatibility. First of all, we design the JND model based on pixel domain and transform domain respectively, and the pixel domain model can give the JND threshold more intuitively on the pixel. The transform domain model introduces the contrast sensitive function into the model, making the threshold estimation more precise. Secondly, the proposed JND model is embedded in the HEVC video coding framework. For the transformation skip mode (TSM) in HEVC, we adopt the existing pixel domain called nonlinear additively model (NAMM). For the non-transformation skip mode (non-TSM) in HEVC, we use transform domain JND model to further reduce visual redundancy. The simulation results show that in the case of the same visual subjective quality, the algorithm can save more bitrates.


Introduction
Nowadays, high definition video is becoming more and more popular.However, the growth of storage capacity and network bandwidth cannot meet the demands for high resolution for storage and transmission.Therefore, ITU-T and ISO/IEC worked together to release a new generation of efficient video coding standard-HEVC [1].HEVC still follows the traditional hybrid coding framework and uses statistical correlation to remove space and time redundancy in order to achieve the highest possible compression effect.However, as the ulti-Q.M. Yi [2] has some visual redundancy due to its own characteristics.In order to get the perceptual redundancy, researchers have done a lot of work, of which the widely accepted model is the just noticeable distortion model.Video encoding based on perceptible distortion is mainly to use the human eye's visual masking mechanism.When the distortion is less than the human sensitivity threshold, the human eye is imperceptible [3].
In recent years, the JND model has received wide attention in the aspects of video image encoding [4] [5], digital watermarking [6], image quality evaluation [7] and so on.At present, several JND models have been proposed: the JND model based on pixel domain and the JND model based on transform domain.
For the JND model based on pixel domain, it usually considers two main factors including luminance adaptive masking and contrast masking effect.C. H. Chou and Y. C. Li [8] proposed the pixel domain JND model for the first time.The lager one of the calculated luminance adaptive masking value and contrast masking effect value was used as the final JND threshold.Yang [9] and others proposed the classical nonlinear additively masking model.The two kinds of masking effects were added together to get the corresponding JND values.To some extent, the interaction between the two masking effects was considered.To solve the problem of lack of precision in the calculation of the contrast masking value for the above methods, Liu [10] assigned different weights to texture region and edge region in the image through texture decomposition on the basis of NAMM model, which made the JND model have better calculation accuracy.
Wu [11] proposed a JND model based on luminance adaptive and structural similarity, which further considered the sensitivity of human eyes to different regular and irregular regions when computing texture masking.
The JND model based on transform domain could easily introduce the contrast sensitivity function into the model with high accuracy.Since most image coding standards adopt DCT transform, the JND model based on DCT domain has attracted much attention of researchers.Ahumada et al. [12] obtained a JND model of a grayscale image by calculating the spatial CSF function.Based on this, Waston [13] proposed the DCTune method, further considering the features of luminance adaptation and contrast masking.Zhang [14] made the JND model more accurate by adding a luminance adaptive factor and a contrast masking factor.Wei et al. [15] introduced gamma correction to the JND model and proposed a more accurate video image JND model.

Nonlinear Additively Masking Model
The NAMM model is simulated in pixel domain from the aspects of luminance adaptation and texture masking to obtain the JND threshold of pixel domain.
The JND estimation based on the pixel domain can be written as the nonlinear additively of the luminance adaptation and the contrast masking, as shown in Equation ( 1 value is, the stronger superposition between the adaptive background luminance and texture masking is.When C lt is 1, the superposition effect between the two factors is the greatest; when C lt is 0, there is no superposition effect between the two effects.In fact, the superposition is between the maximum and the minimum, where C lt is equal to 0.3.( ) T x y can be determined according to the visual threshold curve in Figure 1. (

, others 128
where ( ) Due to the characteristics of HVS itself, distortion that occurs in plain and edge areas is more noticeable than texture areas.In order to estimate the JND threshold more accurately, it is necessary to distinguish the edge and non-edge regions.Therefore, considering the edge information, the calculation method of the texture masking threshold ( ) where β is the control parameter and its value is set as 0.117. where,

( )
, k g i j are four directional high-pass filters for texture detection, as shown in Figure 2.

Improved JND Model Based on DCT Domain
A typical JND model based on DCT domain is expressed as a product of a base threshold and some modulation factors.Assume that t is expressed as the frame index in the video sequence, n is the block index in the tth frame, and (i, j) is the DCT coefficient index.Then the corresponding JND threshold can be expressed as: where ( ) , , , T n i j t is the spatial-temporal base distortion threshold, which is calculated from the spatial-temporal contrast sensitivity function; ( ) Contrast , , , a n i j t is expressed as a con- trast masking factor.

Spatial-Temporal Contrast Sensitivity Function
In psychophysics experiments, the visual sensitivity of the human eye is related to the spatial frequency and time frequency of the input signal.The contrast sensitive function is usually used to quantify the relationship between these factors.It is defined as the inverse of the distortion perceived by human eye, when the contrast changes.The spatial-temporal contrast sensitivity function curve is shown in Figure 3.If we consider the (i, j) th in the nth DCT block in the tth frame, then the corresponding CSF function can be written as: where ( ) , n t ν depicts the associated retinal image velocity; the empirical constant 1 k , 2 k and 3 k are set as 6.1, 7.3 and 23.0 c and 1 c control the magnitude and the bandwidth of a CSF curve; . i j ρ is the spatial subband frequency: where, x ϖ and y ϖ are the horizontal and vertical sizes of a pixel in degrees of visual angle, respectively.They are related to the viewing distance l and the display width Λ of a pixel on the monitor, as follows:

arctan , , 2
when Equation ( 7) is used for predicting distortion threshold due to spatial-temporal CSF, several factors needs to be considered: 1) the sensitivity modeled by Equation ( 7) represents the inverse of distortion threshold; 2) the CSF threshold represented in the luminance needs to be scaled into the gray levels for digital image; 3) since Equation ( 7) comes from experimental data of one-dimensional spatial frequency, for any subband, the threshold is actually higher than the one given by Equation ( 7), and therefore a compensating needs to be introduced for a DCT sub-band.With all consideration mentioned above, the base threshold for a DCT sub-band is determined as: where, max L and min L represent the display luminance values corresponding to the maximum and minimum gray levels, respectively; M is the number of gray levels, which is generally valued at 256; θ accounts for the effect of an arbitrary sub- band; r is set to 0.6.

Luminance Adaptive Factor and Contrast Masking Factor
The luminance masking mechanism is related to the brightness change in the image.According to Weber-Fechner's law, the minimum perceptible luminance of human eye shows a higher threshold in the areas with brighter or darker background brightness, which is called luminance adaptive effect.The calculation formula of the luminance adaptive factor is: where I represents the average brightness.
The contrast masking effect is an important perceptual property in the HVS, usually related to the awareness of a signal in the presence of another signal.
When the contrast sensitivity factor is calculated, the image is first detected by Canny edge, and the image blocks are divided into three types: plain, edge and texture region.Since the human eye is more sensitive to distortions that occur in plain areas and in edge areas, different weights need to be assigned to different areas.Based on the above considerations, the weighted factor for each classification block is determined by the following equation: in texture region a 1,in plain and edge region in texture regi 25 on d , an 16 where i and j are the DCT coefficient indices.
Taking the masking effect in the intra frame into account, the final contrast masking factor is: ( 0.36 contrast Lum , in plain and edge region

Evaluation of the Improved JND Model Based on Transform Domain
In order to verify the effectiveness of our proposed JND model based on DCT domain, we selected eight test images of different contents and complexities as shown in Figure 4 to carry out simulation experiments.Theoretical analysis shows that under a certain visual quality, the larger the threshold of the JND model is, the more visual redundancy will be excavated.Under the same injected noise energy, a more accurate JND model leads to better perceived quality.In order to verify the validity of the model, the thresholds calculated by the corresponding JND models are introduced as noise into the DCT coefficients: Journal of Computer and Communications Table 2 shows the comparison of the performance of the proposed algorithm and Chen's [17] and Bae's [4] schemes under different quantization parameters.
The experimental results show that compared with the algorithm of [4], the algorithm reduces the encoding bit rate by 4.3%, compared with Chen's algorithm, the encoding bit rate decreases by up to 7.58%.
In order to more intuitively show the bitrate reduction of each algorithm, served that compared with other method, more bit saving can be obtained by our method in most cases.It also can be seen that the smaller the QP value is, the more bits are reduced.This is because finer quantification will result in a larger JND threshold.

Figure 1
Figure 1 shows the curve of the background luminance and the visual threshold obtained from the experimental results.It simulates the background luminance model and shows the distortion threshold that the human eye can tolerate under a certain background luminance.
weighted average of gradients around the pixel at (x, y); ( ) , W x y θ is an edge-related weights of the pixel at (x, y), and its corresponding matrix W θ is detected by the Gaussian low-pass filter.

Figure 2 .
Figure 2. Directional high-pass filters for texture detection.

Φ
belong to the DCT normalization factor; , i j

Figure 6
Figure6shows the comparison of bitrates at different QP values.It can be ob-
et al.
DOI: 10.4236/jcc.2018.6400554 Journal of Computer and Communications mate receiver of video, Human Visual System

Table 1 .
PSNR between different models.

Table 2 .
Comparison of the performance of each program.
19 Journal of Computer and Communications