^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{*}

Aiming at the problems of image super-resolution algorithm with many convolutional neural networks, such as large parameters, large computational complexity and blurred image texture, we propose a new algorithm model. The classical convolutional neural network is improved, the convolution kernel size is adjusted, and the parameters are reduced; the pooling layer is added to reduce the dimension. Reduced computational complexity, increased learning rate, and reduced training time. The iterative back-projection algorithm is combined with the convolutional neural network to create a new algorithm model. The experimental results show that compared with the traditional facial illusion method, the proposed method can obtain better performance.

Image super-resolution is a classical problem in the domain of computer vision. It aims to infer an HR image with crucial information from the given LR images. Face hallucination is a branch of image super-resolution, which develops domain specific prior knowledge with strong cohesion to face domain. It was first introduced by Baker and Kanada [

The algorithm based on Interpolation is the most basic method in face super-resolution research, including the nearest neighbor interpolation, bilinear interpolation, bicubic Interpolation etc. The method based on reconstruction has a fast speed and made a little improvement in image quality. However, because it is limited by the original information of the image, the ambiguity caused by low resolution sampling cannot be removed. Feerman et al. [

Inspired by the above literature, we apply the deep learning theory to illusory face hallucination reconstruction [

Image acquisition process may be affected by motion blur, optical blur, signal aliasing caused by down sampling and all kinds of noise. The picture is polluted by all of the above. Elad proposed a matrix vector approach to describe low resolution image imaging models [

y = H D X w + n (1)

X represents high resolution images, y means low resolution images, and N represents additive Gauss noise. D, H and w denote the down sampling matrix, the fuzzy matrix and the geometric transformation matrix respectively. Hallucination face is the inverse process of face image degradation. The purpose is to give the low resolution image y to restore the original high resolution image X.

Convolutional Neural Networks are a biologically inspired variant of multi-layered perceptron networks (MLP’s), specialized for image processing. First popularized by LeCun et al. in they are similar to other hierarchical feature extraction methods such as the Neocognitron and HMAX.

The structure of a typical CNN consists of alternating layers of convolutional and pooling followed by an output classification layer. Each type of layer contains several feature maps, or groups of neurons, in a rectangular configuration.. The receptive field itself is simply a number of weighted connections, that is, each connecting edge has a weight. The group of weights applied by a neuron is called a weight kernel. A distinguishing property of these networks is that all neurons in a feature map share the same weight kernel. The idea behind this configuration is that a spatial feature detector should be useful across an entire image, instead of just at a particular location; for example a vertical edge detector. The convolutional layers in the network perform the majority of processing in these networks, with the feature maps in the pooling layers simply down-sampling their corresponding feature map in the convolutional layer [

In the past few years, the method based on deep learning has been improved and updated. It is not only applied to the image classification in the field of computer vision, but also from face recognition to semantic segmentation. Recently, deep learning method has also been applied to low level vision tasks, including image denoising, image enhancement, image super-resolution and so on. The seminal work of image super-resolution convolutional neural network (SRCNN) was done by Dong et al. [

The model is mainly composed of three volumes, which generally simulate a sparse layer. Three coiling layers accomplish the following tasks: patch extraction and representation, nonlinear mapping and reconstruction [

1) Patch extraction and representation: A patch is extracted from a low resolution image and each patch is represented as a high dimensional vector. These high dimensional vectors are composed of a set of feature maps, and the dimension of the vector is equal to the number of maps.

2) Non-linear mapping: this step nonlinearly maps each high dimensional vector onto another high dimensional vector. Each mapped vector is conceptually the representation of a HR patch. These vectors comprise another set of feature maps.

3) Reconstruction: this step aggregates the above HR patch-wise representations to generate the final HR image. This image is expected to be similar to the ground truth R.

The iterative back projection algorithm proposed by Irani is the representative method of the original image restoration [

y 0 = H x 0 + n (2)

If the x is equal to the original high resolution image and the upper analog imaging process conforms to the actual situation, the analog low resolution sequence y_{0} is the same as the actual low resolution image y, and if it is different, the difference between y and y_{0} is projected back to x_{0} for correction.

In IBP algorithm, HR image is obtained by utilizing the backward projection of the error projection matrix based on the difference between simulated LR images and the observed LR images with up-sampling, reverse blur filter and reverse motion transform [

E n = 1 N N ∑ K = 1 N H k B P ( y k − y ^ k ( n ) ) (3)

z ^ ( n + 1 ) = z ^ ( n ) + λ E ( n ) (4)

where z ^ and z ^ ( n + 1 ) denote the super resolution image gained from the (n)th and (n + 1)th iteration respectively, y ^ k ( n ) denotes the (n)th simulated LR images of z ^ ( n ) under the imaging degradation model, E ( n ) is the difference between the simulated LR images and the observed LR images, H k B P is the (k)th back projection operation and λ is the iteration step.

The reconstruction algorithm of the iterative back-projection algorithm is not outstanding, but it can be combined with other super resolution methods to improve the performance. In this paper, the super-resolution algorithm based on convolution neural network is improved and combined with the iterative back-projection algorithm [

In the convolution layer, we mainly consider the influence of the size and number of convolution kernel on the processing effect and processing speed of the model, Convolution neural network proposed by Dong et al. [

The larger the convolution kernel size is in convolution, the better the super-resolution effect, but it will also increase the corresponding computation [

resolution image is 0.1dB higher than that of the 3-1-3 model. Considering the processing results and computing speed, we select the 3-1-3 model.

The number of convolution kernel and convolution kernel size determines the super resolution effect together. In the super-resolution algorithm 9-1-5 model proposed by Dong and others, the first convolution kernel number and the second layer convolution kernel number improvement algorithm are tested in the 3-1-3 model for different size and the final selection n 1 = 64 , n 2 = 64 .

According to the selection of the number of convolution kernel [

At the same time, the influence of the amount of calculation and information on the speed of the image super-resolution is considered. The size of the patch is selected to be 33 × 33, and the size of the sub image is properly increased. Experimental results show that the increase of input block size improves training speed and shortens training time.

Dong et al. proposed to apply convolution neural network model to image super-resolution processing, there is no pool layer in the model. In addition to improving the size and number of convolution kernels, we also introduce pooled layers [

To make the pooling unit have translation invariance, that is, after a small

kernel number | n 1 = 64 , n 2 = 32 | n 1 = 64 , n 2 = 64 | n 1 = 128 , n 2 = 64 |
---|---|---|---|

PSNR | 33.57 | 33.65 | 30.14 |

translation, the image still produces the same characteristics as before. We can choose the continuous range in the image as the pool area and only pool the features generated by the same hidden unit, so the number of the pool layer input feature graph will not change, but the size of the feature graph will be reduced. This process is actually a down sampling process [

X j k = f [ β j k down ( x j k − 1 + b j k ) ] (5)

The whole model of the hybrid algorithm is divided into nine layers, including four coiling layers, two pool layers, two subsampling layers (one for lower sampling, the other for upper sampling), and the other for differential stratification [

1) The first five layers are the framework of the super-resolution algorithm of the convolution neural network (SRCNN model), which are mainly implemented in the five layers: the extraction and expression of the patch, the nonlinear mapping and reconstruction.

2) Down-sampling layer. This operation down-samples the image derived from the third layer. As a result, a LR version of reconstruction image is obtained.

3) Difference layer. This operation calculates the difference between down-sampling version of the original HR image and the corresponding counterpart we acquired above. The difference is treated as reconstruct-ion simulation error, and it also can be considered as a prior guidance that has been introduced

4) Up-sampling layer. This operation up-samples the simulation error to generate the simulation error of HR version.

5) Update layer. This operation performs a convolution with the above simulation error, and then the final HR image is updated based on the synthesis of the third layer’s result with the convolution version of simulation error.

Considering the test and contrast, the image degradation model we adopted is to generate low resolution images by sampling the original high-resolution images after Gauss blur. The initial high resolution image is input into the convolution neural network model of off-line training, and the final high resolution image is generated by the processing of network learning and optimization after the processing of the model parameters [

We choose a Bicubic [

Image | BI | NE | ScSR | SRCNN | Proposed | |||||
---|---|---|---|---|---|---|---|---|---|---|

PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |

1 | 27.50 | 0.798 | 28.30 | 0.803 | 29.27 | 0.825 | 30.23 | 0.834 | 31.01 | 0.871 |

2 | 28.91 | 0.837 | 28.36 | 0.844 | 29.89 | 0.874 | 30.59 | 0.882 | 31.87 | 0.912 |

3 | 28.01 | 0.823 | 28.81 | 0.832 | 30.61 | 0.853 | 31.02 | 0.875 | 32.23 | 0.891 |

4 | 27.04 | 0.812 | 28.36 | 0.829 | 28.69 | 0.843 | 30.22 | 0.862 | 31.35 | 0.886 |

5 | 28.70 | 0.833 | 29.09 | 0.838 | 29.21 | 0.853 | 31.21 | 0.880 | 31.93 | 0.879 |

6 | 32.48 | 0.897 | 32.56 | 0.898 | 33.45 | 0.905 | 34.00 | 0.914 | 34.08 | 0.917 |

Average | 28.77 | 0.833 | 29.25 | 0.840 | 30.197 | 0.859 | 31.21 | 0.875 | 32.08 | 0.890 |

than the competing methods, which validates that introducing image prior for face hallucination works well.

In this paper, by analyzing the training process of convolutional neural networks, we have made a series of improvements to the image super-resolution algorithm based on convolutional neural networks. Compared with the traditional algorithm, the results show that the improved algorithm has better reconstruction effect, higher edge sharpness and clearer picture. Improved convolutional neural network algorithm can achieve better results with less iteration and significantly reduce training time.

The authors declare no conflicts of interest regarding the publication of this paper.

Xia, J.F., Yang, Z.Z., Li, F., Xu, Y.D., Ma, N. and Wang, C.X. (2018) Human Face Super-Resolution Based on Hybrid Algorithm. Advances in Molecular Imaging, 8, 39-47. https://doi.org/10.4236/ami.2018.84004