^{1}

^{*}

^{1}

^{1}

With the development of the economy and the surge in car ownership, the sale of used cars has been welcomed by more and more people, and the information of the vehicle condition is the focus information of them. The frame number is a unique number used in the vehicle, and by identifying it can quickly find out the vehicle models and manufacturers. The traditional character recognition method has the problem of complex feature extraction, and the convolutional neural network has unique advantages in processing two-dimensional images. This paper analyzed the key techniques of convolutional neural networks compared with traditional neural networks, and proposed improved methods for key technologies, thus increasing the recognition of characters and applying them to the recognition of frame number characters.

In the 2018 Boao Forum for Asia (BFA) annual meeting, auto industry has been mentioned twice during the speech given by China Chairmen, Mr. Xi Jinping. With the development of the auto car science and the rising oil price, the second hand market is also rapidly developing. During the procurement of auto cars and with the applying of frame number identification, it will be quick to match the data of service provider and make all the information of the said cars clear, including accidents and illegal driving. The buyer could know the statues of the cars anytime and anywhere after the auto vehicle database is established [

In the current character recognition algorithms, it may turn out to be under fitting when multi-layer neural network is applied due to the insufficient layer of the traditional neural network. With the addition of Feature Learning to the multi-layer neural network and the unique advantages of image recognition, the Convolutional Neural Network (CNN) technology is widely studied in the field of computer vision, model matching and pattern recognition. By the CNN character recognition, not only has been the extract features processing mechanisms in the human visual system simulated to extract underlying image feature information, but also the end-to-end feature extraction has been utilized to enhance the generalization ability and avoid the inaccuracy due to the image morphing.

The traditional multi-layer neural network includes an input layer, an output layer and many hidden layers between them. Every layer consists of some nerve cells. In the neighboring layer, every cell in the latter layer connects with every cell in the former layer respectively. In the image identification, input layer stands for feature vector and every cell of this layer stands for every eigenvalue.

In the vehicle frame number character identification, every nerve cell in the input layer stands for pixel gray values of every frame number image. But several problems exist. On the one hand, space structure is not considered while the position of the frame number is hidden and identification efficiency is limited. On the other hand, there are too many nerve cells in the neighboring layers resulting in a limited training speed.

CNN can be used to solve the problems in the traditional multi-layer neural network. The CNN can be trained faster due to its special structure aiming at the image identification. Thanks to the high efficiency, it makes the multi-layer neural network easier and multi-layer has a great advantage in the accuracy rate of identification. In this way, the usage of CNN, as one of the deep learning algorithms, in the field of frame number and license number identification and traffic management is attracting attention and updating and improving.

The CNN consists of input layer, hidden layer and output layer while the hidden layer consists of several convolutional layer and subsampling layer. The structure of the CNN is as

In

map. C2 and S2 will repeat the operation of C1 and S1. That the hidden layer of CNN has all the same structure of C1 and S1 as well as the structure of convolutional and pooling layers is used to extract features lowers the resolution, increases the quantity of feature map generated and obtains more feature information. The vehicle information can be obtained by the 1D matrix, expanded from the last pooling feature map and fully connected with the out-layer [

Comparing to the creature visual cell’s local sensory, the convolution will be made by local filter, or the output result will be taken as the dimensionality corresponding value of the convolutional output matrix. In the CNN’s structure, in addition to the size of convolutional kernel, not only will the quantity of convolutional kernel affect the precision but also the selection of the activation function will determine the time efficiency of algorithm. To give a better representation of data, the convolutional layer usually provides more than one such local filter and forms multi-output-matrix. The size of every matrix is ( n − m + 1 ) and the details of arithmetic process is as Formula (1):

x k l = f ( ∑ i ∈ M k x i l − 1 ∗ H i k l + b k l ) (1)

PS: 1 stands for the number of model layer; H stands for convolutional kernel; M_{k} stands for the k-th feature map of 1 − 1 layer: b stands for the biasing of output plots; f stands for activation function.

In theory, the smaller convolutional kernel is, the more exquisite the extraction feature information. However, the practical image may be polluted in different levels by noise, that is to say, some rambling information can be easily extracted if the kernel is too small [^{ }

The function of the pooling layer is to lower the matrix dimensionality and does not damage to the inner link of the data. This layer can be established by the way of average or maximum value. The input of this layer come for the former convolutional layer and the output can be treated as the input of the latter convolutional layer [

Suppose input feature map matrix is F, subsampling pooling condomain is c × c matrix P, biasing is b_{2}, the subsampling feature map is S and c stands for pooling movement step length. The dimensionality of average pool is being lowerd by local average value calculating. The details are as Formula (2):

S i j = 1 c 2 ( ∑ i = 1 c ∑ j = 1 c F i j ) + b 2 (2)

The maximum pooling calculating formula is as Formula (3):

S i j = max i = 1 , j = 1 c ( F i j ) + b 2 (3)

PS: max i = 1 , j = 1 c ( F i j ) means taking out the maximum element from the c × c size pooling codomain of input feature map F.

The traditional Convoluational Neural Network directly takes the gray scale image as the original data and inputs them into the network for training and recognition. Mr. HeSF’s team proposes an improved plan with multipath input [

Carry out the following experiment: select 3000 training pictures of the frame number and 6000 test pictures. The single-path CNN includes two convolutional layers and two pool layers, input the normalized frame number grayscale image. The multi-path CNN input the image calculated by the Sobel operator, the three CNN models are the same as the single-path model. The error rates of the two algorithms are shown in

It can be seen from the experimental results that the multi-path CNN has a lower recognition error rate than the single-path in frame character recognition.

As an important part of the CNN, the function of activation function is to adjust the output of convolution layer and make the feature extraction result of every layer meet the requirement of human vision. Sigmoid function and hyperbola tangent function are common and they can only adjust the output rang [

h ( i ) = max ( w ( i ) T x , 0 ) = { w ( i ) T x 0 w ( i ) T x > 0 w ( i ) T x ≤ 0 (4)

The value of convolution calculation will be assigned to be 0 if it is less than 0 while stays the original value. Though this method forces some data to be 0, the experiment has proved that CNN can fully suit such sparse constraint and identification efficiency gets significantly improvement, which proves ReLU is able to guide sparse in a certain level.

The average pooling model and maximum pooling model are the most common classical model, feature extraction with which will damage the precision and presentation of the global feature. Based on the maximum pooling algorithm, a dynamic self-adaption pooling model is proposed to improve the pooling model [

S i j = μ max i = 1 , j = 1 c ( F i j ) + b 2 (5)

This is basic representation and its nature is to use μ to optimize the maximum pooling algorithm.

For μ, there is a Formula (6) as follows:

μ = ρ a ( v max − a ) v max 2 + θ (6)

The a is the average of the element in pooling codomain except the maximum value; v max is the maximum value in pooling codomain; θ is alignment error factor, ρ is characteristic coefficient:

ρ = c 1 + ( n e p o − 1 ) c n e p o 2 + 1 (7)

ρ is determined by side length of pooling codomain c and iterations n e p o .

When the pooling codomain and iteration cycle remain unchanged, μ can value automatically. μ can dynamically adjust to be the optimal according to the different iterations n e p o when it is faced with the same pooling codomain. μ ∈ ( 0 , 1 ) , this gives consideration to maximum and average pooling and can keep the precision when processes the pooling codomain with a clear maximum feature. What’s more, CNN can guarantee the precision of feature extraction when it is faced with different iterations n e p o and different pooling codomain as it can weaken the effect of the maximum pooling when it processes the rest codomains [

The frame number is generally composed of 10 digits about 0 - 9 and 23 letters about A - Z (except I, O, and Q), there are 33 characters to be recognized, so set the number of output layer nodes to 33.

Selected 5000 pictures as the training set from the frame number database. Firstly, used the principal component analysis technique to reconstruct the image. Then used BP neural network, traditional LeNet-5 CNN and the improved CNN to analyse the same character sets. The experimental results are shown in

It can be seen from the above results that the LeNet-5 CNN has obvious advantages in recognition rate compared with the traditional BP Neural Network. Although the Improved CNN and the LeNet-5 CNN are similar, the former improved the recognition rate at all.

In summary, the principle of CNN is briefly introduced. Then the key technical points of CNN are analysed, and the optimized scheme is proposed respectively.

In the frame character recognition, one of them can be used to establish the system, or combine these improvements to create a more perfect model. As one of the widely used deep learning algorithms, the excellent performance in the field of image identification has already assured the ability in feature extraction [

The authors declare no conflicts of interest regarding the publication of this paper.

Li, H.M., Liu, Y.X. and Wang, Y. (2018) Optimization of Convolutional Neural Network for Recognition of Vehicle Frame Number. Journal of Computer and Communications, 6, 209-215. https://doi.org/10.4236/jcc.2018.611020