Underwater Inhomogeneous Light Field Based on Improved Convolutional Neural Net Fish Image Recognition ()
1. Introduction
With the rapid development of aquaculture industry, aquaculture enterprises lack the means to observe the growth of domestic fish all day and quickly distinguish the types of domestic fish. The application of image recognition is conducive to improving the speed and accuracy of identifying and distinguishing domestic fish species. However, there is still room for the development of the accuracy and speed of the existing image recognition technology in the underwater environment of non-uniform light field. The observation of non-uniform light field in aquaculture waters is inconvenient due to excessive feed residue, media in water, surface ripples, fish occlusion, and turbidity in water. In the case of artificial observation of juvenile fish and mature domestic fish, it is difficult to obtain high recognition for biological observation of underwater non-uniform light field with eyes.
And the traditional image recognition can not meet the high precision and uninterrupted demand in the non-uniform light field environment. The application of image recognition technology based on deep learning plays an important role in the improvement of aquatic quality and the intelligent development of aquaculture industry. The efficiency of artificial fish detection is lower than that of image recognition detection technology [1] . The traditional recognition method uses shape features [2] and texture information [3] [4] to classify fish images. Traditional recognition methods need to first manually segment the position of the fish subject in the image, and then classify the fish subject based on the segmentation. For example, 3179 underwater image data of 10 kinds of fish were collected based on the balance guarantee optimization tree algorithm [5] , and the recognition rate is better than the classification method based on spots, stripes and morphological characteristics. The recognition method of least squares support vector machine model [6] can achieve an recognition rate of about 90%. The classification method of SIFT feature and principal component analysis [7] achieved 92% accuracy on a data set containing 162 fish pictures in 6 categories. Based on the SVM model and shape features [8] , the training was conducted on 76 pictures, and the accuracy rate was 78.59% on the data set of 74 pictures. Fish recognition method [9] is integrated with SVM decision, and the recognition accuracy can reach 90%. A torsion method was established for image preprocessing [10] [11] , and then SVM was used for classification, achieving 90% accuracy on 320 images. The above methods are basically used for fish data with clear images and no noise. Traditional image recognition technology needs to be improved for a specific condition when it is used in underwater ecological environment with noise interference from non-uniform light field.
At present, the main source of fish identification methods is deep learning [12] [13] [14] convolutional neural network model algorithm. It takes convolutional neural network [15] (CNN) and generative adversarial network [16] as the core. As its most exemplary neural network, convolutional neural network still has advantages in image processing [17] [18] [19] , image classification [20] [21] and image recognition [22] [23] [24] [25] [26] . Therefore, convolutional neural network is chosen as the network model to recognize individual fish images in this paper. Compared with the fully connected FCNN network, the neural nodes of each layer in the convolutional neural network are arranged in 3D form like pictures, and the nodes of the previous layer are only connected with some nodes of the next layer through the convolutional operation. The convolutional neural network can include many hidden layers, so that it can learn the feature information of different granularity through layer-by-layer learning. However, due to the excessive design of layers, the number of neurons in the convolutional neural network will increase and the complexity of the network will further affect the learning rate.
2. Data Collection and Processing
In this paper, imaging equipment and polarizer are combined to reduce the noise interference of underwater non-uniform light field and collect the image data of domestic fish. The image data of domestic fish is first processed by sub-pixel convolution reconstruction to improve the quality of image data. The data reconstructed by sub-pixel convolution is filled by translation and flip to build the domestic fish image database.
2.1. Experimental Equipment
The underwater CCD device used in this paper is GoPro9-Black. Its parameters are shown in Table 1.
The imaging equipment is combined with the polarizer to optimize the non-uniform light field and reduce the imaging noise. Parameters of the polarizer are shown in Table 2.
Polarization imaging technology is an imaging technology based on polarized light, which can be used to obtain information such as surface morphology and material properties of objects. Polarization imaging technology is mainly based
Table 2. Polarizer parameter table.
on the interaction between polarized light and the surface of the object, by measuring the change of the polarization state of the light, to obtain the information of the object surface. Polarization imaging techniques usually require the use of optical elements such as polarization filters and polarization splitters to control and separate the polarization state of light. During imaging, the surface information of objects at different angles can be obtained by changing the direction and intensity of polarized light. By synthesizing the imaging results from multiple angles, more accurate information about surface morphology and material properties can be obtained. In this paper, the combination of polarizer and GoPro9 is used to optimize the image quality and reduce the noise interference in underwater imaging.
2.2. Data Collection
In this paper, image data collection is carried out under two conditions: different angles and different depths. The collection period is 9:00 - 10:30 in the morning on a sunny day. 14:00 - 15:30 p.m. A total of 342 images were collected for the experiment.
The first step is to select a polarizer with a polarization Angle of 45 and adjust the image in the imaging device to have no obvious brightness difference. Step 2: Samples are taken at 30˚, 45˚, 60˚ and 90˚ in the same water depth. Step 3: Adjust CCD at 30 cm, 60 cm, 90 cm, 120 cm, 150 cm and 180 cm, and repeat Step 2 to continue collection. Different water depth operation is similar, not described here. The collection method is shown in Figure 1.
2.3. Data Processing
In Figure 2, (a)-(f) images are collected under non-uniform light field, and (a*)-(f*) is the gray histogram corresponding to each image. The image quality, image features are not obvious, the image is not clear, and the noise is loud when the underwater non-uniform light field is collected, which is reflected in the gray
histogram as follows: the gray level of the pixels in the histogram is concentrated, and the contrast is low.
Before inputting the collected image data into the convolutional neural network, subpixel convolution [27] is used to reconstruct it. Subpixel slack pole reconstruction is to perform unilinear interpolation upsampling convolution of low pixels to obtain image data of high pixels, as shown in Figure 3.
Figure 3. Principle of subpixel convolution reconstruction.
Unilinear interpolation is to connect a line between two points of pixel (
,
), (
,
), and calculate the value of feature point x between (x0, x1) on line y. The formula is as follows:
(1)
when it is reflected in the image pixel, the image value between two points is y, which is obtained by up-sampling according to the single linear interpolation, and is called the subpixel point (Figure 4).
In the image data reconstructed by sub-pixel convolution, each gray level of the gray histogram of (a)-(f) is evenly distributed without concentration. Image quality is improved. It is beneficial to improve the recognition rate of convolutional neural network.
A total of 342 data sets were collected in this paper. In order to fully extract various types of features of domestic fish during training, various fishlike images were randomly selected during the input image training [28] , and extended data processing was carried out on domestic fish image data by random translation. The blank part is filled with its similar color [29] [30] [31] to ensure the color compatibility of the image. As shown in Figure 5.
Image translation is a basic image processing technique that is used to translate an image along a horizontal or vertical direction. The basic idea is to move each pixel in the image in a specified direction. Specifically, for a two-dimensional image, the position of the pixel after translation can be calculated by the following formula:
(1)
where (x, y) represents the position of the original pixel, (dx, dy) represents the amount of translation along the horizontal and vertical directions, and (x', y') represents the new position after the translation. In image translation, each pixel in the image needs to be calculated according to the above formula and moved to a new position. In the process of moving, pixels beyond the image boundary can be processed by image filling and other methods to ensure image quality and continuity.
The domestic fish image after translation will produce black regions in the
upper, lower and left, which will interfere with the recognition weight and reduce the accuracy of domestic fish recognition in the recognition of convolutional neural network. In this paper, the image is filled after translation, edge filling is an image filling method, which is used to expand the edge of the image. The basic idea of edge filling is to copy the edge pixel value of the image along the edge so that it can be processed to the edge of the image when performing some image processing operations.
Specifically, the process of edge filling is as follows: for the upper and lower edges of the original image, the pixel values of its last row/column are copied until the newly generated image size meets the requirements of the original image size.
3. Network Construction
3.1. A-D-CNN Construction
There are four kinds of domestic fish: silver carp, bighead carp, black carp and grass carp. When collecting image data, there are different angles and different feature points, which lead to the reduction of recognition accuracy and long recognition time. For this reason, a convolutional neural network suitable for identifying domestic fish species is constructed. The model structure design is shown in Figure 6.
1) Input layer: Based on individual domestic fish images collected by underwater cameras, morphological characteristics of individual domestic fish are important indicators to distinguish different species, and color and size are not included in the feature index. Therefore, the size of input image is 128 × 128 pixels, and the convolutional layer is input for feature extraction.
2) The convolutional layer adopts three-layer convolutional layer with fewer parameters, more nonlinearity and deeper network, which is conducive to the improvement of learning rate. In order to extract more features in the underwater non-uniform optical field environment, 3 × 3 convolutional layer is adopted, and the number of convolutional nuclei in each layer is 32, 64 and 128.
3) Pooling layer: The function of pooling layer is to reduce the dimension of the obtained feature image and further compress the feature to get the reduced feature image size. In this way, the computation is reduced and the overfitting of the network is reduced. This paper adopts max pooling downsampling method to obtain individual features of domestic fish, and the filter size of pooling layer is 1 × 1.
4) Optimization of the Adam-dropout function: the Adam algorithm iteratively updates the weight of the neural network based on the training data and
Figure 6. Convolutional neural network structure even diagram.
adaptively updates the learning rate to avoid the problem of gradient explosion of the neural network. The Dropout algorithm randomly kills too many neurons with a set probability after the full connection layer to avoid overfitting between the training set and the test set in the network.
5) Output layer: output the domestic fish species information identified by user input.
In this paper, a three-layer convolutional neural network is used: input − convolution − pooling = convolution − pooling − convolution − pooling = full connection.
This convolutional neural network activation function takes ReLu: its mathematical expression is
(1)
Output x when x > 0 and 0 when x ≤ 0.
ReLU activation function can improve the expression ability of the model, convergence speed is fast, and it has good robustness to the input small perturbations, which can effectively prevent the gradient disappearance problem.
SGD is the loss function of convolutional neural network, and the loss function is minimized by constantly adjusting the model parameters, so as to improve the prediction performance of the model. It calculates the gradient of the loss function on each training sample and updates the model parameters according to the direction and magnitude of the gradient.
3.2. Adam-Dropout Optimization Network
In the process of network training, the learning rate needs to be dynamically adjusted according to the network training situation. In the early stage of network training, the pixel information of the input image is completely unknown, and inappropriate learning rate is easy to make the model fall into overfitting. In the later stage of training, excessive learning rate will cause a large oscillation of loss value. Therefore, this paper uses a method combining Adam-Dropout [32] to make the learning rate adjust automatically, make the model reach the optimal solution locally, and make the test set and training set jump out of the overfitting phenomenon.
3.2.1. Dropout Optimizes Network Operation
Add Dropout operation to the full-connection layer of the model to solve the overfitting problem of the model. Individual fish image recognition training, even the same species of fish. There is also a large gap in the individual, resulting in the recognition rate of the training set is much higher than that of the test set, which makes the model easy to overfit. Dropout randomly deletes some hidden neurons in the network in a batch of data, leaving the input and output neurons unchanged; the input is propagated forward through the modified network, and then the error is propagated back through the modified network. For another batch of training samples, repeat the above operation Dropout at the time of forward propagation, set the activation value of a neuron and stop working with a certain probability p.
If p is zero, the neurons will be inactive, and p is set high. There are too many neurons, which makes the model lack of learning. The recognition accuracy of the model is affected. When p is set low, the discarding work cannot be completed normally. Therefore, this article sets the discard rate from 0.1 to 0.7. Discard operation is shown in Figure 7.
If p is zero, the neurons will be inactive, and p is set high. The loss of too many neurons makes the model underlearning. The recognition accuracy of the model is affected. When p is set low, the discarding work cannot be completed normally. Therefore, this article sets the discard rate from 0.1 to 0.7. The recognition effect is shown in Figure 8.
The relationship between recognition weight and Dropout is:
(2)
is the recognition weight and i is the number of training.
3.2.2. Adam Optimizes Network Operation
In the training process, the model should automatically adapt to the learning rate. In this paper, Adam optimization operation will be added after discarding the operation. Adam operation can automatically learn according to the set parameter values in the process of individual recognition of trained fish, avoiding
Figure 8. Relation between discard rate and recognition rate.
the overfitting of the model in the case of unknown image features. Adam optimization algorithm is a learning rate adaptive optimization algorithm, which was first proposed in the ICLR conference in 2015. Adam algorithm can be understood as a learning rate adaptive optimizer with momentum method, which is more effective than stochastic gradient descent method to update the network weight. It estimates the first and second moments of the gradient of each parameter according to the objective function and uses the exponential moving average to calculate. In order to solve the problem of high noise and gradient dilution in the iterative process of parameter space, the feature scaling of the gradient of each parameter is constant.
The principle is as follows:
Calculate the sliding mean, square the cumulative gradient, correct the deviation, update the parameter.
(3)
The sliding mean, the mean of v’s sliding squares, the first order matrix, the second order matrix, g is the gradient
Deviation correction
(4)
Update learning parameters, lr is learning rate. Is the fuzzy factor
(5)
Under multiple training simulations, the parameters were set as follows: lr was set as 0.01, the fuzzy factor was 1e−8, and the gradient coefficient was between 0.99 - 0.999.
During the experiment, the set convolution kernel size and convolution layer number would affect the feature extraction accuracy of the fish, so the experiment gradually increased the convolution kernel and convolution layer number. For the learning rate Settings of Adam and the Dropout rate Settings, we need to fine-tune the Settings in the experiment to find the optimal settings to meet the requirements of high precision and low time.
4. Analysis of Experimental Results
In this paper, the data of four species of domestic fish were randomly cut according to the training set and test set 8:2. In the experiment, the training frequency was set as 15 iterations, as shown in Figure 9. At 15 iterations, the recognition accuracy rate and loss rate of the training set tended to be stable. In order to provide recognition results faster, 15 iterations were set as the training frequency of this experiment.
The loss rate and success rate of A-D-CNN model and typical convolutional neural network model [33] in Training set and test set are shown in Figure 10 (o stands for typical convolutional neural network model, c stands for A-D-CNN model, Training is training set, Validation is test set):
Typical convolutional neural network model test set loss rate (1.4) is higher than A-D-CNN model test set loss rate (0.17), success rates are 59.38% and 96.97%, as shown in Figure 11.
In addition to the comparison between A-D-CNN and typical convolutional neural network models, this paper seeks neural network models such as ResNet50,
Figure 9. Training times and success rate/loss rate.
Figure 10. Comparison of loss rate of test set between typical convolutional neural network model.
Figure 11. Comparison of success rate between typical convolutional neural network model and A-D-CNN model.
GoogleNet and YoLov5 for comparison with A-D-CNN. Under the same data set and the same training times, the recognition rate of Resnet50 and GoogleNet is shown in Figure 12 and Figure 13. When the three models used the same data set as A-D-CNN, the recognition rate of ResNet50, GoogleNei and A-D-CNN was 77.27%, 81.82% and 96.97%. The recognition rate pairs are shown in Table 3.
Figure 12. Image recognition rate of individual ResNet fish.
Figure 13. Image recognition rate of individual GoogleNet fish.
Table 3. Comparison results of different models.
Table 4. Comparison between typical model identification time and A-D-CNN model identification time.
Based on the above experimental results, it can be seen that the method proposed in this paper can better complete the task of individual recognition of domestic fish.
In Table 4, the difference in recognition time between the two models is 15 s, and A-D-CNN is superior to typical convolutional neural networks in terms of speed.
5. Analysis of Exper
According to the different characteristics of four species of domestic fish, an individual recognition model of domestic fish based on improved convolutional neural network is proposed in this paper. The Dropout operation is set in the model with a dropout rate of 0.5Dropout to reduce the dependence between neurons, and an adaptive motion estimation algorithm is used to dynamically adjust the learning parameters. Experiments show that the recognition rate of domestic fish species by the convolutional neural network (A-D-CNN) constructed in this paper reaches 96.97% and the recognition time decreases from 54 seconds to 39 seconds under the environment of non-uniform light field. The model can identify different species of domestic fish with high quality, which is conducive to improving the efficient and convenient identification of fish in aquaculture industry and improving the intelligent level of aquaculture.
Although the accuracy and speed of individual fish recognition in this paper meet the experimental requirements, the occlusion of fish is not taken into account. In the next step, convolutional neural networks will continue to be used to analyze and solve the occlusion recognition of fish, so as to achieve multiple application scenarios of the model.