Garbage Classification Detection Based on Improved YOLOV4

As the rate of garbage generation gradually increases, the past garbage disposal methods will be eliminated, so the classification of garbage has become an inevitable choice. The multi-category classification of garbage and the accuracy of recognition have also become the focus of attention. Aiming at the problems of single category, few types of objects and low accuracy in existing garbage classification algorithms. This paper proposes to use the improved YOLOV4 network framework to detect 3 categories, a total of 15 objects, and find that the average accuracy is 64%, Frame per second 92f/s. It turns out that the improved YOLOV4 can better detect garbage categories and is suitable for embedded devices.

the quality of life. Some progress has been made in the detection of garbage classification. For example, Yue Xiaoming et al. proposed an Anchor-Free CenterNet network for garbage classification and detection [1]. Sang Shenggao et al. proposed to combine the Internet with garbage collection, and through people reporting online, the cost of garbage collection was reduced [2]. Chen Ningxin, Yang Jiarui, etc.
proposed a dry and wet garbage sorting bin based on a binary classification strategy [3]. According to the analysis function of Abaqus software, Yang Mingwei and others designed a multi-function automatic sorting trash can, and processed the trash through voice input [4]. However, because the types of garbage are very abundant, and the current knowledge of category of garbage is not popular enough at present, it is very important to classify and detect garbage to avoid mishandling of garbage.
In recent years, with the development of deep learning, neural networks have been widely used in image detection. The existing neural network models can be divided into two types. One requires the extraction of candidate frames from the original image. From Alexnet's proposal to extract image candidate frames with a sliding window, to later R-CNN based on selective search, Fast R-CNN, Faster R-CNN [5], and R-FCN, etc. The other is to input the original image and directly return to the detection target through the convolutional network. The step of extracting the candidate frame is omitted, which speeds up the image detection, but at the same time reduces the detection accuracy. The main representatives are YOLO series and SSD [6]. According to this article, the improved YOLOV4 is an improved target detection method based on the YOLO series. Although the original YOLOV4 method has high accuracy, the amount of parameters and network models involved are very large, and it is not suitable for embedded devices. Therefore, the improved YOLOV4 reduces the network structure, reduces the amount of parameters, and improves the image detection efficiency while ensuring accurate target detection. This article trains on the improved YOLOV4 object detection framework and the pre-trained weights of the VOC data set, and detects 3 categories, 15 types of garbage, and a total of 22,000 images.

YOLOV4 Network Introduction
The YOLOV4 algorithm has been developed from YOLOV1. After YOLOV2, YOLOV3, YOLO network can generally be divided into three parts, namely the backbone network, neck network and Prediction. The backbone network is mainly responsible for the extraction of image features, but with the development of deep learning, it is found that although the more layers of the network, the richer the extracted feature information, but it will increase the cost of training. After reaching a certain number of layers, its training effect will decrease instead. Therefore, people now increasingly expect to use lightweight layer networks to replace complex and computationally intensive neural networks on the premise of ensuring effective feature extraction from images. The

Data Augmentation
Because the ability to collect data sets is limited, YOLOV4 will create new training samples from existing data sets. Therefore, it is necessary to perform data augmentation operations on existing data sets to improve the generalization ability of the training model. Therefore, in the current image processing, diversified data augmentation can maximize the use of the data set, which is the key to the performance breakthrough of the object detection framework. The photometric distortion of YOLOV4 is to adjust the brightness, contrast, hue, satura-

Neck
Neck is mainly used to generate feature pyramids. The feature pyramid will enhance the model's detection of objects at different scales, so that it can recognize the same object of different sizes and scales. Before PANET came out, FPN had always been the State of the art of the feature aggregation layer of the object detection framework until the appearance of PANET (Path Aggregation Network).
The Neck network in YOLOV4 uses SSP (Spatial pyramid pooling) and PANetP.
SSP uses 4 different sizes of sliding kernels, 1 × 1, 5 × 5, 9 × 9, and 13 × 13 to convolve the candidate images, and then apply Multi-scale Maxpooling to get the same dimensions of feature maps [8]. SSP allows the spatial size of each candidate map to be preserved, and then connects feature maps of different core sizes as output, and the output is a fixed size feature map. The SSP network structure is shown in Figure 2.

Loss Function
The

Prediction GroundTruth IOU Prediction GroundTruth
A C : The minimum closure area of the prediction box and the ground truth box, U: The sum of the area of the prediction box and the real ground truth. Prediction means prediction box, Ground Truth means ground truth box, ∂ : The newly added weight coefficient based on GIOU, v: The similarity of the aspect ratio between the predicted frame and the real frame, c: is the length of the diagonal of the box, 2 ρ means the square of the distance between the center of the real frame and the predicted frame.

YOLOV4 Improvements
Since the original YOLOV4 detects 80 categories on the VOC data set which has many categories, and in this garbage classification, only 15 types of garbage categories need to be detected, and the backbone network parameters of the original YOLOV4 reach 52.49 M, and the number of layers is 334 layers, too many floating point operations, although it can increase the accuracy of garbage classification and detection, it is not suitable for the application of embedded devices. Therefore, the original YOLOV4 backbone network extraction part can be modified to reduce the memory size calculated by the model to meet the requirements of embedded devices.

Rescurive-FPN
Since changing the backbone network in YOLOV4 to Mobilenetv3 will reduce the acquired image feature information, PANNet is converted to Rescurive-FPN in the Neck network, because PANNe is a simple two-way fusion of the feature Journal of Computer and Communications maps output by the backbone extraction network. Although the feature information on the image is retained to a certain extent, due to the garbage classification, there is rich and similar image information, such as kitchen garbage. Therefore, Rescurive-FPN is used to fuse the characteristic information output by traditional moral FPN and then input it to Backbone for a second cycle. In this way, the characteristic information of the backbone network is retained as much as possible ( Figure 6).

Experiment Environment
In this training, a total of 22, 000 garbage images were collected through crawlers and actual collection of images, which were divided into 3 categories, namely

Result
Using 22,000 15 types of garbage objects for training and 6600 images for testing, it is found that when the number of iterations is between 0 and 200, the CIOU Loss decreases significantly, and the accuracy improves significantly.
From 700 to 1200, its training performance gradually stabilized, and the MAP value eventually reached 64%. The improved YOLOV4 training loss value is shown in Figure 7. The trained model is used to detect the garbage category.
The detection result is shown in Figure 8. From the figure, it can be seen that the model not only detects the garbage object and the confidence value, but also detects the garbage category.
Compare the improved YOLOV4 with YOLOV4 and YOLOV3, and the specific training parameters are shown in Table 1. It can be seen from the experimental results that the three models, YOLOV3, YOLOV4 and Improved YOLOV4,

Conclusion
Compared with the original version of YOLOV4, the accuracy of the improved YOLOV4 remains almost unchanged, at 64%, and the improved YOLOV4 has a higher FPS value than YOLOV4, which is 92 f/s. Therefore, using the improved