Intelligent Detection Method of Substation Environmental Targets Based on MD-Yolov7

Tao Zhou; Qian Huang; Xiaolong Zhang; Yong Zhang

doi:10.4236/jilsa.2023.153006

Journal of Intelligent Learning Systems and Applications > Vol.15 No.3, August 2023

Intelligent Detection Method of Substation Environmental Targets Based on MD-Yolov7

Tao Zhou, Qian Huang, Xiaolong Zhang, Yong Zhang
State Grid Yangquan Power Supply Company, Yangquan, China.
DOI: 10.4236/jilsa.2023.153006 PDF HTML XML 97 Downloads 440 Views

Abstract

The complex operating environment in substations, with different safety distances for live equipment, is a typical high-risk working area, and it is crucial to accurately identify the type of live equipment during automated operations. This paper investigates the detection of live equipment under complex backgrounds and noise disturbances, designs a method for expanding lightweight disturbance data by fitting Gaussian stretched positional information with recurrent neural networks and iterative optimization, and proposes an intelligent detection method for MD-Yolov7 substation environmental targets based on fused multilayer feature fusion (MLFF) and detection transformer (DETR). Subsequently, to verify the performance of the proposed method, an experimental test platform was built to carry out performance validation experiments. The results show that the proposed method has significantly improved the performance of the detection accuracy of live devices compared to the pairwise comparison algorithm, with an average mean accuracy (mAP) of 99.2%, which verifies the feasibility and accuracy of the proposed method and has a high application value.

Keywords

Substation, Target Detection, Deep Learning, Multi-Layer Feature Fusion, Unmanned Vehicles

Share and Cite:

Zhou, T. , Huang, Q. , Zhang, X. and Zhang, Y. (2023) Intelligent Detection Method of Substation Environmental Targets Based on MD-Yolov7. Journal of Intelligent Learning Systems and Applications, 15, 76-88. doi: 10.4236/jilsa.2023.153006.

1. Introduction

The substation is the hub of the grid that connects the transmission lines at all voltage levels. Its role is to raise and lower the voltage and allocate resources from the pooled power resources. With the development of electric power technology and the improvement of power quality requirements of power users, the automation, security and intelligent construction of substations are in urgent demand [1] . Among them, live operation is an important means to ensure smooth and continuous power supply, and automation equipment needs to work near live high-voltage equipment. However, the environment of live equipment in substations is complex, and once the target detection of automation equipment crosses the safety spacing, it may cause extremely bad consequences and economic losses, which seriously affects the safe production of power supply and substation projects [2] . Therefore, accurate detection of targets within the substation environment is required. Traditional detection of powered equipment has limitations for transformers, busbars and pointer meters, and is subject to serious interference from light and background noise, low robustness and poor self-adaptability [3] .

In recent years, intelligent detection algorithms represented by deep learning, reinforcement learning, and migration learning have been widely used, and the deep learning detection framework is the mainstream of current research. Currently, there are two main directions of deep learning-based target detection algorithms: two-stage target detection algorithms and single-stage target detection algorithms [4] . For two-stage detection algorithms, a pre-selected box that may contain the object to be detected is set in advance, and then convolutional neural networks are used for sample classification calculation, with R-CNN [5] and Fast R-CNN [6] as typical representatives.

Single Shot MultiBox Detector (SSD) and Yolo single stage target detection algorithms are represented. Dai et al. [7] trained a deep learning model SSD to identify weak target detection in complex environments. The trained model achieved 92.35% accuracy on the test set. The Yolo family of networks combines detection accuracy and detection speed, and excellent performance in detecting complex environmental targets [8] . Guan et al. [9] proposed a DCNN model based on the fusion of deep learning and Google Data Analytics, using the improved network Yolov4, for testing images after quality level classification, with the recognition accuracy was 95%. The literature [10] introduced CDMA into Yolov5 for dynamic target tracking and successfully improved the detection accuracy and real-time performance of the algorithm.

With in-depth research on the application of deep learning technology in the task of substation target recognition, the Yolo series algorithms have been partially applied in the power grid, which can be applied to real-time tracking of multiple faults in the power grid, enabling real-time detection of various targets in the transmission grid, real-time recognition and obstacle avoidance through the pictures collected in real time by the power inspection equipment. However, a number of scholars have pointed out the limitations of deep learning techniques. Yolov7 as the current best performing Yolo series algorithm, Yolov7 neck network uses a single scale feature map, which limits the expression of feature information and thus constrains the detection accuracy [11] . The literature [12] pointed out that the substation target identification numerous and complex background, how to quickly identify the device type in the complex background is also limited by the backbone network structure, optimization and improvement of the network becomes especially important.

This paper conducts research on the intelligent identification of energized equipment under complex background and noise interference, and obtains a light-weight disturbance dataset through adaptive data expansion, and then proposes a Yolov7 substation environment target intelligent detection method based on fusing multi-layer feature fusion (MLFF) and detection transformer (DETR) to improve the detection accuracy of substation equipment identification.

2. Substation Target Analysis

The serious electromagnetic interference in the substation, the obvious non-structural characteristics of the environment, the seriousness of obstacles and other occlusions, and the wide range of equipment present a huge challenge to accurate target identification.

2.1. Target Analysis

The equipment in the substation is divided into primary equipment and secondary equipment. Primary equipment mainly completes electric power production, efficient transmission, centralized distribution and electric power use, including transformers, capacitors, reactors, disconnect switches, busbars, etc.; the secondary equipment measures, monitors, controls and protects the operating conditions of the primary equipment, including relay protection devices, automatic devices, measurement and control devices (current transformers, voltage transformers), metering devices, etc. In this study, the total number of substation equipment to be identified is 26, and the voltage carried covers 10 KV - 1000 KV of energized equipment. The safety distances between different energized equipment are shown in Table 1.

The electromagnetic environment is complex and the goal is to increase the sample diversity of the experimental data set in order to improve the robustness of the training model, while addressing the need for a large amount of data as samples to support DETR to achieve excellent results.

Table 1. Safety distance between uncovered energized parts of vehicles.

2.2. Lightweight Perturbation Dataset

Since there is no proprietary dataset of substation equipment, the data of related resources are insufficient and the samples are single. At the same time, in order to train the Yolo recognition model with good robustness and strong generalization ability, it is necessary to expand and broaden the existing data and establish a light-weight disturbance dataset by considering the disturbance factors such as illumination, shading and noise.

The lightweight perturbation dataset is obtained by fitting Gaussian stretched poses to multiple photos of an independent scene containing a dynamic target through recurrent neural networks with iterative optimization, which can quickly generate a dataset containing thousands of images with target poses through iterative learning based on Gaussian fitting of recurrent neural networks, thus eliminating the time-consuming step of collecting real-world data.

Processing the image based on the target pose by Gaussian stretching to synthesize a new image of the scene acquired from different camera poses. For example, each Gaussian stretch in units of (1 cm, 1 cm, 1 cm, 10˚, 10˚, 10˚) corresponds to a change of $(p_{x}, p_{y}, p_{z}, r_{x}, r_{y}, r_{z})$ in the target posture, which can be calculated to obtain the target pose and the image after the stretch, and the recurrent neural network is trained to fit and iteratively learn the target pose and the 2D image of the Gaussian stretch results to obtain the constructed dataset T.

$T = {(I_{0}, {}^{0}r), (I_{1}, {}^{1}r), \dots, (I_{n}, {}^{n}r)}$ (1)

where n is the number of synthetic dataset images. This fitting optimization process can be described as follows:

$\overset{⌢}{r} = \arg \min_{r} ρ (r, r^{*})$ (2)

where r is the arbitrary initial pose and $r^{*}$ is the stretched desired pose, $\overset{⌢}{r}$ is the optimized fitted pose when $\overset{⌢}{r} = r^{*}$ is the system has converged optimally. $ρ (.)$ is any cost function with global minimum.

Data set light weighting is to improve the network training speed by extracting the range around the target region, filtering other non-essential data, and compressing the data set to reduce the data volume and prevent data overfitting while ensuring the integrity of the dynamic target data and the accuracy of the training network.

$T^{'} = {(I_{0}, {}^{0}r), (L (I_{1}), {}^{1}r), \dots, (L (I_{n}), {}^{n}r)}$ (3)

where $L (I_{a}) = {{I^{'}}_{a} (x, y) | (x, y) \in η {}^{a}r_{p}}$ , $η$ is the target region range threshold, ${}^{a}r_{p}$ is the target region location range, $L (I_{1})$ is the part of the image $I_{1}$ based on which the target region range threshold is $γ$ , and $T^{'}$ is the lightweight dataset.

In the established dataset, perturbation factors are randomly added to create a lightly perturbed dataset, where $T^{*} = {(I_{0}^{*}, {}^{0}r), (I_{1}^{*}, {}^{1}r), \dots, (I_{n}^{*}, {}^{n}r)}$ is the synthetic dataset containing the perturbation factors.

In order to solve the problem that the number of training sets is too small, data enhancement is needed. In this paper, we use to generate images of signal lights, digital meters and pointer meters, and further achieve data enhancement by adding noise, rotation, horizontal and vertical flip and cropping based on the original images and the generated images.

3. Materials and Methods

For the autonomous inspection vehicle substation equipment identification task, a certain depth of network is required to extract feature information due to factors such as complex background and low sample identifiability. We chose the Yolov7 algorithm as the basic algorithm under the common GPU operating environment to improve the implementation of Backbone's detection and detection efficiency real-time detection needs, and optimized the established Yolov7 through MLFF and DEFT to improve the feature extraction capability, and the model structure is shown in Figure 1.

3.1. Yolov7

Yolov7 [13] is one of the newest basic models in the Yolo series, which outperforms

Figure 1. MD-YOLOv7 network model.

most known target detector network models in terms of speed and accuracy in the following four main modules: Input, Backbone, Neck and Head. Input consists of Mosaic data enhancement, image size scaling and adaptive anchor frame calculation. The backbone mainly consists of convolution, E-ELAN module, MPConv module and SPPCSPC module.

3.2. MLFF

In the feature extraction network, the shallow feature maps generated by Yolov7 contain rich detail information in terms of target texture features. In contrast, the deep layer feature maps generated by the deep layer network can extract richer semantic information through a larger sensory field. Different layers of feature maps are useful for substation equipment identification in different ways. To obtain a more adequate feature map representation, we fuse different feature layers to aggregate contextual information and dynamically adjust the weights of each input feature layer by adaptive learning to reasonably allocate the proportion of feature information required by the detection task.

MLFF generates a new feature map by multi-scale fusion of the feature maps $P_{1}, P_{2}, P_{3}, P_{4}$ of the four layers. First, in order to weight the fused feature maps of the four layers need to be normalized to obtain a consistent size and number of channels. The adjusted feature maps are integrated in the channel dimension, and thus the fused feature maps P are obtained as follows:

$P = F u ([P_{1}, P_{2}, P_{3}, P_{4}])$ (4)

Then, the nonlinear relationship between the channels is fitted by two fully connected layers and the corresponding weight s is generated using the Sigmoid activation function with the intermediate features f:

$f = F_{e} (x, W) = σ (W_{2} δ [W_{1}, z])$ (5)

$F_{e}$ denotes the excitation mapping, $σ$ is the Sigmoid function, $δ$ is the nonlinear activation function ReLU6, and $[., .]$ is the rotation splicing transformation. The weight is multiplied with the feature map P to obtain the feature map U, which realizes the response with prominent important information and reduces the redundant information.

The middle feature f is split into two independent features along the middle in the horizontal and vertical directions, and then into two convolutional layers with a convolutional kernel of 1 and a nonlinear function activation layer to obtain the weights in the two directions, and then use the Softmax function to normalize the weights in space of the feature map to obtain the weight matrix.

$W \in ℝ^{4 \times H \times W}$ (6)

The weights are expanded into three-dimensional arrays $α, β, γ, λ \in ℝ^{1 \times H \times W}$ with the same dimensionality as the input features along their respective directions to obtain the importance weight parameters of each layer of the feature map, complete the rescaling of the features, and obtain the new feature map $L_{3}$ after fusion.

$L_{3} = α P_{1} + β P_{2} + γ P_{3} + λ P_{4}$ (7)

3.3. DETR

To further improve the extraction of effective information from substation energized equipment, the DETR module is built to improve the model for primary and secondary equipment target feature learning. In Figure 2, the proposed DETR consists of encoder, decoder and prediction unit. In the encoder part, N objects are converted into embedded outputs by dimensioning the feature map of Backbone output, i.e., the feature map of $C \times H \times W$ dimensions is convolved by a 1 × 1 convolution kernel to obtain multiple one-dimensional vectors, which are fed to the Transformer-based encoder and decoder together with the position vectors; these embedded output vectors are independently encoded into N categories and prediction frames in parallel by a feed forward network with shared weights, and the entire input is predicted in parallel based on the Transformer de encoder-decoder structure.

Subsequently, we perform a sequence transformation of the feature map to further downscale the spatial dimension as $H \times W$ to obtain a two-dimensional feature map of $H W$ . Finally, the two-dimensional feature map is position encoded with position coding. The position encoding formula is as follows:

$P E_{(p o s, 2 i)} = \sin (p o s / 10000^{2 i / d})$ (8)

$P E_{(p o s, 2 i + 1)} = \cos (p o s / 10000^{2 i / d})$ (9)

In the Transformer-based encoding stage, the attention matrix size is $(H \times W) (H \times W)$ , and a point on the attention matrix corresponds to two different points on the feature block represented by this token, and since the number of input tokens to the encoder is the same as the number of feature image elements, this defines a frame. Therefore, the DETR model has a unique advantage in the target detection task.

Figure 2. DETR structure.

4. System and Experiment

In this section, we propose an unmanned vehicle inspection system for substations, and describe the hardware and software system components in detail, as shown in Figure 3.

Based on this, performance verification experiments are carried out using DaSiamRPN, SSD, Faster R-CNN, and Yolov7 as comparison algorithms, and the superiority of the performance of the proposed algorithm is analyzed based on the experimental results.

4.1. System Composition

The inspection vehicle test system mainly includes environment sensing subsystem, human-computer interaction subsystem, drive subsystem, unmanned vehicle and communication module. The specific equipment parameters are shown in Table 2.

Table 2. Safety distance between uncovered energized parts of vehicles.

Figure 3. Unmanned vehicle inspection and testing system.

4.2. Experimental Testing

Based on the constructed lightweight perturbation dataset, we establish test set, training set and validation set, and set up comparison algorithms to carry out performance validation experiments, introduce evaluation indexes such as recall, precision, mAP, etc., and quantitatively analyze the algorithm performance based on the experimental results.

$P R C = \frac{T P}{T P + F P}$ (10)

$R E C = \frac{T P}{T P + F N}$ (11)

TP means the target is the real target, and the detection result is the real target; FP means the target is not the real target, and the detection result is not the real target; FN donates the target is the real target, and the detection result is not the real target. The accuracy mean AP (map) is the area value of the curve enclosed by the accuracy PRC and REC.

As shown in Figure 4, the accuracy and recall of the MD-Yolov7 target detection network model stabilized and reached the fit state after about 20 epochs of training, and the mAP@0.3, which is determined by the target detection accuracy and recall, also stabilized, which verified the accuracy and rapidity of the proposed algorithm target detection network model.

In order to further test the performance of the algorithms, the performance of the comparison algorithms SSD, Faster R-CNN, Yolov7 was tested based on the test set. The following Figure 5 shows the visualization of the results using six types of data: transformer, extreme noise transformer, bus, extreme noise bus, capacitor, and extreme noise capacitor.

1) The proposed MD-Yolov7 algorithm has the highest recognition mAP of 98.85% for the same training and test sets, which demonstrates the superiority of our proposed algorithm.

Figure 4. Feasibility validation results.

Figure 5. Selected data results.

2) The performance of Yolov7 and SSD is close, slightly higher than Faster R-CNN but lower than DasiamRPN.

3) SSD, Faster R-CNN, Yolov7 show substantial performance degradation under extreme noise conditions. The performance of MD-Yolov7 proposed in this paper is more stable and verifies the robustness of the algorithm.

Also to check the performance of the trained target detection model, the experiments compared the performance of the five algorithms in terms of PRE, REC, mAP, and AST for 26 powered equipment inspection tasks, and the results are shown in Table 3.

Table 3. Safety distance between uncovered energized parts of vehicles.

Table 3 shows the recognition performance of each algorithm on the test set. The recognition rates of these four methods are around 86% and do not exceed 80%. We can get the following conclusions.

1) The proposed MD-Yolov7 has more performance advantages in terms of PRE, REC, mAP and AST. Especially, the proposed MD-Yolov7 model is 12.5% higher than Faster R-CNN in mAP metrics;

2) The recognition time of Faster R-CNN is much longer than the others, but its performance is not superior. Because Faster R-CNN is a two-stage network, which leads to a slow training process. Yolov7 and SSD are single-stage networks that directly output classification and localization results, which can improve the training speed, but there is some loss in detection accuracy;

3) MD-Yolov7 takes the least time and achieves the best detection results, which verifies the rapidity of convergence and the real-time nature of recognition. With low resource occupation, it can converge faster and has the highest average recognition accuracy. It has low resource occupancy, faster convergence, and high average recognition accuracy.

The above experimental results verify the correctness and validity of the algorithm and the trained model. The above experimental process also verifies the reliability of the proposed algorithm, which will provide a new way of thinking about safe substation operations.

5. Conclusions

In summary, in order to solve the problem of accurate identification of energized equipment in substation environment and realize intelligent inspection operations in substations, this paper realizes the construction of a lightweight disturbance data set and proposes an MD-Yolov7 multi-scale target detection algorithm for complex substation environment background. The MD fusion improvement module is established by optimizing the design of MLFF module and DETR module. The experimental results show that for 26 kinds of charged targets in complex backgrounds, the proposed MD-Yolov7 algorithm has a huge improvement in recognition accuracy and recognition efficiency compared with SSD, Faster-CNN and Yolov7, and the average detection accuracy reaches more than 99.2%, and there is no phenomenon of missed detection and false alarm, which verifies the correctness of the algorithm and model.

The next proposed step is to continue to improve the detection speed without affecting the performance of the algorithm and model as a subsequent research direction.

Funding

This research was funded by State Grid Limited Science and Technology Project Grant No. 5205C0220001.

Conflicts of Interest

The authors declare no conflict of interest.

References

[1]	Xia, Y.Q., Gao, R.Z. and Lin, M. (2020) Green Energy Complementary Based on Intelligent Power Plant Cloud Control System. Acta Automatica Sinica, 46, 1844-1868.
[2]	Chen, Y.J., Zhu, X.T. and Yu, Y.R. (2022) Empirical Analysis of Lightning Network: Topology, Evolution, and Fees. Ruan Jian Xue Bao/Journal of Software, 33, 3858-3873.
[3]	Zheng, H.B., Sun, Y.H. and Liu, X.H. (2021) Infrared Image Detection of Substation Insulators Using an Improved Fusion Single Shot Multibox Detector. IEEE Transactions on Power Delivery, 36, 3351-3359. https://doi.org/10.1109/TPWRD.2020.3038880
[4]	Ala, G., Favuzza, S. and Mitolo, M. (2022) Forensic Analysis of Fire in a Substation of a Commercial Center. IEEE Transactions on Industry Applications, 56, 3218-3223. https://doi.org/10.1109/TIA.2020.2971675
[5]	Nassu, B.T., Marchesi, B. and Wagner, R. (2022) A Computer Vision System for Monitoring Disconnect Switches Distribution Substations. IEEE Transactions on Power Delivery, 37, 833-841. https://doi.org/10.1109/TPWRD.2021.3071971
[6]	Balouji, E., Backstrom, K. and McKelvey, T. (2020) Deep-Learning-Based Harmonics and Interharmonics Predetection Designed for Compensating Significantly Time-Varying EAF Currents. IEEE Transactions on Industry Applications, 56, 3250-3260. https://doi.org/10.1109/TIA.2020.2976722
[7]	Guan, X., Gao, W. and Peng, H. (2022) Image-Based Incipient Fault Classification of Electrical Substation Equipment by Transfer Learning of Deep Convolutional Neural Network. IEEE Canadian Journal of Electrical and Computer Engineering, 45, 1-8. https://doi.org/10.1109/ICJECE.2021.3109293
[8]	Zheng, H.B., Cui, Y.H. and Yang, W.Q. (2022) An Infrared Image Detection Method of Substation Equipment Combining Iresgroup Structure and CenterNet. IEEE Transactions on Power Delivery, 37, 4757-4765. https://doi.org/10.1109/TPWRD.2022.3158818
[9]	Ou, J.H., Wang, J.G., Xue, J. and Zhou, X. (2023) Infrared Image Target Detection of Substation Electrical Equipment Using an Improved Faster R-CNN. IEEE Transactions on Power Delivery, 38, 387-396. https://doi.org/10.1109/TPWRD.2022.3191694
[10]	Han, S., Yang, F. and Jiang, H. (2021) A Smart Thermography Camera and Application in the Diagnosis of Electrical Equipment. IEEE Transactions on Instrumentation and Measurement, 70, 1-8. https://doi.org/10.1109/TIM.2021.3094235
[11]	Li, J., Xu, Y. and Nie, K. (2023) PEDNet: A Lightweight Detection Network of Power Equipment in Infrared Image Based on Yolov4-Tiny. IEEE Transactions on Instrumentation and Measurement, 72, 1-12. https://doi.org/10.1109/TIM.2023.3235416
[12]	Zhou, N., Luo, L. and Sheng, G. (2019) High Accuracy Insulation Fault Diagnosis Method of Power Equipment Based on Power Maximum Likelihood Estimation. IEEE Transactions on Power Delivery, 34, 1291-1299. https://doi.org/10.1109/TPWRD.2018.2882230
[13]	Fan, Z., Shi, L. and Xi, C. (2022) Real Time Power Equipment Meter Recognition Based on Deep Learning. IEEE Transactions on Instrumentation and Measurement, 71, 1-15. https://doi.org/10.1109/TIM.2022.3191709

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies