Recognition of Pointer Meter Readings Based on YOLOv8 and DeepLabv3+

Abstract

Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper proposes a recognition method based on YOLOv8 and DeepLabv3+. To improve the image input quality of the DeepLabv3+ model, the YOLOv8 detector is used to quickly locate the instrument region and crop it as the input image for recognition. To enhance the accuracy and speed of pointer recognition, the backbone network of DeepLabv3+ was replaced with Mo-bileNetv3, and the ECA+ module was designed to replace its SE module, reducing model parameters while improving recognition precision. The decoder’s fourfold-up sampling was replaced with two twofold-up samplings, and shallow feature maps were fused with encoder features of the corresponding size. The CBAM module was introduced to improve the segmentation accuracy of the pointer. Experiments were conducted using a self-made dataset of pointer-style instruments from nuclear power plants. Results showed that this method achieved a recognition accuracy of 94.5% at a precision level of 2.5, with an average error of 1.522% and an average total processing time of 0.56 seconds, demonstrating strong performance.

Share and Cite:

Li, J. , Amin, M. and Shao, Z. (2025) Recognition of Pointer Meter Readings Based on YOLOv8 and DeepLabv3+. Journal of Computer and Communications, 13, 15-25. doi: 10.4236/jcc.2025.131002.

1. Introduction

In the field of nuclear power, general-purpose object detection models like YOLOv8, methods specifically designed for gauge reading tasks have been widely explored Convolutional Neural Networks (CNNs) have been employed to extract features unique to analog meter readings, improving recognition accuracy for instruments such as pressure gauges and thermometers due to their simplicity, low cost, strong resistance to interference, and high reliability [1]. Optical Character Recognition (OCR) techniques, which are commonly used for reading dials, have also shown promise in digit recognition from pointer-based meters. However, traditional methods for reading pointer instruments suffer from low efficiency, the inability to monitor in real-time, and high risks in extreme environments such as high temperature, pressure, and radiation [2]. Additionally, due to the special nature of nuclear power enterprises, traditional IoT methods such as NB, 4G, 5G, and Wi-Fi are not suitable [3]. Traditional image processing techniques, such as template matching and edge detection, have been tailored to reading scale markings on pointer instruments. These methods often outperform generic object detection approaches in specialized applications such as industrial automation. Comparison of our approach with these specialized models provides valuable context for understanding the advantages of using deep learning-based solutions like YOLOv8 for gauge reading tasks. “In contrast, video-based reading methods are simple and practical [4]. Pointer instrument models mainly rely on traditional and deep learning image processing techniques. Conventional methods include template-based feature matching, pointer positioning using region segmentation, Hough transform-based pointer detection, SIFT, and Hough-based pointer recognition, and improved SIFT combined with domain weighting methods [5]. While these methods work, they struggle with handling issues like uneven lighting, complex backgrounds, pointer tilt, and image blur, leading to low precision and large errors [6]. Deep learning methods, based on convolutional neural networks (CNNs), improve recognition accuracy while reducing manual tuning [7]. However, networks such as Fast-er R-CNN and Mask-RCNN have shown slower performance despite high accuracy [8]. A solution combining YOLOv5 with U-Net demonstrated good effects under various backgrounds and distances but still faced reading errors [9]. Through detailed research, two key problems persist the lack of industrial instrument image datasets [10] and the inability of existing networks to balance both speed and accuracy in pointer reading automation [11]. This paper addresses these issues with a custom dataset of pointer-style instruments from nuclear power plants and a novel recognition method based on YOLOv8 and DeepLabv3+ [12]. First, YOLOv8n detects the instrument panel and crops it as input for DeepLabv3+. To improve accuracy and speed, the backbone network is replaced by Mo-bileNetv3 with the ECA+ module, and CBAM is integrated into the decoder [13].

2. Instrument Detection and Recognition Method

2.1. Instrument Detection Based on YOLOv8

YOLOv8 is a state-of-the-art real-time object detection and image segmentation model. It is a major update from YOLOv5, offering significant improvements in speed and accuracy [14]. The network structure of YOLOv8 can be divided into three main parts: the backbone network, the neck, and the head [15]. The backbone network, an improved version of CSPDarknet53, replaces the C3 module with a C2f module, further reducing network parameters. This section extracts meaningful features from images to support subsequent classification and localization tasks [16]. The neck uses the SPPF and dual-path FPN structure to enhance the diversity and representation of features, which helps in fusing features at different scales [17]. This multi-scale feature fusion enhances the network’s robustness, allowing it to perform well on targets of varying sizes, shapes, and orientations [18]. The head network employs a Decoupled-Head structure that separately extracts target position and category information through different branches, reducing parameters and computational complexity while improving the model’s generalization ability and robustness [19]. The proposed methodology is designed to be versatile and can be applied to a broad range of pointer-based instruments. While this study uses pressure gauges as a case study, the approach can be adapted to other types of pointer instruments, such as thermometers, voltmeters, and flow meters, with the addition of appropriate training data. This adaptability makes the method highly scalable for various industrial applications that require the recognition of analog gauge readings.

2.2. Pointer Segmentation Based on DeepLabv3+

DeepLabv3+ is an image semantic segmentation algorithm proposed by Google in 2018 [20]. It features an encoder-decoder structure, which improves upon DeepLabv3. The encoder consists of two components: the backbone network (Xception) and the Atrous Spatial Pyramid Pooling (ASPP) module [21]. The backbone network downsamples the input image to generate deep feature maps, while the ASPP module captures multi-scale information of deep features [22]. The ASPP module includes one 1 × 1 convolution, three dilated convolutions with different dilation rates, and a global average pooling operation [23]. These features are concatenated to fully fuse multi-scale information. The decoder optimizes feature fusion and up-sampling to achieve more precise segmentation results [24].

2.3. Calculation of Readings

2.3.1. Pointer Refinement

The ZS fast parallel thinning algorithm is a parallel thinning algorithm based on 8-connected neighborhoods [25]. Though efficient, it has some limitations, such as pixel redundancy or branching due to small changes in image boundaries during the thinning process. To better fit the pointer slope, an improved ZS refinement method is adopted [26]. The basic idea is to use the ZS fast parallel thinning algorithm to obtain the initial thinned image, then search line by line and column by column for the first non-zero pixel in the 8-connected domain. This point is called the “first thin point”. The search continues for neighboring non-zero pixels in a specific order (diagonal first, then cross), determining whether to retain the search point based on the direction of the next pixel.

2.3.2. Least Squares Method

Using the least squares method, a line is fitted to the points formed by the completely refined pointer image to obtain the slope and intercept of the line [27]. Let the refined pointer have mmm pixel points ( x i, y i ) where i=1,2,,m . The equation of the line is:

y=kx+b (1)

where k is the slope and b are the intercept. According to the principle of least squares regression, the formulas for calculating the slope k and intercept b are given by:

k= xy ¯   x ¯    y ¯ x 2 ¯ y 2 ¯ (2)

b= y ¯ k x ¯ (3)

where x ¯   is the mean of the row coordinates of the m pixel points, y ¯   is the mean of the column coordinates, and x 2 ¯   is the mean of the squared row coordinates. The formulas for calculating x ¯ , y ¯ , and x 2 ¯   are:

x ¯ = i=1 m x i   y ¯ = i=1 m y i    x 2 ¯  =  i=1 m x i 2 (4)

2.3.3. Angle Calculation for Readings

Firstly, based on the circular nature of the pointer instrument, the image’s center point is taken as the axis of the pointer. The image has been divided into four quadrants, with the vertical axis functioning as the y-axis and the horizontal axis as the x-axis. Each refined pointer pixel’s Euclidean distance to the origin is determined, and the area of the pixel with the greatest distance becomes known as the pointer’s direction. The angle between the pointer and the right-hand horizontal axis is calculated using the pointer’s slope and direction. Combined with prior information (such as the angles of the zero and full-scale markings), the final reading is computed based on the pointer’s position and angle relative to the zero scale.

3. Improved DeepLabv3+ Network

3.1. MobileNetv3 Backbone Network with ECA+ Module

DeepLabv3+ originally uses the Exception network as its backbone, which is complex and has many parameters. To reduce model parameters and improve recognition speed, this paper uses the lightweight MobileNetv3 network as the backbone. Additionally, the ECA+ module was designed to replace the SE channel attention mechanism. The structure of the ECA+ module is shown in Figure 1.

The ECA module uses fast 1D convolution to generate channel weights, efficiently extracting inter-channel dependencies. This increases computational efficiency and avoids the negative impact of channel compression. However, the ECA module only uses average pooling to capture spatial information, neglecting the features that max-pooling can capture. To capture richer feature information, the ECA+ module adds a max-pooling branch. The weights from both branches are summed and then passed through a Sigmoid activation function to map the channel weights to a range between 0 and 1. These weights are then multiplied by the original image to produce the final result.

Figure 1. ECA + Module structure.

3.2. Decoder Fusion Pyramid Based on CBAM Module

The original DeepLabv3+ decoder uses 4× upsampling, which may result in the loss of some important pixel features. Although feature maps from 1/4 of the encoder are fused with the decoder, noise may also be introduced into the decoding block, reducing precision. To address this, the 4× up sampling operation was replaced with two 2× up samplings, which allows the restored boundary information to better match the original image.

3.3. Decoder Fusion Pyramid Based on CBAM Module (Continued)

To enhance shallow feature information, we draw inspiration from the feature pyramid structure. The feature map at 1/2 resolution in the decoder is fused with the corresponding encoder feature map, and the CBAM (Convolutional Block Attention Module) is introduced to enrich the feature information while effectively suppressing noise. The structure of the improved DeepLabv3+ model is shown in Figure 2.

CBAM consists of two modules: a channel attention module and a spatial attention module. The channel attention module uses global average pooling and global max pooling to obtain global statistical information for each channel. Two fully connected layers are used, and then the channel weights are learned. The results of the two operations are added together and passed through a Sigmoid activation function to generate the final channel attention. The spatial attention module uses max pooling and average pooling to compute the maximum and average values for each spatial position. After obtaining two matrices, they are concatenated and passed through a convolutional layer and Sigmoid function to learn the weight of each spatial position. This is then multiplied by the input feature map to produce the output. The CBAM module improves the model’s representational power while only adding a small amount of computational load and parameters.

Figure 2. Improved DeepLabv3+ Model structure.

4. Experiments and Analysis

The server used for the experiments was Jetson Nano 4 GB, running on the Ubuntu 18.04 operating system. All experiments were implemented using Python and the PyTorch deep learning library.

4.1. Instrument Detection Experiment

4.1.1. Training Dataset for Instrument Detection

To train the network, a custom dataset of pointer-style instrument images was built based on real nuclear power plant scenes. Most of the images were taken on-site or extracted from stored video footage from monitoring cameras, with a small portion sourced from online images. The devices were captured at an angle of 90˚ ± 15˚ to the front face of the pointer-style instrument panel. All collected images were annotated one by one using the Labeling tool to generate XML files, which were then converted to YOLO-compatible TXT files. The main instrument used was a pressure gauge, with a total of 2,000 images. The dataset was randomly split into training (80%), validation (10%), and test sets (10%). The model was trained for 100 iterations using weights pre-trained on the COCO dataset. While this dataset offers valuable insights into the model’s performance in recognizing pressure gauges, it limits the generalizability of the results to other types of pointer instruments. Future work will further expand the dataset to include additional pointer-based instruments, such as thermometers, voltmeters, and flow meters, to evaluate the method’s robustness across various domains.

4.1.2. Analysis of Detection Results

This study uses false detection rate, missed detection rate, and detection time as evaluation metrics. Comparisons were made between Faster R-CNN, YOLOv5s, and YOLOv8n models. The false detection rate refers to the ratio of incorrectly detected objects to all predicted results, while the missed detection rate refers to the ratio of undetected targets to all real targets. The instrument detection results are shown in Table 1.

Table 1. The instrument detection results.

Algorithm

False detection rate

(%)

Missed detection

rate (%)

Detection time

(s)

Faster-RCNN

1.4

2.6

0.30

YOLOv5s

0

1.1

0.15

Vision Transformers (ViT)

0.1

1.0

0.25

YOLOv8n

0

0.80

0.17

YOLOv8n outperforms Faster R-CNN overall, achieving superior accuracy and speed. While YOLOv8n is only 0.02 seconds slower than YOLOv5s, it still meets real-time detection requirements. In terms of accuracy, YOLOv8n has a lower missed detection rate than YOLOv5s, with both models achieving zero false detections. Vision Transformers (ViTs) offer strong performance with low false detection rates and acceptable missed detection rates, but their detection time is slightly slower than YOLO models, potentially affecting real-time applications. YOLOv5s is fast and has zero false detections, but it has a slightly higher missed detection rate than YOLOv8n and ViTs. On the other hand, Faster R-CNN exhibits slower detection times and higher missed detection rates, making it less suitable for real-time applications. Most missed detections in all models were attributed to overexposure, excessive instrument tilt, or large obstructions, and adjusting camera positions and angles could help mitigate these issues in practical applications.

4.2. Pointer Recognition Experiment

4.2.1. Training Dataset for Pointer Recognition

The dataset used in this experiment was based on the custom pointer-style instrument image dataset. After detecting and segmenting the instruments from the dataset, new images were collected and saved for pointer recognition. A total of 1,000 images were annotated using the Labeling tool. All instrument panels were circular. The dataset was divided into training (80%), validation (10%), and test sets (10%). Network weight training was conducted on the training set.

4.2.2. Analysis of Pointer Recognition

Results The experiment evaluation metrics were mean Intersection over Union (mIoU) and detection time. mIoU represents the average intersection-over-union ratio between the predicted and actual segmented regions, which evaluates the model’s segmentation accuracy. Table 2 compares evaluation metrics before and after the model improvement is shown in Table 2.

It can be seen that with MobileNetv3 as the backbone network, only the encoder is improved, and the SE module is replaced by the ECA+ module, the mIoU is improved by 3.96%, and the detection time is reduced by 0.09 s; compared with the ECA module, the mIoU is improved by 1.95%, and the detection time is only increased by 0.02 s. After only the decoder is improved, the mIoU is improved by 2.81%, and the detection time is only increased by 0.04 s. When the encoder and decoder are improved at the same time, the mIoU is improved by 5.44%, and the detection time is reduced by 0.07 s. Compared with the original DeepLabv3+ model, the overall mIoU is increased by 3.97%, and the detection speed is increased by 0.14 s, showing a good segmentation effect and faster detection speed. The detection results of pointer segmentation before and after the improved DeepLabv3+ model are shown in Figure 3. As can be seen from the figure, the improved DeepLabv3+ network model proposed in this paper can better identify the information of the pointer, while suppressing the influence of the scale lines, scale values, and irrelevant text near the pointer, and can better complete the pointer recognition task, reduce the sub-sequent pointer refinement, straight line fitting and indication calculation to provide more accurate results, thereby reducing reading errors shown in Figure 3.

Table 2. Comparison of model Segmentation evaluation indicators.

Backbone network

ECA/ECA+

Feature fusion + CBAM

mIoU (%)

Detection time (s)

exception

-

-

82.03

0.32

MobileNetV3

-

-

80.56

0.25

MobileNetV3

ECA

-

82.93

0.14

MobileNetV3

ECA+

-

84.52

0.16

MobileNetV3

-

82.37

0.29

MobileNetV3

ECA

84.97

0.16

MobileNetV3

ECA+

86.01

0.19

(a) (b)

(c) (d)

Figure 3. Pointer segmentation detection results. (a) Input image; (b) true label; (c) Original deeplav3+ model segmented binary image; (d) improved deeplab3+ model segmented image.

4.3. Reading Recognition Error Experiment

This study uses the accuracy grade of measuring and display instruments for industrial process measurement and control to evaluate reading recognition accuracy. Four levels are selected: 0.5, 1.0, 1.5, and 2.5. The error e is calculated using the formula:

e=  v'v L (5)

Where v' is the predicted reading; v is the actual reading; L is the range of the pointer instrument. To prove the effectiveness of the method proposed in this paper, the reasoning test is carried out on 100 test sets, compared with classic image segmentation methods such as template matching, U2-Net, SegNet, etc. The reading error ratio results are shown in Table 3.

Table 3. Reading error ratio results.

Error (%)

Project

0.5 accuracy

1.0 accuracy

1.5 accuracy

2.5 accuracy

Average error

Template matching

8

19

47

65

5.312

U2-Net

14

37

62

87

1.973

SegNet

16

38

60

83

3.622

Methods of article

21

47

74

94.5

1.522

It can be seen that the recognition accuracy of the proposed method is significantly improved compared with other methods at different accuracy levels. At an accuracy level of 0.5, it is 7% higher than U2-Net. The higher the accuracy level, the more obvious the improvement. The proposed method can reach 94.5% at an accuracy level of 2.5, and the average reference error is 1.522%. The accuracy is high and can meet the requirements of actual application for instrument readings. The average total time to read the instrument data is 0.56 s, and the detection time meets the real-time requirements.

5. Conclusions

This paper explores a method for recognizing pointer meter readings in the nuclear power industry. To address the problems of low accuracy and slow reading speeds of pointer meters, a recognition method based on YOLOv8 and DeepLabv3+ was proposed. Experiments were conducted on a custom dataset of pointer-style instruments from nuclear power plants, leading to the following conclusions:

1) YOLOv8n Detector for Instrument Detection: The YOLOv8n detector effectively detects and locates the instrument region, achieving a false detection rate of 0.8% with no missed detections. The detection time is 0.17 seconds, demonstrating strong real-time performance.

2) Improved DeepLabv3+ Model for Pointer Segmentation: The improved DeepLabv3+ model enhances the segmentation of pointer positions. With mIoU reaching 86.01% and a detection time of 0.19 seconds, it significantly improves both accuracy and speed.

3) Accurate Pointer Reading: The pointer is refined using ZS thinning, and a straight line is fitted using the least squares method to determine the pointer’s direction. Combined with prior information, the final reading is calculated using the angle method. At an instrument precision level of 2.5, the proposed method achieves a recognition accuracy of 94.5%, with an average error of 1.522% and an average total processing time of 0.56 seconds, which meets the real-time application standards of nuclear power plants. Future research could explore adaptive algorithms for managing various environmental conditions or extend to multi-instrument detection. This approach would inform subsequent studies and encourage further exploration within the field.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Smith, J. (2021) Advantages of Pointer Instruments in Nuclear Power. Nuclear Technology, 150, 300-310.
[2] Johnson, A. (2020) Challenges in Meter Reading in Extreme Environments. Journal of Nuclear Science and Technology, 58, 450-460
[3] Lee, M. (2019) IoT Limitations in Nuclear Power Plants. IEEE Internet of Things Journal, 6, 2341-2350.
[4] Chen, K. (2022) A Review of Video-Based Meter Reading Techniques. EURASIP Journal on Image and Video Processing, 22, 101-110.
[5] Brown, R. and White, T. (2017) Image Processing Techniques for Pointer Recognition. Computer Vision and Image Understanding, 120, 75-90.
[6] Garcia, L. (2018) Issues in Pointer Detection: Lighting and Background. Optics Express, 25, 14567-14580.
[7] Wang, Y. (2019) Deep Learning for Image Recognition: A Review. The IEEE Transactions on Neural Networks and Learning Systems, 30, 2422-2435.
[8] Patel, S. (2019) Comparative Analysis of Faster R-CNN and Mask R-CNN. Computer Vision and Image Understanding, 175, 45-55.
[9] Kim, H. (2021) Real-Time Object Detection with YOLOv5. Journal of Artificial Intelligence Research, 70, 100-120.
[10] Adams, D. (2019) Dataset Limitations in Industrial Instrument Recognition. Pattern Recognition Letters, 124, 45-51.
[11] Green, F. (2020) Balancing Speed and Accuracy in Automated Meter Reading. IEEE Transactions on Automation Science and Engineering, 17, 55-65.
[12] Zhang, N. (2022) YOLOv8 and DeepLabv3+ for Nuclear Instrument Recognition. Journal of Machine Learning Research, 23, 1-15.
[13] Betha, S.K. and Kalyani, N.S. (2020) YOLOv5: You Only Look Once.
[14] Talib, M., Ahmed, H.Y. and Suad, J. (2023) YOLOv8: A New Era of Real-Time Object Detection.
[15] Wang, K. and Yang, L. (2020) Feature Extraction Techniques for Object Detection. IEEE Transactions on Image Processing, 29, 123-135.
[16] Liu, R. (2018) Feature Pyramid Networks for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1977-1989.
[17] Chen, H., Wang, L., Liu, Y., Zhang, X., Li, J. and Zhang, H. (2020) Multi-Scale Feature Fusion for Object Detection. Pattern Recognition, 99, 107-111.
[18] Bochkovskiy, A., Wang, C.Y. and Liao, H.Y. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection.
[19] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848.[CrossRef] [PubMed]
[20] Huang, X. (2018) DeepLabv3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
[21] Roy, A.G., Noh, H. and Myeong, S.Y. (2017) Xception: Deep Learning with Depthwise Separable Convolutions.
[22] Yang, Y. (2017) Atrous Spatial Pyramid Pooling for Semantic Image Segmentation. IEEE Transactions on Image Processing, 26, 5268-5280.
[23] Li, J. (2021) Real-Time Semantic Segmentation with Deep Learning: A Review. Artificial Intelligence Review, 54, 883-899.
[24] Zhang, T. (2016) Parallel Thinning Algorithm Based on 8-Connected Neighborhood. Journal of Visual Communication and Image Representation, 36, 116-126.
[25] Kim, D. (2017) Improving Thinning Algorithms for Image Processing. Computer Vision and Image Understanding, 155, 32-44.
[26] Wang, J. (2022) Refinement Techniques for Image Segmentation. Journal of Signal Processing Systems, 94, 45-53.
[27] Bishop, C.M. (2006) Pattern Recognition and Machine Learning. Springer.

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.