Defect Detection in Manufacturing: An Integrated Deep Learning Approach ()
1. Introduction
A key parameter for superior consideration in manufacturing industry is operational efficiency. Different errors are certain to happen during production cycle. Engineering functions are highly repetitive in nature and thus higher chances of errors, mistake and defects [1]. Defects detection is a quality control procedure in production settings which allow reduction of waste and enhancement of quality of manufactured products. Various traditional means to detect defects in pre-industrial 4.0 era revolve around manual operations which have elongated cycle time and are labour intensive (Table 1).
Table 1. Various traditional approaches in defect detection.
Traditional Methods |
Description |
Weakness |
Manual Measurement |
Making use of measuring tools to investigate defects due to shape or size Tools like gauges, callipers and micrometres |
Subjective High human error Limited precision andaccuracy [2] [3]. |
Physical Inspection |
Trained inspection visually checked defects |
Subjective High human error Limited precision andaccuracy [4] [5]. |
Testing by Ultrasonic |
High frequency sound waves to detect defects |
Directional sensitivity Inaccurate interpretation by operation [6] [7]. |
As noted by [8], the traditional methods of detecting defects are subjective in nature due to over reliance on human judgement. These contribute to limitation in desired outcomes and the need for automation. While [9], asserted the shift in paradigm from manual to automation in industrial 4.0 age, [10] supported this view with highlight of various automated sensors that improve quality level in production cycle. With advanced and sophisticated technologies involving parallel computing, big data, machine learning, AI and others, most production processes are handled smartly to detect defects [11]. Machine learning is an approach where patterns are identified automatically from data which are used to make predictions [12]. It is the act of creating computer systems that learn from specific dataset and automatically produce models which deliver models. Generally, machine learning performs classification, clustering, selection of features, regression, ranking and prediction. Feature selection is the aspect of machine learning that deals with selecting highly relevant (non-redundant) variables and removing irrelevant (less weighted) ones [13]. With feature selection, model prediction accuracy is greatly improved while time for processing and cost of computation is reduced. Many machine learning algorithms have been used in detection of defect in production in the era of industry 4.0. Defect is an important phenomenon considering quality control in production [14]. Defect detection studies using machine learning often require labelled datasets to train and evaluate models. They are essential for training, evaluating, and fine-tuning machine learning models. The quality, diversity, realism, and size of the dataset directly impact the model’s performance, generalization, and real-world applicability [15]-[17]. For defect detection, various classes of dataset have been used like image dataset, video dataset, sensor datasets, point cloud dataset, text or document datasets and composite datasets. Image datasets are commonly used and can be highly effective for defect detection studies using machine learning. There are several reasons why image datasets are often preferred: rich visual information, availability and ease of collection. The use of image dataset also provides improved synergy with deep learning and wider opportunities for transfer learning. The collaboration between deep learning and image datasets has revolutionized defect detection by enabling accurate and automated analysis of visual data. The ability of deep learning models to learn hierarchical features, train end-to-end, leverage transfer learning, and benefit from data augmentation has significantly advanced the field of defect detection in a wide range of industries and applications [18]-[20]. The aim of this research is to develop a very robust deep learning system that can detect defects with highest possible accuracy through integration of classification and segmentation techniques. The motivation of choosing deep learning for defect detection aligns well with the core principles of Industry 4.0. Deep learning techniques possess high computational efficiencies, and they have potentials to work in automated mode. While it is evident that traditional defect detection methods (Table 1) like manual inspection and ultrasonic testing are limited by human error, subjective interpretation, and inaccurate precision [21]. Manual methods also involve significant labour costs and extended turnaround times. This greatly reduces overall efficiency in modern industrial setup [8]. In contrast, deep learning models automatically learn and extract relevant features from large volumes of data while delivering high accuracy and reliability with defect detections [22]. These models can process images or sensor data faster and more consistently compare to traditional techniques while improving production quality [23]. Importantly, the ability of deep learning models to scale with increasing data availability is critical in Industry 4.0, where large-scale IoT sensor networks continuously generate data [24].
2. Literature Review
Noted by [25], in any standardized production setting, quality control and management must be optimized. Traditional means of handling quality of the processes and end-products are manual-based and highly subjective. Sophistication in entire engineering set-up demand for modern solutions to monitor various aspects of production and the products. With emergence of industry 4.0, manufacturing processes have advanced with strong support from computing technologies which lead to improved all-round efficiency. In this context, machine learning has been the new magic wand to detect anomalies and optimized processes [26]. Machine learning approaches possess stronger objectivity with lesser human interaction. The machine learning approach is modern and rely majorly on existence of suitable data which will be trained sufficiently to identify defect when apply to new data. Several studies have been conducted in this domain. The summary of the works is presented in Table 2.
Table 2. Summary of existing works.
Author |
Summary of the Work |
Results |
Wu and Zhou (2021) [27] |
Classified and detected defective components from industrial images using CNN and compared results with SVM, KNN, BPN, and MLP. |
CNN: Accuracy 91.4%, Recall 84.9%, F1 Score 88.0%. MLP: Accuracy 85.5%. BPN: Accuracy 84.7%. KNN: Accuracy 79.6%.
SVM: Accuracy 76.3%. |
Westphal and Seitz (2021)
[25] |
Used VGG16 and Xception CNN to detect defects in Selective Laser Sintering (SLS) product powder bed. |
VGG16: Accuracy 95.8%, Precision 93.9%, Recall 98.0%, F1 Score 95.9%, ROC-AUC 0.982. Xception: Accuracy 89.4%, F1 Score 89.7%, ROC-AUC 0.982. |
Yang et al. (2020) [28] |
Conducted a survey on defect detection and compared machine learning techniques across various products. |
Liu et al. (2017) CNN: Accuracy 94.68%. Kumar et al. (2018) CNN: Accuracy 86.2%. He et al. (2019) [23] Fully CNN: Accuracy 99.14%. Lv and Song (2019): 97.25%. |
Rameshrao and Bhelkar (2022) [29] |
Studied defects in manufacturing using CNN, RCNN, Fast RCNN,
and Faster RCNN models. |
Faster RCNN: Accuracy 99.90%, Fast RCNN: Accuracy 98.70%, RCNN: Accuracy 98.70%, CNN: Accuracy 98.50%. Prediction Time: Faster RCNN: 0.2 s, RCNN: 40 s. |
Khalfaoui et al. (2022) [10] |
Detected defects using sensor data in automotive production using ML models like LR,
GNB, DT, LDA, RF, and DNN. |
DNN: Accuracy 74%, RF: 64%, LDA: 57%, DT: 56%, GNB: 63%,
LR: 63%. |
Wang, Wu and Wu (2020) [30] |
Detected defects in vehicle
parts using VGG16 compared to HOG + SVM. |
VGG16: Accuracy 95.29%, HOG+SVM: Accuracy 93.88%. |
The reviews show that deep learning possess enhanced capabilities with defect detection compared to traditional machine learning techniques. Conversely, most research focused solely on classification methods in addressing defect detections. This research will extend the knowledge around defect detection by combining classification and segmentation approach to build a robust system. Also, aside the general related works on defect detection, literatures with Severstal dataset was reviewed. These research in Section (2.1) are used as benchmarked for the current research.
Related Works: Benchmarked
Abu et al. 2020 [25] used SEVERSTAL dataset to predict surface defects in steel production using four deep learning techniques VGG16, MobileNet, Densenet121 and Resnet101. The dataset was pre-processed by using OpenCV to rescale images to 256 × 480 pixels while 50 epochs and 32 batch sizes were used for their models. The accuracy of their models is shown Table 3.
Table 3. Deep learning results of Abu et al. 2020 investigation.
Model |
Accuracy |
VGG |
50.00% |
MobileNet |
79.91% |
Densenet121 |
70.34% |
Resnet101 |
70.50% |
The results showed that MobileNet yielded highest accuracy compared to others.
Akhyar et al. 2023 [22] investigated surface defects in steel manufacturing using SEVERSTAL steel dataset, NEU dataset and DACM dataset. They proposed a deep learning approach termed forceful steel defect detector FDD which is rooted in R-CNN with deformable convolution and deformable ROI pooling which integrate with the geometric shape of defects. Their models accuracy were measured in terms of average recall (AR) and mean average precision (mAP) and comparison were made (Table 4).
Table 4. Deep learning results of Akhyar et al. 2023 [22] investigation.
Model |
Author/Year |
Model
Backbone |
AR |
mAP |
YOLO v4 |
Bochkovski, Wang and Liao 2020 [31] |
CSPDarknet |
0.904 |
0.608 |
YOLOv5 |
GitHub 2021 [32] |
CSPDarknet |
0.891 |
0.601 |
YOLOX |
Ge et al. 2021 [33] |
CSPDarknet |
0.863 |
0.652 |
Cascade R-CNN |
Qiao, Chen and Yuille 2020 [34] |
Resnet 50 |
0.855 |
0.675 |
FDD (proposed) |
|
Resnet 50 |
0.969 |
0.783 |
3. Methodology
The implementation of predictive system that detect defect is broken down into six segments as represented in the workflow in Figure 1. The six segments include data preparation, multi-label classification, segmentation and detection, thresholding and post-processing, output processing and evaluation of test set.
3.1. Data Preparation
In ML programming domain, dataset is necessary to build models that make predictions. The process involves inputting relevant dataset into algorithms that learn from the peculiar patters embedded in the data to make informed predictions [24]. Clearly, the development of effective models starts with data preparation [35] [36]. When relevant dataset for targeted investigation is acquired, the data is prepared through cleaning, transformation and making its ready for computational analysis. The manufacturing dataset acquired in this research was published by [37]. It was sourced from Severstal, a leading steel industry company located in Russia. Severstal created very large industrial data lake from production of flat sheet steel in order to monitor defects (Figure 2).
Figure 1. System workflow.
Figure 2. Sampe flat sheet steel from severstal dataset.
The dataset is in three segments: train images (count = 12,568), test images (count = 5506) and train CSV (7095,3). The 12,568 images exist in four classes with various kinds of defects. Images with common defect are present in class 3 (Figure 3).
The data preparation step involves resizing the images to match the fixed dimension of the input shape, creating arrays to hold the images and their corresponding class identifiers, and transforming the categorical labels into integers using LabelEncoder (Figure 4). The images’ pixel intensities were also normalized to be within the range of 0 - 1. To handle overfitting and enable the model to learn from a more generalized set of features, data augmentation techniques were applied, including shifting, rotation, zooming, and flipping. Transforming the provided run-length encoded masks into binary masks for the segmentation task.
Figure 3. Defect type count plot.
Figure 4. Snippet of Python code for pre-processing.
3.2. Multi-Label Classification
Machine learning systems are versatile tools capable of handling a variety of tasks which includes classification, regression, and segmentation [38] [39]. Classification tasks are common, and they involve categorization of data into predefined classes based on input features [40] [41]. When two label exist in the dataset, the classification is binary. When label is more than two, classification becomes multi-label. The Severstal dataset contains a target variable with more than two labels (defects). The dataset has four types of defects: defect 1, defect 2, defect 3 and defect 4. A multi-label classifier is employed in the ML system set-up of the current research. This enables prediction of different defect class in the dataset. The classifier is trained on the training set and evaluated using the validation set. The classifier’s input includes the generated training images (256 × 512) and corresponding labelled masks. As represented in Figure 1, for each prediction, a classifier probability score is calculated. This reveals the model’s confidence in the presence of each defect that is detected.
3.3. Segmentation and Detection
Segmentation in machine learning is another important task and it is common with image processing, object identification and biomedical applications [42]-[44]. The main target of segmentation in all these applications is streamline the outlook of image being investigated into item that is more coherent for visual sighting [44] [45]. In this research, segmentation has been integrated with classification to further streamline the image of defect that is detected by classification. The multi-label classifier in Section (3.2) identifies types of defects that exist in a particular sheet material while segmentation is implemented to understand the part of the sheet where defect is detected. The multi-label classifier identifies different types of defects (Defect 1, Defect 2, Defect 3, Defect 4) that exist in an image. For each defect, predictions are produced that help in understanding which part of the image contains each defect.
3.4. Thresholding and Post-Processing
Thresholding is important in situation where classification is integrated with segmentation. This enables proper refinement of predictions made by the classification algorithm. Basically, the purpose of thresholding is to establish a cutoff point for classifier outputs to decide which predictions should be considered valid detections of defects. With adoption of thresholds between the 2nd and 98th percentiles, the thresholding method ensures that predictions with low confidence (potential false positives or false negatives) are discarded. As affirms by [46] and [47], thresholding increases the reliability of model’s performance. The remasking process that follows thresholding is specified to further improve the accuracy of the segmentation. Remasking involves reintroducing the predicted defects and comparing them against the original pixel mask (ground truth). This process facilitates refinement of the segmented regions through elimination of possible noise while validating the areas where defects are predicted [48].
3.5. Output Processing
The re-masked images are resized back to the original resolution (256 × 1600) before final output. The final outputs consist of a CSV file containing: ImageId (the unique identifier for each image) ClassId (the defect class identified Defect 1, Defect 2, Defect 3 and Defect 4) and the Encoded Pixels (the encoded pixel values representing the locations of the defects)
3.6. Evaluation on Test Set
Evaluation of machine learning models directly involves assessment of model performance [49]. The evaluation reflects the applicability and efficiency of the models across different domains. Evaluation is important to determine the suitability of models implemented for specific investigation while ensuring their effectiveness in real-world applications [50]. There are many metrics used in machine learning domain to measure model performances [51]. Based on the integration of classification and segmentation, the applicable metrics includes accuracy, precision, F1-score, recall, dice coefficient and dice loss. Accuracy measures the proportion of correct predictions made by a model out of all predictions [52]. Precision focuses on the ability of the model to correctly identify only the relevant positive instances [53]. It is computed as the ratio of true positives to the sum of true positives and false positives. Recall, also known as sensitivity, evaluates the model’s ability to detect all possible positive cases by dividing true positives by the sum of true positives and false negatives. The F1-score is the harmonic mean of precision and recall. It yields a balanced metric when trade-off exist between precision and recall. The Dice coefficient is one of the metrics for similarity [54] [55]. It measures the overlap between predicted and actual segmentations, with values ranging from 0 (no overlap) to 1 (perfect overlap). Conversely, dice loss is a loss function derived from the Dice coefficient. It is used to train models by penalizing the model for poor overlap between predicted and actual regions.
4. Model Selection and Implementation
4.1. Initial Experimentation: Classification
Prior testing before final model selection was adopted to yield improved model performance in the overall system. Initial experimentation was carried out using four models for classification: EfficientNetB1, ResNet50, DenseNet121 and VGG16. EfficientNetB1 is part of a family of models that uses compound scaling method, balancing network depth, width, and resolution to achieve high accuracy with fewer parameters [56]. The EfficientNet architecture is designed to optimize both accuracy and efficiency. This makes it suitable for large-scale image classification tasks [57]. Just like EffcientNetB1, ResNet50 is a member of family of ResNet. According to its name, ResNet50 maintain depth with 50 layers. ResNet50 is deep CNN architecture that function with consideration of residual connections [58]. The residual connections resolve the issue of vanishing gradients to boost training of deeper networks. ResNet50 has strength to capture detailed information in images with improved adaptability for item recognition across the images [59]. DesNet121 is a CNN architecture that work with the reuse of dense connections and features [60]. Fundamentally, in DenseNet, every layer is linked to every other layer in feed-forward settings. While in traditional deep learning setup, output layer usually acts as the only input to subsequent layers in DenseNet configuration, dense connections are introduced every layer to get feature, and they allow maps of layers before it [61] [62]. This special arrangement control growth of network and computational complexity of the algorithms which support improved performance [63]. VGG is also a CNN model developed by Visual Geometry Group (VGG) at the University of Oxford. The VGG16 architecture is the one with 16 layers with learnable weights [64] [65]. This architecture reduces the spatial dimensions of the input image while increasing the depth of the feature maps [66]. VGG16 produces better classification due to its ability to focus on targeted image area [67].
As justified by many evidence of their strengths with handling of image classification, EfficientNetB1, ResNet50, DenseNet121 and VGG16 were implemented.
The experiment showed DenseNet121 performing better than the rest of the deep learning (Table 5). Thus, denseNet121 was selected for classification in the final model as specified in the project workflow. DenseNet-121 has shown in various computer vision tasks, such as image classification, object detection, and image segmentation. The architecture of DenseNet-121 allows for better gradient flow throughout the network, reducing the vanishing gradient problem. Also, its dense connections promote feature reuse, allowing the network to learn more compact and efficient representations. This makes it suitable for the task of defect detection in steel plate images. Specifically, the DenseNet-121 algorithm was used for binary and multi-class classification in main analysis.
Table 5. Result of initial experimentation (Classification).
|
EfficientNetB1 |
ResNet50 |
DenseNet121 |
VGG16 |
Accuracy |
0.9128 |
0.9201 |
0.9234 |
0.7259 |
4.2. Initial Experimentation: Segmentation
Similarly, U-Net and DeepLabV3 were tested for segmentation. The architecture of U-Net is characterized by its U-shaped structure. The “U” shape consists of a contracting path and an expansive path that allow the model to capture both context and precise localization of images during segmentation [68]. U-Net is adaptable, efficient and high performing [69] [70]. DeepLab is combined neural networks specifically designed for semantic image segmentation [71] [72]. Implementing these two models achieved results in Table 6.
Table 6. Result of initial experimentation (Segmentation).
|
U-Net |
DeepLab |
Dice Coefficient |
0.7220 |
0.8518 |
4.3. Main Modelling: Model Training
Based on the outcomes of initial experimentation, DenseNet121 and DeepLab is selected to build the classification and segmentation models in the system workflow. The input of DenseNet121 is configured with shape of 100 by 100 pixels with 3 colour channels (RGB) and set up with five layers (Figure 5). The whole layers are divided into DenseBlocks where average pooling layer average each of the feature map and reduce spatial dimensions. Each of the fully-connected (dense) layers is followed by ReLu activation function. Batch normalization is applied to normalize the activations to stabilize the model and improve speed.
Figure 5. Python code to execute DenseNet121 (Google Colab).
A drop-out of 0.3 was applied after the first two dense layers thus during model training, 30% of the units are randomly fix to zero in order to stop overfitting. Sigmoid activation function which is perfect for binary classification and is added at final dense layer. The DenseNet-121 model was trained on the pre-processed dataset. The training process involved backpropagation and gradient descent to update the model’s parameters. The model was trained to learn to identify and categorize the defects present in the images of steel plates.
The DeepLabV3+ model was setup after initializing the input size as (512, 512, 3) and the number of output classes as n_classes. The n_classes in the DeepLabV3+ model is applied to make the model flexible and adaptable to various defect classes in the dataset (Figure 6). This design makes the model generalizable. The encoder block was configured with a Conv2D layer with 32 filters, a kernel size of (3, 3), and stride 2, followed by a custom convolutional layer (Conv2D_custom) with 64 filters, a kernel size of (1, 3), and stride 1, and a BatchNormalization layer. Xception blocks are applied for different sizes like (128, 128, 128), (256, 256, 256), and (728, 728, 728), with strides depending on the layer. The SeparableConv2D layers include filters of 256, with dilation rates of 6, 12, and 18.
Figure 6. The Python code extract for DeepLabV3+ Set up (Google Colab).
4.4. Integration: Classifier Probability and Thresholding Masks
The predicted class of defects from classifiers are fed into segmentation through decision flow of 2nd/98th percentile rule. This allow human decisions as areas outside the range between the 2nd percentile and the 98th percentile in the predicted masks are neglected. With application of thresholding, model performance is improved (reliability and robustness) by focusing on the area that is more concern with the study predictions and discard other regions.
4.5. Model Evaluation
After training, the classification model’s performance was evaluated using various metrics, including precision, recall, F1-score, and accuracy (Figure 7). These metrics provided a quantitative measure of the model’s ability to accurately identify and categorize the defects in the steel plate images.
Figure 7. Result of DenseNet121 classification.
The segmentation models are trained using the Adam optimizer and the Dice loss function. The Dice coefficient is used as a metric for model evaluation. The models are trained for a specified number of epochs, with the best model saved based on the maximum Dice coefficient achieved on the validation set. Each trained model is then evaluated on the train, validation, and test sets. The evaluation scores (Dice loss and Dice coefficient) for each defect class are displayed (Figure 8). Additionally, the model’s predictions on a subset of the train, validation, and test datasets are visualized by displaying the original image, the ground truth mask, and the predicted mask side by side (Figures 9-12).
These models were trained and evaluated separately for each type of defect.
5. Result and Discussion
The DenseNet-121 model was trained and evaluated on the Severstal steel defect detection task. The model demonstrated promising results, successfully identifying and categorizing defects in the steel plate images. The performance of the
Figure 8. Performance for segmentation models for Defect 1, 2, 3 and 4.
Figure 9. Original image, the ground truth mask, and the predicted mask (Defect 1).
Figure 10. Original image, the ground truth mask, and the predicted mask (Defect 2).
Figure 11. Original image, the ground truth mask, and the predicted mask (Defect 3).
Figure 12. Original image, the ground truth mask, and the predicted mask (Defect 4).
model was evaluated using several metrics, including precision, recall, F1-score, and accuracy. These metrics were calculated for each class of defects, providing a detailed view of the model’s performance across different types of defects. The precision, recall, and F1-score provided insights into the model’s ability to correctly identify the presence of a defect and correctly categorize it. The accuracy metric provided an overall view of the model’s performance across all classes. The results showed that the model was able to achieve a high level of accuracy, demonstrating its effectiveness in identifying and categorizing defects in steel plate images.
The model performs well across all three data sets (Table 7), with slightly lower accuracy on the validation and testing sets. The values of F1 are consistent with the accuracy metrics, with a slight drop in performance from training to testing. The models have high precision, especially in the validation set. The recall is slightly lower in validation and testing sets, indicating that the model might miss some positive cases. The DeepLab for segmentation demonstrated promising results (Table 8) in identifying, categorizing, and localizing defects in the steel plate images.
Table 7. Classification Results for Train, Validation and Test dataset.
|
Accuracy |
Binary Cross_Entropy |
F1_Score |
Precision |
Recall |
Training |
0.9135 |
0.2042 |
0.9152 |
0.9177 |
0.9231 |
Validation |
0.9032 |
0.2383 |
0.9002 |
0.9503 |
0.8643 |
Testing |
0.8990 |
0.2301 |
0.8910 |
0.9376 |
0.8591 |
Table 8. Segmentation Results for Train, Validation and Test dataset.
|
Dice_Coefficient |
Dice_loss |
Training |
0.8421 |
0.1696 |
Validation |
0.6977 |
0.3331 |
5.1. Dice Coefficient for Defect Class
The coefficients range from 64.81% to 87.69% across different datasets and defect classes, indicating varying performance for different defects (Table 9). Some insights can be drawn. Defect 4 consistently has the highest Dice Coefficients across all sets, suggesting that the model is best at detecting this defect. Defect 1 has the lowest Dice Coefficient in the testing set, hinting at potential challenges in detecting this specific defect.
Table 9. Dice Coefficient for each defect class.
|
Defect 1 |
Defect 2 |
Defect 3 |
Defect 4 |
Training |
0.7262 |
0.8596 |
0.7366 |
0.8112 |
Validation |
0.6721 |
0.8451 |
0.7160 |
0.7639 |
Testing |
0.6481 |
0.8769 |
0.7109 |
0.7877 |
5.2. Models Comparison
Comparing this model with other commonly used models in the field of image recognition, such as VGG16, ResNet, DenseNet-121 showed superior performance. The superior performance of DenseNet-121 can be attributed to its unique architecture, where each layer is connected to every other layer in a feed-forward fashion, allowing for better gradient flow and feature reuse. This architecture enables DenseNet-121 to learn more compact and efficient representations, making it more suitable for complex tasks like defect detection in steel plate images. Different variations of the DeepLab models. The use of DeepLab V3 provides the benefits of a powerful, efficient convolutional network that scales well with increasing amounts of data and computational resources. Each model was trained separately to detect a specific type of defect in the steel plates. The model’s performance was evaluated using the Dice coefficient, a popular metric for image segmentation tasks that measures the overlap between the predicted and actual results. High Dice coefficient value (Table 10) indicate a high degree of overlap and, thus, successful defect detection. The models demonstrated effective learning, with their performance improving over successive training epochs. The visualization of the model’s predictions further confirmed their ability to accurately detect and categorize defects.
Table 10. Comparing study models with benchmarked literature results (Accuracy and Dice Coefficient).
|
EfficientNetB1 (Acc) |
ResNet50 (Acc) |
DenseNet121 (Acc) |
VGG16 (Acc) |
DeepLab (Dice) |
Study Model |
0.9128 |
0.9201 |
0.9234 |
0.7259 |
0.8421 |
Amin & Akhter 2020 [73] |
|
|
|
|
0.5430 |
Abu et al. 2021
[25] |
|
0.7050(ResNet101-CPU) 0.7235(ResNet101-GPU) |
0.7034(CPU) 0.7027(GPU) |
0.5000 |
|
VGG16 showed lowest accuracy due to its smallest number of layers embedded in its configuration. Though observation indicates that larger layer sizes do not ensure improve accuracy value. The study models achieved higher accuracies compared to [25] outcomes as configurations were specifically tailored towards improved performance with optimization of models’ parameters (pooling, activation functions, batch normalization).
The study segmentation model performed better than benchmarked paper from literature review, Amin and Akhter 2020. DeepLab V3 delivered higher dice coefficients across classes of the defects while Amin and Akhter 2020 model exhibited imbalanced performance across defect classes and failed to predict defects for classes 1, 2 and 4.
The proposed methodology offers several improvements over existing benchmarked works in defect detection. Deep learning models like VGG16 and ResNet101 used in research by [25], yielded lower accuracies of 50% and 70.5%, respectively. In contrast, the proposed system utilizing DenseNet121 achieves a significantly higher accuracy of 92.34% (Table 11). This demonstrates its superior capability with learning and identification of defect patterns. The DenseNet121 architecture outperforms other deep learning techniques by leveraging dense connections between layers, utilization of feature reuse, and improvement of gradient flow. All these allow DenseNet121 to mitigate vanishing gradient problem better. Also, the overall system in this research integrates multi-label classification and segmentation, providing both identification and precise localization of defects. This combination surpasses models that focus solely on classification like those by [22] and [25] which they achieved accuracies of 79.91% with MobileNet and 96.9% with the FDD model. Their models lacked segmentation capabilities, and this limits the practical application when localization of defect is a top target. Based on the initial experimentation and final modelling, the use of DeepLab V3 for segmentation in the current research further improves performance. The Dice coefficient for segmentation ranges from 64.81% to 87.69% across different defect classes. This provides more robust and reliable detection compared to previous approaches in research by [73] where their model only achieved a Dice coefficient of 54.30%. DenseNet121 maintains a higher feature map resolution in its configuration while the is densely connected. This allows DenseNet121 to capture more detailed spatial information and produced higher precision of 92.31% compared to YOLO models that downsample the input image multiple times and achieved highest precision of 65.20% in the work of Akhyar et al. (2023) [22]. Generally, the proposed methodology adopted in this research excels by offering a more integrated approach with the combination of classification and segmentation to improve accuracy and precision that surpasses models in existing works.
Table 11. Comparing study models with benchmarked literature results (Precision).
|
YOLOv4 (Precision) |
YOLOv5 (Precision) |
YOLOX (Precision) |
Ca scadeR-CNN (Precision) |
FDD (Precision) |
DenseNet (Precision) |
Akhyar et al. 2023 [22] |
0.6080 |
0.6010 |
0.6520 |
0.6750 |
0.7830 |
|
Study Model |
|
|
|
|
|
0.9231 |
6. Conclusion and Future Work
The system adopted in this research addresses the issues of subjectivity and human error prevalent in traditional defect detection methods with the incorporation of automation through deep learning techniques. As detailed from background review, traditional defect detection methods like manual measurement and physical inspection are highly dependent on human judgment. This leads to inconsistencies particularly when well-hidden defects are targeted. Also, manual techniques are prone to operator fatigue, skill variability, and bias. These significantly reduce accuracy and repeatability. The system developed in this research utilizes combination of DenseNet121 for classification and DeepLabV3 for segmentation to automate the entire defect detection process. This deep learning system is trained on large datasets by Severstal dataset. This allows the system to learn and identify defects autonomously without the need for subjective human judgment. The automation offered by these models eliminates human-related errors and introduces improved level of consistency that is impossible with manual methods. DenseNet121, with its dense connections, allows for better feature reuse and gradient flow and produces higher accuracy with the identification and classification of defects. Similarly, segmentation using DeepLabV3 provides precise localization of defects and further improves accuracy with the visualization of the exact position of the defects within an image. This two-stage approach, integrating classification with segmentation, improves the system’s ability to accurately detect and localize defect while reducing false positives and negatives that are common in manual inspections. Also, the system applies thresholding techniques that discard low-confidence predictions and improve the reliability of defect identification.
Future Work:
1) Multimodal Learning: in a situation where more diverse datasets are available, future work should delve into multimodal learning such that models are trained to utilize information from different data sources (like images, sensors, and texts) simultaneously to create more robust and reliable defect detection ML system.
2) System with Feedback Loops: Implementation of systems such that outputs of the machine learning models are continuously inputted for retraining to ensure that the models will evolve and adapt to changing production dynamics and new types of defects.