Impact of Data Augmentation Technique on Automatic Brain Tumor Detection Using ResNet CNN ()
1. Introduction
Brain tumors pose a major challenge in neurology, with significant implications for public health. Globally, one in five individuals will develop cancer during their lifetime. Early detection and accurate characterization are essential for effective treatment and optimal clinical outcomes. However, interpreting conventional MRI images for brain tumor detection can be complex and requires substantial expertise, as it involves separating tumor tissues from normal tissues, including gray matter and cerebrospinal fluid.
In recent years, deep learning, a branch of artificial intelligence inspired by the functioning of the human brain, has emerged as a powerful tool for analyzing medical images. Precise segmentation of brain tumors is of great importance for medical diagnosis, surgical planning, and treatment planning [1]. Most body functions are managed by the brain, including analysis, integration, organization, decision-making, and issuing commands to the rest of the body. The human brain has an extremely complex anatomical structure [2].
Specifically, it is crucial to separate tumor tissues, such as necrosis, edema, enhanced core, and non-enhanced core, from normal brain tissues, including gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). However, accurate segmentation is extremely challenging, primarily for the following reasons:
The shape, location, appearance, and size of gliomas can vary significantly from one patient to another [3].
This raises the question: how can the use of image augmentation techniques improve the accuracy of automatic brain tumor detection?
2. Literature Review
A) Introduction
Automatic detection of brain tumors from MRI images is a rapidly growing field due to its potential to improve diagnostic accuracy and accelerate patient treatment. Advances in deep learning, particularly the use of convolutional neural networks (CNNs), have shown promising results in the analysis of medical images. This literature review aims to provide an overview of existing methods, identify their strengths and weaknesses, and highlight the current challenges in the field.
Traditional Tumor Detection Methods
Early brain tumor detection methods relied primarily on classical image analysis techniques, such as contour-based segmentation, k-means algorithms, and region-growing segmentation methods. While these methods can provide reasonable results in certain cases, they are generally limited by their inability to handle the variability in tumor shapes and intensities in MRI images.
Machine Learning-Based Approaches
Before the advent of deep learning, supervised machine learning techniques, such as support vector machines (SVMs) and traditional artificial neural networks, were applied to tumor detection. These methods required preprocessing and complex feature extraction, where manually defined features such as skeletons, shapes, and intensities were extracted from images and analyzed by learning algorithms. Although these machine learning-based approaches showed improvements over traditional methods, they were still constrained by their reliance on manually extracted features.
Recent Advances in Deep Learning
Recent progress in deep learning, especially the use of CNNs, is revolutionizing the field of medical image analysis. CNNs can automatically learn discriminative features (e.g., skeletons, shapes, and intensities) directly from raw data, eliminating the need for manual feature extraction. Architectures such as AlexNet, VGG, ResNet, and more recently U-Net and its variants, are currently being used for medical image segmentation and classification.
U-Net and Its Variants
U-Net, introduced by Ronneberger et al., is particularly effective for medical image segmentation due to its “U”-shaped structure, which allows for detailed capture of spatial features while preserving global contextual information. Variants of U-Net, such as 3D U-Net and Attention U-Net, have been developed to further enhance the accuracy of tumor segmentation.
ResNet
Residual networks, or ResNet, enable the training of very deep networks without suffering from the gradient vanishing problem. Their ability to extract complex features has been exploited in several studies to improve the accuracy of brain tumor detection.
B) Current Challenges
Despite advancements, several challenges remain in the automatic detection of brain tumors:
1) Data Variability
The variability in tumor size, shape, location, and intensity makes detection particularly challenging. Deep learning models need to be robust to these variations to become clinically viable.
2) Limited Annotated Data
Annotated medical data are often scarce and expensive to obtain. Deep learning approaches require large datasets to achieve optimal performance, posing a significant challenge in the medical context.
3) Interpretability
Deep learning models are often perceived as “black boxes”, lacking interpretability, which can limit their acceptance in clinical practice.
C) Related Work
This section reviews prior research on brain tumor detection using deep learning models:
1) Al Almadhoun [4]:
Almadhoun and colleagues proposed a deep learning-based model using an MRI dataset for brain tumor detection. In addition to the deep learning model, they applied four transfer learning models: VGG16, MobileNet, ResNet-50, and Inception V3. Their dataset contained 10,000 MRI images with a resolution of 200 × 200 pixels, divided into two categories of 5000 images each: brain tumors and non-brain tumors. The proposed deep learning model achieved superior results, with a training accuracy of 100% and a test accuracy of 98%.
2) Al Musallam [5]:
Musallam et al. introduced a DCNN model using an MRI dataset for brain tumor detection. Their proposed lightweight model employed a few convolutions, max-pooling, and iterations. They also analyzed VGG16, VGG19, and CNN-SVM. The dataset contained four subcategories: glioma (934), meningioma (945), no tumor (606), and pituitary (909), with a total of 3394 MRI images. The suggested model achieved an overall accuracy of 97.72%, with a detection rate of 99% for glioma, 98.26% for meningioma, 95.95% for pituitary, and 97.14% for normal images.
3) Al Nayak [6]:
Nayak et al. proposed Dense EfficientNet, a CNN-based network for detecting brain tumor images using MRI. They also analyzed ResNet-50, MobileNet, and MobileNetV2, with their Dense EfficientNet outperforming the others. The model achieved an accuracy of 98.78% and an F1 score of 98.0% after training. Their research utilized four different types of MRI to identify brain tumors, with a total dataset of 3260 MRI images.
4) Al Obeidavi [7]:
Obeidavi et al. introduced a CNN-based residual network for early detection of brain tumors using a dataset of 2000 MRI images. They used the BRATS 2015 MRI dataset, and the results for residual networks were promising. The proposed model achieved an accuracy of 97.05%. Additionally, they reported other metrics, including an average precision of 97.05%, an overall precision of 94.43%, an average IoU of 54.21%, a weighted IoU of 93.64%, and an average BF score of 57.027%. The model was trained over 100 epochs to improve performance.
5) AlexNet (2012):
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton [8] introduced AlexNet, achieving a top-5 accuracy of 84.6% on the ImageNet dataset. This model marked a turning point in the application of deep neural networks for computer vision.
3. Methodology
a) Data Preparation
Data Collection:
The data was collected in May 2024 from the Kaggle platform. It includes 148 MRI scans of healthy brains and 154 MRI scans of brains with tumors. We applied a 60˚ rotation augmentation technique, resulting in more than 1000 images for each category.
Data Annotation:
Images of patients with tumors were annotated with the letter “Y” followed by a number, while images of healthy individuals were annotated with the expression “no” followed by a number, depending on the total number of images.
Data Preprocessing:
Due to the limited amount of data available, we employed an image rotation augmentation technique to expand our dataset. The data was then split into percentages for training (1990 images) and validation (205 images). The configurations and training process were carried out using the Google Colab environment, which provides free GPU resources.
b) Model Architecture
Model Selection:
For medical image segmentation and classification, architectures like AlexNet, VGG, ResNet, and more recently U-Net and its variants are commonly used.
ResNet: ResNet (Residual Networks) allows the training of very deep networks without suffering from the gradient degradation problem.
U-Net: U-Net and its variants are particularly effective for medical image segmentation due to their “U”-shaped architecture. This structure allows pixel-by-pixel image segmentation by predicting the class of each pixel, which is essential for detailed medical imaging tasks.
The combination of these architectures was chosen based on their ability to handle complex medical image segmentation and classification problems efficiently.
4. Results
Data augmentation:
Figure 1 below shows an example of an MRI image rotated 60 degrees during the data augmentation process. This technique was used to increase the dataset size and improve model training accuracy by introducing diversity in the input data.
Figure 1. An example of an MRI image that has undergone a pivoting technique.
Figure 2 is a capture that shows the classification report without augmentation technique and Figure 3 is also a capture that shows the classification report after using augmentation technique.
Figure 2. Classification report without augmentation technique.
Figure 3. Classification report after augmentation technique.
Figure 4 is a capture that shows loss and precision curves without augmentation technique and Figure 5 is a capture that shows loss and precision curves after augmentation technique.
Figure 4. Loss and precision curves without augmentation technique.
Figure 5. Loss and precision curves after augmentation technique.
Confusion matrix results
Figure 6 is a capture that shows Confusion matrix results for both techniques.
Figure 6. Confusion matrix results for both techniques.
All of the following represent the performance metrics with augmentation technique and performance metrics without augmentation technique.
With the augmentation technique
True Negatives (TN): 91
False Positives (FP): 10
False Negatives (FN): 10
True Positives (TP): 94
Performance Metrics:
Using these values, we can calculate key performance metrics:
1) Accuracy (Overall Precision):
Accuracy = (Total correct predictions)/(Total samples)
Accuracy = TP + TN/FP + FN + TP + TN
Calculation: 94 + 91/94 + 91 + 10 + 10 = 0.924
Accuracy = 90.24%
2) Precision (Positive Predictive Value):
Precision = (Correctly classified tumor cases)/(Total predicted tumor cases)
Precision = TP/TP + FP = 94/94 + 10 = 0.938
Precision = 90.38%
3) Recall (Sensitivity or Tumor Detection Rate):
Recall = (Correctly classified tumor cases)/(Total actual tumor cases)
Recall= TP/TP + FN = 94/94 + 10 = 0.938
Recall = 90.38%
4) Specificity (Non-Tumor Detection Rate):
Specificity = (Correctly classified non-tumor cases)/(Total actual non-tumor cases)
Specificity = TN/TN + FP= 91/91 + 10 = 0.938
Result = 90.38%
5) F1-Score (Harmonic Mean of Precision and Recall):
F1 = 2 × Precision × Recall/Precision + Recall = 2 × ((0.9038 × 0.9038)/(0.9038 + 0.9038)) = 0.9038
F1 = 90.38%
Without the augmentation technique
True Negatives (TN): 22
False Positives (FP): 20
False Negatives (FN): 11
True Positives (TP): 31
Performance Metrics:
Using these values, we can calculate key performance metrics:
1) Accuracy (Overall Precision):
Accuracy = (Total correct predictions)/(Total samples)
Accuracy = TP + TN/FP + FN + TP + TN
Calculation: 31 + 22/31 + 22 + 20 + 11 = 0.63
Accuracy = 63.1%
2) Precision (Positive Predictive Value):
Precision = (Correctly classified tumor cases)/(Total predicted tumor cases)
Precision = TP/TP + FP = 31/31 + 20 = 0.608
Precision = 60.8%
3) Recall (Sensitivity or Tumor Detection Rate):
Recall = (Correctly classified tumor cases)/(Total actual tumor cases)
Recall = TP/TP + FN = 31/31 + 11 = 0.738
Recall = 73.8%
4) Specificity (Non-Tumor Detection Rate):
Specificity = (Correctly classified non-tumor cases)/(Total actual non-tumor cases)
Specificity = TN/TN + FP = 22/22 + 20 = 0.524
Specificity = 52.4%
5) F1-Score (Harmonic Mean of Precision and Recall):
F1 = 2 × Precision × Recall/Precision + Recall = 2 × ((0.608 × 0.738)/(0.608 + 0.738)) = 0.666
F1 = 66.6%
5. Discussion
We trained our model on a scale of 10 epochs given the quantity of images we submitted to CNN Resnet. It took a little less than an hour to train and test the model on google.colab.com; despite the fact that we stored the dataset in our drive. For the augmented data, we see that as the epochs increase, the training and validation curves try to merge. This is not the case for non-augmented data, because the curves do not touch each other. We have a training accuracy of 95% due to the fact that we had little data. Which led to overfitting and a poor validation percentage of 63%.
In 2022, Al Obeidavi had obtained an accuracy of 97.05% with a residual model and our results show an accuracy of 90% after data augmentation. However, our approach is distinguished by the use of a data augmentation method adapted to medical images, which made it possible to improve the robustness of the model even with an initially limited dataset.
Although the augmentation technique improved the performance of the model, we admit that the initial size of the dataset remains a limiting factor. By using the BRATS databases, we could increase the generalization and robustness of our model in the face of various cases of brain tumors.
When data is unbalanced, the emphasis on accuracy can be misleading. We therefore calculated other metrics like sensitivity, specificity and AUC-ROC. The addition of these indicators further highlights the ability of the model to detect tumors more effectively.
We also evaluated another architecture like U-Net on the same dataset and the performances are in Table 1 below:
Table 1. Evaluate results on another neural architecture.
Model |
Accuracy (%) |
Sensitivity (%) |
Specificity (%) |
AUC-ROC (%) |
Our model |
90 |
90.38 |
90.09 |
90.23 |
U-Net |
87.5 |
82.4 |
90.5 |
0.91 |
6. Summary of Contributions and Future Work
We demonstrated that it is possible to improve the accuracy percentage in tumor detection using the ResNet convolutional neural network model. By increasing the number of images, we are taking a step toward more precise early detection of brain tumors. However, this result is not perfect, as there is still room to get closer to the ideal of 100% accuracy. We aim to achieve better outcomes in our future experiments using the VGG16 convolutional neural network model.
Acknowledgements
Our heartfelt thanks go to the thesis supervisor, Professor James T. KOUAWA, for his availability and guidance.