Deep Learning Approaches for the Identification and Classification of Skin Cancer ()
1. Introduction
Dermatoscopy and epiluminescence microscopy are tools used by medical practitioners to detect melanoma skin cancer. Our melanoma detection method uses deep learning algorithms to analyze dermoscopic photos and reliably identify melanoma. Two sets of dermoscopic images were employed: one related to malignancy and the other to benignity. An ISIC (International Skin Image Collaboration) dataset was gathered to build a system to detect skin cancer. There are 2357 photos in this dataset. Unfortunately, over the past 40 years, there has been a global increase in melanoma cases. Melanoma is thought to be found in 1% of malignant tumors and causes about 60% of skin cancer deaths [1]. Incidences of malignant melanoma have risen in Europe and the U.S. during the previous three decades [2]. From 2002 to 2006, the annual cost of treating skin cancer with medications was 3.6 billion; however, from 2007 to 2011, that amount rose to 8.1 billion [3]. Consequently, the death rate from melanoma is rising every day. If detected early enough, melanoma is curable even though it spreads to other organs like the brain and bones more quickly than other types of cancer. But if it spreads to other body parts and the skin, the chances of recovery almost completely vanish, and the therapies become more challenging. Building an artificial neural network that can function well on challenging tasks requires careful consideration of feature extraction and selection. Through their assistance, the dimensionality of the data is decreased, redundant or irrelevant information is eliminated, and the model’s accuracy and efficiency are increased. It is crucial for the early diagnosis of malignant melanoma as a result. For this reason, this technique has developed to quickly identify malignant melanoma [4]-[13]. To improve accuracy, the Convolutional Neural Network model is used with the R-CNN algorithm.
2. Literature Survey
Specialized dermatologists and oncologists are needed for melanoma skin cancer inspection, which is costly and time-consuming. Further education and experience are also needed to recognize skin cancer lesions. This is why automatic skin cancer detection has recently been studied using deep learning. The subsequent paragraphs provide a discussion of some recent studies on automatic skin cancer diagnosis. For example, in [14], the authors employed two sets of dermoscopic images—one with 1497 images of malignancy and the other with 1800 photos of benignity—to train deep learning models to detect skin cancer (melanoma). They gathered a dataset of scaled, 224 × 224 resolution photographs from the International Skin Image Collaboration (ISIC). They have employed one-hot encoding, normalization, data augmentation, noise removal, train, validation, and test set creation in preprocessing. In the end, they employed CNN models to identify skin cancer. The authors achieved 93.58% accuracy in the CNN model in the result part [13] [15]-[22]. Kaur and Inha [23] used deep learning techniques to create a quick and precise classifier for an autonomous skin cancer detection system. First, the dataset—which includes 2150 benign and malignant images—was gathered from a Canadian hospital. Subsequently, the preprocessing stage included dropout and data augmentation approaches. Next, the authors compared the outcomes using the VGG16 and Alex Net models. Ultimately, they applied a custom model and obtained 0.8295 accuracies. [24] used an ISIC repository skin lesions image dataset, HAM10000, to present an automatic skin cancer detection model. To get better and more accurate results, the authors of this work used a variety of deep learning models, including CNN, VGG16, ResNet, and DenseNet algorithms. The authors noted that the CNN model’s validation accuracy is 0.79. However, the authors also noted that the VGG 11_BN model had a validation accuracy of 0.85. The final accuracy of the ResNet 50 and DenseNet 121 models was 0.90, which were highest accuracies.
Authors [25] presented a method of melanoma detection based on segmentation and picture classification. In this study, the authors suggested an autonomous machine learning-based approach for identifying dermoscopic images. To determine the results, the authors utilized linear scaling to unit variance. RGB dermoscopic images were gathered for this study from the ISIC database. Before being uploaded to the system, the photos underwent preprocessing. However, the researchers discovered that there was an imbalance in the dataset, with 20% of the photos being cancerous and the remaining 80% benign. Lastly, the authors discovered that the accuracy of random classifiers, SVM, and quadratic discrimination was 93%, 88%, and 85%, respectively. In the end, random forest obtained highest accuracy, which is 93%. Authors [26] developed a melanoma skin cancer detection system for better self-examination using deep learning and machine learning. In this experiment, a public dataset was used from ISIC (International Skin Imaging Collaboration) size of 23,000 images, and only 640 photos of size 124 × 124 were used. Before applying any model, data preprocessing is important. So, the first case is removing hair. The author used the Color enhancement technique, 2-D derivatives of Gaussian for hair removal and segmentation. For the highest accuracy, the author uses Convolutional neural networks, conventional machine learning classifiers, and the appearance of a skin lesion. In this experiment, the author applied KNN and SVM with CNN. The lowest accuracy achieved for KNN is only 57.3%, and SVM gives a higher accuracy of 71.8%. Finally, SVM with CNN provides the highest accuracy of 85.5% [27]-[38]. The authors [39] applied three deep-learning models on a public dataset with 25,000 images to identify skin cancer. All of the dataset’s photographs have been downsized to 224 × 224. In the training phase, 80% of the photos were utilized. To avoid overfitting, they applied data augmentation techniques. They employed VGG16, proposed-ensemble, ResNet, and CapsNet. Based on the results, the proposed ensemble model had achieved the highest accuracy which was 93.5%.
Authors [40] used two deep-learning models using a public dataset of 3097 pictures to detect skin cancer. The images in the collection have been reduced to 224 × 224 pixels in size. Two thousand four hundred thirty-seven images were used in the training phase. They used data augmentation approaches to prevent overfitting. They have applied ResNet101 and InceptionV3 to detect skin cancer. The InceptionV3 model had obtained the highest accuracy which was 87.42%.
Authors [41] introduced Automatic Melanoma Detection using deep learning approaches with some updated models. Melanoma is the deadliest variety of skin cancer. To overcome these issues, the authors of this paper proposed a deep learning-based method for automatically recognizing and segmenting melanoma lesions. A lesion classifier is a revolutionary method that has been created to categorize skin lesions into melanoma and non-melanoma based on the results of pixel-wise classification. The most extensively researched cutting-edge literature techniques, including FrCN, CDNN, FCN, and UNET, were suggested for use in a few models. The accuracy and dice coefficient on the PH2 and the ISIC 2017 datasets were 0.95, 0.92, and 0.95, 0.93, respectively. The authors [42] suggested a method for predicting melanoma using deep learning. For the validation cohort, they gathered data from 11 patients using the open-source CPTAC-CM dataset. Every image has been downsized to 224 × 224 resolution. CNN, ResNet50, DenseNet201, and inceptionV3 were used in that work. First, they used the dataset to train three pre-trained CNNs and added SVM classifiers. They then produced three models: inceptionSVM, DenseSVM, and ResSVM. Finally, they combined the output from the three models into a DeepSVM ensemble model and included a Clinical Model built using clinical data. Finally, they achieved an accuracy of 72.2%, which was not as good as existing works. Authors [43] used some deep-learning models to detect melanoma using dermoscopic images. They used a public dataset named HAM10000. In the preprocessing part, they resized all the photos into 227 × 227 resolution. Then, rotation and cropping were used in the augmentation part. They took 80% data for training and 20% for testing. VGG16, inceptionV3, AlexNet, Xception, and modified Xception were applied in that paper. Finally, custom Xception models gave the best accuracy, which is 100% with 95.53% F1 score. Ghadah Alwakid and others [44] proposed a system that presents a method for detecting skin cancer using deep learning algorithms. The technique entails employing ESRGAN to pre-process the lesion image before CNN, and a modified version of Resnet-50 is used to analyze it. The accuracy rate of the proposed method, tested using the HAM10000 dataset, was 86%, exceeding earlier research. A more varied dataset, including lesion-free skin, will be used to test the method’s effectiveness. It will be assessed using alternative deep learning models like VGG, AlexNet, or DenseNet.
A wide range of publications and blogs were studied about using deep learning to detect melanoma. The majority of them used limited pictures and outdated deep-learning algorithms to detect melanoma. Furthermore, several articles did not include the data augmentation section and advanced pre-processing techniques, while updated pre-processing techniques and a wide range of data augmentation were applied in this paper. As a result, we attempted to use a CNN model using a special technique called Resnet50 on the selected dataset, along with different data augmentation techniques.
In this research, dermoscopic pictures and deep learning approaches were used to identify melanoma skin cancer disease. The 2357 dermoscopic images were collected from the ISIC dataset. In preprocessing, we applied reshaping techniques, normalization, and data augmentation to prevent overfitting. In addition, a Convolutional Neural Network with Resnet50 has been applied. Finally, this model obtained better accuracy than other works.
Deep learning is used in this paper to detect melanoma skin cancer. This work has made the following notable contributions:
A major contribution of this project is to resize all the images of the ISIC dataset, which contains 2357 dermoscopic images, into 64 × 64. The dataset includes nine classes.
Convolutional Neural Network has been applied to the dataset to classify melanoma skin cancer detection.
To prevent overfitting, data augmentation has been applied to every model.
This research’s novelty is using a deep learning model named Resnet50 with advanced pre-processing techniques and several data augmentation techniques to construct a system that automatically detects skin cancer using the ISIC dataset.
3. Proposed Methodology
Datasets, preprocessing, and deep learning models have all been briefly covered in this section. Convolutional neural networks, or CNNs, are one kind of artificial intelligence. Image analysis is its primary use. It operates by dividing the image into smaller segments and searching for patterns. These patterns aid with the recognition of objects in the image, such as forms and edges. For a wide range of applications, including picture segmentation, object detection, and recognition, there are numerous CNN methods. Nonetheless, many widely utilized CNN architectures, such as InceptionNet, ResNet, DenseNet, VGGNet, and YOLO, have demonstrated great accuracy in a variety of computer vision applications [45]. There are several distinct kinds of layers in the CNN design, including the input layer, pooling layer, activation layer, fully connected layer, output layer, and convolutional layer, which aids in pattern recognition and feature extraction. Each of these layers contributes to the development of a melanoma detection system.
3.1. Dataset
2357 images were gathered from the ISIC database, which is also available in Kaggle [46]. The dataset contains 2357 pictures of malignant and benign. Figure 1 contains some pictures of the dataset. There are different subsets of this dataset, which contain the same number of pictures. The subsets are mainly the types of diseases. Here are the following subsets:
Figure 1. Dataset [46].
3.2. Pre-Processing
Preprocessing is required to get image data ready for model input. For example, the fully linked layers of convolutional neural networks required that each image be in an array of the same size. Model preprocessing may also accelerate model inference and reduce the time needed for model training [47]. We have applied different types of pre-processing techniques in this work.
Normalization: One pre-processing method for standardizing data is normalization. Stated differently, possessing disparate data sources within the same range. Our network may experience issues if the data is not normalized before training, which would make training much more difficult and slow down the learning process. (450, 600, 3) is the shape of an image array. Here, 3 means three channels which are red, green, and blue. (0, 1) is taken for the mean across the axis.
Resize images: The number of pixels of these matrices impacts the computational cost of CNNs, which analyze pictures as matrices of values for pixels. More calculations are required since larger images, such 512 × 512, have many more pixels. Each pixel increases the dimensionality of the input, which raises the number of parameters and processes in each convolutional layer. We guarantee uniform input dimensions by scaling photographs to a square form, which facilitates CNN model design and training. The number of pixels in an image can be reduced from hundreds of thousands to a reasonable 4096 (for greyscale photos) or 12,288 (for RGB images) by resizing it to a smaller resolution (such as 64 × 64). Faster training times and less hardware are made possible by this significant reduction in memory utilization and computational cost. All images are reshaped into (64 × 64) for Convolutional Neural Network. To guarantee that every input has the same proportions, we resize all photos to a standard size (64 × 64). Batch processing, which increases training speed and stability, requires this standardization.
Data augmentation: Data augmentation is a method that uses pre-existing data to create customized copies of a dataset, thereby artificially expanding the training set. It entails introducing small adjustments to the database or producing new data points via deep learning. For data augmentation, we have used the following features (Table 1).
Table 1. Data augmentation.
Type |
Value |
Rotation |
+40, −40 |
Width-shift |
0.2 |
Shear |
0.2 |
Zoom |
0.2 |
Horizontal-flip |
True |
Fill |
nearest |
In Table 1, all types of data augmentation parameters and their corresponding parameters are shown.
3.3. Feature Selection and Finalization
To learn the best features and allow for easy backpropagation of errors, a basic network is required for feature selection. Yet, to understand the intricate manifold of data and recreate it using a small number of carefully chosen features, data reconstruction requires a complex network [48]. To construct a machine learning neural network that can function well on challenging tasks, feature extraction and selection are essential stages. They aid in reducing the number of dimensions in the data, eliminating superfluous or unnecessary information, and enhancing the effectiveness and precision of the model.
The tumor in the image has seven form characteristics and one derived color feature based on the ABCD rule. The chromatic information is first represented by (C), while the shape features are represented by (A, B, D). To determine the significance of each input feature in the decision-making process, feature importance is a stage in the construction of a machine learning model that entails scoring each feature. A feature’s influence on the model’s ability to predict a certain variable increase with its score.
3.4. Deep Learning Algorithms
One model has been used in this project and it is known as CNN classifier. This model has been described below.
Convolutional Neural Network: In this project, a Convolutional Neural Network has been used to build the model. After preprocessing, we make a Convolutional Neural Network with (64, 64) reshaped images. 64, 64, 3 is the input feature map. 3 means three colors: red, green, and blue. In the first layer, there are 16 filters with 3 × 3 dimensions. In the next layer, max pooling has been added with a 2 × 2 dimension. That’s the whole part of the first convolution. In the second convolution, there are 32 filters with 3 × 3 dimensions, and a 3 × 3 max pooling layer has been added in the next layer. In the third convolution, there are 64 filters with 3 × 3 dimensions, and a 3 × 3 max pooling layer has been added in the next layer. Then, 512 hidden units and ReLU activation were used to construct a layer that was fully connected. A dropout rate of 0.5 has been added to the layer. Adam optimizer has been used with a 0.001 learning rate. 30 epochs have been used to train the model. In Figure 4, a sample architecture shows this model. In this model, the number of total parameters is 2,124,839.
![]()
Figure 2. CNN architecture [49].
Convolutional Layer: A CNN consists of three layers: The Convolutional Layer, which extracts features; the Pooling Layer, which reduces spatial aspects; and the Fully Connected Layer, which uses retrieved information to accomplish tasks like regression or classification. Figure 2 shows the CNN Architecture. Each input neuron in a neural network is normally connected to the subsequent hidden layer. In CNN, the hidden layer neurons are connected to a relatively tiny portion of the input layer neurons.
Pooling Layer: To make the feature map less dimensional, employ the pooling layer. Within the CNN’s hidden layer, there will be numerous activation and pooling layers. CNN’s pooling layers, sometimes referred to as reduced layers, are crucial for deep learning applications. Their responsibility is to minimize the width and height of the data they provide while maintaining the most significant information. A fully connected layer is one kind of layer. Weights in fully connected layers are connected to every output of the layer before it. I appreciate your response. According to your explanation, the pooling and convolution layers are hidden layers since they are positioned between the input and output layers.
Fully-Connected Layer: The final several network layers are known as Fully Connected Layers. The output from the last pooling or convolutional layer, which is flattened and then fed into the fully connected layer, serves as the input for the fully connected layer. In artificial neural networks, an FC layer, often called a dense layer, is a layer type in which each neuron or cluster from earlier layers is connected to every other neuron in the current layer. Because of its total connectivity, it is referred to as “fully connected”. FC layers are in charge of generating final output predictions and are normally located at the end of a neural network architecture.
ResNet-50 Architecture: The CNN architecture ResNet-50 is a member of the Residual Networks, a group of models created to tackle the difficulties involved in deep neural network training. Renowned for its depth and effectiveness in image classification tasks, ResNet-50 was created by Microsoft Research. Advanced results can be achieved by training a robust image classification model, such as ResNet50, on huge datasets. Without having to deal with the issue of gradients that diminish, the network is now able to learn far deeper structures due to these remaining connections. A significant advancement in it is the utilization of remaining connections, which enable the network to pick up a set of remaining functions that translate the input into the intended output. The four primary components of ResNet50’s architecture are fully linked layers, convolutional blocks, identity blocks, and convolutional layers themselves. From the input image, the convolutional layers extract features, which are then processed and transformed by the identity block and convolutional block. The ultimate categorization is then determined using the fully connected layers. Multiple convolutional layers are included in ResNet50’s convolutional layers, which are subsequently followed by batch normalization and activation of ReLU. Features like edges, textures, and forms are extracted from the input image by these layers. Max pooling layers come after the convolutional layers and lower the three-dimensional feature maps without sacrificing the salient features. The two main ResNet50 building blocks are the identification block and the convolutional block. The identity block is a straightforward block that puts the input back to the result after passing it through several convolutional layers. As a result, the network is able to learn residual functions, which convert input into desired output. With the inclusion of a 1 × 1 convolutional layer to lower the overall amount of filters before the 3 × 3 convolutional layer, the convolutional block resembles the identity block. The completely connected layers make up the last portion of ResNet50. The ultimate classification is determined by these layers. The final class probabilities are obtained by feeding the output of the last fully connected layer into a softmax activation function. Large-layer deep residual networks are better at memorizing training data, which can cause overfitting—especially in cases where the dataset is short. It could be necessary to use regularization methods like weight decay and dropout to stop the overfitting in large models. Numerous hyperparameters, including LR, network depth, batch size and skip connection topology, need to be adjusted for Deep Residual Networks. It can take a lot of time and money to fine-tune these hyperparameters computationally. ResNets are useful in mitigating the issue of the vanishing gradient but they can also occasionally cause gradient explosion, a phenomenon in which gradients grow extremely enormous. This may result in inconsistent training and complicate the process of arriving at a workable solution. Compared to typical neural networks, residual networks are deeper, which may result in higher computing difficulty throughout both inference and training. Deeper networks are more difficult to train and implement, particularly in contexts with limited resources, because they demand more memory and computing power. Even while Residual Networks use skip connections to partially alleviate the vanishing gradient problem, it can still happen in very deep networks. Gradients can get very small as a result, which might make training harder and cause convergence to happen more slowly or even to inferior solutions.
The ResNet50 model has some drawbacks, such as uneven data, overfitting, and a reduced capacity for small item detection. The ResNet50 model also has a problem with imprecision.
ResNet-50 is made up of 50 layers split up into 5 blocks, each of which has a collection of residual blocks. Resnet50 Architecture is shown in Figure 3. The network can learn more accurate representations of the input data by using the residual blocks, which enable the preservation of information from previous levels [50].
Figure 3. Resnet50 architecture [51].
4. Results and Discussions
In this study, we identified melanoma skin cancer disease using deep learning techniques and dermoscopic images. We took advantage of the ISIC dataset’s 2357 dermoscopic pictures. To avoid overfitting, we used normalization, data augmentation, and reshaping approaches during preprocessing. Also used is a Convolutional Neural Network that incorporates R-CNN. In the end, this model outperformed previous efforts in accuracy. Resnet50 is utilized for image categorization. Various pre-processing methods were used on training photos, and data augmentation contributed to an improvement in accuracy. With the use of all those methods, the accuracy is increased to 89.03%. Table 2 mentions the percentage of Training Accuracy, Text Accuracy, and Validation Accuracy.
Table 2. Training Accuracy, Test Accuracy, and Validation Accuracy.
Model Name |
Training Accuracy |
Validation Accuracy |
Test Accuracy |
ResNet50 |
98% |
90% |
94% |
4.1. Confusion Matrix
An accurate representation of a classification model’s performance in artificial intelligence is provided by a confusion matrix. False positives, true positives, false negatives, and true negatives are all counted. A confusion matrix is a tabular representation that provides an overview of a classification model’s performance by contrasting the anticipated and true labels. It shows how many of the model’s predictions were TP, FN, FP, and TN. By identifying incorrect classifications and enhancing predictive precision, this matrix helps with model performance analysis. An N × N matrix, where N is the total number of target classes, is called a confusion matrix and is used to assess how well a classification model performs. The artificial intelligence model’s predicted values are compared with the actual target values in the matrix. This provides us with a comprehensive understanding of the types of mistakes and performance metrics of our classification model. The accuracy of “nv” (Nevus) is 0.98, indicating that it is easily recognizable. At 0.73 and 0.81, respectively, “akiec” and “bcc” are also typically accurately identified. Figure 4 shows the confusion matrix.
![]()
Figure 4. Confusion matrix.
4.2. Accuracy
It calculates the proportion of all accurate predictions; larger numbers signify greater model performance. Being a discrete metric, it cannot be directly optimized. The frequency with which a deep learning model accurately predicts the result is measured by its accuracy. By dividing the total amount of guesses by the number of right forecasts, accuracy may be computed. A metric called accuracy can be used to characterize the model’s performance in all classes. When all classes are equally important, it is helpful. It is determined by dividing the total number of forecasts by the ratio of accurate predictions. 94% accuracy is achieved with the ResNet50 deep learning model. A model that is overfitting—that is, one that has learnt the training data too effectively, including its noise—may have an inadequate loss value but yet perform poorly when applied to new data. Loss measures the discrepancy between the expected and actual values; lower values correspond to higher model performance. It is a constant metric that can be directly optimized via instruction. On the other hand, if a model regularly makes minor errors across a large number of predictions as opposed to significant errors on a small number of them, it may attain acceptable precision with an overall higher loss. In Figure 5, with more epochs, ResNet50 models are more accurate during training and validation. Additionally, as epochs are added, activity and validation losses get smaller.
![]()
Figure 5. Accuracy and loss graphs.
Table 3. Comparison of this work with existing systems.
Author |
Dataset |
Network |
Training
Accuracy |
Validation
Accuracy |
Test
Accuracy |
[4] |
ISIC |
CNN |
0.98 |
0.89 |
0.93 |
[5] |
Canadian hospital (open source) |
Their Model |
0.83 |
0.82 |
0.82 |
[6] |
HAM100000 |
ResNet50 |
0.9 |
0.89 |
0.87 |
[7] |
ISIC-ISBl |
Random Forest |
|
|
0.93 |
[8] |
ISIC |
CNN |
0.88 |
|
0.85 |
[9] |
ISIC |
Proposed Ensemble |
0.93 |
0.87 |
0.84 |
[10] |
ISIC-Archive |
Inception-V3 |
0.95 |
0.9 |
0.87 |
[11] |
PH2 |
Their Model |
0.95 |
0.95 |
0.93 |
[12] |
CPTAC-C |
DeepSVM |
|
|
0.72 |
[13] |
HAM10000 |
Proposed Xception |
|
|
0.93 |
[14] |
ICBI |
Model6 |
|
|
0.90 |
[15] |
HAM10000 |
Modified ResNet50 |
|
|
0.86 |
This Work |
ISIC |
ResNet50 |
0.98 |
0.9 |
0.94 |
Table 3 displays the comparison of results obtained for the three types of accuracy and performance across the datasets. ISIC is the popular dataset for detecting skin cancer and most of the papers used it. We also used this dataset to obtain a good outcome and we successfully achieved 94% accuracy in the test using Resnet50, which one is better than the other. Table 3 represents a good picture of comparison.
5. Conclusions
Finally, we built three models with better accuracy using deep learning, which can easily detect melanoma skin cancer with the help of dermoscopic pictures. We collected our Dataset from ISIC, which contains 2357 images. The Convolutional Neural Network model was used to gain a good outcome. In our work, we implemented data preprocessing, data augmentation, and classification for training our model. Data augmentation and a CNN model were applied to the chosen dataset using a unique method known as Resnet50. Several preprocessing techniques were employed to extract average color information and normalize all color channel information. After that, we restructure the images and gather data for classification. Data augmentation was also used to prevent overfitting. As a result, ResNet50 achieved better accuracy than the other models, which were 0.98 on training, 0.9 on validation, and 0.94 on the test part. Our models can detect melanoma skin cancer quickly within a very low-cost range, revolutionizing medical science.
In the future, we will increase the accuracy of the models used in this work. We will also include or train more models to build a project with better accuracy.