A Multi-Task Deep Learning Framework for Simultaneous Detection of Thoracic Pathology through Image Classification ()
1. Introduction
In contemporary scientific research, an emerging and highly regarded subject of exploration involves a specific facet of deep learning known as the multi-task deep learning approach. This innovative technique facilitates the simultaneous training of numerous tasks [1] , ultimately contributing to the enhancement of model classification performance. In this pursuit of excellence, researchers engage in a continuous process of refining and enhancing these models until no further performance improvements are attainable. This article delves into multi-task deep learning paradigms, highlighting their significance and the relentless efforts invested by researchers to improve their applications. The multi-task deep learning can be an effective and fast method to identify and detect COVID-19 through CT scans or X-ray images [2] . Thus, multi-task convolutional neural networks can classify different abnormalities of the chest correctly. Therefore, it is simultaneously capable of detecting COVID-19 and pneumonia using both CT scans and CXR images. It combines the entire available data labels into a single model. It makes it possible to carry out various but connected tasks using only one neural network architecture. Hence, this paradigm makes the use of the convolution layer efficient and improves the performance above the single-task models [3] . Multi-task Learning (MTL) in general, aims to learn multiple tasks together at the same time. This will improve the generalization of the model. MTL can be defined as the optimization of several loss functions in a single model such that shared representation learning can execute related tasks [4] .
Multi-task learning creates challenges that are absent from single-task learning. Particularly, several tasks may have requirements that contradict each other. In this situation, a phenomenon known as negative transfer or destructive interference occurs when a model’s improved performance on one job degrades the performance of another job. Therefore, a primary objective of MTL techniques is to minimize negative transfer. Splitting data between tasks is a delicate balance to fulfill because so many structures are designed with particular features to decrease the negative transfer, such as task-specific feature spaces as well as attention mechanisms. However, it is really interesting to allow information flow among tasks that result in positive transfer and deterring sharing when it would generate a bad transfer. In our work, the tasks are Pneumonia, COVID-19, and Normal. There are two main types of MTL. It is noted that the Deep Learning MTL techniques that are most used will be described.
In this study, we have developed a multi-task deep learning framework capable of simultaneously detecting both COVID-19 and pneumonia through image classification tasks. Our contributions encompassed the thorough training of various well-established pre-trained CNN architectures, including VGG-16, InceptionV3, Res-Net-50, and EfficientNetB0, until a notably high level of accuracy in disease detection is achieved. Furthermore, we conducted a comprehensive comparative analysis of these promising CNN architectures, evaluating their respective accuracies.
2. Related Work
The process of thoracic disease detection problems has been challenging for many years, especially with the appearance of the COVID-19 disease. Several studies have been proposed to solve the difficulties of this issue and to help practitioners to detect accurately the disease. These studies have used machine learning and deep learning approaches extensively.
2.1. Traditional Machine Learning Approaches
Santos and Melin [5] have considered the importance of the development of the computer-intelligence technique for pneumonia classification, they utilized the ML approach for diagnosis and detection where the segmentation technique is used for the extraction of the area of interest from the chest X-ray. The experimental results reveal a mean accuracy of 95%. Tuncer et al. [6] used a traditional machine learning-based system. The size of the dataset that has been used was small and imbalanced. It contains (135 for COVID-19, 150 for pneumonia, and 150 for normal radiographs), and they achieved an accuracy rate of 97.01%. For pneumonia X-ray classification, Sharma et al. [7] designed a simple CNN model, and to solve the problem of scarcity of data they utilized data augmentation. The achieved accuracy was around 90.68%.
2.2. Deep Learning Approaches
Danilov et al. [8] have tested four deep learning networks to evaluate the performance for pneumonia, and COVID-19 classification, the networks are EffecientNet B1, B2, MobileNetV2, and VGG16 from 2631 CXRs. When they used fine-tuning, the presented model reached an accuracy of 78% which is not a promising value. To improve the model performance, the authors utilize Grad-CAM, which presents guided attention to the chest part based on the segmentation of U-Net. This technique improved the COVID-19, pneumonia classification. In [9] , the authors used deep learning instead of the reverse transcription polymerase chain reaction method for COVID-19 identification because it was found that the algorithm is time-consuming even though deep learning is showing promising performance. The dataset consists of 13,569 X-ray images split into healthy, non-COVID-19 pneumonia and COVID-19 patients. The results show that this approach presents an excellent model with an accuracy of 93.9%, a COVID-19 sensitivity of 96.8%, and a positive prediction of 100%. The authors in [10] have collected a dataset of X-ray images of healthy and COVID-19 patients, this dataset become the biggest dataset. Next, they trained a CNN model, and they achieved a total accuracy of 87.88%. Bhattacharyya et al. [11] presented a deep learning method combining image segmentation and classification models to detect COVID-19, and health efficiently. They used segmentation on X-ray raw images by the conditional generative adversarial network (C-GAN) with the algorithm called pix to pix. After that, they put segmented images into the network for feature extraction, which is a combination of several deep neural networks such as VGG-16, VGG19, and DensNet-160. In the end, they used different ML techniques to classify extracted features such as SoftMax, SVM, and XG boost. They obtained the highest accuracy with the VGG-19 model at 96.6%.
2.3. Multi-Task Learning Approaches
Multi-task learning has proved its ability to work with classification and detection issues. Indeed, multi-tasking is a new pragmatism to solve these types of problems [12] . It can be used for different diseases using, for instance, chest X-ray images and CT scan images as input data. Indeed, in [13] , the authors have proposed an artificial neural network as a tool to detect lung diseases such as chronic obstructive pulmonary disease (COPD), pneumonia, and lung cancer. After a pre-processing of images to remove irrelevant data, statistical features are extracted like standard deviation, entropy, and mean. For the architecture, they used a feed-forward neural network to identify such diseases. The dataset is collected from Sasoon Hospital. In [14] , to diagnose CXR thoracic diseases, a new version of CNN was used, called Attention Guided convolutional neural network (AG-CNN), which is utilized to combine the global and local information to improve recognition performance. Authors in [15] have proposed an algorithm called chest-X-Net. This algorithm, with a CNN architecture, is composed of 121 layers. It takes an image of the chest as input and the result is the likelihood of thorax diseases. This approach (ChexNet) has been used successfully on 14 thoracic diseases. The authors in [16] have mentioned that the multi-task learning method utilizing deep neural networks was historically subdivided into two structures: soft parameter sharing and hard parameter sharing. In hard parameter sharing, the parameters set is typically divided into task-specific and shared parameters, whereas in soft parameter sharing each task has its own set of parameters, single head for each parameter set. Udeshani et al. [17] , proposed different approaches, where chest X-ray images are used to identify lung cancer. Using this approach, the achieved accuracy was 96% while using the feature-based technique the achieved accuracy was 88%. Al-Edhari et al. [18] used a deep learning algorithm by using X-ray images as input to identify the normal and abnormal classification of pneumonia disease. Liu et al. [19] used a defined multi-task architecture, which divided the feature space into a task space and a shared specific part. They mentioned that multi-task learning is a powerful technique able to enhance the performance of a single task with the aid of other tasks together [20] [21] . Hemdan et al. [22] developed many deep-learning approaches to classify CXR images for COVID-19 and non-COVID-19 patients. They introduce a novel approach namely COVIDX-Net to help doctors automatically detect the disease in CXR. They achieved good results with an accuracy of 90% by utilizing VGG16 and DenseNet201. In [23] , Yuxing Tang et al. presented a convolutional neural network based on attention-guided curriculum learning (AGCL) to increase the accuracy of localization and classification of thoracic diseases. In addition, they have exploited the use of a relevant attention-guided improvement framework to enhance the performance for both localization and classification. In [24] , the authors used deep convolutional neural networks to detect thoracic disease utilizing chest X-rays. This principle might have two drawbacks. Firstly, unmatched objects in all chest X-ray images may lead to unwanted noise. Secondly, it reduces the image resolution because of the scaled process. To solve both issues, the authors have proposed a new technique called a Segmentation-based Deep Fusion Network, which enhances field knowledge and high-resolution information for lung areas. Varshni et al. [25] proposed a deep learning model with a specific convolutional neural network for feature extraction with a transfer learning approach. Indeed, they provided a pneumonia identification system utilizing a Densely Connected Convolutional Neural Network’ (DenseNet-169). This model, composed of CNNs for feature extraction combined witha supervised classifier algorithm emerged as an optimal solution for detecting normal and abnormal X-ray chest images. Wesley et al. [26] utilized the Alexnet deep learning network with transfer learning. The dataset composed of 5659 images was utilized for training. It achieved 72% accuracy. In this paper transfer learning is used, CNN model is Alexnet. The last three layers of the Alexnet model were replaced by a fully connected layer. The rest of the layers are standard classification layers. The goal of the work presented in [2] is to use a multi-task deep learning methodology to identify patients with COVID-19. They employed their Inception Residual Recurrent Convolutional Neural Network with a transfer learning approach to detect COVID-19 disease using X-ray and CT scan images. The achieved accuracy for CT scan images was 98.78% and for X-ray images was around 84.67%. The authors in [27] described a new framework that relies on the encoder-decoder multi-task model. They have used the CXR real-world dataset which contains healthy, bacterial pneumonia, viral pneumonia, and COVID-19 pneumonia images. The average achieved accuracy was 87.9%. In [28] , the authors have proposed a multi-classification deep learning architecture to diagnose and classify four types of diseases including COVID-19, pneumonia, lung cancer, and normal cases. The dataset combines CT scans and CXR images. The results have shown that VGG19+CNN achieved the highest accuracy of 98.05%. Authors in [29] employed a multitask approach to consolidate both triage approaches and propose a deep convolutional neural network for identification and severity quantification of COVID-19 disease.
3. Materials and Methods
3.1. Dataset
In this study, two different datasets have been used. The first data set is chest X-ray radiographs [30] . It contains 5856 files labeled (pneumonia vs. normal), which are split into the testing set and training set. Testing set with (234 Normal and 390 Pneumonia) and training set with (1349 normal and 3883 pneumonia). These images are stored in the format of JPEG. The resizing operation has been performed on these images. Before the operation, they had a width of 1233 and a height of 1460. After resizing they become 224 × 224 × 3 to fit the models used, all models have the input image with size 224 × 224 × 3 except inceptionv3 which has an image size of 150 × 150 × 3. These operations make this dataset useful for our research. The second dataset contains CT scans for COVID-19 [30] , with a total number of 8.055 files split and labeled as (5427 COVID and 2628 Non-COVID). They are not split into training and testing sets, therefore, we manually split them into 20% for the testing set and 80% for the training set. Figure 1 shows samples from both datasets.
3.2. Proposed Models
Two different models of machine learning are proposed. Firstly, single-task deep
Figure 1. Samples from the dataset: X-Ray for pneumonia and CT-scan for COVID-19.
learning is used for the two disease classifications. Secondly, a multi-task deep
learning model for both diseases is developed. These two machine-learning models are evaluated on both CT scans for COVID-19 and X-ray images for pneumonia disease.
3.2.1. Single-Task Learning
For single-task learning, the input data is the image of the disease, and the goal is to predict a single outcome disease (see Figure 2 for the pneumonia task).
3.2.2. Multi-Task Learning Methods
Multi-task learning in general aims to learn multiple tasks together at the same time. This will improve the generalizability of the CNN models. MTL can be defined as the optimization of several losses in a single model such that shared representation learning can execute related tasks [4] . Multi-task learning in the context of advanced learning is often carried out using either soft or hard parameters sharing of the convolution layer.
1) Soft Parameter Sharing
In soft parameter sharing, every task has its model with different parameters. After calculation, the distance between parameters is regularized to make the parameters look similar. In this method, every task has a unique design with unique characteristics. The model’s parameters are then pushed to be close by regularizing the gap among them.
2) Hard Parameter Sharing
Hard sharing is a widespread MTL application used in machine learning. In this method, all tasks share hidden layer data, and the output layer of each task is different. The technique of MTL within neural network models that are most frequently employed goes back to [31] and is known as hard parameter sharing. Typically, this is implemented by preserving several layers that are task-specific layers while sharing the hidden features units across the training process. Overfitting is considerably diminished by hard parameter sharing. In reality, the authors in [32] demonstrated that the danger of overfitting overall shared parameters, or the hidden and output, is an order of N lower than the problem of overfitting the task-specific variables. The model structure used in this work is shown in Figure 3, where the model starts with shared layers, and after it splits into two specific layers.
Figure 2. The overall architecture for single-task learning (Pneumonia) [45] .
Figure 3. The proposed hard parameter-sharing structure.
4. Results
The experiments are conducted using two different types of machine learning. Firstly, single-task deep learning is used for the two disease classifications. Secondly, a multi-task deep learning architecture is utilized for both diseases. These two machine learning types are evaluated on both CT scans for COVID-19 and X-ray images for pneumonia disease. The accuracy metric is used to compare different architectures for both STL and MTL (number of correct predictions)/(total number of predictions). It is noticed that the achieved accuracy of a single task is getting improved, using different models, and enhanced for multi-task learning approach.
4.1. Single-Task Learning
Four distinct state-of-the-art PCNN models, namely VGG-16, Inception-V3, Effi-cientNetB0, and ResNet-50, have been employed within the framework of a single-task model. These models were instantiated using Python’s machine-learning libraries and executed in the Google Colab environment. All these models have been retrained to find optimal weights that achieve better performances. The hyperparameters used for Re-seNet50 and EfficientNetB0 are shown in Table 1.
These networks were trained using 5 to 10 epochs, but to avoid overfitting, the training limit of 5 epochs will be used for multi-tasking. These models have a high performance with low overfitting. The batch size was chosen as 16 and to optimize the network parameter Adam optimizer was used with a learning rate of 0.0001. It decayed by 0.1 factor at the 5th epoch. In addition, the dataset used is a balanced dataset which means an equal number of images for each label. In this experiment, there were 500 images for each label of the COVID-19 and pneumonia dataset. Table 2 and Table 3 show different achieved accuracies for
Table 1. Hyperparameters used for network structure (COVID-19) with ResNet50 and (Pneumonia) with effecientNetB0.
Table 2. Comparison of different pre-trained CNN for Pneumonia [43] .
Table 3. Comparison of different pre-trained CNN for COVID-19 [43] .
single-task learning. For the Pneumonia disease, the highest score is realized by the EffecientNetB0 model (96.50%). Whereas for the COVID-19 disease, the ResNet-50 model reached the highest accuracy at 99.5%. Therefore, to conclude this part, the highest accuracy performance for single-task learning was obtained with the architecture ResNet-50 up to 99.5%, and it will be used as PCNN in the MTL. Figure 4 illustrates a sample of accuracy during the training and validation phase with the loss function variation.
4.2. Multi-Task Learning
The model of a single-task has been enhanced by building multi-task frameworks. Single-task performance will be improved by using multi-task deep learning with its two main methods which are hard parameter sharing and soft parameter sharing. The most useful is hard parameter sharing, here it achieved an accuracy of 99.9%. However, with soft parameter sharing the performance was not promising. The same hyperparameters used in single-task learning have been followed with some slight differences. For instance, Adam optimizer was used, the batch size was 64, the learning rate with 0.001, and the number of epochs varied from 5 to 10. These parameters have been adjusted until the desired accuracy is achieved. Figure 5 and Figure 6 depict the training and validation accuracies with their corresponding loss functions.
4.3. Models Performance and Discussion
In this work, we performed and compared the performances of different models. First, we compare the accuracy of pre-trained architectures (PCNNs) to determine the best one. It has been found that ResNet-50 outperforms the other PCNN models. This model has been chosen to be used in the multi-task approach for even more enhancement. Table 4 shows different achieved accuracies for COVID-19 and Pneumonia diseases. Further, the total number of parameters for each model is also mentioned. The second table (Table 5) is related to the achieved performance of multi-task learning using the RESNET-50 model,
(a) (b)
Figure 4. Training/validation accuracy (a) with the loss function variation (b) (single-task learning).
(a) (b)
Figure 5. Training/validation accuracy with the loss function variation for multi-task learning (Soft parameter sharing).
(a) (b)
Figure 6. Training/validation accuracy with the loss function variation (Hard parameter sharing).
Table 4. The accuracy was achieved with different PCNNs for CT scan and CXR2.
Table 5. MTL methods with the ResNet-50 model and their achieved accuracy.
where it is noticed that it has achieved the best accuracy under a hard parameter sharing structure, in comparison with the soft parameter sharing structure (see Table 4 and Table 5).
Moreover, comparisons of this work with others who have almost the same dataset with the same architectures are conducted. The models and the achieved accuracies are shown in Table 6 for single-task learning and Table 7 for multi-task learning approaches. Further, in this table, a comparison of the proposed scheme with their results is also displayed. The proposed models have outperformed their accuracies in both approaches single-task learning and multi-task learning as well. Therefore, we consider that we overcome their limitations and improve the accuracy of the classification. (See the comparison Table 6).
Furthermore, the achieved accuracies (see Table 5) are promising with a multi-task deep learning framework under the method of hard parameter sharing. The achieved accuracy is 99.9%, whereas with the second method, using soft parameter sharing, the performance achieved was up to 93.8% with the ResNet-50 model. In addition, the same work has been tested with EfficeintNetB0 since it achieves high accuracy in single-task learning for pneumonia disease, but it performs poor performance, and for this reason, this model has been ignored. We have used only the architecture of ResNet-50 since it achieved the highest result among the other architectures in single-task learning. We further improve this architecture using multi-task learning.
Figure 7 shows several prediction scores of the proposed method. The trained model clearly distinguishes Pneumonia, COVID-19 diseases, and normal.
Figure 7. Experimental results using the proposed method.
Table 6. Comparison with recent models using single-task learning with CT scan and CXR.
Table 7. Comparison with other models using the multi-task learning method.
5. Conclusion
In this work, two main deep learning architectures have been performed which are related to the deep learning paradigm. Two different real-world datasets for training and testing have been used, and two different learning models have been utilized (single-task learning and multi-task learning). A single-task model is trained with several state-of-the-art architectures. For the multi-task learning approach, the architecture that achieved the best performance among other PCNNs, is used in MTL. It is found that ResNet-50 has achieved the best accuracy. All the models have been evaluated on two types of medical images, which are CT scans for COVID-19 and CXR for pneumonia. Further, in addition to the multi-task approach, many types of machine learning architectures have been trained and tested for performance inquiry. A comparison between the achieved results with the recent works is conducted. It is noticed that the proposed model under a multi-task structure with hard parameter sharing outperformed their accuracies. Indeed, the performances have been improved by fine-tuning hyperparameters, pre-processing images, and using a balanced dataset. Finally, using this kind of proposed deep learning model may help physicians to early detection of such chest diseases, which will protect human life and health.