Detection and Classification of Lung Cancer Cells Using Swin Transformer

Lung cancer is one of the greatest threats to human health. It is a very effective way to detect lung cancer by pathological pictures of lung cancer cells. Therefore, improving the accuracy and stability of diagnosis is very impor-tant. In this study, we develop an automatic detection scheme for lung cancer cells based on convolutional neural networks and Swin Transformer. Microscopic images of patients’ lung cells are first segmented using a Mask R-CNN-based network, resulting in a separate image for each cell. Part of the background information is preserved by Gaussian blurring of surrounding cells, while the target cells are highlighted. The classification model based on Swin Transformer not only reduces the computation but also achieves better results than the classical CNN model, ResNet50. The final results show that the accuracy of the method proposed in this paper reaches 96.16%. Therefore, this method is helpful for the detection and classification of lung cancer cells.

agnosis include computed tomography (CT), chest X-ray, and cytopathological identification. Screening and detecting lung cancer cells are crucial in cancer prevention and control efforts [3]. Lung cancer diagnosis and ancillary tests rely on cytology and small biopsy specimens obtained by minimally invasive means [4]. Specimens of lung cancer cells are usually obtained from patients' sputum exfoliated cells, alveolar lavage fluid, bronchial secretions, or pleural effusions.
Compared with other screening methods, this method is convenient, quick, and basically non-invasive, which is very suitable for the initial screening.
Traditionally, lung cancer cytopathological images are used by pathologists or physicians to analyze cell morphology, number, differentiation, and other characteristics to reach a diagnosis. In recent years, as the number of patients with the disease has increased, the large number of lung cancer patients has brought thousands of data to be analyzed, and processing these data requires a large number of professionals. With the shortage of pathologists in some areas, it is unreasonable to use manual review data to cause a waste of human resources. The long-term repetitive and boring work also increases the possibility of misjudgment by professionals. Therefore, the research on cytopathological image-assisted diagnosis systems for lung cancer is of great practical significance. Combining advanced computer technology and the diagnostic experience of cytology experts can, to a certain extent, solve the current medical troubles of cancer cell diagnosis and reduce the workload and artificial influence of pathologists. This work can largely improve the efficiency of early lung cancer screening and reduce the mortality rate of lung cancer patients [5].
In the last decade, with the development of computer hardware and deep learning algorithms, artificial intelligence has been used to process the stream of data generated throughout the clinical pathway [6]. Computer-aided medical analysis techniques have also been rapidly developed with advances in image analysis algorithms and the rise of big data algorithms [7]. Using machine learning algorithms to identify and detect cancer has been shown to be feasible [8] [9] [10]. Today, many cytopathological recognition methods have been proposed as the techniques for image classification are becoming mature. However, since cells in different organs and tissues have different characteristics, the guidelines for physicians to determine whether a cell is diseased or not may change accordingly. There is no universal cytopathological image recognition method. Current methods for lung cancer cell detection suffer from low prediction accuracy, high resource consumption, and poor real-time performance. In this paper, we propose a transformer-based lung cancer cell detection network, which solves the above problems to some extent.

Related Work
The identification and detection of lung cancer cells consist of two main steps: cell nucleus segmentation and cell image classification. Segmentation of lung cancer cells involves segmenting one or more lung cells in an image to facilitate There are many traditional image segmentation methods, which are widely used in cell nucleus segmentation. Threshold segmentation [11] [12] is the simplest method to distinguish foreground objects from the background. The basic idea of clustering segmentation [13] [14] is to calculate the similarity between each pixel point and group the pixels with high similarity into one class, so as to segment the image. There are some other traditional methods for segmenting cell nuclei, such as the watershed algorithm [15] [16] and the active contour method [17]. All these traditional methods have obvious advantages and disadvantages, and individual methods are only applicable in some specific scene conditions. And they often have limitations for the complex environments that occur in reality. Therefore, a combination of multiple methods is often used in practice, which also brings new problems such as great computational effort and complex computational principles.
The success of deep learning has brought new life to medical image segmentation. In 2019, Yiming Liu et al. [18] used a combination of coarse and fine segmentation methods, and first trained Mask-RCNN to obtain coarse segmentation results. The local fully connected conditional random field is used in fine segmentation, and finally the two are fused. In 2020, Cai et al. [19] proposed a The traditional method of lung cancer cell classification requires manual extraction of cell features. The advent of deep learning has simplified this step [21].
The global linking and weight sharing features of convolutional neural networks make them well suited for processing images, which has led to the derivation of many classical CNN models. In 2014, Simonyan and Zisserman proposed the VGG model [22] with a deeper network structure. Compared to other neural networks, it uses a smaller size convolutional kernel, which increases the nonlinear representation of the network while reducing the parameters. In 2015, ResNet was introduced to solve the gradient vanishing problem, which is common in neural networks [23]. It introduced residual blocks, and the network performed Identity Mapping through shortcuts with good results. In 2016, Huang G, et al. [24] effectively alleviated the gradient disappearance problem by reusing the feature map in the network while enhancing the transfer of features in the network. In 2017, Teramoto et al. [25] developed an automatic classification scheme for lung cancer based on microscopic images using deep convolutional neural networks (DCNN). And their classification accuracy was evaluated using triple cross-validation. In the obtained results, about 71% of the images In 2017, the transformer framework proposed by Google [27] attracted a lot of attention. It not only became a mainstream model in the field of natural language processing, but also started to expand to the field of computer vision. In 2020, Google proposed Vision Transformer (ViT) [28]. ViT has also been used in medical image processing. For example, the staff of [30] [31] used transformers to distinguish COVID-19 from other types of pneumonia by computed tomography (CT) or X-ray images, meeting the urgent need for fast and effective treatment of COVID-19 patients.
As can be seen from the references, the current lung cancer cytopathology image detection technology is not mature enough, and the accuracy of detection is low. CNN can only extract local features through a convolution kernel, while the ViT model can learn the features of the whole image through an attention mechanism, which can better analyze the image. Therefore, the work in this paper is very meaningful in the field of early diagnosis of lung cancer.

Cell Segmentation and Data Enhancement
In this study, NucleAIzer [32], a deep learning framework for cell nucleus seg-

Swin Transformer Structure
The transformer structure used for lung cancer cytopathology image classification is shown in Figure 2  The structure of one of the core modules, the Swin Transformer Block, is shown in Figure 2

Experimental Environment
The experiments in this paper were conducted on Ubuntu 18.04.5

Experimental Results and Analysis
The performance of the Swin Transformer model on the test set is shown in Ta Figure 3. In the confusion matrix, we can see that all the images in the noise category are classified correctly, and only a small number of errors are generated in the abnormal and normal categories.

Extended Experiments
In order to observe the performance of the Swin Transformer model, the res-Net50 and resNet50 + FPN models, which perform very well in the field of image classification, are selected for comparison with the current model, where FPN stands for Feature Pyramid Network. This is a feature fusion technique.
The basic idea of FPN is to improve the effectiveness of the network by fusing the features of higher and lower layers together, i.e., multi-scale feature fusion, so as to fully utilize the features of each stage of the network.
Then experiments were conducted using three different models in two publicly available cervical cell datasets-Herlev and SIPaKMeD-as a way to demonstrate the generalization performance of the model. The SIPaKMeD dataset is a five-category labeled cervical cell dataset with a total of 4049 cervical cells. The overall precision, recall, and specificity of the model were calculated by calculating the mean of the different categories. The experimental setup and dataset division was the same as before. Table 2 shows the results of the different data sets on the different models.
The results show that Swin Transformer performs slightly worse than ResNet50 on the SIPaKMeD dataset, except on the other two datasets, where the results are significantly better than all other classification models. The accuracy of the lung cancer cell dataset, which is our main focus, reached 96.14, which is nearly two percentage points higher than the resNet50 model. This demonstrates the effectiveness of the Swin Transformer for lung cancer cell image classification and that it can perform well on other cell image datasets as well.

Conclusion
In this paper, a Swin Transformer-based lung cancer cell classification model is The experimental results showed that the accuracy of classification reached 96.16%. Therefore, it can be proved that using Swing Transformer to detect lung cancer cells is effective.