Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images - Journal of Biomedical Science and Engineering

JBiSE > Vol.17 No.9, September 2024

Journal of Biomedical Science and Engineering

Volume 17, Issue 9 (September 2024)

ISSN Print: 1937-6871 ISSN Online: 1937-688X

Google-based Impact Factor: 1.68 Citations

Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images ()

XML

Download as PDF (Size: 2899KB) PP. 157-170

DOI: 10.4236/jbise.2024.179012 140 Downloads 662 Views

Author(s)

Eri Matsuyama¹, Haruyuki Watanabe², Noriyuki Takahashi³

Affiliation(s)

¹Faculty of Informatics, The University of Fukuchiyama, Kyoto, Japan.
²School of Radiological Technology, Gunma Prefectural College of Health Sciences, Gunma, Japan.
³School of Health Sciences, Fukushima Medical University, Fukushima, Japan.

ABSTRACT

This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.

KEYWORDS

Lung Cancer Classification, Vision Transformers, Convolutional Neural Networks, Cross Entropy, Deep Learning

Share and Cite:

Matsuyama, E. , Watanabe, H. and Takahashi, N. (2024) Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images. Journal of Biomedical Science and Engineering, 17, 157-170. doi: 10.4236/jbise.2024.179012.

Cited by

No relevant information.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies