Journal of Biomedical Science and Engineering

Volume 17, Issue 9 (September 2024)

ISSN Print: 1937-6871   ISSN Online: 1937-688X

Google-based Impact Factor: 1.68  Citations  

Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images

  XML Download Download as PDF (Size: 2899KB)  PP. 157-170  
DOI: 10.4236/jbise.2024.179012    140 Downloads   662 Views  

ABSTRACT

This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.

Share and Cite:

Matsuyama, E. , Watanabe, H. and Takahashi, N. (2024) Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images. Journal of Biomedical Science and Engineering, 17, 157-170. doi: 10.4236/jbise.2024.179012.

Cited by

No relevant information.

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.