A Wavelet-Based Two-Stage Vision Transformer Model for Histological Subtypes Classification of Lung Cancers on CT Images ()
ABSTRACT
Accurate histological classification of lung cancer in CT images is essential for diagnosis and treatment planning. In this study, we propose a vision transformer (ViT) model with two-stage fine-tuning using wavelet transformation to improve classification performance. In the first stage, feature extraction is enhanced using wavelet-transformed images, and in the second stage, the model is fine-tuned with the original CT images. This method improves classification accuracy and enhances model robustness. Experimental results show that the proposed method outperforms conventional ViT and CNN fine-tuning methods. It achieves a classification accuracy of 0.971, surpassing the 0.953 obtained with conventional ViT fine-tuning and 0.945 with ResNet50 fine-tuning. Moreover, the proposed method reduces classification uncertainty, with particularly significant improvements in the classification of large cell lung carcinoma. These results demonstrate the effectiveness of incorporating wavelet-based feature extraction into ViT fine-tuning for lung cancer classification. Future research will focus on developing optimization techniques, applying the method to multimodal medical imaging, and integrating explainable AI technologies to further improve its applicability in clinical settings.
Share and Cite:
Matsuyama, E. , Watanabe, H. and Takahashi, N. (2025) A Wavelet-Based Two-Stage Vision Transformer Model for Histological Subtypes Classification of Lung Cancers on CT Images.
Open Journal of Medical Imaging,
15, 57-72. doi:
10.4236/ojmi.2025.152005.
Cited by
No relevant information.