TITLE:
Vision Transformers (VIT’s) for Early Identification of Alzheimer’s Disease
AUTHORS:
Aayush Rajesh Jadhav
KEYWORDS:
Alzheimer’s Disease, Computer Tomography, Magnetic Resonance Imaging, Deep Neural Networks, Recurrent Neural Networks, Convolutional Neural Networks, Vision Transformers
JOURNAL NAME:
Open Journal of Applied Sciences,
Vol.15 No.6,
June
27,
2025
ABSTRACT: This thesis focuses on leveraging Image Processing, Computer Vision, Machine Learning, and Deep Learning, particularly the Vision Transformer (ViT) model, for early identification of Alzheimer’s disease (AD), the most common form of dementia progressing from mild memory loss to significant impairment in daily interactions. Unlike prior studies that rely on conventional Convolutional or Recurrent Neural Networks, this research integrates ViT with a unique pre-processing strategy that includes sagittal-plane slicing, PCA-based feature reduction, and watershed segmentation to enhance regional interpretability. A key novelty lies in training on a clean, pre-augmented Kaggle dataset and testing on the real-world, imbalanced OASIS-3 dataset—demonstrating the model’s ability to generalize from curated to noisy clinical data. The study details the ViT model’s architecture, pretraining, and fine-tuning processes, employing a two-step training approach for efficient classification. The ViT-Base-Patch16-224 model undergoes pretraining on ImageNet-21k and fine-tuning on ImageNet 2012, incorporating data pre-processing with image partitioning, positional embeddings, and various transformations. The training process involves optimization with the AdamW optimizer, learning rate adjustments, exponential moving averages, and early stopping callbacks. Evaluation on Kaggle Alzheimer’s and OASIS-3 datasets reveals promising performance, achieving 97.34% accuracy on Kaggle and 81.25% on OASIS-3. The confusion matrix and F1 score analyses highlight the model’s strengths and areas for improvement, demonstrating high precision and recall for different classes, particularly in Alzheimer’s disease identification. This study contributes to medical image analysis by emphasizing the ViT model’s accuracy in classifying Alzheimer’s cases, highlighting a novel framework adaptable to varied MRI datasets and offering interpretable, transferable results for clinical use.