Image Classification Based on Vision Transformer ()
ABSTRACT
This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT’s ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.
Share and Cite:
Omer, A. (2024) Image Classification Based on Vision Transformer.
Journal of Computer and Communications,
12, 49-59. doi:
10.4236/jcc.2024.124005.
Cited by
No relevant information.