Journal of Computer and Communications

Volume 12, Issue 4 (April 2024)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.98  Citations  

Image Classification Based on Vision Transformer

HTML  XML Download Download as PDF (Size: 4040KB)  PP. 49-59  
DOI: 10.4236/jcc.2024.124005    249 Downloads   2,118 Views  

ABSTRACT

This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT’s ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.

Share and Cite:

Omer, A. (2024) Image Classification Based on Vision Transformer. Journal of Computer and Communications, 12, 49-59. doi: 10.4236/jcc.2024.124005.

Cited by

No relevant information.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.