TITLE:
Seeing the Walk: Vision Transformers for Accurate Human Gait Recognition
AUTHORS:
Nouf Nayish M. Alghamdi, Osamah A. M. Ghaleb
KEYWORDS:
Human Gait Recognition, Vision Transformer Models, Gait Pattern
JOURNAL NAME:
Journal of Computer and Communications,
Vol.14 No.3,
March
16,
2026
ABSTRACT: Human gait recognition (HGR) is a non-invasive biometric modality that is applicable in mass-scale surveillance and security systems. Nevertheless, HGR systems are currently susceptible to covariate variables, including viewpoint, clothing, and the carrying conditions. Recent deep learning methods, especially convolutional neural networks, have also enhanced recognition performance, though at the cost of modeling global spatiotemporal interactions and with high training data requirements. This study presents a transfer learning-based HGR model that relies on Vision Transformer (ViT) models to harness self-attention mechanisms to achieve strong representation of global features. ViT-B/16 and ViT-L/32 are two pre-trained transformer models that were trained on gait image sequences on the CASIA-B dataset. The framework was evaluated across four viewing angles (0˚, 18˚, 36˚, and 54˚) under varying covariate conditions. Training and testing accuracy and loss metrics were used as performance metrics at the learning rate of 0.001 and 0.0001. Furthermore, experiments were conducted for both frontal-view and cross-view analysis. The results indicate that transformer-based models are capable of achieving strong recognition performance. ViT-L/32 achieved the highest average testing accuracy at 87.87 percent, followed closely by ViT-B/16 with 86.99 percent. Both models outperformed several recently proposed HGR approaches. The attention-based architecture successfully extracts discriminative gait images across image patches and is more robust to viewpoint and appearance changes as well as it consumes less computational costs due to pre-trained models. These results highlight the usefulness of Vision Transformers as an effective and precise alternative to traditional deep learning methods of recognizing human gait in biometric applications.