TITLE:
Variational Auto-Encoder and Speeded-Up Robust Features Hybrd Model for Anomaly Detection and Localization in Video SequenCE with Scale Variation
AUTHORS:
Sammy Wambugu Kingori, Lawrence Nderu, Dennis Njagi
KEYWORDS:
Variational Auto-Encoder, Speeded-Up Robust Features Hybrd Model
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.4,
April
27,
2025
ABSTRACT: Anomaly detection in complex crowd scenes is a challenging task due to the inherent variability in crowd behaviors, interactions, and scales. This paper proposes a novel hybrid model that synergistically integrates Variational Autoencoders (VAEs) and Speeded-Up Robust Features (SURF) to address these challenges. The VAE component captures latent temporal patterns in crowd dynamics, while SURF ensures robust, scale-invariant feature extraction. The proposed model leverages multi-resolution analysis, edge computing, and federated learning to enable real-time anomaly detection and localization. Additionally, tensor decomposition is employed for effective spatial-temporal feature integration. A detailed explanation of the feature fusion process between VAE and SURF is provided, highlighting how their interaction contributes to the overall performance improvement. To ensure reproducibility and credibility, we provide specific details about the architecture of the VAE, the implementation of SURF, hyperparameter tuning, the training process, and dataset specifics. To thoroughly evaluate the model’s novelty and performance, we conduct extensive comparisons not only with traditional methods like Hidden Markov Models (HMMs) but also with state-of-the-art deep learning-based anomaly detection models, including Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), and Spatio-Temporal Autoencoders. Furthermore, we provide a comprehensive computational complexity analysis and evaluate real-time performance metrics such as latency and throughput. Experimental evaluations on benchmark datasets demonstrate the model’s superior performance in terms of accuracy, robustness, and computational efficiency, making it a promising solution for real-time applications in surveillance and crowd monitoring.