TITLE:
Covariant-Scale: Riemannian Manifold Learning for Scale-Invariant Video Anomaly Detection
AUTHORS:
Sammy Wambugu Kingori, Lawrence Nderu, Dennis Njagi
KEYWORDS:
Video-Based Anomaly Detection, Urban Surveillance, Scale-Projective Ambiguity, Riemannian Manifold, Scale-Covariant Tracking, Geometric Deep Learning, Variational Autoencoder (VAE), Ricci Curvature Constraints, LiDAR-Camera Fusion
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.11,
November
21,
2025
ABSTRACT: Video-based anomaly detection in urban surveillance faces a fundamental challenge: scale-projective ambiguity. This occurs when objects of different physical sizes appear identical in camera images due to perspective projection for example, a child standing 3 meters away may occupy the same number of pixels as an adult standing 5 meters away. This ambiguity causes severe failures in detecting, localizing, and tracking anomalous events. Current methods suffer from three critical limitations: (1) scale-invariant features like SURF discard absolute size information, (2) monocular depth estimation introduces unacceptable errors (>0.5 m at 10 m distance), and (3) existing tracking systems fragment when objects change scale. To address these challenges, we introduce Covariant-Scale, a unified framework that combines classical geometry with modern deep learning. Our approach makes four key contributions: First, we model the space of object transformations (including scale changes) as a curved geometric surface called a Riemannian manifold. This allows us to track how objects naturally evolve through different scales while maintaining their physical properties. Second, we develop a novel deep learning architecture a Variational Autoencoder (VAE) with geometric constraints that learns to separate an object’s appearance from its scale. This separation is enforced through a mathematical property called Ricci curvature, which ensures scale information remains consistent regardless of the object’s distance from the camera. Third, we integrate LiDAR depth sensors with cameras through physics-based principles. We mathematically prove that LiDAR reduces scale estimation errors by 10,000× compared to camera-only methods at 10-meter distances a fundamental limit we derive using information theory (Cramér-Rao bound). Fourth, we develop a tracking system that preserves the physics of motion by converting 2D pixel movements into true 3D velocities, reducing identity confusion by 41% during scale transitions. Evaluated on three benchmark datasets (UCSD, ShanghaiTech, Avenue) under extreme scale variations (>15× zoom range), Covariant-Scale achieves: 98.2% detection accuracy (22.1% improvement over state-of-the-art), 76% reduction in false alarms in crowded scenes, and real-time performance (34 ms per frame) on embedded hardware. This work establishes a new paradigm for video analytics that bridges theoretical geometry with practical computer vision, resolving a 20-year challenge in safety-critical surveillance systems.