TITLE:
Asynchronous Multi-Camera-IMU Pose Estimation Algorithm Based on Depth Confidence Optimization
AUTHORS:
Zhi Li, Guoliang Wei, Zhixuan Miao
KEYWORDS:
Multi-Camera, Confidence Scoring, Chi-Square Test, Indoor Localization, Gated Recurrent Unit, Factor Graph
JOURNAL NAME:
Open Journal of Applied Sciences,
Vol.16 No.2,
February
24,
2026
ABSTRACT: To address the limitations of traditional multi-camera-IMU state estimation systems—namely, insufficient localization accuracy in complex environments and poor robustness under abnormal IMU observations—this paper proposes an asynchronous multi-camera-IMU tightly-coupled navigation algorithm that incorporates a depth confidence scoring strategy and a chi-square test mechanism. The proposed method is built upon the factor graph optimization (FGO) framework. On the basis of conventional multi-camera-IMU fusion, a gated recurrent unit (GRU)-based depth confidence scoring model is introduced to mitigate uncertainty and partially suppress noise errors. These confidence scores are then utilized as weights in the factor graph to minimize the weighted sum of squared residuals, thereby improving the accuracy of the optimized state variables. Furthermore, a dynamic outlier rejection strategy based on chi-square testing is designed, which operates through three stages—pre-screening, initialization, and iterative statistical optimization—to preferentially utilize well-distributed and high-quality observation subsets while effectively excluding noise-sensitive measurements, thus strengthening the 6-DoF pose constraints. Extensive evaluations are conducted using indoor simulation scenarios and the public VINS-Multi dataset (sequence raw_515_435). The results demonstrate that, after incorporating the depth confidence scores, the pose uncertainty in asynchronous settings is reduced from an error range of 0.1 - 0.2 to within 0.05. Across all sequences in the dataset, the proposed method achieves an average improvement of 31.6% in localization accuracy, with particularly significant gains of up to 46.8% in challenging low-texture regions and scenarios with large motion magnitudes.