Research on Vehicle Tracking Method Based on YOLOv8 and Adaptive Kalman Filtering: Integrating SVM Dynamic Selection and Error Feedback Mechanism ()
1. Introduction
Vehicle tracking technology finds extensive applications in intelligent transportation systems, autonomous driving, and video surveillance [1]. Its core task is to precisely localize and continuously track target vehicles, supporting key functions such as path planning, traffic flow analysis, and behavior prediction [2].
In autonomous driving, accurate vehicle tracking enables vehicles to identify dynamic obstacles in complex environments, facilitating safe and efficient decision-making. In traffic surveillance, it aids in detecting traffic violations, monitoring congestion, and improving road efficiency.
However, the highly dynamic and unpredictable nature of real-world environments presents numerous challenges for vehicle tracking:
1) Occlusion Problem
Occlusion is a common issue in vehicle tracking within multi-vehicle scenarios [3]. For instance, when a vehicle is partially or fully obscured by other vehicles, pedestrians, or static objects, the continuity of detection and tracking may be severely affected.
2) Dynamic Scenarios and Complex Backgrounds
In densely populated dynamic traffic scenes, vehicle motion can exhibit nonlinear behaviors such as acceleration, deceleration, and turning. Additionally, complex backgrounds—including multi-target interference, strong lighting, or shadows—can significantly reduce the accuracy of detection and tracking [4].
3) Error Accumulation
In traditional tracking frameworks, discrepancies may arise between detection results and trajectory predictions by filters. When tracking persists over time or the dynamics of the scene intensify, these errors can accumulate, ultimately leading to tracking failures.
To overcome these challenges, it is crucial to develop vehicle tracking methods that offer high accuracy, strong robustness, and dynamic adaptability. This paper proposes an integrated approach combining object detection with trajectory prediction. The method utilizes the deep learning-based object detection model YOLOv8 [5] and an adaptive filter selector to achieve efficient and robust vehicle tracking.
To address the limitations of existing methods, this paper introduces two key innovations:
1) SVM-Based Adaptive Filter Selection Mechanism
Vehicle motion characteristics (e.g., constant velocity, acceleration, nonlinear movement) vary across different scenarios, making it challenging for a single filter to effectively handle diverse conditions. To address this, an adaptive filter selection mechanism based on Support Vector Machine (SVM) is designed. This mechanism dynamically selects the optimal filter—standard Kalman Filter (KF), Extended Kalman Filter (EKF), or Unscented Kalman Filter (UKF)—based on the current motion characteristics of the vehicle. This approach significantly enhances tracking accuracy and robustness in complex and dynamic environments.
2) Integration of Detection and Tracking Error Feedback Mechanism
To minimize discrepancies between detection and prediction, an error feedback mechanism is proposed. This mechanism dynamically feeds the error between YOLOv8 detection results and filter predictions back to the filter parameter adjustment module. By adaptively tuning the process noise covariance and measurement noise covariance, the filter can better handle sudden events (e.g., abrupt acceleration or sharp turns), thereby improving the consistency and stability of tracking.
2. Related Work
Vehicle tracking technology is a key research area in intelligent transportation, autonomous driving, and security fields. Its main objective is to accurately locate the target vehicle and predict its motion trajectory. Currently, vehicle tracking methods are mainly divided into two categories: deep learning-based object detection methods [6] and classical filtering-based trajectory prediction methods [7]. This section reviews these two approaches and discusses their advantages and limitations based on the latest research.
2.1. Deep Learning-Based Object Detection Methods
With the development of deep learning, vehicle tracking methods based on object detection have gradually become mainstream. Object detectors generate candidate bounding boxes by detecting each video frame, and vehicle trajectories are formed using subsequent association strategies.
1) YOLO Series Models
The YOLO (You Only Look Once) model, proposed by Redmon et al. [8], pioneered single-stage object detection. The current popular version, YOLOv8, optimizes the network structure and feature fusion, demonstrating excellent accuracy and speed in vehicle detection tasks [9] [10].
The advantage of the YOLOv8 model is its ability to achieve real-time vehicle detection with a high frame rate, performing well even in occlusion and low-light conditions. However, its detection results are highly dependent on the diversity of the training data, and there is limited coordination with subsequent tracking modules.
2) Faster R-CNN
Faster R-CNN, introduced by Ren et al. [11], significantly improves detection accuracy by incorporating a Region Proposal Network (RPN). Zhang Ying et al. [12] designed a vehicle detection solution based on the Faster R-CNN model for unmanned aerial vehicle (UAV) platforms. Ouyang Bo et al. [13] proposed a lightweight tracking model, FA-SORT, which achieved a tracking speed of 29.93 FPS when tested on the UAVDT dataset.
The advantage of Faster R-CNN lies in its high accuracy, making it particularly suitable for complex backgrounds and dense target scenarios. However, its computational complexity is relatively high, making it difficult to meet real-time processing requirements.
2.2. Kalman Filtering and Its Variants
Classic filtering methods, represented by the Kalman filter, estimate and update vehicle trajectories by combining detection results with predicted states. These methods are typically used in conjunction with object detectors for multi-target tracking tasks.
1) Kalman Filter (KF)
The Kalman Filter (KF) is an efficient linear state estimation method widely used for vehicle tracking in scenarios involving constant or linear motion. Gordon et al. [14] applied KF in real-time traffic flow monitoring, significantly improving the accuracy of vehicle trajectory predictions. However, KF performs poorly when dealing with nonlinear or complex motion.
2) Extended Kalman Filter (EKF)
The Extended Kalman Filter (EKF) extends the KF to nonlinear systems by performing a first-order linearization, making it suitable for a wider range of applications. Reid et al. [15] applied EKF to multi-target tracking, achieving high accuracy in vehicle scenarios involving rapid acceleration and sharp turns. However, its performance in highly nonlinear scenarios is still limited by linearization errors.
3) Unscented Kalman Filter (UKF)
The Unscented Kalman Filter (UKF) improves trajectory prediction accuracy in nonlinear systems by using sigma-point sampling, eliminating the need for linearization. The UKF algorithm, proposed by van der Merwe et al. [16], demonstrates excellent performance in dynamic target tracking. Farag et al. [17] combined UKF with deep detectors to perform multi-target tracking in complex traffic environments.
2.3. Integrated Approaches Combining Deep Learning and
Classical Filtering
In recent years, researchers have explored the deep integration of object detection and filtering techniques to optimize vehicle tracking performance. Bui T et al. [18] proposed a tracking framework that combines YOLOv5 with an optimized Kalman filter (KF) algorithm in DeepSORT, achieving a 15.5% improvement in mAP and a 14.2% increase in MOTA. Gao J et al. [19] reduced the impact of observation noise on detection accuracy during nonlinear motion by incorporating a lightweight channel block attention mechanism (LCBAM) and noise-adaptive Extended Kalman Filter (NSA-EKF). These methods demonstrate the complementarity of deep detectors and classical filtering techniques in vehicle tracking tasks, making them a current research hotspot.
Deep learning-based object detection methods offer efficient target recognition capabilities, while classical filtering methods provide irreplaceable advantages in state estimation and trajectory prediction. Integrating the strengths of both approaches into a unified tracking framework is an effective solution for vehicle tracking in complex dynamic environments.
3. Methodology
3.1. YOLOv8 Detection Module
In this study, YOLOv8 is responsible for the initial vehicle detection, providing high-quality target information for the subsequent tracking module. This information is then used to enhance the overall tracking performance through Kalman filtering and error feedback mechanisms. The following section outlines the configuration and structural optimizations of the YOLOv8 model used in this research.
1) Overall Framework Configuration
In this study, YOLOv8 still uses CSPDarknet as the backbone network, incorporating a bottleneck structure (Bottleneck CSP) to reduce redundant computations and lower model complexity, thereby improving detection speed while maintaining accuracy. Additionally, multi-scale feature fusion modules (FPN and PAN) are introduced in the structure to enhance the model’s performance in detecting vehicles of varying sizes. Furthermore, the application of depthwise separable convolutions significantly reduces the computational cost of YOLOv8, enabling real-time processing. Finally, to improve detection accuracy, a reinforced data augmentation strategy is employed during YOLOv8 training, including random scaling, translation, flipping, and color adjustment, which enhances the model’s robustness and generalization ability. The overall framework of YOLOv8 is shown in Figure 1.
Figure 1. YOLOv8 structural frame diagram.
2) Input and Output Formats
The input format for the YOLOv8 detection module is a standard RGB image, which is normalized and resized to a specific dimension (e.g., 640 × 640) before being fed into the model. This ensures consistent detection accuracy across images with different resolutions. During the input process, YOLOv8 also incorporates anchor box adaptive adjustment techniques, allowing the model to more flexibly accommodate vehicles of various sizes and perspectives.
The output format of YOLOv8 includes the coordinates of the predicted bounding boxes, target categories, and confidence scores. For vehicle detection tasks, the model outputs the vehicle’s bounding box positions (x, y, width, height) along with the probability distribution for the corresponding category (e.g., car, truck, motorcycle), where the confidence score is used to filter out low-confidence targets. After post-processing with Non-Maximum Suppression (NMS), overlapping redundant boxes are eliminated to ensure the accuracy of the output results. In this study, the detection boxes output by YOLOv8 are directly fed into the Kalman filter module for further trajectory prediction and state estimation.
3.2. Kalman Filtering and Its Improvements
In vehicle tracking tasks, the Kalman filter is a commonly used state estimation method that combines observation and prediction to estimate the object’s position and motion state. While the standard Kalman filter (KF) performs excellently in linear systems with Gaussian noise, it has limitations in dynamic, nonlinear, and non-Gaussian environments. Therefore, this study introduces several variants of the Kalman filter, including the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and Interactive Multiple Model (IMM) filter. Additionally, an adaptive filter selection mechanism based on Support Vector Machine (SVM) is proposed to dynamically select the optimal filter in different scenarios, thus optimizing detection and tracking performance.
3.2.1. Mathematical Models and Characteristics of Kalman Filter Variants
1) Standard Kalman Filter (KF)
The standard Kalman filter assumes that both the system state model and the measurement model are linear and that the noise follows a Gaussian distribution. The state estimation process consists of two stages: prediction and update.
The state prediction equations are given by the following Equation (1) and (2).
(1)
(2)
The state update equations are given by the following Equation (3), (4) and (5).
(3)
(4)
(5)
Wherein,
represents the state transition matrix,
is the measurement matrix,
and
denote the process noise and measurement noise covariance matrices, respectively, and
is the estimation error covariance matrix.
2) Extended Kalman Filter (EKF)
The Extended Kalman Filter (EKF) extends the linear assumption of the standard Kalman Filter (KF) to accommodate mildly nonlinear systems. The EKF achieves approximate linearization by performing a first-order Taylor expansion on the nonlinear state transition function and observation function.
The equations for the state prediction phase (nonlinear) are given by Equations (6) and (7) as follows.
(6)
(7)
The state update equations are given by the following Equation (8), (9) and (10).
(8)
(9)
(10)
In this context,
and
represent the nonlinear functions for state transition and observation, respectively.
3) Unscented Kalman Filter (UKF)
The Unscented Kalman Filter (UKF) addresses the state estimation problem in nonlinear systems through a set of weighted sample points known as “sigma points.” Without the need for linearization, the UKF achieves higher-precision state estimation under stronger nonlinear conditions.
Sigma Point Generation and Propagation: A set of weighted sigma points is generated, and the propagation results of each sigma point are used to update the mean and covariance, thereby calculating the estimated state distribution.
3.2.2. SVM-Based Adaptive Filter Selection Mechanism
In complex vehicle tracking scenarios, a single filter struggles to cope with all situations due to the presence of different motion patterns (such as uniform speed, acceleration, nonlinear trajectories, etc.). Therefore, an adaptive filter selection mechanism based on Support Vector Machines (SVMs) is proposed. This mechanism utilizes SVMs to classify motion patterns in the input feature space, thereby selecting the most suitable Kalman filter variant according to the characteristics of the scene. This dynamic selection can significantly enhance the adaptability of the filter and ensure the accuracy and robustness of the tracking process.
The design of the SVM-based adaptive filter selection mechanism can be divided into three main steps: feature extraction, SVM classifier training, and dynamic filter selection.
1) Feature Extraction
To achieve accurate filter selection, the system needs to extract features that can effectively distinguish between motion patterns. These features include the target’s velocity (
), acceleration (
), and the rate of change in its motion trajectory (such as curvature changes). The distributions of these features under different motion patterns are typically distinct, serving as effective bases for classification. Therefore, we define the feature vector as shown in Equation (11):
(11)
Wherein,
represents the instantaneous velocity of the target at the current moment;
denotes the acceleration of the target; and
indicates the change angle of the movement direction (i.e., steering angle or trajectory curvature).
2) SVM Classifier
Within the feature space, the feature vector X is input into the Support Vector Machine (SVM) classifier. The SVM utilizes the trained classification boundary to separate different motion patterns. Specifically, we train a multi-class SVM classifier that categorizes vehicle motion into three classes: uniform speed, acceleration, and complex nonlinear motion.
Assuming the training data is denoted as
, where
represents the input features and
corresponds to the motion pattern labels (1 for uniform motion, 2 for accelerated motion, and 3 for nonlinear motion). The objective is to find an optimal classification boundary by solving the following optimization problem using SVM, as shown in Equations (12) and (13):
(12)
subject to
(13)
Wherein, W represents the classification weight vector, B is the bias, CC stands for the slack variable, C is the penalty parameter, and QQ denotes the kernel function used to map features into a high-dimensional space. By solving the aforementioned optimization problem, the SVM obtains the optimal separating hyperplane, enabling classification of newly input data points.
3) Filter Selection Strategy
Once the current motion pattern is determined, the system dynamically selects the corresponding Kalman filter variant based on the SVM classification result. The specific selection strategy is as follows:
For uniform motion, the standard Kalman Filter (KF) is selected;
For accelerated motion, the Extended Kalman Filter (EKF) is chosen;
For nonlinear motion, the Unscented Kalman Filter (UKF) is utilized.
Therefore, given the input features X at the current time, the SVM outputs a label
, and the corresponding filter is selected based on this label to enhance vehicle tracking performance.
Compared to a fixed filter, the SVM-based adaptive selection mechanism overcomes the limitations of a single filter in complex scenarios by dynamically switching filters, thereby improving tracking accuracy and stability.
Figure 2 illustrates the flowchart of the SVM-based adaptive filter selection mechanism.
Firstly, data preprocessing and feature extraction are conducted to compute the current speed, acceleration, and direction change angle from the target’s historical trajectory, forming the feature vector X.
Subsequently, the feature vector X is input into an SVM classifier, which outputs the current motion mode label y (in a multi-class SVM, based on the trained classification boundaries, the SVM can effectively distinguish between uniform, accelerated, and complex nonlinear motions).
Figure 2. Flowchart of the adaptive filter selection mechanism.
Concurrently, based on the classification result y, the corresponding Kalman filter variant is selected. For instance, when y = 1, the system employs the standard Kalman Filter (KF); when y = 2, the Extended Kalman Filter (EKF) is used; and when y = 3, the Unscented Kalman Filter (UKF) is chosen.
Finally, with the selected filter, the YOLOv8 detection results are input as observations to the filter, and the updated target state is calculated to obtain more precise location information.
3.3. Error Feedback Mechanism
Apart from selecting the appropriate Kalman filter, dynamic changes or unexpected events in the environment (such as sudden acceleration, deceleration, occlusion, or unexpected direction changes of the target) can also lead to increased tracking errors. To enhance the system’s responsiveness to these unexpected events, this study proposes an error feedback mechanism that dynamically adjusts the parameters of the Kalman filter by feeding back detection errors. This mechanism not only compensates for errors generated during detection and tracking but also adapts when the filter’s response is poor, thereby improving the system’s stability and accuracy in complex environments.
The design of the error feedback mechanism is based on dynamic feedback adjustment of detection errors and mainly consists of the following steps:
1) Error Calculation
In each frame, the detection error
is calculated based on the difference between the target position detected by YOLOv8 (detection result) and the predicted position by the Kalman filter (prediction result).
Assuming the detection result is
(observation) and the filter’s prediction is
, the error is defined as Equation (14):
(14)
Wherein,
denotes the detection error vector at time
, encompassing information on the differences in position and velocity.
2) Error Weight Calculation
Different feedback intensities should be applied to detection errors in various scenarios. By introducing an error weight
, the errors are weighted accordingly. The error weight can be dynamically adjusted based on the magnitude of the detection error and the frequency of occurrence of unexpected events. For instance, a larger feedback weight is assigned to significantly increased errors.
The feedback weight can be defined by Equation (15):
(15)
Wherein,
and
are adjustable parameters, and
represents the magnitude of the error. By adjusting the feedback weight, the filter’s responsiveness to unexpected events can be enhanced.
3) Filter Gain Adjustment
The error is fed back to the process noise covariance
and measurement noise covariance
of the Kalman filter. By dynamically adjusting
and
, the filter can enhance its adaptability to different error scenarios, thereby improving tracking accuracy.
The adjustment model for the feedback mechanism can be expressed as Equations (16) and (17):
(16)
(17)
The error feedback adjusts the process noise
and the measurement noise covariance matrix
, enabling the filter to adaptively adjust its gain matrix based on the current error.
Figure 3 illustrates the complete overall process of adaptive tracking.
Figure 3. Overall flowchart of adaptive tracking.
4. Experimental Setup
4.1. Datasets and Evaluation Metrics
To comprehensively evaluate the performance of the vehicle detection and tracking system proposed in this study, two widely used standard datasets, KITTI [20] and UA-DETRAC [21], are selected. These datasets cover a variety of scenarios (such as urban roads, highways, and adverse weather conditions) and different dynamic conditions (such as occlusion and low-light environments). Additionally, to quantify the model’s performance, mean Average Precision (mAP), Multiple Object Tracking Accuracy (MOTA), and Frames Per Second (FPS) for real-time performance are adopted as the core evaluation metrics.
4.1.1. Dataset Selection
1) KITTI Dataset
The KITTI dataset is one of the standard datasets for autonomous driving research. It primarily consists of images from common traffic scenarios such as urban roads and highways, including 7481 training images and 7518 test images.
2) UA-DETRAC Dataset
The UA-DETRAC dataset focuses on vehicle detection and tracking tasks. It covers high-density traffic areas such as urban roads and intersections. It provides 10 hours of video data, comprising 1210 video clips and 140,000 annotated frames.
Table 1 summarizes the main characteristics of the KITTI and UA-DETRAC datasets:
Table 1. Table type styles (Table caption is indispensable).
Dataset |
Task Type |
Data Volume |
Scenario
Coverage |
Challenges |
KITTI |
Detection &
Tracking |
7481 Training Images |
Urban,
Highways |
Occlusion,
Multi-target,
Illumination
Changes |
UA-DETRAC |
Detection &
Tracking |
1210 Video
Clips |
Urban, Traffic Intersections |
Complex
Background, Dense Targets, Low-light
Scenarios |
4.1.2. Evaluation Metrics
This paper quantitatively evaluates the system performance from both detection and tracking perspectives, mainly using the following metrics:
1) Mean Average Precision (mAP@0.5)
mAP is a core metric for object detection, used to measure the overall performance of a detection model across all categories. mAP@0.5 indicates the mean average precision calculated with an Intersection over Union (IoU) threshold of 0.5. Its calculation is shown in Equation (18):
(18)
where,
is the number of categories, and
is the average precision for the i-th category.
2) Multiple Object Tracking Accuracy (MOTA)
MOTA is used to quantify the overall performance of a tracking task, taking into account object losses, mis-tracking, and ID switches. Its calculation is shown in Equation (19):
(19)
where,
,
, and
represent the number of false negatives, false positives, and ID switches at frame
, respectively, and
represents the total number of ground truth objects at frame
.
3) Real-time Performance (FPS)
Frames Per Second (FPS) is used to measure the speed performance of the model and is an important indicator of real-time performance. A high FPS value indicates that the system is suitable for practical applications.
4.2. Experimental Parameters and Environment
The experiments in this study were conducted in a high-performance computing environment to ensure the real-time and reliable performance of vehicle detection and tracking tasks. The following details the software and hardware configurations of the experimental environment, as well as the specific hyperparameter settings for the YOLOv8 detection module and Kalman filter.
4.2.1. Software and Hardware Configuration
The software and hardware configurations used in the experiments are shown in Table 2:
Table 2. Software and hardware configuration table.
Software/Hardware Component |
Configuration Details |
Operating System |
Windows 11 |
Graphics Card |
NVIDIA RTX 3060 12GB |
Memory |
128GB DDR4 |
Python |
3.9 |
PyTorch |
1.12.1 |
CUDA |
11.6 |
The GPU (RTX 3060) supports rapid training and inference for YOLOv8, and the sufficient memory ensures the loading and processing of the dataset. The detection part of the experiment is implemented based on the PyTorch framework, while the Kalman filter and data processing modules utilize NumPy for matrix operations.
4.2.2. YOLOv8 Hyperparameter Settings
The hyperparameters of the YOLOv8 model have been fine-tuned to ensure a balance between detection accuracy and speed. The specific configurations are shown in Table 3:
Table 3. YOLOv8 hyperparameter configuration table.
Parameter Names |
Setting |
Instructions |
Image Size |
640 × 640 |
Adjust all input images to a uniform size. |
Confidence |
0.5 |
Filter out detection bounding boxes with
low confidence scores. |
NMS (Non-Maximum
Suppression) |
0.45 |
Remove excessively overlapping detection bounding boxes. |
Epochs |
50 |
Number of iterations for model training. |
Batch Size |
16 |
Number of images input to the model
per batch. |
Learning Rate
(Initial Value) |
0.01 |
Dynamically adjust the learning rate using
a cosine annealing strategy. |
Optimizer |
SGD |
Optimize model parameters using stochastic gradient descent. |
Data Augmentation |
Open |
Data augmentation techniques include
random cropping, rotation, scaling, and color jittering. |
These hyperparameter settings have been optimized on the KITTI and UA-DETRAC datasets, effectively enhancing the detection performance of YOLOv8 in various scenarios.
4.2.3. Kalman Filter Hyperparameter Settings
To adapt to dynamic scenarios in vehicle tracking, the initialization parameters for various Kalman filters are shown in Table 4:
Table 4. Kalman filter hyperparameter configuration table.
Filter Type |
State
Transition
Covariance
(Q) |
Measurement Noise
Covariance
(R) |
Initial Error Covariance
(P) |
State
Dimension |
Observation Dimension |
KF |
0.1I |
0.05I |
1.0I |
4 |
2 |
EKF |
0.15I |
0.05I |
1.0I |
4 |
2 |
UKF |
0.2I |
0.1I |
1.0I |
4 |
2 |
Where Q, R, and P represent the process noise covariance matrix, measurement noise covariance matrix, and initial error covariance matrix, respectively; I denotes the identity matrix.
The hardware and software environments of the experiment provide powerful computational support for large-scale detection and tracking tasks. The optimized hyperparameters of YOLOv8 enable it to be efficient and robust in multiple scenarios, while the reasonable initialization and dynamic adjustment strategies of the Kalman filter ensure the accuracy and real-time performance of the tracking module.
5. Experimental Results and Analysis
5.1. Comparative Analysis of Detection and Tracking
5.1.1. Analysis of Detection Performance
Vehicle detection serves as the foundational step in achieving multi-object tracking, with its detection accuracy directly impacting the predictive performance of subsequent filters. To evaluate the performance of the YOLOv8 detection model used in this paper, comprehensive comparative experiments were conducted with mainstream object detection models (such as YOLOv5, Faster R-CNN, and RetinaNet) across multiple typical scenarios (including daytime, nighttime, occlusion, low-light, etc.). Table 5 summarizes the performance of different detection models in these typical scenarios.
Table 5. Comparison of performance among different detection models.
Models |
Daytime mAP@0.5 |
Nighttime mAP@0.5 |
Occlusion mAP@0.5 |
low-light |
FPS |
Loss |
YOLOv8 |
94.3% |
90.1% |
87.8% |
84.5% |
45 |
0.43 |
YOLOv5 |
91.2% |
85.4% |
83.1% |
79.8% |
35 |
0.57 |
Faster R-CNN |
88.5% |
83.2% |
79.5% |
75.4% |
15 |
0.62 |
RetinaNet |
89.1% |
82.8% |
80.3% |
77.0% |
20 |
0.59 |
From the comparison of mean Average Precision (mAP) at IoU threshold of 0.5, it can be seen that YOLOv8 outperforms other models in all scenarios, particularly in nighttime and occlusion scenarios, achieving mAP values of 90.1% and 87.8%, respectively.
The final loss value of YOLOv8 is 0.43, which is approximately 25% lower than that of YOLOv5 and RetinaNet, and approximately 30% lower than that of Faster R-CNN. This verifies the rationality of selecting YOLOv8 as the detection module in this paper, providing robust support for subsequent tracking tasks.
5.1.2. Comparative Analysis of Kalman Filtering
Table 6 presents the performance of the SVM-based adaptive filter selection mechanism across different scenarios. The experimental results indicate that the adaptive selection mechanism significantly outperforms fixed filter schemes in terms of tracking accuracy and robustness in complex scenarios such as accelerated motion and nonlinear motion.
Table 6. Performance of the SVM-based adaptive filter selection mechanism.
Scene Types |
Filter Selection |
Tracking
Accuracy
(MOTA) |
Mean
Squared
Error (MSE) |
Computational Latency(ms) |
Uniform Motion |
KF (Kalman Filter) |
89% |
1.5 |
12 |
Accelerated Motion |
EKF (Extended
Kalman Filter) |
92% |
1.2 |
14 |
Complex Nonlinear Motion |
UKF (Unscented Kalman Filter) |
94% |
0.9 |
20 |
SVM-Based Adaptive Selection |
Dynamic
Switching |
95% |
0.8 |
15 |
5.2. Effectiveness of Error Feedback Mechanism
5.2.1. Comparison of Tracking Accuracy
In scenarios involving occlusion, sudden motion, and multi-object environments, the error feedback + SVM dynamic selection mechanism significantly improves tracking accuracy compared to other methods. Below is a comparison of the Mean Squared Error (MSE) for each method across different scenarios.
Table 7. Performance of the error feedback + SVM dynamic selection mechanism.
Scene Types |
Methods |
MSE |
Trajectory Smoothness |
Computational Latency (ms) |
Occlusion
Scenarios |
Single KF |
15.2 |
Low |
10 |
Single EKF |
12.4 |
Medium |
12 |
Single UKF |
11.5 |
Medium |
20 |
SVM Dynamic Selection
Mechanism |
9.7 |
High |
18 |
Error Feedback + SVM
Dynamic Selection Mechanism |
7.1 |
High |
20 |
Sudden Motion Scenarios |
Single KF |
18.7 |
Low |
10 |
Single EKF |
13.2 |
Medium |
15 |
Single UKF |
11.0 |
Medium |
20 |
SVM Dynamic Selection
Mechanism |
9.2 |
High |
18 |
Error Feedback + SVM
Dynamic Selection Mechanism |
6.9 |
High |
20 |
As shown in Table 7, the error feedback + SVM dynamic selection mechanism significantly reduces MSE in occlusion and sudden motion scenarios, with superior smoothness and real-time performance indicators compared to single filters and the standalone SVM dynamic selection mechanism.
5.2.2. Analysis of Interaction Strategies between Error Feedback and SVM
As shown in Table 8, a further evaluation is conducted to assess the difference in effectiveness between the standalone SVM dynamic selection mechanism and the SVM dynamic selection mechanism combined with error feedback. Two different error feedback mechanisms under distinct strategies are employed and compared across multiple scenarios.
1) Error Feedback Standalone Adjustment Strategy: Error feedback only adjusts the parameters of the currently selected filter without triggering a re-selection.
2) Error Feedback + Filter Re-selection Strategy: Error feedback not only adjusts the parameters but also triggers a re-selection of the filter.
Table 8. Comparative experiments of different strategies.
Scene Types |
Strategies |
MSE |
Switching Frequency
(times/second) |
Occlusion Scenarios |
Standalone Adjustment Strategy |
8.8 |
2 |
Filter Re-selection Strategy |
7.1 |
3 |
Sudden Motion
Scenarios |
Standalone Adjustment Strategy |
8.5 |
3 |
Filter Re-selection Strategy |
6.9 |
4 |
The performance advantages of SVM in dynamic filter selection primarily stem from the following aspects:
a) Discriminative power of motion features: Experiments have demonstrated that velocity, acceleration, and direction change angles exhibit significant distribution differences across different motion patterns, providing a clear basis for filter classification.
b) Nonlinear classification capability: Compared to traditional linear classifiers, SVM effectively handles nonlinear patterns through kernel function mapping, enabling more precise selection among KF, EKF, and UKF.
c) Matching of errors with scenario characteristics: By incorporating an error feedback mechanism, SVM dynamically adjusts selection strategies based on real-time error variations, further enhancing the system’s adaptability.
5.3. Robustness of Adaptive Kalman Filtering
To validate the robustness of the SVM-based adaptive Kalman filtering, this paper analyzes the visualization results of tracking and recognition in various complex environments, as shown in Figure 4.
(a) The four frames in row (a) illustrate the tracking and recognition performance of the model for a target vehicle with ID 17 in a turning scenario. It can be observed that the model demonstrates good performance.
(b) The four frames in row (b) show the tracking and recognition performance of the model for a target vehicle with ID 1 at different angles during nighttime. It is evident that the model maintains good locking on the target vehicle, even during the process of being overtaken.
(c) The four frames in row (c) present the tracking and recognition performance of the model for a target vehicle with ID 1 across multiple scenarios on a road segment. It is clear that the model can effectively identify and track the target vehicle even in multi-target and complex scenarios.
From this, it can be concluded that the SVM-based adaptive Kalman filtering exhibits strong robustness in complex environments. It can provide decision support for intelligent transportation, autonomous driving, security, and other fields.
Figure 4. Visualization of inference results.
5.4. Comparison with Existing Methods
Through experimental comparisons with current mainstream vehicle tracking methods such as DeepSORT and FAIR MOT, the proposed research method in this study demonstrates advantages in detection accuracy, tracking accuracy, and system stability.
As shown in Table 9, compared to DeepSORT’s 91.5% mAP on the KITTI dataset and FAIR MOT’s 89.2% mAP, the YOLOv8 detection module combined with a dynamic filter selection mechanism proposed in this paper achieves a 94.3% mAP, exhibiting stronger adaptability to complex scenarios. Additionally, compared to DeepSORT’s 35 FPS and FAIR MOT’s 30 FPS, the real-time performance of the proposed method (45 FPS) significantly enhances processing speed while meeting accuracy requirements, making it suitable for practical deployment.
Table 9. Comparison of experimental results table.
Datasets |
Methods |
mAP |
MOTA |
FPS |
KITTI |
DeepSORT |
91.5% |
87.5% |
35 |
FAIR MOT |
89.2% |
88.3% |
30 |
Ours |
94.3% |
90.6% |
45 |
UA-DETRAC |
DeepSORT |
90.2% |
85.4% |
33 |
FAIR MOT |
88.1% |
86.1% |
28 |
Ours |
92.1% |
89.7% |
40 |
6. Conclusions and Future Work
This paper proposes an SVM-based adaptive filter selection mechanism and an error feedback mechanism, addressing the challenges of vehicle tracking in dynamic and complex scenarios with impressive results.
Although the integration of the components (YOLOv8, KF variants, SVM dynamic selection, and error feedback mechanism) represents incremental innovation, it yields the following significant system-level advantages:
1) Enhanced adaptability in complex dynamic scenarios, reducing accuracy degradation due to changing motion patterns through dynamic filter selection.
2) Improved coordination between detection and prediction, with the error feedback mechanism ensuring rapid system response to unexpected events.
3) Simplified complexity in multi-module development through deep integration, while optimizing both real-time performance and resource utilization.
Future research will focus on multimodal data fusion, incorporating sensor data from LiDAR and mmWave radars to enhance system applicability in adverse weather conditions, and large-scale application deployment, exploring distributed computing methods to meet real-time demands of large-scale traffic monitoring systems.