1. Introduction
Dust storms are hazardous weather phenomena that occur under specific geographical and surface conditions, where strong winds lift large quantities of dust particles from the ground into the atmosphere, resulting in turbid air and a significant reduction in horizontal visibility [1] [2]. As a typical meteorological disaster in arid, semi-arid, and desert regions, dust storms are characterized by their sudden onset, short duration, and wide spatial impact [3] [4]. Globally, the main dust storm-prone regions include Central Asia, North America, Central Africa, and Australia [5]. Northern China, as part of the Central Asian dust source region, plays a significant role in contributing dust particles that can be transported over long distances, exerting considerable influence on both global and regional atmospheric environments and climate change [6] [7]. Therefore, the timely and accurate monitoring of the intensity and spatiotemporal evolution of dust storms is of great importance to fields such as environmental protection and meteorology [8]. During dust storm events, the concentration of PM₁₀ in the atmosphere increases sharply, making it a reliable indicator for dust storm monitoring [9]. For instance, during the dust storm that occurred in northern China in March 2021, the PM₁₀ concentration in the Beijing area peaked at over 7000 μg/m3.
At present, the monitoring of dust storms mainly relies on ground-based observations and satellite remote sensing. However, both approaches have certain limitations [10] [11]. Traditional ground-based observations, such as meteorological monitoring stations, can provide high temporal resolution data, but they are associated with high maintenance costs, limited spatial coverage, and sparse station distribution, making it difficult to monitor large-scale dust storm events effectively [12] [13].
Satellite remote sensing technologies using visible light, infrared, and laser are currently the main methods for dust storm monitoring and can achieve large-area detection [14] [15]. Nevertheless, these remote sensing products typically suffer from low temporal and spatial resolution, and their performance is easily affected by cloud cover and surface conditions, which limits their capacity for dynamic, real-time monitoring and forecasting of dust storms [16] [17]. In recent years, with the rapid development of the Global Navigation Satellite System (GNSS) and in-depth research on GNSS signal characteristics [18], GNSS-based environmental parameter sensing has emerged as a novel remote sensing application. It offers advantages such as multiple signal sources, all-time and all-weather operation, wide coverage, and high efficiency. GNSS has been widely applied in meteorological fields such as water vapor retrieval [19] [20], haze detection [21] [22], and wildfire monitoring [23] [24], with notable success. Researchers have also begun introducing this technology into the study of dust storm monitoring [25] [26].
In the application of GNSS technology to dust storm monitoring research, scholars have made a series of significant advances. Studies have found that the sharp increase in atmospheric particulate matter concentration during dust storms causes notable changes in GNSS tropospheric delay, providing a theoretical basis for using GNSS technology to monitor dust storms [9] [27]. Zhou et al. proposed a ZNHD residual monitoring method based on Singular Spectrum Analysis (SSA), which extracts dust storm features by decomposing high-frequency signals. Their research showed that during dust storms, the correlation coefficient between PM₁₀ and the ZNHD residual can exceed 0.7, and when the residual peak surpasses 80 mm, it can serve as an early warning indicator for dust storms [9]. Han S and Gurbuz G, through linear regression analysis, found that under clear sky conditions, the correlation coefficient between GNSS precipitable water vapor (PWV) and PM₁₀ is weak (r < 0.2), whereas during dust storms, the correlation significantly increases to above 0.6, indicating that variations in particulate matter concentrations affect GNSS retrieval parameters [28] [29]. To eliminate water vapor interference, Xie Peng innovatively proposed the GNSS PWV difference method (ΔGNSS PWV), which effectively separates dust contributions by comparing GNSS-retrieved PWV with ERA5 model PWV, increasing the correlation between particulate matter concentration and PWV difference to over 70% [26]. Subsequently, Tokunaga R and colleagues applied GNSS Precise Point Positioning (PPP) technology to dust storm monitoring and introduced azimuth filtering to optimize the Zenith Total Delay (ZTD) estimation process. Results demonstrated that this improved method enhanced the correlation between additional delay and suspended particulate matter data, effectively highlighting the particulate contribution to ZTD and improving the reliability of dust storm monitoring [30]. It is evident that researchers have developed a series of effective methods for monitoring dust storm events by modeling the relationship between GNSS retrieval parameters and relevant air quality indicators, thereby demonstrating the feasibility of using GNSS technology for dust storm monitoring. However, existing studies mostly focus on correlation analyses between PM10 concentrations and GNSS tropospheric delay or atmospheric water vapor (PWV), with few advances in developing further models for monitoring and forecasting PM10 concentrations.
Therefore, in response to the dust storm event that occurred in northern China on March 15, 2021, this study proposes a PM10 concentration retrieval method that integrates ground-based GNSS ZNHD, meteorological factors, and PM2.5 concentration. The core concept of this study is to calculate the Zenith Non-Hydrostatic Delay (ZNHD) using GNSS satellite data. Then, the ZNHD sequence is adaptively decomposed and reconstructed using Ensemble Empirical Mode Decomposition (EEMD) to extract signal components more closely related to dust storms. Finally, using multi-source data with a 1-hour resolution―including PM10, PM2.5, ZNHD, temperature, atmospheric pressure, and relative humidity―a rolling estimation model for short-term PM10 concentration retrieval is established based on nonlinear least squares support vector machine regression (LS-SVM).
The remainder of this paper is organized as follows. Section 2 provides a detailed introduction to the GNSS stations and data. Section 3 elaborates on the methodological principles. Subsection 3.1 describes the process of retrieving ZNHD based on GNSS data, while subsection 3.2 explains the separation and reconstruction of ZNHD using the EEMD method. Subsection 3.3 details the design and application of the proposed model. Section 4 outlines the procedure for retrieving PM10 concentrations using multi-source data. Section 5 presents and discusses the retrieval results. Finally, Section 6 offers conclusions and recommendations.
2. Data
This study uses GNSS observation data, basic observation data from ground meteorological stations, the fifth-generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis dataset (ERA5), and PM10 and PM2.5 air quality data as the experimental data.
The GNSS observation data were provided by the International GNSS Service (IGS) Data Center at Wuhan University (http://www.igs.gnsswhu.cn/) [31]. This study selected two GNSS stations, BJFS and CHAN, whose station information is shown in Table 1. The BJFS and CHAN monitoring stations are located in Fangshan District, Beijing, and Changchun City, Jilin Province, respectively. These areas were severely affected by the dust storm in March 2021.
![]()
Table 1. GNSS station related parameters.
The global air quality data platform (https://aqicn.org/data-platform/register/) provided ground monitoring data of PM10 and PM2.5 for the areas around the two GNSS stations from March 1 to March 31, 2021.
The ERA5 meteorological reanalysis data can be accessed on the official ECMWF website (https://www.ecmwf.int). ERA5 is the latest generation of meteorological reanalysis data released by ECMWF, providing various weather parameters including hourly temperature, dew point temperature, and others [32].
3. Method
3.1. ZNHD Solving Based on GNSS Data
The retrieval process of ZNHD serves as the foundation for subsequent experiments. This study employs the GNSS data processing software GAMIT/GLOBK 10.7 to estimate ZNHD. When processing data with GAMIT/GLOBK 10.7, the parameter settings are as follows [9] [33]: the cutoff elevation angle is set to 10°, and the sampling interval is set to 30 seconds. The tropospheric delay model and mapping function are set to the Saastamoinen model and VMF1 model, respectively. The ocean tide model used is otl.grid. ZNHD is estimated once per hour, with the main process described as follows.
First, the hourly values of the tropospheric Zenith Total Delay (ZTD) are retrieved using the original GNSS observation data. The main principle is to solve for the tropospheric total delay (Zenith Total Delay, ZTD) as an unknown parameter by employing the GNSS carrier phase observation equation. The GNSS phase observation equation is:
(1)
In the equation,
denotes the receiver;
denotes the satellite;
is the geometric distance from the receiver to the satellite;
is the speed of light;
is the receiver clock error;
is the satellite clock error;
is the tropospheric mapping function;
represents the ionospheric delay;
is the wavelength of the phase observation;
is the phase ambiguity; and
represents other errors.
Then, the Zenith Hydrostatic Delay (ZHD) is calculated using the classical Saastamoinen model [33]. This model comprehensively accounts for the effects of station pressure and geographic location, offering high theoretical accuracy. The equation is as follows:
(2)
In the equation,
represents the atmospheric pressure;
is the latitude of the GNSS station;
denotes the height of the GNSS station.
Finally, ZNHD can be obtained by subtracting the Zenith Hydrostatic Delay from the tropospheric Zenith Total Delay. The equation is:
(3)
3.2. EEMD Separation and Reconstruction of ZNHD Signals
Ensemble Empirical Mode Decomposition (EEMD) is a noise-assisted adaptive time-frequency analysis method suitable for the decomposition of nonlinear and non-stationary signals [34] [35]. Compared with traditional Empirical Mode Decomposition (EMD), EEMD effectively suppresses the mode mixing problem and enhances decomposition robustness by performing EMD decomposition multiple times with added Gaussian white noise and then averaging the results. This method provides a multi-scale analysis framework for examining anomalies in non-hydrostatic delay caused by dust storms. In this study, EEMD is applied to decompose the ZNHD time series retrieved from GNSS observations. By comparing the correlation between different Intrinsic Mode Function (IMF) components and PM10 concentration, dust storm-related modal components are extracted to construct the optimal signal combination. The specific steps are as follows:
Let the original ZNHD time series be denoted as
:
(4)
In the equation,
represents the observation time.
The basic idea of EEMD is to repeatedly add Gaussian white noise to the original signal
, and decompose each noise-perturbed signal into mm Intrinsic Mode Functions (IMFs) with different time scales and a residual term
[34]. Each IMF must satisfy the following two conditions: 1) throughout the entire sequence, the number of extrema and the number of zero crossings must either be equal or differ at most by one; 2) the mean of the envelope defined by the local maxima and the envelope defined by the local minima is zero [34] [35]. Finally, by ensemble averaging the IMF components obtained from all noise perturbations, stable IMF components are derived, effectively suppressing the mode mixing problem.
The decomposition process of
can be expressed as:
(5)
In the equation,
is the residual term of the series, representing the low-frequency trend of the signal;
denotes the Intrinsic Mode Function components, which reflect the inherent and intrinsic characteristics of the signal.
It can be seen that the original signal
is decomposed into a finite number of IMFs containing the local characteristic signals of the original signal. The first IMF component represents the highest frequency component in the original signal
. As the IMF order increases, the corresponding frequency components gradually decrease [35].
3.3. LS-SVM Regression Model
Least Squares Support Vector Machine (LS-SVM) is a machine learning method based on the principle of structural risk minimization. By transforming the quadratic programming problem of traditional Support Vector Machines (SVM) into a set of linear equations, LS-SVM significantly improves computational efficiency [36]. This method uses kernel functions to map nonlinear samples from a low-dimensional space to a high-dimensional feature space for linear regression. It effectively addresses problems involving small samples, nonlinearity, and high-dimensional pattern recognition, demonstrating strong generalization ability and is particularly suitable for modeling complex nonlinear systems. Therefore, to achieve short-term retrieval monitoring of PM₁₀ concentration, this study proposes a rolling retrieval model based on LS-SVM. The model takes GNSS ZNHD reconstructed components, ground meteorological observation data (such as relative humidity, atmospheric pressure, temperature), and PM2.5 concentration as independent variables, with PM₁₀ concentration as the dependent variable. By effectively integrating multi-source data through LS-SVM and employing a rolling window to continuously update the regression model, the approach adapts to the time-varying and nonlinear characteristics of PM₁₀ concentration changes.
Let the training samples be
, where
represents the m dimensional input variables, and
is the corresponding PM₁₀ concentration value. The LS-SVM regression problem is then formulated as the following optimization problem:
(6)
In the equation,
is the weight vector,
is the bias term,
represents the error term,
denotes the nonlinear mapping function, and
is the regularization parameter.
The fundamental idea behind LS-SVM regression estimation is to use a kernel function to map nonlinear samples
from a low-dimensional space to a high-dimensional feature space, and then perform linear regression in that space. For detailed principles, refer to reference [36]. Considering that the performance of LS-SVM largely depends on the optimal selection of the kernel function ϕ(⋅), the kernel parameter σ, and the regularization parameter
, this study defines the Gaussian kernel function (Radial Basis Function, RBF), which satisfies Mercer’s condition and effectively reflects the model complexity, as the kernel function for LS-SVM. Meanwhile, σ and
are optimized using a grid search method [37]. By solving the LS-SVM, the retrieval expression for PM₁₀ concentration at a given point is obtained as:
(7)
In the equation,
is the Radial Basis Function (RBF) kernel, where
alpha and
are the optimization variables.
To dynamically adapt to the time-varying characteristics during atmospheric evolution, the model introduces a rolling window mechanism. The specific setup is as follows: the training window is defined as
, and the test window is
. In each iteration, the LS-SVM model is constructed using the most recent training window to predict the PM₁₀ concentrations within the subsequent test window. After prediction, both the training and test windows slide forward by one hour, and the process is repeated until the entire data period is covered.
4. PM₁₀ Retrieval
According to the method described in Section 3, the PM₁₀ concentration retrieval process is illustrated in Figure 1.
5. Results and Discussion
5.1. Analysis of ZNHD and PM10
Firstly, the variation trends and correlations between ZNHD and PM2.5/PM₁₀ before, during, and after the sandstorm event from March 1 to 31, 2021, were analyzed for the BJFS and CHAN stations. The results are shown in Figure 2 and Figure 3.
As shown in Figure 2, during the sandstorm event, the PM10 concentration in Beijing exceeded 7000 μg/m3, which was several tens of times higher than normal
Figure 1. The workflow of PM₁₀ concentration retrieval.
![]()
Figure 2. Variation of GNSS ZNHD and PM10 at BJFS station.
![]()
Figure 3. Variation of GNSS ZNHD and PM10 at CHAN station.
![]()
Table 2. Correlation between ZNHD and PM10 in three periods.
levels. From March 1 to March 14, the trends of ZNHD and PM10 at the BJFS station were relatively consistent. When the sandstorm occurred from March 15 to March 16, the consistency between ZNHD and PM10 increased significantly. From March 16 to March 20, after the sandstorm, the variations in ZNHD and PM10 remained closely aligned. As shown in Figure 3, during the sandstorm, the PM10 concentration in Changchun reached over 800 μg/m3, about seven times higher than normal. From March 1 to March 14, the ZNHD and PM10 trends at the Changchun station were relatively consistent. When the sandstorm occurred from March 15 to March 16, ZNHD and PM10 showed strong consistency in their variations. From March 16 to March 20, after the sandstorm, the trends of ZNHD and PM10 remained highly consistent. In addition, beginning on March 28, a second sandstorm in March affected most parts of northwestern and northern China, resulting in blowing dust conditions in both Beijing and Changchun. This event had a noticeable impact on GNSS ZNHD retrievals and also led to increased PM10 concentrations.
To better illustrate the relationship between ZNHD and PM10, we calculated the correlation between ZNHD and PM10 before, during, and after the sandstorm. The results are shown in Table 2. According to Table 2, before the sandstorm, the correlation between ZNHD and PM10 at the two GNSS stations was low. This is mainly because the tropospheric delay caused by PM10 contributed little to ZNHD, which was primarily influenced by water vapor. During the sandstorm, the correlation between ZNHD and PM10 increased significantly, with the proportion of tropospheric delay caused by PM10 in ZNHD noticeably higher than before the sandstorm. After the sandstorm, the correlation between PM10 and ZNHD decreased significantly. This is similar to the situation before the sandstorm, as PM10 concentrations dropped and ZNHD variations were mainly caused by water vapor.
In summary, sandstorms have a significant impact on PM10, making PM10 a useful indicator for assessing dust weather. During sandstorms, the variation trends of GNSS ZNHD and PM10 show high consistency, which can indirectly reflect sandstorm events.
5.2. ZNHD Reconstruction Based on EEMD
To use GNSS more effectively and accurately for monitoring sandstorms, this study applies EEMD to decompose and reconstruct the ZNHD signal sequence. This approach aims to obtain more effective ZNHD signal components for subsequent modeling experiments. First, the ZNHD time series is decomposed by EEMD. Then, different combinations of IMF components are formed. Based on the linear regression relationship between each IMF combination and PM10 concentration, the IMF combinations with higher correlation are selected as the effective ZNHD component combinations.
Due to limited space, only partial experimental results are presented. The following figures show the experimental results from two stations during the sandstorm and its adjacent periods in March 2021. As shown in Figure 4, EEMD can adaptively decompose the original ZNHD into a series of IMF components with different frequencies. With increasing decomposition levels, the IMF components from IMF1 to residual gradually become smoother, with frequencies decreasing sequentially. Further correlation analysis between each component and PM10
(a)
(b)
Figure 4. EEMD decomposition results of GNSS ZNHD. (a) BJFS; (b) CHAN.
![]()
![]()
Figure 5. Correlation between EEMD-decomposed GNSS ZNHD IMF components and PM10 concentration. Left: BJFS station; Right: CHAN station.
concentration is shown in Figure 5. The correlation coefficients of low-frequency components are generally higher than those of high-frequency components, because high-frequency components typically reflect detailed fluctuations in the signal, while low-frequency components reflect the overall trend. The reconstructed signal combinations selected during the experiment are shown in Figure 6. The correlation coefficients between the selected combinations and PM10 concentration improved compared to the original signal, and the method effectively captured the abnormal variations during the sandstorm. This demonstrates that the decomposition and reconstruction method used in this study is effective.
(a)
(b)
Figure 6. Comparison between reconstructed ZNHD signals and PM10 concentration. (a) BJFS; (b) CHAN.
5.3. Retrieval Results Based on LS-SVM
After completing the decomposition and reconstruction of the ZNHD signal, the LS-SVM model enters the processing stage. Following the experimental procedure described in Section 3.3, the model input parameters are selected as: reconstructed ZNHD signal, PM2.5, temperature, humidity, wind speed, and wind direction; the output parameter is PM10. This study builds LS-SVM models based on three different schemes, with detailed settings shown in Table 3. Since PM2.5 is an internal factor influencing PM10 concentration changes, it is included in all schemes. During model construction, the rolling method requires defining the training sample window. To balance real-time performance and daily-scale characteristics, at least 24 hours of sample data are needed to describe the overall features of PM10 and its influencing factors. Meanwhile, the stability and reliability of correlations tend to decrease as the time window lengthens. Therefore, in this study, the training sample window is set to 24 hours. After training is completed, the test set is input into the model for PM10 retrieval. Figure 7 and Figure 8 show the comparison between
(a)
(b)
(c)
Figure 7. Comparison between retrieved and reference PM10 values at the BJFS station. (a) scheme 1; (b) scheme 2; (c) scheme 3.
(a)
(b)
(c)
Figure 8. Comparison between retrieved and reference PM10 values at the CHAN station. (a) scheme 1; (b) scheme 2; (c) scheme 3.
![]()
Table 4. Comparison between retrieved and reference PM10 values at the BJFS station.
![]()
Table 5. Comparison between retrieved and reference PM10 values at the CHAN station.
retrieved PM10 and reference values for each scheme at different stations. Table 4 and Table 5 list the correlation coefficients between retrieved and reference PM10 concentrations for different schemes at the respective stations.
Based on Figure 7 and Figure 8, it can be seen that the retrieved values at both stations can reflect the sudden increase of PM10 concentration during the sandstorm to some extent. The predicted values show a trend consistent with the actual values, but the retrieval accuracy gradually decreases over time. Combined with the analysis of Table 4 and Table 5, the correlation coefficients between retrieved and actual values gradually decline over time, from strong correlation to no correlation, which corresponds to the retrieval results shown in Figure 7 and Figure 8. Specifically, at the BJFS station, Scheme 1 achieves the highest correlation coefficient of 0.814 at the 1-hour retrieval, indicating that including GNSS ZNHD in PM10 retrieval modeling is effective. At the CHAN station, Scheme 3 achieves the highest correlation coefficient of 0.768 at the 1-hour retrieval, demonstrating that incorporating multi-source data fusion is also effective and can improve the retrieval results to better match the actual trend. Moreover, as the prediction time increases, the improvement effect gradually diminishes. For both stations, the correlation coefficients of retrieved values in all schemes remain above 0.5 during the first 2 hours, indicating a good consistency between predicted and actual trends. However, after 3 hours, the correlation decreases gradually, showing weak or no correlation, which suggests a significant inconsistency between the development trends.
In summary, incorporating ZNHD significantly improves the overall retrieval performance of the model. It is feasible to include ZNHD as an independent variable, in addition to meteorological factors, for short-term PM10 concentration prediction.
6. Conclusions
This study is based on GNSS observation data provided by IGS stations during the 2021 sandstorm events in northern China. High-precision Zenith Tropospheric Wet Delay (ZNHD) data were calculated and analyzed for correlation with PM10 concentrations in the corresponding regions to establish a new method for dust monitoring. The study employed Ensemble Empirical Mode Decomposition (EEMD) to decompose and reconstruct the ZNHD signal, extracting characteristic components related to dust weather. A Least Squares Support Vector Machine (LS-SVM) rolling model was then constructed to retrieve PM10 concentrations. The results show that:
Here is the English translation of the four points:
1) During sandstorm events, there is a significant correlation between PM10 concentration and GNSS ZNHD, indicating that ZNHD is an effective indicator for dust monitoring.
2) The EEMD method can effectively separate dust-related components from the ZNHD signal, and the reconstructed signal shows an improved correlation with PM10 concentration.
3) The LS-SVM model based on GNSS ZNHD performs well in the short-term retrieval of PM10 concentration during sandstorms, validating the feasibility of applying GNSS technology to dust monitoring.
4) The study finds that the relationships among GNSS ZNHD, meteorological factors, and PM10 concentration are highly complex and unstable, requiring more refined models to capture their nonlinear interactions.
This study combines high-precision, real-time, all-weather GNSS observation data with meteorological and air quality data to provide a new method for short-term PM10 concentration retrieval monitoring with high temporal resolution. Due to the limited number of stations selected in the experiment, the data coverage and time span are constrained, and the regional representativeness of the model needs further validation. Future research is recommended to: 1) expand the station network coverage to achieve regional-scale PM10 retrieval monitoring; 2) incorporate more environmental variables to improve model accuracy; 3) develop dynamic adaptive algorithms to extend the effective retrieval duration.
Funding
This work was supported by the Key Program of Joint Fund of the National Natural Science Foundation of China and Shandong Province under Grant U22A20586, the Natural Science Foundation of Shandong Province under Grant ZR2022MD015, the Fundamental Research Funds for the Central Universities under Grant 24CX02030A, the National Natural Science Foundation of China under Grant 41701513, 61371189, and 41772350, and the Key Research and Development Program of Shandong Province under Grant 2019GGX101033.
Acknowledgements
We sincerely thank the International GNSS Service (IGS) data center of Wuhan University for providing GNSS data. We are also grateful to the Massachusetts Institute of Technology (MIT), Scripps Institution of Oceanography (SIO), and Harvard University for providing the GAMIT/GLOBK software.