^{1}

^{*}

^{2}

^{1}

^{3}

^{4}

^{5}

Tropospheric ozone (O3) is one of the pollutants that have a significant impact on human health. It can increase the rate of asthma crises, cause permanent lung infections and death. Predicting its concentration levels is therefore important for planning atmospheric protection strategies. The aim of this study is to predict the daily mean O3 concentration one day ahead in the Grand Casablanca area of Morocco using primary pollutants and meteorological variables. Since the available explanatory variables are multicollinear, multiple linear regressions are likely to lead to unstable models. To counteract the multicollinearity problem, we compared several alternative regression methods: 1 ) Continuum Regression ; 2 ) Ridge & Lasso Regressions ; 3 ) Principal component regression (PCR) ; 4 ) Partial least Square regression & sparse PLS and ; 5 ) Biased Power Regression. The aim is to set up a good prediction model of the daily ozone in the Grand Casablanca area. These models are fitted on a training data set (from the years 2013 and 2014), tested on a data set (from 2015) and validated on yet another data set data (from 2015). The Lasso model showed a better performance for the prediction of ozone concentrations compared to multiple linear regression and its other alternative methods.

Tropospheric ozone (O3) is a dangerous air pollutant that threatens the human health [

As all the large cities in the world, Casablanca has a serious photochemical tropospheric ozone (O3) air pollution problem. The urban emission pattern of O3-forming pollutants is caused by meteorological factors: exposure to sunshine, temperature and wind speed and also by a series of atmospheric reactions involving precursor pollutants caused by car and industry emissions.

Various statistical methods are available to predict daily O3 [

In this study, we compare different regression models to predict the daily mean O3 concentration in the Grand Casablanca area using O3 persistence and meteorological variables. We follow two successive stages. In the first stage, we fit statistical models using two years (2013-2014, calibration sets) of pollutants and observed meteorological data. In the second stage, in order to choose the best predictive model, we compare the prediction abilities using the observed dataset of 2015 (test set) and the meteorological forecasting dataset of 2015 (prediction dataset test). The aim of the study is to select the best model in terms of prediction ability.

The three years datasets used in this study are provided by the National Meteorological Office of Morocco (DMN). Their collection ranged from January 2013 to December 2015. A detailed description of the 25 available variables is given in Appendix A. The data consist of daily O3 pollutant concentrations, observed at “Jahid” monitoring site, located in the western center of the Casablanca city which is the most important industrial area (

Inevitably, the collected data contain missing values and it is important to tackle

this problem before further analyses are performed. The reasons for which a value can be missing are numerous. For instance, in air quality applications, data can be missing due to a dysfunction of the equipment or an insufficient resolution of a sensor device. Therefore, it is necessary to identify missing values and choose an appropriate imputation technique in order to keep as much data as possible e.g. [

Once the data were imputed, we applied a standardized Principal Components Analysis (PCA) to investigate the relationships between the variables and assess the degree of collinearity among the predictor variables [

Several statistical models are available to predict tropospheric ozone concentration. Since the available explanatory variables are potentially highly correlated, we investigate alternative methods to the classical Multiple Linear Regression (MLR) that circumvent the problem of multicollinearity which is likely to lead to unstable models. The emphasis is put on: 1) classical regularized regression methods: Principal Components Regression [

We assume the MLR model using Equation (1):

O 3 i = β 0 + β 1 O 3 i − 1 + ∑ j = 2 p β j v a r m e t e o i j + e i (1)

where O 3 i : ozone concentration at day i; O 3 i − 1 : ozone concentration at day i − 1 (i.e. the persistence); v a r m e t e o i j : Meteorological variable j observed on day i.

Equation (1) can be written using usual matrix format after centring of the response variable:

y = X β + e (2)

where y is an ( n × 1 ) vector of centered dependant variable (O3 concentrations at day i), X is a ( n × p ) matrix of standardized predictors (Observed meteorological variables and O3 concentrations at day i − 1), β is an ( p × 1 ) vector of unknown regression coefficients and e is an ( n × 1 ) vector of random errors. Classically, the distribution of e is assumed to be normal with mean equal to 0 and a variance covariance matrix equal to σ 2 I ; where I is the identity matrix.

The usual unbiased Ordinary Least Squares (OLS) estimator is expressed by (3) [

β ^ OLS = ( X T X ) − 1 X T y (3)

The prediction of y using OLS, y ^ OLS , is given by y ^ OLS = X β ^ OLS . It is well known that this estimator is likely to lead to an unstable model and poor predictions in presence of quasi-collinearity among the predictors or in the case of the small sample and high dimensional setting.

The principal components regression (PCR) approach involves running PCA on the predictor variables and, thereafter, using the first m principal components (PC) with 1 ≤ m ≤ p , F 1 , ⋯ , F m , as the predictors in a linear regression model [

The appropriate number, m, of first principal components to be introduced in the model can be determined in practice by a validation technique such as Leave One Out cross validation (LOO). The PCR model can be written as in Equation (4):

y = F ( m ) β PCR ( m ) + e (4)

where F ( m ) is a matrix with m columns containing the m first PCs. The Ordinary Least Squares (OLS) estimator of β PCR is given by: β ^ PCR = ( F ( m ) T F ( m ) ) − 1 F ( m ) T y .

It is easy to express this model in terms of the original variables by remarking that F ( m ) = X U ( m ) , where U ( m ) ( U T U = I p ) is the matrix containing the m dominant normalised eigenvectors of X T X . It follows that β ^ PCR = U ( m ) T β ^ OLS .

PCR gives a biased estimate of the regression coefficients. If all the PCs are included in the model, we retrieve the usual MLR estimator, β ^ OLS .

As with PCR, PLS regression, introduced by [

The major difference between PCR and PLS regression is that, whereas PCR uses only X to construct the components to be used as regressors, PLS regression uses both X and y to determine such components. More precisely, the PLS components are determined sequentially and, at each step, we seek to determine a new component, constrained to be orthogonal to the components determined at the previous stages, so as to maximize the covariance between this component and the independent variable.

Suppose that m PLS components are determined. Again, the number m of latent variables to be introduced in the model can be selected by means of LOO cross validation technique in practice. These latent variables can be stacked into a matrix T ( m ) , T ( m ) = X W ( m ) where W ( m ) = ( w 1 , ⋯ , w m ) is the X-weight matrix. Equation (5) gives the vector of fitted values obtained by regressing y on T ( m ) , namely:

y ^ PLS m = T ( m ) ( T ( m ) T T ( m ) ) − 1 T ( m ) T y (5)

We can show in Equation (6) the expression of PLS regression coefficients as:

β ^ PLS = W ( m ) ( P ( m ) T W ( m ) ) − 1 ( T ( m ) T T ( m ) ) − 1 T ( m ) T y (6)

where P ( m ) = X T T ( m ) ( T ( m ) T T ( m ) ) − 1 and W ( m ) = X T y / ‖ X T y ‖ , where ‖ ‖ is the L^{2} norm.

PLS regression is often helpful to reduce the number of predictors to a small number of latent variables constructed by linear combinations of the columns of original predictors. It yields a biased estimate of the regression coefficients.

The Sparse PLS method defined by [

In SPLS regression, w the first vector of loadings is sought as an optimal solution to:

max w ( w T M w ) subject to ‖ w T w ‖ = 1 , ‖ w ‖ 1 ≤ η ,

where M = X T y y T X , ‖ w ‖ 1 is the L^{1}-norm of vector w and η > 0 is a scalar which controls the degree of sparsity.

The regression coefficients estimation of y on X is calculated in the following way: The coefficients of the non-selected variables are set to 0, and the coefficients of the selected variables are those obtained by means of the “standard” PLS regression. We can also give an expression of the SPLS regression coefficients defined by (7) [

( β ^ SPLS ) j = { ( β ^ PLS ) j , if w j ≠ 0 and j = 1 , ⋯ , m 0 otherwise (7)

The interest of the SPLS is two folds. On the one hand, thanks to the sparsity, it yields an easy to interpret model and, on the other hand, it prevents the problem of multicollinearity by using the PLS framework. SPLS estimator is biased comparing to OLS estimator. Moreover, SPLS is computationally efficient with a tunable sparsity parameter to select the important variables.

The CR prediction model is chosen from a continuum of candidates among which we find methods of analysis related to OLS estimation, PCR, PLSR. [

CR aims at transforming the explanatory variables into new latent predictors which are orthogonal to each other and constructed as linear combinations of the original predictors. It makes it possible to circumvent the problem of multicollinearity between predictors. However, the CR regression does not specifically aim at selecting a subset of variables [

Another general strategy to circumvent the problem of multicollinearity consists in imposing a constraint on the vector of regression coefficients. The two most popular methods in this context are Ridge and Lasso regressions.

Ø Ridge regression

Ridge regression is the first regularization procedure that was proposed to cope with the multicollinearity problem [

β ^ R = ( X T X + k I ) − 1 X T y (8)

where k ≥ 0 is a constant to be selected. Note that if k = 0 , the Ridge estimator amounts to the least-squares estimator.

Ridge estimator is obtained as a solution to the following least squares problem defined by (9):

β ^ R = arg min β ∈ ℝ p , ‖ β ‖ 2 ≤ δ ( ‖ y − X β ‖ 2 ) where δ ≥ 0 (9)

There is a one to one correspondence between the Ridge parameter k and the upper bound, δ, imposed on the vector of regression coefficients, β . From a practical point of view, these parameters can be selected by means of a cross-validation technique.

The Ridge regression shrinks the OLS estimators towards 0. It yields a biased estimator, but with a smaller variance than that of OLS estimator.

Ø Lasso regression

The Least Absolute Shrinkage and Selection Operator, or Lasso [^{2} penalty of ridge regression is replaced by an L^{1} penalty: ‖ β ‖ 1 = ∑ j = 1 p | β j | . This is a subtle change that has important consequences. Indeed, this constraint entails that some of the regression coefficients are shrunk exactly to zero. This means that this regression strategy operates de facto a selection of variables since the unimportant variables are discarded, their regression coefficients being equal to zero. Formally, the lasso estimator is given as a solution to the following optimization problem by (10):

β ^ Lasso = arg min β ∈ ℝ p , ‖ β ‖ 1 ≤ δ ( ‖ y − X β ‖ 2 ) , (10)

where δ ≥ 0 .

The parameter δ controls the degree of sparsity and, in practice; it is determined by Leave One Out (LOO) cross validation procedure. The smaller this parameter is, the larger is the number of discarded variable. Contrariwise, if δ is larger than δ 0 = ∑ j = 1 p | β ^ j | (where β ^ j are the OLS estimators) then β ^ Lasso = β ^ OLS . Lasso regression has the double effect of shrinking the β coefficients, allowing to decrease the variance of the regression coefficients as with Ridge regression, and, more importantly, to operate an automatic selection of the variables, by cancelling out some β j coefficients.

Recently, a new biased regression called Biased Power regression (BPR) strategy was proposed [

β ^ BP = ( X T X ) α − 1 X T y (11)

where α is a tuning parameter which ranges between 0 and 1.

In practice, α is selected using a cross validation procedure.

Clearly, when α = 0 , we retrieve the OLS estimator and as α increases, the variance-covariance matrix of the predictor variables is shrunk to the identity matrix. The prediction of y using BPR, y ^ BP is given by y ^ BP = X β ^ BP .

BP-regression shares the same properties as Ridge regression (see Section 2.3.4) and thus can highlight those variables whose coefficients become very small. However, it was not designed to select a subset of variables [

To assess the prediction ability of the various models listed above on the Grand Casablanca O3 data, we performed a cross validation technique on a training set to determine the appropriate parameters (number of components, Ridge or lasso constant…) to be used in the prediction models. Using these parameters, the performance of the different models is assessed on the basis of a fresh data set. More precisely, we partitioned the available data into two complementary datasets: 1) summer period of 2013 and 2014 (called the training set) used to adjust the models; and 2) summer period of 2015 (called the validation set or testing set) used to “test” the models obtained in the training phase. The models are fitted on the training set used to predict the ozone responses for a) the observed meteorological data on 2015 of the validation set (obstest) and b) the forecasted meteorological data on 2015 for real validation (prevtest).

The performance of the models is measured with standard indicators defined by Equations (12)-(14) generally used to compare statistical models [

In a first stage, an internal validation (2013 and 2014 datasets) is performed on the basis of the following criteria in order to assess the quality of the model adjustment:

The multiple correlation coefficient R^{2} allows us to assess the quality of the

adjustment based on the training set: R 2 = ∑ i = 1 n train ( y i − y ^ i ) 2 ∑ i = 1 n train ( y i − y ¯ ) 2 , where n train is the

size of training sample.

The Root Mean Squared Errors (RMSE): This is computed according to the following expression:

RMSE = 1 n train ∑ i = 1 n train ( y i − y ^ i ) 2 (12)

The smallest value of this criterion corresponds to the best adjustment of the model.

For the external validation (on summer 2015 observed dataset), the following criterion is used to assess the prediction ability of the models [

The Root Mean Squared Errors of Prediction (RMSEP). This criterion is similar to RMSE but, this time, the validation data set is used instead of the training data set.

RMSEP obs = 1 n obstest ∑ i = 1 n obstest ( y i − y ^ i ) 2 (13)

where n obstest is the size of the observed the validation set (obstest).

Obviously, the best predictive model corresponds to the smallest RMSEP.

The following criterion is used to assess the performance of the models with observed meteorological data (obstest) and real meteorological forecast data (prevtest) for summer period of 2015.

In the same way, we define the RMSEP of prevision based on the forecasted dataset as:

RMSEP prev = 1 n prevtest ∑ i = 1 n prevtest ( y i − y ^ i ) 2 (14)

where n prevtest is the size of the sample size of the forecasted data (prevtest).

Experiments were run on an Intel(R) Core(TM) i7-6600U CPU computer with 2.60 GHz, 8 Go in RAM, Windows 10 Professional 64 bits.

All the statistical analyses were performed using the free software R. (http://www.rproject.org/).

In this study, the dataset is composed of 25 explanatory variables. Appendix A gives the abbreviation of these variables.

Variable | Min | Max | Mean | St.Dev | NA |
---|---|---|---|---|---|

TMPMAX | 16.2 | 37.5 | 24.5 | 3.09 | 0 |

TMPMIN | 8.20 | 23.50 | 18.35 | 3.02 | 0 |

TMPMOY | 12.40 | 29.90 | 21.45 | 2.88 | 0 |

RRQUOT | 0.00 | 19.30 | 0.39 | 1.98 | 0 |

DRINSQ | 0.00 | 13.30 | 9.72 | 2.79 | 0 |

HUMREL06h | 50.00 | 100.0 | 87.42 | 8.00 | 0 |

HUMREL12h | 34.00 | 95.00 | 68.32 | 8.78 | 0 |

HUMREL18h | 28.00 | 97.00 | 75.66 | 9.66 | 0 |

PRESTN06h | 9997.7 | 1017.3 | 1008.2 | 2.97 | 0 |

PRESTN12h | 997.7 | 1016.5 | 1008.9 | 2.91 | 0 |

PRESTN18h | 999 | 1016 | 1008 | 2.88 | 0 |

FFVM06h | 0.00 | 4.00 | 1.55 | 0.80 | 3 |

FFVM12h | 0.00 | 6.00 | 3.58 | 0.98 | 4 |

FFVM18h | 0.00 | 7.00 | 3.46 | 1.04 | 4 |

DDVM06degre | 0.00 | 360.0 | 176.4 | 117.87 | 3 |

DDVM12hDEG | 0.00 | 360.0 | 227.3 | 141.63 | 4 |

DDVM18hDEG | 0.00 | 360.0 | 189.2 | 152.21 | 4 |

Vx06 | −2.95 | 3.46 | −0.05 | 1.06 | 3 |

Vx12 | −5.91 | 3.94 | −0.59 | 1.98 | 4 |

Vx18 | −5.91 | 4.50 | −0.10 | 1.84 | 4 |

Vy06 | −4.00 | 4.00 | 0.08 | 1.38 | 3 |

Vy12 | −3.06 | 6.00 | 2.75 | 1.39 | 4 |

Vy18 | −5.36 | 6.00 | 2.79 | 1.36 | 4 |

O3veilleJahid | 10.00 | 130.0 | 52.83 | 25.66 | 23 |

O3Jahid | 10.00 | 130.0 | 52.84 | 25.62 | 23 |

Minimum, maximum, mean and standard deviation statistics are provided to describe the characteristics of the data set.

The 2013 and 2014 studied periods are characterized by high temperatures. We can notice that, in the Grand Casablanca Area, the maximal temperature (TMPMAX) can go up to 37.5˚C and the minimal temperature is 16.2˚C. The maximal daily total sunshine duration is of 13.3 hours. We can also notice that there is almost no rain is these periods (RRQUOT). The Wind strength average is relatively high at 18 hours (FFVM18h = 7 m/s). The O3 concentrations are between 10 and 130 µg/m^{3}.

There are in total 90 missing values for the 366 recording days, distributed on 14 variables. This represents around 2% of missing values to be imputed before the prediction models are performed.

As mentioned above, a strategy based on the K-nearest neighbors was performed. Different K values were used in the literature and the choice of K = 10 led to the best results [

The diagonal entries show the histograms associated with the various variables and the upper entries indicate the coefficients of correlations between pairs of variables.

We also performed a PCA on the imputed dataset. PCA is run on the complete data (2013 and 2014) after imputation and standardization of the variables. The data is composed of 366 days (from April to September of 2013 and 2014) and 24 variables. The first five principal components recover up to 65% of the total variance (

The first PC is linked to wind direction (Vx06, Vx12, Vy12 and Vx18) and pressure variables (PRESTN at 06 h, 12 h and 18 h). We can notice, for example, that variables TMPMAX, TMPMIN and TMPMOY as well as PRESTN06h, PRESTN12h and PRESTN18h are strongly correlated. A strong correlation also exists between variables Vx06, Vy06, Vx18 and Vx12. O3veilleJahid variable is very correlated to O3 Jahid but it is not very well represented in the plan (PC1-PC2).

In this section, we compare the results obtained from the different regression models described in section 2.3, namely: 1) the Multiple Linear Regression (MLR) model applied to all the variables of the dataset (24 variables), 2) the Reduced MLR with seven variables selected by means of Akaike Information Criterion (AIC) [

Component | Eigenvalue | Percentage of variance | Cumulative percentage of variance |
---|---|---|---|

comp 1 | 4.83 | 23.02 | 23.02 |

comp 2 | 3.67 | 17.47 | 40.49 |

comp 3 | 1.91 | 9.11 | 49.60 |

comp 4 | 1.77 | 8.43 | 58.04 |

comp 5 | 1.52 | 7.27 | 65.31 |

comp 6 | 1.30 | 6.19 | 71.51 |

comp 7 | 1.01 | 4.82 | 76.33 |

comp 8 | 0.99 | 4.73 | 81.06 |

comp 9 | 0.83 | 3.93 | 84.99 |

comp 10 | 0.64 | 3.05 | 88.05 |

regressions, 7) Biased Power regression (BP-regression).

A cross validation procedure (LOO) is applied on the data collected during the period extending from April 1^{st} to September 30^{th} in 2013 and 2014 (training data) to determine for each model the parameters (number of components, Ridge and Lasso parameters…) leading to the minimum of the Root Mean Squared Error (RMSE). Then, the prediction ability according to the Root Mean Squared Error Predicted (RMSEP) of the various models is assessed on the basis of: 1) observed data (test data), RMSEP_{obs}, and 2) forecasted data, RMSEP_{prev} from the summer period of 2015.

^{2}, RMSEPobs and RMSEPprev).

Concerning the adjustment of the model on training data (internal validation), not surprisingly, the MLR model leads to the lowest RMSE (9.503), but the other models lead to close values and take into account multicollinearity problem. However, if the goal is to get the best predictive model, the RMSE alone is unsufficient and we need to analyze the RMSEP to assess the predictive quality of each model.

As for the criterion RMSEP_{obs} in external validation, Lasso, Ridge, PLS, BP

Model | MLR | Reduced MLR | PCR | PLS | Sparse PLS | CR | Ridge | Lasso | BP Reg |
---|---|---|---|---|---|---|---|---|---|

Nb var | 24 | 7 | 24 | 24 | 14 | 24 | 24 | 11 | 24 |

Parameter | ncp = 10 | ncp = 5 | ncp = 3 η = 0.56 | ncp = 1 α = 0.1 | λopt = 9.322 | Fract = 0.2 | α = 0.01 | ||

RMSE | 9.503 | 9.587 | 10.521 | 9.59 | 9.703 | 10.11 | 9.537 | 9.676 | 9.535 |

R^{2} | 0.862 | 0.859 | 0.831 | 0.859 | 0.858 | 0.872 | 0.818 | 0.829 | 0.834 |

RMSEPobs | 11.84 | 11.68 | 13.45 | 11.74 | 12.24 | 11.83 | 11.73 | 11.58 | 11.74 |

RMSEPprev | 15.40 | 14.49 | 15.80 | 13.49 | 14.92 | 15.21 | 14.98 | 12.74 | 14.39 |

regressions and CR outperform the other methods. Lasso shows the best predictive ability since it has the smallest RMSEPobs. Moreover, this method of analysis has yet another advantage since the model is based on fewer predictive variables (11 variables such as TMPMAX, TMPMIN, DRINSQ, HUMREL12h, PRESTN06h, FFVM06h, FFVM18h, DDVM12h, Vx06h, Vy06h and O3veilleJahid) than the other models with the exception of the reduced model (7 predictive variables such as TMPMIN, TMPMOY, DRINSQ, PRESTN06, Vx06, Vx12, O3veilleJahid). Among these significant variables, TMPMIN and TMPMOY are strongly correlated so the reduced model does not solve the multicollinearity problem by comparing it to the Lasso model.

Most important are the results of the RMSEP_{prev} based on the forecasted meteorological data for 2015. We recall that the forecasted meteorological data will be the data used on daily basis to predict the O3 concentration as obtained by Aladin-Maroc numerical forecasted model. It turns out that Lasso has by far the best RMSEP_prev (equal to 12.74), a value close to its RMSEP_obs (11.58) followed by the PLS and BP regression model. However, these last two models keep all the predictive variables, unlike the Lasso model, which keeps fewer variables, thus obtaining a model that is simple and easy to interpret.

The most important finding (

ozone concentration.

Starting with a multiple linear regression model, which is plagued by multicollinearity among the predictor variables, we have considered nine more or less recent alternative methods to relate meteorological and pollution variables. The emphasis was put on the prediction ability of the daily tropospheric ozone of these models in the Grand Casablanca area as the first comparative study of its type in such region.

We proposed the selected Lasso model based on a comparison of several linear forecasting methods to reduce the multicollinearity problem. The results obtained over two years of training data (2013 and 2014), verified on observed data (2015) and validated on forecast data (2015) show that the Lasso model has the best predictive capacity O3 for the Jahid station located in Grand Casablanca area. Moreover, using the dataset of 2015, Lasso model still gives the best predictive ability for O3 in Jahid station. The Lasso model presents the interest of being relatively simple and easily interpretable. The choice of this model is explained by the fact that it yields the best criteria in comparison to the alternative models discussed in this paper. These criteria include R², RMSE, RMSEPobs and RMSEPprev. Furthermore, besides yielding a more stable model than multiple linear regression, Lasso is based on a relatively small number of explanatory variables. This feature presents a significant advantage for the daily prediction of the ozone concentration in the Grand Casablanca.

This contribution proposes the first linear model of daily O3 concentration forecast in Morocco and more particularly in the Grand Casablanca area.

In perspective, we plan to widen our study by comparing the performances of the Lasso model with those of other non-parametric models and we will add more data (2017-2018) to ensure model validation. The most appropriate forecast model will be routinely implemented by the National Meteorological Office of Morocco (DMN).

The National Meteorological Office of Morocco (Direction de la Météorologie Nationale DMN) is gratefully acknowledged for providing the necessary data for undertaking the present study.

The authors declare no conflicts of interest regarding the publication of this paper.

Oufdou, H., Bellanger, L., Bergam, A., El Ghaziri, A., Khomsi, K. and Qannari, E.M. (2018) Comparison of Different Regularized and Shrinkage Regression Methods to Predict Daily Tropospheric Ozone Concentration in the Grand Casablanca Area. Advances in Pure Mathematics, 8, 793-812. https://doi.org/10.4236/apm.2018.810049

Abbreviation | Variable | Unit |
---|---|---|

TMPMAX | Maximal temperature | ˚C |

TMPMIN | Minimal temperature | ˚C |

TMPMOY | Average temperature | ˚C |

RRQUOT | Total precipitation | mm |

DRINSQ | Sunshine duration | heure |

HUMREL06h | Relative humidity at 06 h | % |

HUMREL12h | Relative humidity at 12 h | % |

HUMREL18h | Relative humidity at 18 h | % |

PRESTN06h | Pressure at the station level at 06 h | hpa |

PRESTN12h | Pressure at the station level at 12 h | hpa |

PRESTN18h | Pressure at the station level at 18 h | hpa |

FFVM06h | Wind force at 06 h | m/s |

FFVM12h | Wind force at 12 h | m/s |

FFVM18h | Wind force at 18 h | m/s |

DDVM06h | Wind direction at 06 h | degree |

DDVM12h | Wind direction at 12 h | degree |

DDVM18h | Wind direction at 18 h | degree |

Vx06 | Horizontal wind at 06 h | m/s |

Vx12 | Horizontal wind at 12 h | m/s |

Vx18 | Horizontal wind at 18 h | m/s |

Vy06 | Vertical wind at 06 h | m/s |

Vy12 | Vertical wind at 12 h | m/s |

Vy18 | Vertical wind at 18 h | m/s |

O3veilleJahid | Ozone concentrations of the day before | µg/m^{3} |

O3veille | Ozone concentrations | µg/m^{3} |

Variables | Complete Reg | Reduced Reg | PCR | PLS | SPLS | CR | Ridge | Lasso | BP Reg |
---|---|---|---|---|---|---|---|---|---|

TMPMAX | 24.55 | 0.00 | 0.03 | −1.06 | −1.41 | −1.74 | −2.52 | −0.55 | 21.62 |

TMPMIN | 30.45 | 6.99 | 1.13 | 1.09 | 1.60 | 2.16 | 2.85 | 0.79 | 27.38 |

TMPMOY | −51.62 | −6.55 | 0.61 | −0.001 | 0.08 | 0.17 | −0.003 | 0.00 | −45.91 |

RRQUOT | −0.08 | 0.00 | 0.84 | 0.27 | 0.00 | 0.00 | −0.03 | 0.00 | −0.08 |

DRINSQ | 2.15 | 2.11 | −1.47 | 0.66 | 0.92 | 1.66 | 1.97 | 1.02 | 2.08 |

HUMREL06h | 0.51 | 0.00 | −0.40 | −0.43 | 0.00 | 0.10 | 0.36 | 0.00 | 0.48 |

HUMREL12h | 0.27 | 0.00 | 1.37 | 0.84 | 0.00 | 0.53 | 0.33 | 0.14 | 0.28 |

HUMREL18h | −0.62 | 0.00 | −1.33 | −0.36 | 0.00 | −0.34 | −0.48 | 0.00 | −0.59 |

PRESTN06h | −1.46 | −1.64 | −0.51 | −1.04 | −0.65 | −0.97 | −1.15 | −0.87 | −1.42 |

PRESTN12h | −0.02 | 0.00 | −0.46 | −0.89 | −0.41 | −0.43 | −0.28 | 0.00 | −0.05 |

PRESTN18h | 0.08 | 0.00 | −0.54 | −0.85 | −0.39 | −0.15 | 0.06 | 0.00 | 0.06 |

FFVM06h | 0.27 | 0.00 | −0.19 | 0.64 | 0.45 | 0.41 | 0.31 | 0.26 | 0.28 |

FFVM12h | 0.54 | 0.00 | 0.37 | −0.03 | 0.69 | 0.31 | 0.42 | 0.00 | 0.52 |

FFVM18h | 0.41 | 0.00 | 0.35 | 0.47 | 1.18 | 0.43 | 0.41 | 0.52 | 0.41 |

DDVM06deg | 0.05 | 0.00 | 0.66 | 0.45 | 0.56 | 0.22 | 0.11 | 0.00 | 0.07 |

DDVM12hDEG | −0.11 | 0.00 | −0.87 | −0.60 | 0.00 | −0.36 | −0.19 | −0.11 | −0.11 |

DDVM18hDEG | −0.58 | 0.00 | −0.18 | −0.09 | 0.00 | −0.39 | −0.54 | 0.00 | −0.56 |

Vx06 | −1.33 | −1.35 | −1.16 | −1.23 | −0.94 | −1.34 | −1.31 | −0.82 | −1.31 |

Vx12 | 1.28 | 1.10 | 1.03 | 0.39 | 0.00 | 0.62 | 0.98 | 0.00 | 1.22 |

Vx18 | −0.70 | 0.00 | 0.38 | −0.24 | 0.51 | −0.61 | −0.71 | 0.00 | −0.68 |

Vy06 | 0.69 | 0.00 | −2.20 | 0.26 | 0.00 | 0.79 | 0.73 | 0.53 | 0.69 |

Vy12 | −0.97 | 0.00 | 1.86 | 0.41 | 0.00 | −0.19 | −0.69 | 0.00 | 0.91 |

Vy18 | 0.24 | 0.00 | 1.75 | 1.18 | 0.00 | 0.67 | 0.36 | 0.00 | 0.27 |

O3veilleJahid | 23.36 | 23.26 | 22.34 | 23.03 | 23.31 | 23.21 | 22.75 | 23.14 | 22.97 |