Forecast Urban Air Pollution in Mexico City by Using Support Vector Machines : A Kernel Performance Approach

The development of forecasting models for pollution particles shows a nonlinear dynamic behavior; hence, implementation is a non-trivial process. In the literature, there have been multiple models of particulate pollutants, which use softcomputing techniques and machine learning such as: multilayer perceptrons, neural networks, support vector machines, kernel algorithms, and so on. This paper presents a prediction pollution model using support vector machines and kernel functions, which are: Gaussian, Polynomial and Spline. Finally, the prediction results of ozone (O3), particulate matter (PM10) and nitrogen dioxide (NO2) at Mexico City are presented as a case study using these techniques.


Introduction
In recent times, urban air pollution has been a growing problem especially for urban communities.Size, shape and chemical properties govern the lifetime of particles in the atmosphere and the site of deposition within the respiratory tract.Health effects differ upon the size of airborne particulates [1].In this contribution, PM10 (particles less or equal than 10 micrometers) and PM2.5 (particles less or equal than 2.5 micrometers), Ozone and Nitrogen dioxide are considered due to its effect on human health.This is the primary reason why this research has been done: to monitor, and model the levels and spread of harmful particles in urban environments.
In previous contributions, it has been shown that forecast of concentration levels of PM10 may be possible by using other techniques such as neural networks and various fuzzy clustering algorithms [2].However, there are other harmful particles such as Ozone and Nitrogen dioxide, making it essential to accurately model the nonlinear behavior of the system, by designing a more robust model with an enhanced method to reduce the error between the raw data and the model.For this reason, support vector machines (SVM) are chosen for this work.In this appraisal, the non-lineal behavior will be modeled using support vector machines working in regression mode.
Support vector machines are a recent statistical learning technique, based on machine learning and generalizetion theories, it implies an idea and could be considered as a method to minimize the risk.
A kernel approach is discussed, and the results for the forecast at Mexico City are illustrated.This is carried out using PM10 particles O 3 and SO 2 .Finally, the results obtained by SVM kernel algorithms are validated using the raw data from the monitoring stations located at various sites in Mexico City.
Mexico City is one of the largest urban areas in the world, with over 20 million inhabitants within the city and an annual growth rate of between 3.3% and 5%.Also, Mexico City has an area of approximately 1300 km 2 and is naturally open to the north and enclosed by mountains, 1000 m in height above the city, to the south, east and west.As most large cities located in valleys and surrounded by mountains, it has air pollution problems for certain particles.
Mexico City is a dry region of moderate climate with a diurnal pattern of winds blowing from the northwest and the northeast.The rainy season lasts from June to Octo-ber.The industrial area comprises more than 30% of the whole national industry and is mostly located in the northern sectors of the city.
The high levels of fine particulate matter in Mexico City are of concern since they may induce severe public health effects [1].

Related Work
In this section, a list of significant works on soft computing, machine learning and computational methods designed for urban airborne pollution forecasting are listed.Several of this works present a mixture of methods such as neural networks, support vector machines and fuzzy inference systems, with other techniques.
For instance, Kolehmainen [3] has constructed a model using Self-Organized Map (SOM), Sammon's mapping and fuzzy distance metrics to forecast levels of NO 2 , CO and PMx, while works such as the one carried out by Pokrovsky [4] shows a fuzzy logic based method to study the impact of meteorological factors on the evolution of air pollutant levels in order to describe them quantitatively.
In general, soft computing models were carried out using a mixture of two models such as: Box-Jenkins methods and ANN [9], Hidden Markov Model with Fuzzy Logic [10].
In previous contributions, urban air pollution models have been carried out using different fuzzy clustering techniques and fuzzy inference systems on particulate matter of less than 10 µm in diameter (PM10) [2].
Lastly, in terms of forecasting airborne pollution using support vector machines the work of Osowski [11] can be mentioned.In this work, the authors present a method for daily air pollution forecasting by using SVMs and wavelet decomposition.However, in such work only Gaussian kernel is shown.
In this contribution, support vector machines are used to construct models to forecast pollution levels using dissimilar particles, to determine performance for the kernels used (Gaussian, Polynomial and Spline).

Urban Airborne Pollution
The health impact of air pollution became apparent during episodes in the USA and Europe in 1952 and 1958.Subsequent analysis of date for the London winters of 1958-1971 demonstrated that mortality and morbidity were associated with air pollution.The ability to measure the environmental health effects of pollution has im-proved over the last several decades, owing to advances in pollution monitoring and in statistical techniques [12].
The sources of air pollutants are numerous and varied.Three categories of sources may be defined: 1) natural (those that are not associated with human activities); 2) anthropogenic (those produced by human activities); and 3) secondary (those formed in the atmosphere from natural and anthropogenic air pollutants) [13].
Most major pollutants can alter pulmonary function in addition to other health effects when the exposure concentrations are high.This is especially severe in vulnerable sectors of the population such as children asthmatic and the elderly and has been vastly documented [14][15][16][17].
In this work, five particles were chosen due to the site's availability and toxicity: Ozone, Nitrogen Dioxide (NO 2 ) and Particulate Matter of less than 10 micrometers in diameter (PM10).The datasets are separated according to month of the year and type of particle.There is one data for each hour, for each particle for all five sites, making it difficult to extract information from datasets using commonly used methods, hence the importance to use novel methods for data extraction and analysis such as the one used in this work, especially when dealing with the non-linear behavior of airborne particle concentration.

Ozone
Ozone is a natural atmosphere component that is found on low concentrations and is crucial for life.Air pollution caused by high concentration of ozone is a common problem in large cities throughout the world [18].
Mexico City is among the ones suffering from this problem.It is a well-known fact that individuals exposed for a long period of time to high concentration of ozone may experience serious health problems [19].Epidemiology studies have found associations between daily ozone levels and the hospital admission [20].This pollutant is associated with respiratory symptoms specially coughing.This is aggravated in patients with asthma [21].

Nitrogen Dioxide
Nitrogen Dioxide (NO 2 ) is a particularly important compound, not only for its health effects, but also because absorbs visible light and contributes to the visibility decrease.It also plays a critical role in production of ozone because the photolysis of NO 2 is the initial step in the photochemical reaction of the ozone [13,18].
In nature, there is a nitrogen dioxide concentration of 10 to 50 parts per billion (ppb).However, the high levels of nitrogen dioxide are due to industrial processes and fossil sources.Furthermore, motor vehicles substantially contribute to urban levels of nitrogen oxides through their engine combustion processes [22].According to several authors, the monitoring of NO 2 is critically important, in order to assess the potential effect of NO 2 on human health and ecosystems, as well as developing strategies for the effective control of NO 2 pollution [23][24][25].

PM10
The airborne particulate matter (PM) is a mixture of small particles and liquid droplets suspended in the atmosphere, which contributes significantly to the urban air quality such as acid rain and visibility degradation [26].
In airborne pollution, these particles could be any solid or liquid materials with a diameter between 0.002 and 500 micrometers (µm).Airborne particulates of 10 μm diameter and less are of concern from the perspective of air pollution [27].A variety of national and worldwide standards, directives and guidelines exist to define acceptable particulate levels in the air.These types of particles are classified according to their effect on human health and their physical characteristics [28,29].

Support Vector Machines
The support vector machines (SVM) theory was developed by Vapnik [30].This method is applied in many machine-learning applications such as object classification, time series prediction, regression analysis and pattern recognition.Support vector machines (SVM) are based on the principle of structured risk minimization (SRM) [31,32].
In the analysis using SVM, the main idea is to map the original data x into a feature space F with higher dimensionality via non-linear mapping function , as shown in Figure 1, which is generally unknown, and then carry on linear regression in the feature space [33,34].
Thus, the regression approximation addresses a problem of estimating function based on a given data set (where x i represent the input vectors, d i are the desired values), which is produced from the  function.SVM method approximates the function by: where w = [w 1 ,···,w m ] represent the weights vector, b are the bias coefficients and (x) = [ 1 (x),···, m (x)] the basis function vector.The learning task is transformed to the weights of the network at minimum.The error function is defined through the -insensitive loss function, L  (d,y(x)) and is given by: The solution of the so defined optimization problem is solved by the introduction of the Lagrange multipliers , ) responsible for the functional constraints defined in Equation ( 2).The minimizetion of the Lagrange function has been changed to the dual problem [35]  With constraints: where C is a regularized constant that determines the trade-off between the training risk and the model uniformity.
According to the nature of quadratic programming, only those data corresponding to non-zero   i i     pairs can be referred to support vectors.In Equation 3 is the inner product kernel which satisfies Mercer's condition [13] that is required for the generation of kernel functions given by: ing a principled method for model selection [31,35].
A polynomial mapping is a widely used method for non-linear modeling [35], defined by: (8) Thus, the support vectors associates with the desired outputs y(x) and with the input training data x can be defined by: Unless the used of equation 8 implies an inherit problem, some support vector machines become zeros, therefore is preferable to rewrite the expression as: In this survey, a Spline kernel is presented as a choice for modeling due to their flexibility.A spline, of order with N knots located at  s is expressed by: where x i are learning vectors.This leads to a SVM architecture (Figure 2) and are also founded in [2].

Kernel Functions
) The use of an appropriate kernel is the key feature in support vector applications, since it provides the capability of mapping non-linear data into "feature" spaces that in essence are linear, then an optimization process can be applied as in the linear case.This provides a means to dimension the problem properly; nonetheless, the results still depend on the good selection of a set of training datasets.
If k = 1 and the Spline function is defined as: where the solution is a piecewise cubic.The Gaussian kernel function is defined in [11,30] Equation (7).

Other Considerations
There are other considerations when working with SVMs on regression mode.The most important are: Bias Analysis, Free parameters and the quadratic problem.These issues will be discussed in this section.
The Gaussian kernel process delivers an estimate for the reliability of the prediction in the form of the variance of the predictive distribution and the analysis can be used to estimate the evidence in favor of a particular choice of covariance function.The covariance or kernel function can be seen as a model of the data, thus provid-The inclusion of a bias within the kernel function generally leads to a more efficient implementation and a slightly better accuracy model [11].Conversely the solutions achieved with an implicit or explicit bias are not the same.This dichotomy emphasizes the difficulties whit  the interpretation of generalization in high dimensional feature spaces.In this work the explicit bias approach is used.
Other important issue in support vector applications is the selection of free parameters such as the coefficient of C (regularization constant) and the value of error .The regularization constant is the weight, determining the balance between the complexity of the network, while  is the margin within the error is neglected and in the Gaussian kernel function the value of variances  [11].In previous contributions [35], it was found the optimum regularization coefficient to be 100 and the error  of 0.5.
In terms of the quadratic programming (QP), it becomes a problem when the number of data points exceeds a certain quantity (e.g.2000).For SVM training with small data points works flawlessly [32,36].
In the study case of this survey, the number of data points is 365, where each data point represents the daily average of PMx concentration.Therefore the analysis and solving of the QP problem is not considered in the scope of this survey.

Proposed Methodology
The proposed Methodology has been taken from Lu [36] and Wang [37].This methodology provides the general steps to make pollutants modeling and predictions by using SVM working in regression mode.In this survey Gaussian, Polynomial and Spline kernel functions are used.The aim of this work is to provide a natural representation of the system behavior, comparing the performance of the kernels used for this particular case study.In order to perform an appropriate design, training, and testing of SVM this article describes a generic methodology based on a review of [30,32], as shown in Figure 3.
The steps taken based on the methodology shown in Figure 1 are as follows: Preprocessing of the input data by selecting the most relevant features, scaling the data in the range [−1, 1], and checking for possible outliers.
Selecting an appropriate kernel function that determines the hypothesis space of the decision and regression function.
Selecting the parameters of the kernel function, in polynomial kernels the degree for polynomials and the variances of the Gaussian kernels respectively.
Choosing the penalty factor C and the desired accuracy by defining the ε-insensitive loss function.
Validating the model obtained on some previously, during the training, unseen test data, and if not pleased iterate between steps "c" or, eventually "d".
The fundamental reason for considering SVM working in regression mode as an approach for urban air pollution modeling is the non-linear aspect of the application.
There is no predetermined heuristic for the choice of free parameters and design for the SVM, many applications appear to be specific, in order to improve the SVM performance through the automatic adjustment of free parameters.Using SVM on real time applications appears to be rather complex due to the computational demands of the deriving results.

Experimental Results
Table 1 shows the forecasting results of Ozone considering all three kernels discussed in Section 3.2.1 for every month of the year.Here is shown the number of SVM used con construct the model for the specific kernel of the giving month.The bigger the number of SVM used, the bigger the computational cost to compute the model.Hence, the wanted accuracy (ratio of number of predicted and unpredicted points) has to be as high as possible keeping a small number of SVMs.
In this table, is shown that the number of support vector machines for the Gaussian kernel varies from 4 (February 2009) to 10 (December 2009).Although the accuracy is high for most predictions, it is worth noticing that both November and December has an accuracy far below the average (83.3% and 83.8, respectively) and the number of SVMs is also much higher than most of the other months, with the only exception for this for July.The best models with this kernel could be seen for February and June, where the accuracy is high (89.2% and 90%, respectively) maintaining a low computational cost with only 4 SVMs.
In terms of the polynomial kernel, the mode of number of kernels is between 5 and 6, with a lowest accuracy of 74.2% and the highest of 90.3 for some months.It is worth noting that it has a low accuracy (less than 80%) for the last three months of the year, a similar perform-ance than the Gaussian kernel where the last months show a lower accuracy than the rest of the months.Furthermore, for the spline kernel, a low number of SMVs were needed to model Ozone and the accuracy varied from 67.7% (in December) to 93.5% (in March).In general, the lowest accuracy was shown for the last months of the year (especially December) for all three kernels.The kernel that shows a better steadiness was Gaussian for this particle, although for some months the number of SVMs was higher with its substantial increase of the computational cost.An example of a SVM forecasting for ozone using Gaussian kernel is shown in Figure 4.In Table 2 is shown the forecasting results for Nitrogen Dioxide for all three kernels.In this table, is shown that for this particle the highest accuracy is also for Gaussian kernel, which replicates the results for ozone.In general, the spline kernel shows the lowest number of SVMs to represent the model, but also the lowest accuracy in terms of percentage.An example of this behavior may be seen in Figure 5. Also, it can be noted that regardless of the kernel, the lowest accuracy for that specific kernel is located in the last months of the year (es-pecially for December).This is also consistent with the forecasting results for ozone (Table 3).
In terms of the forecasting results for PM10, it can be noted that the Gaussian kernel works better in terms of accuracy in spite of the high number of SVMs to construct the model.Both spline and polynomial kernels work relatively accurate with a lower number of SVMs, with a few exceptions (e.g.Spline Kernel for April gives 10 SVMs with only 67.7% accuracy).An example of this behavior for spline kernel is shown in Figure 6.Factors such as bias and free parameters were considered to construct the models.However, looking at the results, a trade-off must be made between computational cost in terms of number of support vector machines and accuracy.Looking at the results, it can be inferred that Gaussian kernel works better providing that the time to compute the results is not an issue.In general, Polynomial kernel does not offer an adequate performance in comparison with Gaussian for this particular case study.
As future work, implementing other kernel functions such as genetic, wavelet-based, principal component analysis (PCA), among others, may be considered for future contributions.Also a real-time prediction may be carried out using sensor networks and embedded systems.

Figure 1 .
Figure 1.Feature map can simplify the classification and regression tasks.(a) Input space; (b) Feature space.

Figure 4 .
Figure 4. Forecasting of O 3 concentration in January of 2009 using Gaussian kernel.

Figure 5 .
Figure 5. Forecasting of O 3 concentration in January of 2009 using Polynomial kernel.

Figure 6 . 6 .
Figure 6.Forecasting of O 3 concentration in January of 2009 using Spline kernel.6.Conclusions and Future WorkThis method presents a feasible modeling technique of the monthly atmospheric pollution by applying the support vector machine with Gaussian, Polynomial and Spline kernels functions working in regression mode.The application of SVM has enabled to obtain a good accuracy in modeling pollutant concentration of O 3 , NO 2 and PM10 in Mexico City.The methods, techniques and alternatives offered in the SVM field provide a flexible and scalable tool for implementing sophisticated solutions with implied dynamical and non-linear data.It is