^{1}

^{1}

^{*}

^{1}

^{2}

^{1}

This contribution shows the feasibility of improving the modeling of the non-linear behavior of airborne pollution in large cities. In previous works, models have been constructed using many machine learning algorithms. However, many of them do not work for all the pollutants, or are not consistent or robust for all cities. In this paper, an improved algorithm is proposed using Ant Colony Optimization (ACO) employing models created by a neuro-fuzzy system. This method results in a reduction of prediction error, which results in a more reliable prediction models obtained.

In recent years, the environment has been affected by the presence of particulate pollutants such as the Ozone (O_{3}), NO_{2} nitrogen oxide, carbon monoxide CO, sulfur dioxide (SO_{2}) and particulate matter less than 10 microns PM_{10} (≤10 microns) and Particles less than 2.5 micrometers PM_{2.5 }(≤2.5 microns) [

During several years the air quality in Mexico City for prevention of toxicity levels in health and the environment have been observed and evaluated [

Measures were performed to obtain information efficiency and reliability of the air quality. Some of these contributions consists in predicting pollution levels such as the work of Cortina [_{3}), (PM_{10}) and nitrogen dioxide in Mexico City.

Patterns of pollution levels do not show linear behavior [

A Fuzzy Inference System (FIS) is basically a logic that allows intermediate values defined between conventional evaluations as Yes/No, True/False, etc.

A FIS is the main unit of a fuzzy logic system, the part where the decision is made, by constructing appropriate rules based on the theory of fuzzy sets. The rules are constructed linguistically (IF-THEN-) having the general form of “If A Then B” where A and B are (collections of) propositions containing linguistic variables linked by connectors (AND, OR) to make the correct decision-making [

A fuzzy inference system consists of 5 blocks as described in

Fuzzy inference system operates as follows: The input data are treated by a method of fuzzification then the Fuzzy rules are formed and the already fuzzyfied input data are analyzed. Defuzzification method is used to convert the value obtained after analysis with fuzzy rules to obtain an output value applicable to the real world.

The steps taken by fuzzy reasoning, i.e. inference operations on rules IF - THEN performing the FIS, are:

· Compare the input variables with the membership functions for the linguistic value of each part, this process is called fuzzyfication.

· Combine by an operator (

· Add the resulting output to produce a crisp, this step is called defuzzification.

A fuzzy rule has a general structure in a fuzzy model as shown on equation (1):

where

A typical rule in FIS model has the form as shown in equation (2) [

The final output of the FIS system is the weighted average of all the outputs of the rules:

The Takagi-Sugeno model or TSK model proposed in 1985 [

A Fuzzy system needs a precedent and consequent to express a logical connection between the input and output data used as a basis to produce the desired behavior of the system.

Clustering is the grouping of data based on certain criteria that clusters have similar properties and characteristics, clusters are based mainly on the distance interpreted as similarity [

Fuzzy C-Means (FCM) is a clustering method which allows a data to belong in one or more clusters, this method developed by Dunn [

where

Starting with the desired number of cluster

To measure the effect of clustering algorithm, validity and accuracy are required. Validity: If the algorithm can find all internal type in the data collected. Truth: If the algorithm can set the same kind of data to the same group, different types of data at different clusters. We define the rate of decline of Clustering

After determining the number of clusters, the value of m, and the stopping criterion the FCM algorithm performs two steps, first calculates the membership functions using equation (5), as a second step updates the proto types using equation (7) the two steps are repeated iteratively to achieve the stability criterion, in which the change of the output values by the previous equations is minimal. The

This method is a type of neural network that is based on a method for the process of fuzzy modeling to learn information about a data set by generating fuzzy rules, this is called neuro-fuzzy, this architecture was proposed by Jang in 1993 [

The classical ANFIS consists of 5 layers with specific tasks as show in

First Layer (Layer 1). Layer one is performed the input fuzzification. That is, each entry will set a value of belonging that only covers the total of the input variables to be treated [

While:

While the parameters

Second Layer (Layer 2 from

For all the terms nodes

Third layer (Layer 3 from figure 3). Normalizes the membership functions (MFs).

The output of the

Fourth layer (Layer 4 from figure 3) executes the consequent part of the fuzzy rules. Each node

The weight

Layer 5 computes the output of fuzzy system by summing up the outputs of layer fourth.

The optimization algorithm is a numerical method which finds a value

As mentioned in [

Exhaustive optimization techniques are those that guarantee always find the optimal (maximum or minimum) walking in the worst case the whole solution space (which can be considerably large) [

Non-exhaustive techniques rely on getting good enough solutions without exceeding the time constraints established or memory. Solutions are based on steps taken to find new solutions that approximate the optimal solution more, increasing quality.

Some of the best known are:

· Genetic Algorithms

· Taboo search

· Ant Colony Optimization

· Greedy Randomized Adaptive Search Procedure (GRASP).

· Scatter Search

· Simulated Annealing

The ant colony optimization (ACO) is a bio-inspired technique of classification based on the external behavior of a real ant colony when foraging. This algorithm was created in 1992 by Dorigo [

The ACO algorithm solves complex problems of combinatorial optimization in various fields of engineering, commerce, industry, among others. When a problem does not have a polynomial equation that describes it belongs to the NP-hard problems. The results provided by ACO can be approximated to address the combinatorial optimization problemsas shown in the

The probability of an ant

where:

Updated Trail

Pheromone must be updated because this changes which followed the iterations by offering increased likelihood based on the accumulated experience [

In this case the level of pheromone raises based on the best solution path according to equation (14)

where:

Accumulating trace is proportional to the quality of solutions.

Every time an ant travels a node applies the rule:

The stop condition

This process is iterative until it stops for compliance with established criteria

In

The list of the sites used in this contribution is shown on figure 5, whilst the proposed methodology for this contribution is shown in the figure 6.

The monitoring stations used for this case study were chosen due to data availability and specific industrial or polluted areas.

Since obtaining data from such stations is filtered data, i.e., data values are complete and validated readings

are sought.

The data are arranged in time series along the day; hence the clustering is performed to generate the characteristics of groups depending on pollution levels. These are grouped according through the fuzzy

The fuzzy system is improved with training data, which in this case, the training data are data from a previous year validated dataset. The training is through the use of neuro-fuzzy system AN-FIS which generates a more robust data modeling system.

The training is carried out using of neuro-fuzzy system ANFIS which generates a more robust data modeling system.

Three models for the prediction of air quality are generated; these models are subject to the ant colony optimization algorithm to improve the prediction closer to an outcome more accurately. Such three models along with the real measured data and its corresponding prediction are shown on

The mean square error for each model generated with neuro-fuzzy and neuro-fuzzy with ACO to measure the difference between the estimated and the model.

The models are created using Fuzzy Logic where the rules are formed by clustering, the ANFIS to improve each model and make each one more accurate and the ACO algorithm use these three models to improve the forecast

having different options base on the three previous models. The models are named Mod 2 Mod 1 and Mod 3, respectively. These models are generated based on the characteristics of the stations, i.e. stations show greater similarity between their measurements generate a model.

The results are shown on table 1 and table 2. CO is sensed to the northwest, with three models generated

Station 1 | Station 2 | Station 3 | Station 4 | |
---|---|---|---|---|

Mod 1 | 0.845 | 0.431 | 0.678 | 0.652 |

Mod 2 | 1.732 | 0.591 | 0.572 | 0.890 |

Mod 3 | 0.97 | 2.540 | 1.041 | 0.860 |

Mod ACO | 0.882 | 0.373 | 0.559 | 0.594 |

Station 1 | Station 2 | Station 3 | Station 4 | |
---|---|---|---|---|

Mod 1 | 0.0067 | 0.0143 | 0.0116 | 0.0103 |

Mod 2 | 0.0193 | 0.0124 | 0.0283 | 0.0179 |

Mod 3 | 0.0078 | 0.0213 | 0.0407 | 0.0718 |

Mod ACO | 0.0074 | 0.0100 | 0.0103 | 0.0117 |

based on the area in 2009, and data validation are the stations belonging to the same area but in 2010. Likewise the table 2 is showing the models of Ozone (O_{3}). In the following tables the mean square error, it can be seen that is closer to the real value, the prediction of stations Station 1, Station 2, Station 3 and Station 4 based Mod 1, Mod 2 and Mod 3 models. The station in which there was no good approximation was at Station 1.

The figure 7 shows the comparison between the models created by the trained logic system with ANFIS and the ACO model. This to appreciate the difference between the accuracy for each model generated with neuro- fuzzy and neuro-fuzzy improved with ACO.

The algorithms based on swarm intelligence are feasible for solving problems different than the typical machine learning methods. It has shown that a combination of modeling and optimization methods may be used to improve the prediction for this type of non-linear problem. Such combination of methods showed that systems working cooperatively is well complemented and integrated to generate a task system.

For these reasons ACO is an algorithm that provides an improvement in the prediction of contamination if used properly. ACO prediction has a point of view on the combinatorial type issue, and thus, the use of this algorithm improves the contaminants prediction by making the shortest search and the use of pheromone provides a best approximation to real data.

This work was made possible through funding provided by the National Council of Science and Technology CONACYT.