^{1}

^{1}

^{*}

^{1}

^{1}

^{1}

The airborne pollutants monitoring is an overriding task for humanity given that poor quality of air is a matter of public health, causing issues mainly in the respiratory and cardiovascular systems, specifically the PM10 particle. In this contribution is generated a base model with an Adaptive Neuro Fuzzy Inference System (ANFIS) which is later optimized, using a swarm intelligence technique, named Bacteria Foraging Optimization Algorithm (BFOA). Several experiments were carried with BFOA parameters, tuning them to achieve the best configuration of said parameters that produce an optimized model, demonstrating that way, how the optimization process is influence d by choice of the parameters.

The present work proposes a method to model the particulate matter concentrations using the BFOA; this method is considered as a novel method since it has not been found in the literature an application of the BFO algorithm in the problem of modeling the concentration of particulate material. Likewise, another contribution of the present work is to demonstrate how the adjustment of the parameters of the algorithm affects the result and the way in which each of these parameters individually influences said results.

The methodology is basically to use BFOA as optimizer of a base model generated with another technique, ANFIS. The model generated with ANFIS presents some inaccuracies since it is unstable with highly non-linear problems such as the one that is to be modeled in this work. This is why this method was devised where the accuracy of the base model is improved. Once the model optimized with BFOA is generated, it will be compared against that generated with ANFIS.

The use of an algorithm that has several agents or swarm intelligence, such as BFOA, gives us the opportunity to find an optimal solution since it involves several, relatively simple agents exploring the study area, thus having a greater probability of finding the optimal global values avoiding the problem of getting stuck in a local solution as it happens with other methods such as neural networks, as well as being robust, flexible systems without central control that issues orders to system agents [

To better understand what the problem is, it is important to define some concepts, which are presented below.

Air quality is an essential issue for humanity since the industrial age, and nowadays it is more relevant than ever. Although pollutant concentrations are decreasing worldwide, since countries such as Japan, United States and Brazil, showed a decrease in the concentration of pollutants, there are still less developed countries that have the poorest air quality [

The pollutants that are monitored include gases such as ozone (O_{3}), nitrogen dioxide (NO_{2}), sulfur dioxide (SO_{2}) and particulate matter (PM2.5 and PM10) [

The risks of air pollution not only include pulmonary diseases like asthma or even lung cancer, but also the effects of the air pollution, which are related to the appearance of cardiovascular disease, specifically, the pollution of particulate matter, because of its size, which is in the order of the micrometers [

The particulate matter is classified as PM2.5, which is 2.5 μm (micrometers) of aerodynamic diameter, and PM10, which has 10 μm of diameter; this diameter makes them suitable to be inhaled by humans, causing even deaths on the vulnerable population [

In addition, there have been developed methods to model the behavior of the PM10 particles specifically; these methods include artificial neural networks to predict the concentration on the pollutant 24 hours in advance [_{3} forecasting [

The social organisms like ants, bees and bacteria colonies perform common tasks as a society like gather food, nest building, among other tasks, for the wellness of the community, also, have the ability of self-organization forming decentralized swarms. The term Swarm Intelligence (SI), first appeared in the late 80’s of the last century [

In this contribution, a Bacterial Foraging Optimization Algorithm (BFOA) is used to model the behavior of PM10 in Mexico City.

Bacterial Foraging Optimization AlgorithmThe process of foraging of the Escherichia Coli Bacteria inspires the Bacterial Foraging Optimization Algorithm (BFOA) [

BFOA have been already accepted as an optimization algorithm and its efficiency has been demonstrated in several areas. For instance, its application in the electric engineering a control field [

The following sections of this paper present the materials and methods for the development of the environmental particle concentration PM10 behavior model, explaining where the data used comes from and how they are used in the generation of the optimized model, as well as a detailed explanation of the methodology developed for this application. Finally, the results are presented explaining how the adjustment of the parameters of the BFOA algorithm was made, as well as its final configuration to obtain an optimized model.

The data used to build the model were obtained from the Atmospheric Monitoring System (“Sistema de Monitoreo Atmosférico”, SIMAT) [

Likewise, atmospheric data were taken from the Meteorology and Solar Radiation Network (REDMET), which is a subsystem of SIMAT. From the REDMET, the data of temperature (TMP), relative humidity (RH), wind direction (WDR) and wind speed (WSP) are used. These data are part of the factors for modeling the pollutant concentration [

SIMAT has monitoring stations distributed over different areas of the city. These monitoring stations collect information on concentrations of pollutants and atmospheric conditions every hour.

The model was validated using data from the same stations of 2015 and data of the year 2017 to evaluate the model performance.

Data | Unit of measurement | Subsystem |
---|---|---|

PM10 | µg/m^{3} | RAMA |

TMP | Degrees Celsius (˚C) | REDMET |

RH | Percentage (%) | REDMET |

WDR | Azimut Degrees | REDMET |

WSP | Meters/second (m/s) | REDMET |

ID | Name | Location |
---|---|---|

CHO | Chalco | State of Mexico |

CUA | Cuajimalpa | Mexico city |

CUT | Cuautitlán | State of Mexico |

FAC | FES Acatlán | State of Mexico |

HGM | Hospital General de México | Mexico city |

MGH | Miguel Hidalgo | Mexico city |

SAG | San Agustín | State of Mexico |

The approach of this work is the optimization of an existing model (base model), applying the bacterial foraging optimization algorithm as an optimization method.

The main idea about optimizing the model is taking an existing model, which its accuracy can be improved using it as a start, namely, the base model, the proposed technique to generate this base model is an adaptive neuro fuzzy inference system (ANFIS). Fuzzy logic has been used in the past as an optimization method [

ANFIS is a type of artificial neural network that includes a Takagi-Sugeno fuzzy inference system, that kind of systems have been used in the past for real-time object identification [

ANFIS constructs a fuzzy inference system (FIS) given a set of data of the type

input/output, and the membership functions parameters are adjusted using a backpropagation algorithm or in combination with a least squares method.

A FIS can be defined as a set of fuzzy rules of the type IF-THEN, which are expressions with the form IF A THEN B, where A and B are the labels of fuzzy sets [

As an example, suppose that we have two inputs (x, y) and an output, f, and has five layers to construct the model. Each layer has several nodes that can be adaptive (squared nodes) or fixed (circled nodes) [

Layer 1. It is the fuzzy layer and converts the inputs of the model into fuzzy sets by means of membership functions (MF) and the functions of the node are described as:

O 1 , i = μ A i ( X 1 ) for i = 1 , 2 (1)

O 1 , i = μ B 1 − 2 ( Y 1 ) for i = 3 , 4 (2)

where X 1 and Y 1 are the input nodes, A and B correspond to the linguistic labels associated with these nodes, μ ( X 1 ) and μ ( Y 1 ) , are the membership functions (MF), the parameters of this layer are called premise parameters.

Layer 2. The nodes in this layer are fixed; the function of each node is multiplied by the input signals, which serves as an output signal and are labeled with Π.

O 2 , i = w i = μ A i ( X 1 ) ⋅ μ B 1 − 2 ( Y 1 ) for i = 1 , 2 (3)

where O 2 , i is the output of the layer, and w i represents the firing strength of the rule.

Layer 3. In the layer the nodes are also fixed, they are labeled with N; its function is to normalize the firing strength, calculating the proportion of the ith firing strength to sum the firing strength of all the rules.

O 3 , i = w ¯ = w i w 1 + w 2 for i = 1 , 2 (4)

where O 3 , i is the output of the layer 3, w ¯ and is the normalized firing strength.

Layer 4. The nodes of the layer are adjustable and are defined by

O 4 , i = w ¯ i ⋅ f i for i = 1 , 2 (5)

where f 1 y f 2 represent the fuzzy rules IF-THEN that are defined like this:

Rule 1. IF X 1 is A 1 and Y 1 is B 1 , THEN f 1 = p 1 X 1 + q 1 Y 1 + r 1

Rule 2. IF X 1 is A 2 and Y 1 is B 2 , THEN f 2 = p 2 X 1 + q 2 Y 1 + r 2

where p i , q i y r i are parameters that are already set and denoted as consequent parameters.

Layer 5. The nodes are fixed and labeled with ∑, their function is to calculate the total output and is defined by:

O 5 , i = ∑ i w ¯ i ⋅ f i = ∑ i w i f i w i = f o u t = total output (6)

ANFIS has a very simple learning rule, the backpropagation; this rule calculates recurrently the error signals, starting from the output layer (Layer 5) to the input layers (Layer 1).

The BFO algorithm mimics the process of foraging of a real bacterium, whose locomotion is achieved through the movement of its flagella that helps the bacterium to swim or tumble; these operations are basic in the foraging process. If the flagella rotate in a clockwise direction it generates a tumble movement, in a noxious environment the bacteria will tumble more to find nutrients and when the flagella rotate counterclockwise the bacterium makes a swim, in a suitable environment for the bacterium the swimming movement travels greater distances [

When a bacterium finds enough nutrients and the environment has the adequate temperature, the bacterium will reproduce dividing in two and creating a replica of itself, forming a colony of bacteria. Likewise, if an attack occurs or the environment suddenly changes, a group of bacteria is dispersed to other areas of the environment or is eliminated; this event is called elimination-dispersion.

Suppose that we want to find the minimum of J(θ) where θ ∈ ℜ p (θ is a p-dimensional vector) and we ignore the nature of the gradient ∇ J ( θ ) since we do not know with an analytical description nor measurements of ∇ J ( θ ) .

BFOA implements an imitation of the main mechanisms present in an actual bacteria E, Coli colony: chemotaxis, formation of the colony or swarming, reproduction and elimination-dispersion events, with which the problem of optimization without gradient can be solved. The way to explain what a virtual bacterium represents is that a bacterium is a test solution that moves on the functional surface to locate the global optimum [

In order to implement the BFO algorithm is essential to define some terms, as an example, a chemotactic step as a tumble followed by a swim, or a tumble followed by a tumble. Then j is the index of the chemotaxis steps, k is the index of the reproduction steps, and lastly l is the index of the elimination/dispersion events.

The algorithm has certain parameters that must be initialized and on which depends the performance of the algorithm. So let be:

p: Dimension of the search space

S: Number of bacteria in the population

Nc: Chemotactic steps

Ns: Length of swim

Nr: Reproduction steps

Ne: Dispersal-elimination events

Ped: Probability that a bacterium will be eliminated or dispersed

C(i): Size of the step taken in a random direction specified by the turn.

Let then P ( j , k , l ) = θ i ( j , k , l ) where i = 1 , 2 , ⋯ , S the position of each member in the population of S bacteria in the jth step of chemotaxis, the k-th step of reproduction and the l-th elimination-dispersion event, then we can associate a cost J ( j , k , l ) to that position θ i ( j , k , l ) .

Next, each of the stages of BFOA is described:

Chemotaxis: Suppose that θ i ( j , k , l ) where i = 1 , 2 , ⋯ , S is the position of each member in the population of S bacteria in the j-th chemotactic step, the k-th step of reproduction and the l-th elimination-dispersion event and C(i) is the step taken in a random direction specified by the tumble, then the movement of artificial chemotaxis is represented by:

θ i ( j , k , l ) = θ i ( j , k , l ) + C ( i ) Δ ( i ) Δ T ( i ) Δ ( i ) (7)

where Δ is a vector that contains a random direction whose elements are between [−1,1].

Swarm. The real cells respond to chemical stimuli to form groups of cells and thus travel in the environment. Cell-to-cell signals are represented as follows:

J c c ( θ , P ( j , k , l ) ) = ∑ i = l S [ − d attractant exp ( − w attractant ∑ m = 1 p ( θ m − θ m i ) 2 ) ] + ∑ i = l S [ − h repellant exp ( − w repellant ∑ m = 1 p ( θ m − θ m i ) 2 ) ] (8)

where J c c is the value added to the objective function which is going to be minimized, w attractant is the quantification of the diffusion rate of the attractant, d attractant is a quantification of the attraction agent to be released. In the same way, a cell repels any nearby cell in the sense that it is not physically possible to have two cells in the same location. To model this is used the height of the repellent h repellant , which is the magnitude of its effect and whose value is defined as h repellant = d attractant and w repellant is the measure of the diffusion rate of the repellent. These coefficients must be chosen in an appropriate way according to our search space.

The function presented in (8) represents how at the location of each cell as you move radially away from the cell, the function decreases and then increases. This with the purpose of modeling how distant cells will tend not to be attracted, while nearby cells will tend to try to scale the nutrient gradient from cell to cell with each other and, therefore, try to form a swarm. Is important to make clear that as the cell moves, so does its function representing the release of chemicals as it moves. Due to the movements of all the cells, the function varies with time, and if many cells gather, there will be a large amount of attractant, therefore, a greater probability that other cells will move towards the group forming the swarm.

Reproduction. The general criterion for this stage is that the less healthy bacteria must die while the healthiest bacteria, which are the ones that have a lower value in the objective function, will be reproduced by dividing them in two, keeping the size of the population constant.

Elimination/dispersion. To simulate dispersion and elimination events, a group of bacteria is randomly eliminated with a small probability, and replacements are initialized randomly over the search space.

Below are presented the steps of the methodology to obtain the optimized model (

1) Acquisition of data: The raw data to create the model are obtained from a database.

2) Data filtering: The database contains some data that is not valid or that could be partial and must be filtered to avoid having a biased model.

3) Data entry: once the data has been reviewed and is valid, then it can be fed

into the model.

4) Fuzzification: It is the process of converting the input data into a linguistic value; this depends on the membership functions. In this case, generalized membership functions were used in the form of a bell. The bell function depends on three parameters a, b and c, and is given by.

f ( x ; a , b , c ) = 1 1 + | x − c a | 2 b (9)

Evaluation of the rules: the rules of the model are evaluated with respect to the fuzzy rules and the values of the membership functions.

5) Defuzzification: The defuzzification method used was the weighted average of all rule exits.

6) Model Fuzzy inference system: Once the evaluation and defuzzification steps have been completed, the model is constructed with the specific equations of ANFIS expressed in the construction section of the model.

7) Definition of the search space: A search space is defined as all feasible solutions within our problem, which is why our search space must first be located between the valid data for environmental factors. Relative humidity, temperature, direction and wind speed as well as within the feasible values of PM10, because given the nature of the problem, another way to define the search space is very difficult since the possibilities are very broad.

8) Generation of the population: the population with S bacteria must be generated, in initial random positions within a range of possible values that the actual data may have for the time when the bacteria are being generated. The objective is to perform experiments using different values of S to determine how the size of the population affects the optimized model.

We used the ANFIS models generated with data from seven stations that had enough data to build and optimize the model, we took the data from the same period of time and thus generate the search space where the bacteria will minimize the difference, as shown in

9) Initialization of parameters: The parameters of BFOA must be initialized, these parameters include initializing the counters of the loops of elimination/dispersion j, reproduction k, chemotaxis l, and the index s of the bacterium i. As well as the parameters of attraction and repellent ( w attractant , d attractant , h repellant , w repellant ), which are the ones that generate the swarm effect, in this case the values of d attractant = 0.05 , w attractant = 0.15 , h repellant = d attractant , w repellant = 10 were used.

These values were initially selected according to studies carried out by some authors [

width of the attraction signal to avoid having two bacteria at the same point. This is modeled by making h repellant = d attractant , on the other hand, it also recommends that the attraction signal be very small compared to the nutrient concentration values in our search space and therefore the repellent it must be large enough to prevent bacteria from being very close. However, experimenting with different variants of these values is part of future work to locate the optimal value of this parameter.

10) Calculation of the objective function: We must calculate the criterion that will tell us how healthy the bacterium is, in the algorithm the calculation of swarm factor J_{cc} with Equation (8) and add it to the cost J of the bacterium in the actual position is calculated.

Update of the bacteria position: An update of the bacteria is made according to the cost of the bacteria in the current position, comparing it with the cost of the next position. In order to achieve this, the tumble of the bacteria is calculated and a swim is made in that direction, the cost of the new position is calculated, if it has lower cost then it becomes the best position of the bacterium and it keeps moving in that direction. Otherwise, finishes the swim loop and if it is not found a better position it means that it is not located in an adequate environment and continues with the next bacterium.

11) The reproduction, dispersion and elimination events are carried out where the best half of the population reproduces, that is, an exact copy of the bacteria is made with lower total cost and the other half is replaced by randomly generated bacteria, as well a group with low probability they are scattered in the search space randomly.

12) If the iterations of the BFO algorithm have been completed, the optimization criteria are met. The bacteria have converged to a value for the current data; in this case, it refers to the value of the concentration of PM10 for that hour. The criterion to determine that the bacteria have converged is by means of the parameters of Nre and Ned, which are the events that influence the convergence of the algorithm [

13) Optimized model: if all the loops of the BFO algorithm and the hours of the month have been completed, also if the optimization criteria are met, it is said that an Optimized Model has been created.

To illustrate the complexity of the problem, the actual PM10 concentration data and its non-linear behavior can be observed in

For a better understanding about the complexity of the data it is necessary to analyze the variability of the data coming from the different monitoring stations. This variability has origin in the nature of the phenomenon of the behavior of the atmospheric particles, in

Once the complexity of the problem is established, we can state that the purpose of using the BFOA optimization method is the reduction of the error that exists when applying the model created with ANFIS, which is why the experiments conducted are aimed at testing the efficiency of the BFO algorithm and the different configurations of its parameters.

It is important to calculate the error that is obtained when using ANFIS to create the model, which can be observed in

Station | Minimum | Maximum | Range | Standard deviation | Mean |
---|---|---|---|---|---|

hgm | 5 | 196 | 191 | 30.74678334 | 46.325452 |

fac | 1 | 264 | 263 | 36.87219649 | 45.9357045 |

sag | 3 | 238 | 235 | 34.85784066 | 58.5607094 |

cho | 1 | 220 | 219 | 44.80596269 | 56.6182573 |

cua | 1 | 168 | 167 | 27.56101595 | 33.4013699 |

cut | 1 | 290 | 289 | 51.84183066 | 73.6511628 |

mgh | 3 | 177 | 174 | 29.99927561 | 41.217862 |

make the comparison with the optimized model. The quantification of the error between the real data and those calculated by the ANFIS model was carried out using the root mean square error (RMSE) which is a method historically used to measure the accuracy of data forecasts [

The problem of using ANFIS to generate a pollutant concentration model is that the values obtained with ANFIS present large differences compared to the real values.

In the case of the model generated with ANFIS, an RMSE = 24,147 is obtained, which is expected to be minimized with BFOA.

As for the optimization of the model, the objective is to generate a more accurate model, and this could be achieved by varying the parameters of the algorithm, such as the number of bacteria that will move in the search space, the number of steps of chemotaxis and reproduction steps. However, the variation in BFOA parameters could generate a high execution time, given the nature of BFOA, since it is an algorithm that has nested cycles, that is why the processing time of each parameter configuration was also taken into account in the experiments.

Such experiments were focused on the parameter in question and in its effect on the model, that is why the other parameters were maintained fixed and on low values to avoid unnecessarily rising the execution time and avoid interference with the parameter tests.

As a context for the runtime tests, a PC with Windows 7, 64 bits, Intel Core i3-2100 3.10 GHz processor and 12 Gb RAM was used.

The number of bacteria in population S is perhaps the first parameter to choose, since each of the bacteria represents a possible solution to the optimization problem, even though it must be taken into account that increasing the size of S can also increase the computational complexity. However, if S has a large value and by randomly distributing the initial population in the search space, there is a greater probability that some of these bacteria have been positioned near an optimal point, and that during the execution of the algorithm is also higher the probability that there is a higher density of bacteria in said optimal region.

The proposed values to experiment with the number of bacteria were S = {10, 50, 100, 200, 500}, in

When the steps of chemotaxis are increased, that is, Nc has a larger value, which

Bacteria | RMSE | Time/seconds |
---|---|---|

10 | 19.8851195 | 419.84 |

50 | 10.1072 | 2268.786 |

100 | 7.07398003 | 4185.117 |

200 | 5.32684396 | 8062.675 |

500 | 3.75769854 | 20,625.8073 |

results in a greater optimization advance when having more opportunities to reach an optimal point. Therefore, in

500 bacteria. Which seems to indicate that at low values the steps of which the bacteria moves does not have an effect on the results due that with few bacteria will take longer to find an optimum solution regardless of the steps.

The reproduction steps, Nre, give an indication of how the algorithm ignores regions with few nutrients, and focuses on regions with high nutritional content for the bacteria. This means, whether the bacteria are finding better solutions, given that the bacteria in bad regions die and bacteria that are in good regions tend to reproduce faster. Furthermore, if Nre is very small, the algorithm converges prematurely, and if on the contrary the value of the reproduction steps is very high, the computational complexity increases exponentially.

In

In terms of computational complexity, in

The value of the elimination/dispersion events, Ned, refers to how many times a group of bacteria will be eliminated and new bacteria will be generated in random positions throughout the search space. This means that a low value for Ned will not have to rely on random elimination/dispersion events to find favorable regions, whilst a higher value of Ned will allow bacteria to have access to more

regions of the search space in which they might find higher concentrations of nutrients. This parameter must be also being taken into account, that increasing the number of these events can increase computational complexity.

As a result of the tests carried out with different values for Ned, it can be seen in

Ned | S = 10 | S = 100 |
---|---|---|

2 | 21.9864716 | 8.53933847 |

4 | 14.1218685 | 2.93864778 |

10 | 7.00605537 | 1.12131925 |

16 | 4.17984705 | 1.18270545 |

C(i), where i = 1 , 2 , ⋯ , S , is defined as the size of the step taken in a random direction specified by the tumble. One could say that C(i) is the size of the step by which the BFO algorithm advances. This makes it one of the main parameters to experiment in this contribution. However, it can be intuited that if the values of step C(i) are very large, and the optimum value is within a valley with very pronounced edges, the search could jump the valley without stopping, or it may also omit minimums locals swimming through them. On the other hand, if the values of C(i) are very small, the convergence becomes slower.

In

For example, when C(i) = 30, the error is even greater than the one obtained with the non-optimized model, as expected, since the step is larger, it is easier to pass by regions with high nutrients if they are within a valley, thus avoiding local minimum.

In the previous sections is explained the effect that have the most significant in obtaining the optimized model parameters and the individual effects that each of them have in the model. Based upon these results, it may be concluded that given the appropriate configuration of the parameters are selected, the algorithm succeeds in reducing the Root Mean Square Error (RMSE) given a set of models for Airborne Particulate Matter PM10 without sacrificing computational speed.

The final configuration, proposed for the model optimized with BFO, is shown in

In

with BFOA, is better adjusted to the real data than the ANFIS method alone, in

Parameter | Value |
---|---|

S | 300 |

Nc | 10 |

Ned | 10 |

Nre | 12 |

C(i) | 4 |

S | RMSE ANFIS | RMSE BFOA |
---|---|---|

S = 10 | 25.1144672 | 4.62980105 |

S = 100 | 25.1144672 | 2.32101394 |

BFOA, but with different sizes of population S, where it can be seen that the significant value is still the size of S, since we see that even errors of large scale (

In general, it can be concluded that the use of the Bacterial Foraging Optimization Algorithm (BFOA), turned out to be useful for the optimization of the PM10 concentration model. The contribution of this work is the successful use of BFOA applied to an environmental problem. Being more specific, previous section shows how the variation of parameters modifies the result in the optimized model, being the size of the population of bacteria, S, one of the fundamental parameters, which has to be chosen appropriately to obtain an optimization without increasing the execution time. Similarly, it was shown that, the size of the step, C(i), is an essential parameter, because its variation has a significant influence on the optimized model, and the range is narrow where the appropriate value is located. Apart from the parameters of chemotactic steps, Nc, and the number of reproduction steps, Nre also contributes to the final parameter configuration to obtain the optimized model.

Also, it is important to state that even though BFOA is a stochastic method and contains some degree of randomness, for instance, in the population generation. That is why several tests must be performed to certainly know that the algorithm is reliable, along with the tests using validation data.

It should be mentioned that the implementation of the BFO algorithm was based on the author’s original version [

The authors declare no conflicts of interest regarding the publication of this paper.

Cabrera-Hernandez, M.C., Aceves-Fernandez, M.A., Ramos-Arreguin, J.M., Vargas-Soto, J.E. and Gorrostieta-Hurtado, E. (2019) Parameters Influencing the Optimization Process in Airborne Particles PM10 Using a Neuro-Fuzzy Algorithm Optimized with Bacteria Foraging (BFOA). International Journal of Intelligence Science, 9, 67-91. https://doi.org/10.4236/ijis.2019.93005