^{1}

^{*}

^{1}

^{*}

^{2}

^{*}

In this work, some non-homogeneous Poisson models are considered to study the behaviour of ozone in the city of Puebla, Mexico. Several functions are used as the rate function for the non-homogeneous Poisson process. In addition to their dependence on time, these rate functions also depend on some parameters that need to be estimated. In order to estimate them, a Bayesian approach will be taken. The expressions for the distributions of the parameters involved in the models are ve r y complex. Therefore, Markov chain Monte Carlo algorithms are used to estimate them. The methodology is applied to the ozone data from the city of Puebla, Mexico.

It is a well known fact that air pollution may cause serious health problems to a susceptible population present in an environment affected by it. For instance, in [

Given the oxidant nature of ozone, high levels of this pollutant may cause damage to the upper respiratory system. Therefore, a person already with some health problems may have its condition worsened. Thus, it is important to monitor the level of ozone and with that avoid population exposure to that pollutant. Hence, environmental authorities in several countries have implemented ozone standards and have used monitoring stations to keep track of ozone levels and see when the set standard is not obeyed. Using a continuous monitoring system, measures may be implemented in order to decrease ozone concentration and/or to alert the population of high levels.

During the past 30 years environmental authorities in Mexico and many of its cities have also implemented measures to monitor ozone concentration as well as alert the population of high levels of this and other pollutants. One of those measures is the construction of monitoring networks. Among the cities with some type of monitoring network is Puebla. Puebla is the capital of the state with the same name and has more than 5 million inhabitants with a car fleet of more than one million units. Puebla’s monitoring network was set in the year 2000 and has four stations, namely, Tecnológico (UTP), Ninfas (Nin), Serdán (Ser), and Agua Santa (AS). In addition to ozone other pollutants are also measured.

The interest here is in analysing the behaviour of the ozone data from the monitoring network of the city of Puebla in terms of estimating the probability of having an ozone environmental threshold exceeded a certain number of times in a time interval of interest. The study is performed using non-homogeneous Poisson models allowing the presence of change-points. Different rate functions are taken into account. They may depend on some parameters that need to be estimated. Estimation of the parameters involved is performed under the Bayesian point of view via Markov chain Monte Carlo algorithms.

This type of question has been posed and studied before, for instance, [

This paper is organised as follows. In Section 2, the several versions of the model considered here are described. Section 3 gives the Bayesian formulation of the models. In Section 4, an application to the case of ozone data from the city of Puebla, Mexico, is given. Finally, in Section 5, some comments are made.

In order to use non-homogeneous Poisson models, let

Let

Several forms of rate functions are considered. We take some of the ones used in [

where

When the presence of change-points is allowed, we have the following. Assume that there are

with

where we take

The vectors of parameters to be estimated are

Estimation of the parameters will be performed using Bayesian inference ( [

Therefore, following in that direction, we use the existing relationship between prior and posterior distributions and the likelihood function of the model. The posterior distribution of a vector of parameters

Since we are considering a non-homogeneous Poisson model, the general likelihood function when no change-points are present is given by ( [

where

where

The particular forms of the likelihood function, i.e., when we substitute the expression for the rate and mean functions in (3) and (4), are given in [

Since many versions of the non-homogeneous Poisson model are considered, we need some criteria to select the model that best explains the behaviour of the data. Therefore, in addition to the graphical criterion, we also consider the deviance infor- mation criterion (DIC). The deviance is defined by

Remark. Note that when we use the DIC we are contrasting the different models. The deviance takes into account the information provided by the likelihood function as well as the difference between the expected value of the deviance and its value when evaluated at the posterior mean of the vector of parameters. Therefore, we test one model against the other based on the information provided by the likelihood function of the model and taking into account how much the expected deviance differs from the one obtained using the estimated parameters.

In this section we apply the models described in previous sections to the ozone data obtained from the monitoring network of the city of Puebla, Mexico. The measure- ments correspond to the daily maximum ozone levels obtained from 01 January 2001 to 31 December 2009, giving a total of

After obtaining the daily maximum measurements for stations UTP, Nin, Ser, and AS, there were 10%, 36%, 35%, and 21%, respectively, of data missing. Therefore, it was necessary to estimate the missing values. The methodology used was the following. If on the ith day of the jth year the measurement was missing, then if on that same day on the

After completing the datasets and obtaining the daily maximum ozone measure- ments for the city, we have that during the observational period the mean measurements (in parts per million-ppm) was 0.065 with a standard deviation of 0.026. The environ- mental threshold considered here was 0.11 ppm (which was the Mexican environmental threshold during the time frame in which the measurements were taken [

We start by considering the case where no change-points are allowed and move to assume their presence as necessary. In all cases we assume prior independence of the parameters. Also, in all cases the estimation of the parameters was performed with a sample of size 10,000 using a sample gap of 10 collected after a burn-in period of 20,000.

Throughout this work we use U(a, b) to indicate the uniform distribution on the interval (a, b), and we use Gamma(c, d) to indicate the gamma distribution with mean c/d and variance c/d^{2}.

We will report the estimated parameters only for the selected model. The inclusion of change-points will be performed in the cases where the graphical fit requires so.

When no change-points are considered, the likelihood function is given by (3). The prior distribution of the parameters in the case of no change-points are given in

Remark. We would like to point out that, in a preliminary run of the algorithm, we have considered uniform prior distributions for all parameters, By doing that, we let the weight of information about their behaviour be dictated by the likelihood function of the model. After this preliminary run and after analysing the shape of the posterior distributions of the parameters, more informative prior distributions were assigned to them. The shapes suggested that either uniform or gamma distributions could be taken as prior distributions. We have also taken into account the fact that the parameters are all greater than zero. The hyperparameters used in the preliminary run were such that we had distributions with large variance. In the case of the final run, the hyperparameters were obtained based on the results of the preliminary run.

The values of DIC were 1249, 1288, and 1250 in the cases of W, MO, and WG rate functions, respectively. When considering the cases of the GO and GGO models, the value of the DIC was 1248. Therefore, the models with smallest DIC are models GO and GGO followed closely by the W and WG. The only model with a more distinctive value of DIC is the MO. Note that the DIC values for models W, GO, GGO, and WG differ by less than 10. Hence, from [

In

It is possible to see that even though the models using WG, W, GO, and GGO rate functions are the ones approaching better the observed mean, there is the need to include at least one change-point in the models. Hence, we start by assuming first the presence of just one change-point. By the information provided by the DIC, we have

W | MO | GO | GGO | WG | |
---|---|---|---|---|---|

a | U(0, 1.5) | Gamma(66.01, 0.507) | U(150, 250) | U(100, 500) | U(0.3, 0.9) |

b | Gamma(3.86, 0.76) | Gamma(68.77, 2.55) | U(0, 0.003) | U(1E-05, 0.001) | Gamma(3, 1.1) |

g | -- | -- | -- | U(0.5, 1) | -- |

p | -- | -- | -- | -- | U(0, 1) |

that the MO model is the only one that is significantly different from the others. Additionally, we may say that there is no significant difference among the remaining models. Therefore, due to its simplicity, we choose to continue with the Weibull rate function. Thus, we assume the presence of a change-point only in the W and MO models.

The vector of parameters in the present case is

The values of DIC for the W and MO models are 1226 and 1249, respectively. Hence, the chosen model would be the one assuming a Weibull rate function.

W | MO | |||||
---|---|---|---|---|---|---|

U(0, 3) | U(10, 70) | U(150, 300) | U(0, 50,000) | U(10, 10,000) | U(50, 150) | |

U(0, 2) | Gamma(6.56, 0.18) | -- | U(0, 2000) | U(50, 200) | -- |

allowed and rate functions W and MO are used.

Looking at

If we consider the presence of two change-points, then the vector of parameters to be estimated is

The values of DIC for the W and MO models are 1220 and 1259, respectively. There- fore, based on the DIC values the chosen model would be the one assuming a Weibull rate function.

W | MO | |||||
---|---|---|---|---|---|---|

U(0, 2) | U(0, 35) | U(50, 90) | U(500, 900) | U(100, 200) | U(50, 80) | |

U(1, 2.5) | U(10, 50) | U(200, 300) | U(150, 200) | U(30, 80) | U(150, 300) | |

U(0.6, 1) | U(0, 15) | -- | U(200, 250) | U(50, 100) | -- |

where two change-points are present and models W and MO are considered.

Looking at

In the models with three change-points, the vector of parameters to be estimated is

W | MO | |||||
---|---|---|---|---|---|---|

U(0, 2) | U(0, 35) | U(50, 90) | U(0, 50,000) | U(10, 1500) | U(70, 130) | |

U(1, 2.5) | U(10, 50) | U(200, 300) | U(0, 2000) | U(50, 200) | U(200, 300) | |

U(0.2, 0.6) | U(0, 0.1) | U(800, 1100) | Gamma(1.88, 0.016) | U(10, 55) | U(800, 880) | |

U(0.9, 1.2) | U(10, 25) | -- | U(0, 500) | Gamma(724, 8) | -- |

the prior distributions of all parameters in the W and MO models.

The values of DIC for the W and MO models are 1225 and 1235, respectively. Therefore, we see that the chosen model would be the one assuming a Weibull rate function. However, the difference between the DIC obtained when using the W rate function differ from the one using the MO by a value of 10. Hence, ( [

Note that even though the MO model presents a good fit in the beginning of the observational period, we have that at the end of this period the W model fits better. We also have that the model assuming a Weibull rate function gives the best overall fit. However, it is clear that the inclusion of more change-points could be necessary.

When four change-points are allowed, the vector of parameters to be estimated is

The values of DIC for the W and MO models are 1199 and 1200, respectively. Hence, we may see that of the models with four change-points the one with smallest DIC is the

one assuming a Weibull rate function. However, we may also see that the model using the Musa-Okumoto rate function is not significantly different from the one using the Weibull rate function ( [

Looking at

W | MO | |||||
---|---|---|---|---|---|---|

U(0, 3) | U(0, 50) | U(50, 90) | U(0, 55,000) | U(100, 5000) | U(70, 130) | |

U(1, 2.5) | U(10, 50) | U(200, 300) | U(0, 2000) | U(50, 350) | U(200, 300) | |

U(0.2, 0.6) | U(0, 0.1) | U(900, 1000) | U(0, 2000) | U(10, 200) | U(800, 860) | |

U(0.5, 1) | U(10, 50) | U(1900, 2000) | U(400, 2500) | Gamma(724, 8) | U(1500, 2100) | |

U(0.5, 1.5) | U(5, 30) | -- | Gamma(1.506, 0.001) | U(0, 300) | -- |

If we consider five change-points, then the vector of parameters to be estimated is

The values of DIC for the W and MO models are 1139 and 1148, respectively. Based on these values we may see that the smallest value corresponds to the W model. However, since the DIC for the MO model differs by less than 10, there is not a significant evidence that the W models is better than the MO.

W | MO | |||||
---|---|---|---|---|---|---|

U(0, 4) | U(20, 80) | U(60, 70) | U(0, 60,000) | U(100, 5000) | U(70, 130) | |

U(1, 3) | U(20, 60) | U(210, 250) | U(0, 20,000) | U(50, 450) | U(200, 300) | |

U(0.4, 0.6) | U(0, 0.5) | U(980, 1000) | U(0, 2000) | U(10, 200) | U(800, 860) | |

U(0.5, 1) | U(0, 100) | U(1900, 1920) | U(1000, 3000) | Gamma(724, 8) | U(1500, 2100) | |

U(0.8, 1.2) | U(10, 100) | U(2690, 2750) | Gamma(1.506, 0.001) | U(0, 300) | U(2600, 2800) | |

Gamma(93.5, 161.1) | U(0, 200) | -- | U(90, 100) | U(0, 30) | -- |

of the estimated and observed accumulated means in the case of presence of five change-points and models W and MO.

Looking at

In this work we have, initially, considered several rate functions for the non-homo- geneous Poisson process counting the number of times that the ozone environmental threshold 0.11 ppm was exceeded in a time interval of interest. Some rate functions considered previously in the literature ( [

Models without the presence of change-points and allowing their presence are taken into account. The model that best explains the behaviour of the data is selected using a graphical criterion and the DIC. When using the DIC, the smallest value corresponds to the model using the Weibull rate function and allowing the presence of five change- points. However, when we look at the DIC in the case of the model using the Musa- Okumoto rate function with the presence of five change-points, we may notice that the value differ of that of the W model by less than 10. Therefore, there is not enough evidence to conclude that both models are significantly different ( [

Remark. Note that a preliminary analysis of the accumulated observed mean (i.e., the mean obtained directly from the data) shows the possible existence of change-points. In order to either confirm or discard their presence, several versions of the model were considered. The different results were compared using the DIC and the graphical fit. Hence, we have tested the different hypotheses about the model that would better explain the behaviour of the data. Also, note that the change-points were estimated as well as the parameters of the rate functions. We have assigned a prior distribution to them and using a sample drew from the respective posterior distribution we have estimated those change-points as well their credible intervals. Note that plots of the estimated means were drawn using the estimated parameters which were product of a sample from their posterior distributions. We would like to call attention to the fact that in addition to the plots of the accumulated observed and estimated means, a measure of the discrepancy between observed and estimated means were also used, namely the DIC.

If we use the graphical criterion to select the model that best explain the behaviour of the data, then when looking at Figures 2-7, we may see that the fit of the estimated accumulated mean to the observed one, improves as we let the number of change- points to increase. We may also see that the best graphical fit is provided by the model assuming the Musa-Okumoto rate function and allowing the presence of five change- points.

Therefore, taking into account the two model selection criteria considered here, there is an indication that the model that should be chosen to explain the behaviour of the ozone data from the city of Puebla is the one assuming the Musa-Okumoto rate function allowing the presence of five change-points. This result differs from the ones obtained in the case of Mexico City, where the Weibull model is the one that provides the best fit in almost all cases ( [

We would also call attention to the fact that when we consider the Weibull rate function it presents a decreasing behaviour between the second and fourth change- points, and after the fifth (values of

Mean | SD | 95% Credible Interval | ||||
---|---|---|---|---|---|---|

W | MO | W | MO | W | MO | |

2.889 | 39,280 | 1.12 | 13,930 | (0.932, 4.859) | (9571, 59,100) | |

1.956 | 1410 | 0.324 | 639.7 | (1.29, 2.458) | (206.3, 2690) | |

0.492 | 1458 | 0.032 | 401.2 | (0.41, 0.54) | (528, 1978) | |

0.795 | 2423 | 0.086 | 413.5 | (0.625, 0.955) | (1462, 2977) | |

1.116 | 1391 | 0.092 | 790.5 | (0.918, 1.25) | (163.2, 3071) | |

0.555 | 95 | 0.052 | 2.96 | (0.456, 0.66) | (90.24, 99.75) | |

43.86 | 2412 | 12.84 | 1151 | (19.5, 73.24) | (466.7, 4689) | |

40.16 | 304.2 | 12.43 | 103.2 | (14.38, 58.72) | (85.51, 444.9) | |

0.119 | 304.2 | 0.053 | 103.2 | (0.015, 0.197) | (53.24, 175.9) | |

31.31 | 89.95 | 11.58 | 3.302 | (11.16, 49.24) | (83.56, 96.48) | |

25.04 | 288.6 | 9.634 | 60.1 | (7.146, 39.33) | (180.6, 392.5) | |

19.5 | 10.25 | 6.731 | 6.515 | (6.429, 29.56) | (1.38, 25.64) | |

65 | 104 | 11.34 | 14.47 | (50.44, 88.5) | (77.03, 126.1) | |

243 | 249 | 11.16 | 12.92 | (218.6, 265.7) | (229.4, 278.4) | |

967 | 842 | 27.08 | 19.26 | (903.6, 999.4) | (802.7, 859.8) | |

1911 | 1897 | 7.036 | 28.13 | (1901, 1933) | (1819, 1942) | |

2716 | 2719 | 12.92 | 14.97 | (2704, 2751) | (2704, 2760) |

Since the Musa-Okumoto model with five-change-points is the one whose graphical fit is the best and since by the DIC there is not a significant difference between the W and MO models with five change-points, from now on we consider only the latter.

In the same manner as in previous works ( [

We thank an anonymous reviewer for the comments and questions that helped us to

improve the presentation of this work. ERR and JACJ received partial financial support from the projects PAPIIT-IN102713 and IN102416 of the Dirección General de Apoyo al Personal Académico of the Universidad Nacional Autónoma de México (UNAM), Mexico. JACJ thanks the Instituto de Matemáticas-UNAM, Mexico, for the hospitality and support received. This work is a consequence of JACJ’s Master’s Dissertation.

Cruz-Juárez, J.A., Reyes-Cervantes, H. and Rodrigues, E.R. (2016) Analysis of Ozone Behaviour in the City of Puebla-Mexico Using Non-Homo- geneous Poisson Models with Multiple Change-Points. Journal of Environmental Protection, 7, 1886-1903. http://dx.doi.org/10.4236/jep.2016.712149