Analysis of Key Features of Non-Linear Behavior Using Recurrence Plots . Case Study : Urban Pollution at Mexico City

The use of Recurrence plots have been extensively used in various fields. In this work, Recurrence Plots (RPs) investigates the changes in the non-linear behaviour of urban air pollution using large datasets of raw data (hourly). This analysis has not been used before to extract information from large datasets for this type non-linear problem. Two different approaches have been used to tackle this problem. The first approach is to show results according to monitoring network. The second approach is to show the results by particle type. This analysis shows the feasibility of using Recurrence Analysis for pollution monitoring and control.


Introduction
The states in nature typically change in time.The importance in the investigation of these changes in complex systems helps to understand and describe such changes.A relatively new method based on non-linear data analysis has become popular to describe the changes of these systems.This method is called recurrence plot [1,2].
In this contribution, the non-linear behaviour of urban air pollution is quantified and analysed at various sites at Mexico City, using large datasets over a number of years.This is carried out to show the feasibility of analysing key features embedded in the raw datasets.

Urban Airborne Pollution
In recent times, urban air pollution has been a growing problem especially for urban communities.Size, shape and chemical properties govern the lifetime of particles in the atmosphere and the site of deposition within the respiratory tract.Also, air pollution has been held responsible for various health disorders, especially respiratory complications resulting in an increase in the number of asthmatic cases and hospital admissions in some parts of the world and has been widely documented [3][4][5].
Most major pollutants can alter pulmonary function in addition to other health effects when the exposure concentrations are high.This is especially severe in vulnerable sectors of the population such as children asthmatic and the elderly and has been vastly documented [6][7][8][9].
In this work, five particles were chosen due to the site's availability and toxicity: Ozone (O 3 ), Carbon Monoxide (CO), Nitrogen Dioxide (NO 2 ), Sulphur Dioxide (SO 2 ) and Particulate Matter of less than 10 micrometers (PM10).The datasets are separated according to month of the year and type of particle.There is one data for each hour, for each particle for all five sites, making it difficult to extract information from datasets using common methods.

Ozone
Ozone is a natural atmosphere component that is found on low concentrations and is crucial for life.Air pollution caused by high concentration of ozone is a common problem in large cities throughout the world [10].Mexico City is among the ones suffering from this problem.It is a well-known fact that individuals exposed for a long period of time to high concentration of ozone may experience serious health problems [11,12].Epidemiology studies have found associations between daily ozone levels and the hospital admission [13].This pollutant is associated with respiratory symptoms specially coughing.This is aggravated in patients with asthma [14].
In Mexico City, there is a decreasing of ozone especially from 2009.However, there is still a risk situation for overexposure mainly in the southwest region [15], according to the official Mexican air quality standards [16].

Carbon Monoxide
Carbon monoxide (CO) is a tasteless, odorless gaseous pollutant ubiquitous in the outdoor atmosphere that is generated by combustion [17].CO is produced by incomplete combustion of hydrocarbons.Its main source is vehicle exhaust emissions; secondary important sources are industry, heating, and fires.The concentrations of CO as well as their fluctuations are related to a large extent to the circulation of cars [18].
Adverse health effects of CO exposure include death from asphyxiation at high exposure levels and, at lower levels, impaired neuropsychological performance and risk for myocardial ischemia and rhythm disturbances in persons with cardiovascular disease.The most definitive evidence on CO comes largely from controlled exposure studies, involving CO inhalation at concentrations to mimic exposures previously typical of urban environments [19,20].Also, Carbon monoxide has been held responsible for many hospital admissions due to carbon monoxide poisoning.Only in the US, around 40,000 people are admitted in hospitals for this cause in one year [21].
In Mexico, the Official Norm NOM-021-SSA1-1993 sets the maximum level for carbon monoxide on 11.0 parts per million (ppm) for an average of 8 hours, which cannot be exceeded more than once a year.Comparatively, in the United States the federal standard is 9 ppm for 8 hours and 35 ppm for 1 hour average.

Nitrogen Dioxide
Nitrogen Dioxide (NO 2 ) is a particularly important compound, not only for its health effects, but also because absorbs visible light and contributes to the visibility decrease.It also plays a critical role in production of ozone due to the photolysis of NO 2 is the initial step in the photochemical reaction of the ozone [10].
In nature, there is a nitrogen dioxide concentration of 10 to 50 parts per billion (ppb).However, the high levels of nitrogen dioxide are due to industrial processes and fossil sources.Furthermore, motor vehicles substantially contribute to urban levels of nitrogen oxides through their engine combustion processes [22].According to several authors the monitoring of NO 2 is critically im-portant, in order to assess the potential effect of NO 2 on human health and ecosystems, as well as developing strategies for the effective control of NO 2 pollution [23][24][25].
In Mexico, the official norm: NOM-023-SSA1-1993 [16] establishes a maximum allowed concentration of 0.21 ppm hourly mean, which cannot be exceeded more than once a year.Comparatively, in the United States of America, the federal standard establishes a value of 0.053 ppm annual mean.The WHO recommends a value of 40 µg/m 3 (0.021 ppm) for annual mean and 200 µg/m 3 (0.106 ppm) for hourly mean.

Sulphur Dioxide
Sulphur Dioxide gases contribute to the deterioration of air quality.Several epidemiological studies have demonstrated a direct association between atmospheric inhalable Sulphur dioxide and respiratory diseases, pulmonary damage and mortality among population [8].In urban environments, Sulphur dioxide is generated by many sources.One of them is the burning of solid compounds or petroleum-derived products.
In the past three decades in Europe, and more recently in the United States, there have been substantial reducetions in SO 2 emissions [26,27].
The World Health Organization recommends a concentration of between 100 to 150 µg/m 3 24 hours mean and 40 to 60 µg/m 3 the annual mean.The official Mexican Norm: NOM-022-SSA1-1993 establishes a limit of 341 µg/m 3 24-hour mean once a year and 79 µg/m 3 annual mean to protect vulnerable population.

PM 10
The airborne particulate matter (PM) is a mixture of small particles and liquid droplets suspended in the atmosphere, which contributes significantly to the urban air quality such as acid rain and visibility degradation [28].
In airborne pollution particle could be any olid or liquid materials with a diameter between 0.002 and 500 micrometers (µm).Airborne particulates of 10 μm diameter and less are of concern from the perspective of air pollution.A variety of national and worldwide standards, directives and guidelines exist to define acceptable particulate levels in the air.
These types of particles are classified according to their effect on human health and their Physical characteristics.

Mexico's Sites
Mexico City is geographically located in the Valley of Mexico.This valley, also known as the Valley of the Damned is a large valley in the high plateaus at the center of Mexico.It has an altitude of 2240 meters (7349 feet).The Federal District of Mexico City is situated in central-south Mexico and it is surrounded by the state of Mexico on the west, north and east, and by the state of Morelos on the south.The city covers an area of around 1485 km 2 (571 sq mi) with the elevation of 2240 m (7349 ft).

Recurrence Plots
The recurrence plot (RP) exhibits characteristic patterns for typical dynamical behavior.A collection of single recurrence points, homogeneously and irregularly distributed over the whole plot, reveals a mainly stochastic process [29].
Recurrence Plot is a graphical tool introduced by Eckmann (1987) in order to extract qualitative characteristics of a time series.The recurrence of a state i at a different time j is pictured within a two-dimensional squared matrix with black and white dots, where the black dots represent a recurrence and both axes represent time [30,31].
Such RP can be mathematically expressed as: where, N is the number of considered states Heaviside function [32].Since by definition, the RP has a black main diagonal line called line of identity (LOI).In this context, the Heaviside function is a recurrence of a state  that fall into an m-dimensional neighborhood) [33].
Using the time series of a single observable variable (particles, in this case), it is possible to reconstruct a phase space trajectory.Starting from the scalar time se- is generated [34].The set of all embedded vectors where m is the embedding dimension and τ is the time delay.Each unknown point of the phase space at time I is reconstructed by the delayed vector m    y i in an m-dimensional space called the reconstructed phase space.
Determining the embedding parameters must be the first step for analysing nonlinear systems [29,[34][35][36][37].For this reason, a search for the best dimension and time delay must be made first.In this contribution, the best dimension value is calculated using the algorithm of false nearest neighbors (FNN) as shown on [32,38].
Also, when calculating an RP a norm must be chosen [39].The most widely used norms are the L1, L2 (Euclidean norm) and L∞ [30].In this work, the Euclidean norm was used.Figure 2 shows the recurrence plots of a random signal, a sine wave and two RPs chosen randomly for airborne particle concentration.
Although it is possible to identify each plot from Figures 2(c) and (d), some experience is needed to interpret the RPs [40].For this reason, recurrence quantification analysis (RQA) offers a window to characterize such RP structures.
The main idea of this project is to reconstruct the (unknown) system dynamics in the phase space by using time-delay embedding, and then computing the distances between all pairs of embedded vectors, generating a symmetric two-dimensional square matrix for each dataset as shown on Figures 1(c) and (d), applying RQA to each dataset.

Recurrence Quantification Analysis (RQA) for RPs
Zbilut [40] have developed some of the methods used today for Quantitative Analysis of the recurrence plots.It has been shown that these measures are able to capture dynamical transitions in complex systems [38], defining measures of complexity using certain characteristics of the recurrence plots [41,42].
In general, the characteristics measured in a RP are: recurrence rate, determinism, ratio, entropy and trend.In this contribution, an extension of these characteristics was also considered such as Laminarity and Trapping time.

Recurrence Rate
The recurrence rate is a measure of recurrences, or density of recurrence points in the RP.This rate gives the mean probability of recurrences in the system [41,43].
The recurrence rate is given by: in the case of time series, and; in the case of spatial data [44].
The recurrence rate represents the fraction of recurrent points with respect to the total number of possible recurrences.It is a density measure of the RP.

Determinism
Determinism is a measure for predictability of the system [37].The determinism could also be explained as the percentage of recurrent points forming line segments which parallel the Line of Identity (LOI).The determinism characteristic is given by Gao [36]: where P(l) denotes the probability of finding a diagonal line of length l in the RP.This measure quantifies the predictability of a system [38].The measure of determinism (DET) ranges from 0 to 1. Numbers near zero indicate randomness while those approaching one indicate the presence of a strong signal component [32,35].The average diagonal line length L mean is defined as: This characterizes the average time that two segments of a trajectory stay in the vicinity of each other, and is related to the mean predictability time [38].
The choice of l min can also be used in order to exclude short temporal scales that are not important [39].

Ratio
The Ratio variable is defined as the quotient of determinism (DET) divided by the recurrence (REC).It is useful to detect transitions between states: this ratio increases during transitions but settles down when a new quasi-steady state is achieved [39].

Entropy
The measure characteristic entropy refers to the Shannon entropy of the frequency distribution of the diagonal line lengths [45].According to several authors, the basic idea is that information (Shannon) entropy of the random processes is abundantly supplied with the qualitative and quantitative data on the object under research [39,42,43,45].The entropy of a system is given by:

Trend
The trend is a linear regression coefficient over the re-currence point density of the diagonals parallel to the Line of Identity (LOI).The trend measurement is given by:

Laminarity
Laminarity may be defined as the amount of recurrence points which form vertical lines [34].Thus, laminarity (LAM) can be quantified as expressed on Equation ( 8) where P(v) is the frequency distribution of the lengths v of the vertical lines, which have at least a length of v min.It is noteworthy that Laminarity is evidence of chaotic transitions and is related with the amount of laminar phases in the system (intermittency) [34].

Trapping Time
Trapping Time shows the average length of the vertical lines and is given by Equation ( 9): where v is the length of the vertical lines, v min is the shortest length that is considered a line segment and P(v) is the distribution of the corresponding lengths.TT shows the time that the systems have been trapped in the same state [39].
tiles reaching up to 5.This could also be due to the type of particle rather than the site.This is shown on Figure 6.
extract information from the Recurrence Plots by sites.As explained in Section 2.2, the monitoring networks were separated as: Northwest, Northeast, Downtown, Southwest and Southeast.This approach does not take into account the type of particle, but rather the location of the monitoring network.Figure 3 shows the recurrence rate for all particles.
Figure 7 shows the trapping time for particle concentration separated by monitoring network.In this case the average time that this non-linear system stays in the same state seems fairly constant with a low median between 10 and 13.It is also worth noting that for most sites there are outliers which cannot be explained in detailed using this approach.
In Figure 3 is shown the recurrence Rate for all Monitoring networks.In this figure, it is worth notice that the median recurrence rate lies from 10 to 13, with the lowest recurrence rate being for Mexico Northwest.The other sites seem to have a fairly constant spread of the percentiles.
Figure 8 shows the laminarity for particle concentration by monitoring network.Since the laminarity is the measurement of chaotic transitions, it can be inferred that regardless of the site the changes in laminar phases in the system seem high.Figure 4 shows the determinism for years 2005-2010 by monitoring network shows a fairly constant mean for all sites.However, there are outliers for all sites showing an increase of over 80 in many cases.This exceptionally higher determinism may be explained in greater detail in the analysis by particle, which could give an insight of the reason this happened.
The last measure for this approach was the trend.Since the trend represents the measure of the positioning of recurrent points away from the central diagonal, that is the paling of the RP towards its edges [34].A ''flat" diagram indicates stationarity, whereas drift in the signal will result in the overall increase or reduction of distances as the signal is moved away from the main diagonal.In this respect, it could be noticed that most of the sites have a median between 11 and 14.This is shown on Figure 9.
The ratio for particle concentration (Figure 5) seems fairly stable for all sites ranging from 0 to 2 as a mean and the percentiles increasing up to 4 for some cases.The only exception for this drift is the monitoring station of the Northwest with ratios up to 10 and an outlier of about 38.
Furthermore, for entropy the frequency distribution of the data is also stable.Their median oscillates between 2 and 3 for all monitoring networks showing the percen-

Results by Particle Type
For this approach, the recurrence quantification analysis     is carried out by particle type.In Figure 10 the recurrence rate by particle type is shown.In this figure is shown the recurrence Rate for all five particles (CO, O 3 , NO 2 , SO 2 and PM 10 ).It is worth notice that the median recurrence rate for all particles varies.For CO, the median lies between 13 and 18.Both Ozone and particulate matter shows a low recurrence rate below 6 for all years, regardless of the monitoring sites.Nitrogen dioxide shows a slightly higher recurrence with a median between 10 and 12.However, Sulphur dioxide shows a median above 30 for some years.This explains Figure 3, where the whiskers of the boxplots show a much higher recurrence in some cases, these cases being for sulphur dioxide.This higher recurrence rate may be due to the low variances in values of the datasets for all years, making it easier for RQA to determine recurrence.
Furthermore, the determinism for SO 2 is also much higher than for other particles, having a median between 70 and 90 as shown on Figure 11.Although it seems lower due to the scaling of the boxplots, the median shows otherwise, the spread in the 25th to 75th percenttiles and the length of the whiskers may be due to high variances in the datasets for that type of particle.
Furthermore, it is worth notice for entropy (Figure 12) that the frequency distribution of the data is slightly higher for sulphur dioxide as well.The other particles seem to have steady entropy whose median oscillates between 1 and 3.
Figure 13 shows the trapping time separated by particle type.For this figure it can be seen that Sulphur Dioxide shows also a much higher trapping time showing a median of 40 for some years (e.g.2008 and 2009) in comparison with the other particles between 10 and 13.This explains the outliers seen on Figure 7, that could not be explained using the monitoring networks' approach.
Figure 14 shows the laminarity for particle concentration by particle type.In this figure, is shown that for Carbon Monoxide and PM 10 the chaotic transitions are high, specially for sulphur dioxide.This gives another insight of the chaotic transitions, since Figure 8 shows that the laminarity is high for all sites, in Figure 14 can be explained that only some particles gives this high changes in laminar phases.

Conclusions and Future Work
Numerous experiments have been carried out with different particles and through different years.Using Recurrence Quantification Analysis it could be shown that information could be extracted from large datasets of dissimilar airborne particles during a considering time lap (six years, in this case) for 5 monitoring networks.Trends could be identified using these tools and preliminary conclusions suggest that important information such as density distribution, drifts, among others could be drawn.Also, using more than one approach, some hidden features could be identified showing the feasibility of this approach.
For future work, it could be useful to use a combination o RQA with prediction algorithms such as Support Vector Machines to carry out prognosis of the airborne particle data.Another useful approach that could be carried out is the use of cross recurrence plot (CRP), making a comparison between two recurrence plots to determine trends.

Figure 1 .
Figure 1.Map for the monitoring sites at Mexico City.

Figure 2 .
Figure 2. Recurrence plots using (a) A random signal; (b) A sine wave; (c) Particle concentration of carbon monoxide at Mexico Downtown for 2009 and (d) Particle concentration of PM 10 over 2010 at Mexico North East (daily mean) showing the line of identity (diagonal line).
Analysis of KeyFeatures of Non-Linear Behavior Using Recurrence Plots.Case Study: Urban Pollution at Mexico City 1148