Water Quality Assessment of a Tropical Mexican Lake Using Multivariate Statistical Techniques

Water quality of Mexican tropical lake Chapala was assessed through multivariate statistical techniques, cluster analysis (CA) and principal component analysis (PCA) at ten different monitoring sites for ten physicochemical variables and six metals. This study evaluated and interpreted complex water quality data sets and apportioned of pollution sources to get better information about water quality. From descriptive statistics results, the highest concentrations of metals occurred during the dry season, and this trend was explained by the fact that an unusual rainy event occurred during the month of February 2009 and brought metals into the lake by runoffs from nearby mountains. According to international criteria for water consumption by aquatic organisms [USEPA], only Zn concentration values were below these criteria whereas the values of Ni, Pb, Cd and Fe were above the corresponding values set in these criteria (Ni: 52 μg∙L−1, Pb: 2.5 μg∙L−1, Cd: 0.25 μg∙L−1, and Fe: 1000 μg∙L−1). The correlations were observed by PCA, which were used to classify the samples by CA, based on the PCA scores. Seven significant cluster groups of sampling locations—(sites 4 and 5), (sites 3 and 9), (site 7), (site 10), (sites 2 and 6), (site 8) and (site 1)— Corresponding author. J. Badillo-Camacho et al. 216 were detected on the basis of similarity of their water quality. The results revealed that the stress exerted on the lake caused by waste sources follows the order: domestic > agricultural > industrial.


Introduction
The pollution of water sources such as lakes and rivers is a problem of global concern in large part because many of these water bodies are used as water supply sources for human consumption [1].This is because the uncontrolled human settlements have an impact on the water quality of these water bodies due to the lack of infrastructure (e.g.water treatment plants) that permits the appropriate treatment of domestic, municipal or industrial wastes (especially in the underdeveloped countries).Within this context, a main indicator controlling health and state of disease in humans and animals is the water quality.In general, natural processes and anthropogenic inputs determine water quality of these surface water bodies [2]- [4].In the former, runoffs inputs are an example of these processes and come from seasonal events that are affected by the climate (weathering and soil erosion) [5]; the latter ones include discharges from municipal, industrial and agricultural activities and constitute a permanent source of pollutants, especially if the discharges are not treated appropriately [6] [7].
Several research works have been focused on the anthropogenic contamination of ecosystems [8] [9].However, monitoring programs that provide representative and reliable information of the data are not easily implemented due to both spatial and temporal variations in water quality [10].This is due in part to the large amount of data generated from the measured physicochemical variables, which in turn makes difficult to interpret and to draw meaningful conclusions.The use multivariate statistical techniques, such as principal component analysis (PCA), factor analysis (FA), and cluster analysis (CA) provide a valuable tool in the interpretation of complex data matrices of the water quality and the ecological status of a region (or water body) of interest.Consequently, the use of these techniques can help in the reliable management of water resources and provide a fast solution when facing pollution problems, especially those of water bodies used for human consumption [11] [12].
Shallow tropical Lake Chapala is Mexico's largest freshwater body, and is the most important lake in this country, in part because itis used a water supply resource by the city of Guadalajara.Pollution of this water body has prompted concern since large quantities of wastes from municipal, industrial, and agricultural activities from the entire Lerma-Chapala basin flow largely untreated into the lake through the Lerma River (the main tributary to the lake, see Figure 1) [13] [14].Therefore, the main objective of the present study was to assess the water quality of this lake through concentrations of the macronutrients (nitrate-N, nitrite-N, total nitrogen, and total phosphorus), other physicochemical variables (pH, alkalinity, total suspended solids, conductivity and chemical oxygen demand), and heavy metals (Fe, Cd, Ni, Zn, Cr, and Pb).Multivariate techniques such as PCA, CA and FA were applied to the large data sets obtained during a two-period sampling program (one during the dry season and other at the end of the rainy season of 2009) to find similarities and dissimilarities among the different sampling sites, and to ascertain the influence of the pollution sources on the water quality variables.

Study Zone
Lake Chapala is part of the Lerma-Chapala watershed and is located in the western central side of Mexico at about 1515 m above sea level; it covers a surface area of ~80 × 15 km 2 , and its average depth varies between 4 and 7 m (Figure 1).The lake provides ~7.5 m 3 /s of the water consumed by the city of Guadalajara (pop.~4 million, see Figure 1) [15], and has a unique ecosystem.Due to wind action and low depth, lake stratification is reduced, consequently the lake is oxidized at all depths [16], as reflected by the dissolved oxygen values (6.8 -7.0 mg•L −1 ) which represent a saturation level > 90%.This phenomenon is attributed to the almost complete water mix and to the low depth of the lake [16] [17].The human activities have changed significantly the original regime of the lake over the last 50 years.These changes have been caused by inputs of the Lerma River due to the industrial development of cities located along this river.The lake acts as a sink for dissolved substances (e.g.heavy metals) and suspended sediments transported from up-stream agricultural, urban and industrial areas through the Lerma River and the local watershed [18].
In this study, all sampling sites were selected to cover a wide range of variables of key sites, which reasonably represent the water quality of the lake, accounting for the tributary and inputs from wastewater drains that have impact on the water quality.Ten sites were selected during the dry season of 2009 (May) and at the end the rainy season of 2009 (November) to collect water samples (Figure 1).The sites were selected on the basis of their potential of pollution due to anthropogenic activities that took place at nearby towns and they were distributed at intervals around this lake for representativeness and to account for major pollution sources.Sites were identified depending on when the sampling campaign was conducted, for example: S1A corresponded to Site 1 sampled during the dry season (May, 2009) and S1B represented Site 1 sampled after rainy season (November, 2009); the rest of sites were identified using the same notation.

Water samples collection and analysis
Water samples were manually collected from approximately 30 cm beneath the water surface in polyethylene bottles previously soaked in 10% v/v HNO 3 for 24 h and rinsed several times with deionized water.All samples were cooled at 4˚C and transported to the laboratory for further analysis.Temperature and pH were measured in the field.The analyses of physicochemical variables ( PO − , etc.) for water quality followed pro- tocols established by the Mexican Standard [19] and APHA [20].For metal analysis, samples were acidified in the field with concentrated HNO 3 to pH < 2. Samples were digested according to the APHA Method 3030 K: Microwave-Assisted Digestion [20] in a programmable Microwave Oven Model Ethos Touch (Sorisole, Italy) and the extracts analyzed for heavy metal concentrations Atomic Absorption Spectrometry.Analytical blanks were analyzed in a way similar to that of the samples and metal concentrations were determined using the stan-dard solutions prepared in the same acid matrix.

Statistical Methods
Descriptive statistics (mean, standard deviation and maximum) were calculated for the physicochemical variables in water samples.The Pearson correlation coefficient was obtained to describe the degree of association between two variables under study.FA was employed to establish the possible relationships between the physicochemical variables of the sites sampled.This multivariate method was used here to obtain information about: 1) the most relevant characteristics of the physicochemical variables with a minimal loss of original data [1] [21] [22], 2) to create an entirely new set of factors much smaller in number when compared to the original data set of variables focused on reducing the contribution of the less significant variables to simplify even more the data structure coming from the principal component analysis [23].CA was used to explore the similarities between water samples [24] and grouping the sites according to the similarity of contaminants.Hierarchical agglomerative CA was performed on the normalized data set using squared Euclidean distances as a measure of similarity.The CA technique is a classification procedure that involves measuring either the distance or the similarity between the objects to be clustered.Details of this method can be found elsewhere [25] [26].All statistical tests were made using StatGraphics Centurion XV software (StatPoint Inc., USA).

Quality Assurance
All glassware and glass fiber filters were rinsed using deionized water with a resistivity of 18 MΩ (Barnstead, Chicago, Ill) and heated at 450˚C prior to use.The Relative Standard Deviation (RSD), as determined by running a standard 10 times, calculating the average, and dividing it into the standard deviation, was less than 4% across all analyses.

Water Analysis Results
The results of the measured water quality variables are shown in Table 1 for the two sampling seasons.The pH values are alkaline (8.6 -9.2); site S8A presents the highest value.This site is characterized as a place that receives the Lerma River inputs directly.The mean pH value was 8.8.This value lies within the appropriate range for aquatic life survival [27].Conductivity values were higher on sites sampled after the rainy season compared to the dry season campaign; sites S1B and S8B presented the highest values (840 and 860 µS•cm −1 , respectively).Alkalinity values were highest at sites S2A and S7A (318 and 340 mg•L −1 , respectively).These values can be attributed to the fact that bicarbonate ion entering the lake is dissociated to carbon dioxide which in turn is consumed by algae for their synthesis needs [28].Sites S2A and S7A are located in a zone where tourism activities take place and there are wastewater discharges mainly from domestic origin that provide the appropriate media for algae proliferation.Nutrient variables such as  NO − was observed after the rainy season with undetected values at site S6B.The highest N T values were measured at site S4A during the dry period and are attributed to the fact that this site is located in a hilly zone where there is an intense raspberry and blueberry agricultural activity and the use of large amounts of Nand P-rich pesticides is generalized that end up in the lake possibly through illegal discharges.In the case of the total solids, an approximately constant trend was observed for all sites during the two sampling seasons, except for site S1A.This site is located at the entrance of the Lerma River which in turn is the main contributor of suspended particles into Lake Chapala.COD values were higher after the rainy season; site S4B presented the highest value (34 mg•L −1 ) whereas site S9A presented the lowest value (4 mg•L −1 ) and the mean value was 17.20 mg•L −1 for the two sampling campaigns.SO − values showed little variation throughout the lake (64 -82 mg•L −1 ) during the two campaigns, with a mean value of 76.7 mg•L −1 and the highest concentrations obtained after the rainy season campaign.In the case of metals, Fe and Pb were the two elements prevailing during the two seasons; the maximum Fe value was measured at site S4A (2.9 mg•L −1 ) and Pb was present in the waters at sites S1A, S2A, S5A, S8A and S10A (1.3 mg•L −1 at each site); Cr was detected at sites S1A and S5A.Cd, one of the most toxic metals, presented concentrations of 0.2 mg•L −1 at sites S1A, S4A, S7A and S10A.In general, the highest concentrations of metals occurred during the dry season, and this trend can be explained by the fact that an unusual rainy event occurred during the month of February 2009 and brought metals into the lake by runoffs from nearby mountains.The increase of metal concentrations in surficial water bodies during unusual rainy events has been reported by others (Roberto et al. 2008;Gupta et al. 2010).According to international criteria for water consumption by aquatic organisms [29], only Zn concentration values were below these criteria whereas the values of Ni, Pb, Cd and Fe were above the corresponding values set in these criteria (Ni: 52 µg•L −1 , Pb: 2.5 µg•L −1 , Cd: 0.25 µg•L −1 , and Fe: 1000 µg•L −1 ).

Statistical Analysis Results
The correlations found between the monitored variables were few, and in general, positive (see Table 2).The greatest number of correlations occurred for pH with a negative value between this variable and conductivity and slightly positive with 2 NO − and Pb.Other significant positive correlations were found between Pb and Cd (0.6), Pb and Ni (0.62).Finally, the conductivity presented weakly negative correlations with metals, except with Fe.

Factor Analysis Results
After varimax rotation and Principal Component Analysis, six factors were extracted that represented 81.8% of variability of original data.Weights for each of the variables for each factor are shown in Table 3 and Figure 2 (bi-plots).Factor F1 (34.7% of variance) was comprised of variables associated to domestic-type wastewaters and agricultural runoffs (pH, Conductivity S, COD, PO − ) and suspended particles (TS) coming from erosion, human and agricultural activities; factors F4 (9.8% of variance) and F5 (6.9% of variance) were comprised of Fe and Zn, respectively; factor F6 (6.3% of variance), included T N which is a nutrient associated with surficial water eutrophization.Based on the results of factor analysis, it can be inferred that the contaminants that exert the highest environmental stress on Lake Chapala, are the wastes from domestic origin, followed by heavy metals from either natural source or from nonpoint discharges containing industrial residues.Finally, the contaminants that exert the least environmental stress are those from agricultural activities that take place along the lake and inputs from Lerma River.

Cluster Analysis Results
Methods of Ward and the Closest Neighbor were applied during the Cluster Analysis using averaged concentration values of each variable; the results are presented in Figure 3.The first cluster group contained sites S4 (Jo- cotepec) and S5 (San Juan Cosala).These sites are located in zones with the highest agricultural and tourism activities, and consequently this cluster represents the sites with the highest environmental stress of Lake Chapala.
The second cluster contains sites S3 and S9.These sites are located close to land places where agricultural activities are predominant.Third cluster is comprised of site S7, a site where touristic activity is lower than at sites S4 and S5.Fourth cluster contains site S10.This site is far from human settlements and from agricultural or industrial activities.Sites S2 and S6 comprise the fifth cluster.These sites are characterized by little commercial, agricultural or touristic activities.It is worth noting that site S6 is the place where the water intake is located that supplies water to the nearby city of Guadalajara.Clusters seven and eight comprise sites S8 and S1, respectively.These are the least contaminated sites of the lake.Based upon the results of the cluster analysis, it can be established that the source of environmental stress on Lake Chapala arises mainly from the activities that take place at populations located along the lakeshore.In this regard, it can be seen that the stress exerted on the lake caused by waste sources follows the order: domestic > agricultural > industrial.

Conclusion
The analyzed physicochemical variables of waters from Lake Chapala are within international values (WHO and USEPA).The presence of Cd, Ni and Pb (even in small amounts) in lake waters suggests the need of careful monitoring of the water supplied to the city of Guadalajara used for human consumption.Variables such as phosphates and nitrates exceed the corresponding limits set by WHO and USEPA.High concentrations of these nutrients were attributed to the widespread use of phosphate-rich fertilizers and detergents on parts of land along the lake shoreline and in some parts of the Lerma River close to the lake.A study is in progress to determine more accurately (via transport models) metal pollution point sources that would help to propose pollution control policies with the final objective of reducing the presence of toxic metals in the lake.

2 NO − and 3 4 PO
− show significant variations from one season to the other.For example, 2 NO − values could be detected at sites S1A, S4A and S6A; 3 NO − values ranged be- tween 1.7 and 12.5 mg•L −1 with the highest values found after the rainy season.

3 4 PO 2 NO − and 3
− concentrations were in the range of 0.3 -0.7 mg•L −1 , with highest values measured during the dry season.A significant decrease of N T ,

Figure 3 .
Figure 3. Cluster analysis results on water samples of ten sites of Lake Chapala.

Table 1 .
Descriptive statistics of ten variables and six metals (as total metal) measured in waters of ten sites of Lake Chapala.Concentrations in mg•L −1 ; conductivity S in µS•cm −1 .
* Alkalinity as

Table 2 .
Correlation analysis of ten variables and six metals in ten sites of Lake Chapala.

Table 3 .
Factor loads after varimax rotation of ten variables and six metals at ten sites of Lake Chapala.