The Water Quality Evaluation in Balihe Lake Based on Principal Component Analysis

The water pollution situation in Balihe Lake, the biggest tributary of Shaying River Basin in Anhui Province, China, has brought a huge pressure on the improvement of water quality in Huai River. On October 16 th , 2017, 11 major pollution indexes were observed at 15 sampling points in Balihe Lake. Based on the data experimentally measured, the water quality in Balihe Lake was analyzed utilizing the Principal Component Analysis (PCA) of SPSS. The re-sult suggested that the major components were oxygenated pollutants, water eutrophication pollutants and ammonia nitrogen, in which oxygenated pollutants played a dominant role. In addition, the upper part of Balihe Lake suffered serious situation and needed a focus on oxygenated pollutants.


Introduction
With the increasingly prominent problem of water environmental pollution, the research on the water quality comprehensive evaluation method becomes particularly important (Sina, 2017;Noori, et al., 2019). In the process of water quality comprehensive evaluation, complex and numerous environmental factors can make the research work heavy and the relevant data analysis difficult, moreover, and even may not find the root cause of water quality deterioration (Mena-Rivera, et al., 2017;Yang, 2010).
At present, there are many commonly used water quality comprehensive evaluation methods, such as comprehensive index method, fuzzy evaluation method, neural network method, etc. (Deng & Li, 2010;Wong & Hu, 2013). Although these methods may also make a good evaluation of water quality status, it is impossible to determine the main factors affecting water quality (Chang et al., 2011). Principal Component Analysis (PCA) can put forward relevant factors from many variables, determine the main factors affecting water quality, and then get a reasonable explanation (Friedman, Hastie, & Tibshirani, 2010;Olsen, et al., 2012;Zhong, et al., 2018;Sun, et al., 2019). E.g., in Sun et al.'s research, the temporal and spatial patterns of river water quality were analyzed to evaluate the pollution status in a natural river based on PCA method (Sun et al., 2019). Similar studies can also be found for lake ecosystems (Zhong et al., 2018).
In this study, a set of actual sampling data was observed in a freshwater lake and used to evaluate the water quality in the sampling area based on PCA method, in order to get the main factors affecting the water quality in the area, and provide guidance for water environmental governance and improvement.

Study Area
Balihe Lake is located at the intersection are of Huaihe River and Shaying River in Fuyang City, Anhui Province, P. R. China. As an artificially excavated lake, Balihe lake is originally the largest tributary of Shaying River Basin in Anhui Province. With the geographical coordinates of E116˚14'-116˚19' and N32˚33'-32˚36', it belongs to the semi-humid monsoon climate zone in subtropical and warm temperate zones. The total drainage area of Balihe Lake is about 500 km 2 , accounting for about one-eighth of the total area of Anhui Section of Shaying River Basin. Besides, as can be seen in Figure 1, three rivers including Disanhugou River, Liugou River and Wulihugou River flows into the lake.
Water pollution in the Balihe Lake Basin not only seriously affected the economic development of the basin and the stability of the ecosystem, but also affected the ecological environment and water quality of the Shaying River, bringing tremendous pressure to the improvement of water quality in the Huaihe River Basin. Generally speaking, there were two main sources of water pollution in the lake drainage area: first, non-point source pollution along the lake coastline along with rainfall runoff; second, pollutants from these rivers flowing into the lake. Therefore, the analysis of water quality at different locations in Balihe  S8 S9 S10 S11 S12 S13 S14 S15

Wulihugou River
Disanhugou River Outlet Balihe Lake Journal of Geoscience and Environment Protection Lake is of great significance to the water pollution control work in Huaihe River Basin. In this study, PCA method was applied to the water quality comprehensive evaluation of Balihe Lake. The water pollution status of Balihe Lake was then analyzed comprehensively, and the main pollution factors were identified, which may provide some guidance to the water pollution control of Balihe Lake and Huaihe River Basin.

Sample Collection and Analysis
In October 2017, a field sampling survey was carried out at 15 sampling sites in Balihe Lake (see Figure 1). Surface water samples were collected because the water depth of the survey was within 10 m. According to the most concerned indicators of water environment monitoring in China, the water quality indicators including dissolved oxygen (DO), total nitrogen (TN), total dissolved nitrogen (TDN), ammonia nitrogen (NH 3 -N), nitrate nitrogen (

PCA Method
Principal component analysis (PCA), also known as principal variable analysis, uses the idea of dimensionality reduction to transform multiple indicators into a few comprehensive indicators under the principle of minimizing the loss of data information (Debels, et al., 2005;Ouyang, 2005). In PCA, the comprehensive index of transformation analysis is usually called principal component. The principal component is a linear combination of the original variables and is not correlated with each other. Therefore, only a few principal components need to be considered to grasp the main contradictions and avoid the problem of collinearity between variables in complex problems, while the main information of the original data is not lost. And as such, the analysis efficiency could be improved significantly. Based on IBM SPSS Statistic 25.0 software, PCA was carried out on 11 water quality indicators of the 15 sampling sites mentioned above.

Standardization of the Experimental Data
The original data of these 11 indexes was standardized to eliminate the influence of magnitude and dimension among different data. The standardized data obtained obey the normal distribution with 0 as mean and 1 as standard deviation. Equation (1) is the calculation formula and the results were shown in Table 1.
L. Zhang et al.

Maintaining the Integrity of the Specifications
The standardized data are analyzed by PCA method. Table 2 shows that KMO statistic is 0.624 (>0.500), and the significance level of Bartlett's test of sphericity is less than 0.001. It shows that independent variables are interrelated, and the data meet the basic requirements of PCA.
Spearman correlation analysis was used to analyze the correlations between these 11 indicators. And the results of the correlation coefficients were shown in Table 3. The greater the absolute value of the correlation coefficient between two indicators, the stronger the correlation between these two indicators. There is a positive correlation between two different indicators if the correlation coefficient is positive and vice versa. As can be seen in Table 3, there are some strong correlations between some indicators. E.g., 7 indicators have negative correlations with DO, which indicates that these indicators may be oxygen-consuming ones. Chl-a has a positive correlation with DO, which is consistent with the understanding that Chl-a is the main pigment for photosynthesis. Besides, there are strong positive correlations between TP, TDP and Journal of Geoscience and Environment Protection  According to the explanatory table of total variance (Table 4)   The factorial load matrix (Table 5)  The factorial load matrix is not the principal component coefficient matrix. By dividing the factor load matrix by the square root of the corresponding principal component eigenvalue, the principal component coefficient matrix (Table 6) can be calculated. By multiplying the obtained component coefficient matrix with the normalized data, the evaluation functions F1, F2, F3 corresponding to each principal component and the comprehensive evaluation function F can be obtained. Based on these evaluation functions, the water quality pollution score of each sampling site can be quantitatively described. The higher the score, the more serious the pollution is. The expressions of each function are as follows:

Comprehensive Water Quality Evaluation
The scores and ranks of the principal components 1, 2, 3 and the comprehensive principal component were calculated and shown in Table 7. As can be found in the table, 1) according to the ranks for the 1 st principal component, the top five sampling sites are 4, 5, 6, 3 and 2 indicating that the oxygen-consuming pollution is relatively serious at three places, and the site 4 is highest one; 2) for the principal component 2, the top five sites are 8, 7, 6, 5 and 10, which means that the eutrophication pollution may be more serious in comparison to the other places; 3) the top five sites for principal component 3 are 1, 2, 3, 8 and 4 where ammonia nitrogen pollution may be serious; 4) the top five places 4, 5, 6, 7, 3 for the comprehensive principal component are similar to that of the 1 st principal component. Considering the items 1) and 4) together, it can be concluded that the upper part of the Balihe Lake should be polluted seriously by the oxygen-consuming pollution and should be treated adequately. In addition, the main treatment measures should be oriented to the oxygen-consuming pollutions like living sources, non-point sources, etc. The lower ranks of sites 11, 12, 13, 14 and 15 indicate that the water quality of the lower part of the lake is better and conservative measures may be taken to this area.

Conclusion
Based on the field measurements of 11 environmental factors at 15 sampling sites, the water quality in Balihe Lake was evaluated utilizing PCA method. The following conclusions can be drawn. 1) There are obvious correlations between some of these 11 environmental factors. The 3 extracted principal components accounting for 92.016% of the total variance can well explain the water quality status in Balihe Lake. The 1 st , 2 nd and 3 rd principal component represent the pollution of oxygen consuming pollutants, eutrophication and ammonia nitrogen, correspondingly.
2) The sampling sites 4, 5, 6, 7 and 3, which have relatively higher PCA scores, are all located in the upper part of the lake. The water quality in these places should be more serious and the main pollutants are oxygen-consuming. Therefore, more attention should be paid to such areas in the future water quality prevention and treatment. Journal of Geoscience and Environment Protection 3) The water quality at sites 11, 12, 13, 14 and 15 concentrated in the lower part of the lake as the PCA scores are lower. According to the better water quality in this area, conservative measures may be taken to this area.
Although the results of this study may provide some guidance or inspiration to the water pollution prevention and treatment of Balihe Lake, more research focusing on this topic based on some other methods are necessary in the future.