Rating Curve Estimation of Surface Water Quality Data Using LOADEST

Measurement of the nutrient concentrations in the stream is usually done on weekly, biweekly or monthly basis due to limited resources. There is need to estimate concentration and loads during the period when no data is available. The objectives of this study were to test the performance of a suite of regression models in predicting continuous water quality loading data and to determine systematic biases in the prediction. This study used the LOADEST model which includes several predefined regression models that specify the model form and complexity. Water quality data primarily nitrogen and phosphorus from five monitoring stations in the Neuse River Basin in North Carolina, USA were used in the development and analyses of rating curves. We found that LOADEST performed generally well in predicting loads and observation trends with general tendency/bias towards overestimation. Estimated Total Nitrogen (TN) varied from observation (“true” load) by −1% to 9%, but for the Total Phosphorus (TP) it ranged from −2% to 27%. Statistical evaluation using R, Nash-Sutcliff Efficiency (NSE) and Partial Load Factor (PLF) showed a strong correlation in prediction.


Introduction
Accurate estimation of nutrient loads in the river and streams is very necessary for many applications, including determining sources of nutrient loads in the watersheds [1,2], calibrating and validating watershed models [3][4][5][6] and evaluating long-term trends in the loads [7,8].The instantaneous load which can be found out by multiplying the nutrient concentration C(t) and discharge Q(t) for given time t and the load over an extended period of time T is given by Limited resources prohibit the continuous measurement of water quality constituent concentration and discharge on long term basis.Even though daily discharge measurement is frequently available, concentrations of the nutrients are measured less frequently and gap between measurements can be from weeks to months.For the estimation of load for extended period of time, continuous data is necessary.So, there is a need to convert the weekly or monthly data into daily data.It is well known that load estimation of nutrients is subjected to many potential sources of error and uncertainty [9] and rating curve generation is one of them.
There are many methods of load estimation.Many studies have compared various methods for load estimation and various techniques were applied to measure the performances of the models [3,[9][10][11][12][13].In some studies, under-sampling against a true load to evaluate load uncertainty and model performance is used [9,12], while in others, different algorithm methods were applied to the same dataset [11,13].These studies show that there are lots of variability in the nutrients load estimates.For example, error on the estimated annual phosphorus load using a regression model was found to be 30% in [14] and 34% in [11].Annual nitrate loads have differed by as much as 64% depending on the sampling strategy, load estimation method and monitoring period used [3].
This study is attempted to evaluate multiple regression models [15] in predicting water quality loadings in the Neuse River Basin in North Carolina, USA (Figure 1).The model algorithms are incorporated in LOADEST, a user friendly computer program for load estimation developed by the United States Geological Survey (USGS) [16].It is widely used to estimate nitrogen and phosphorus loads in the rivers [17][18][19][20][21].The model has been utilized to estimate nutrient flux in the major rivers flowing to the Gulf of Mexico [22] and to calculate "observed" loads in the USGS SPARROW model [23].
Model performance was evaluated using statistical techniques by comparing model-predicted nutrient loads to actual measured loads.How well the regression model predicted nutrient loads to actual measured values is assumed to be indicative of how well the model might be expected to perform for days where no samples were collected.The objectives of this study were to test the performance of a suite of regression models in predicting continuous water quality loading data and to determine systematic bias in the prediction.

LOADEST Description
The LOADEST was developed by the USGS [16] to es-timate constituent loads in streams and rivers through a set of regression models.It needs information on a time series of streamflow, additional data variables and constituent concentrations.The simplest form of linear regression equation consists of log of instantaneous load related with one or more explanatory variables: where a 0 and a j are model coefficients, NV is the number of explanatory variables, and Xj is an explanatory variable.The number and form of explanatory variables is highly dependent on the system under study and the constituent of study.For example, one variable (log of stream flow) was found sufficient for the prediction of suspended sediments [24] whereas a model with six explanatory variables were found suitable for nutrient rating curve [15].While the estimation method is very straightforward, the process is complicated due to several statistical complications including retransformation bias, data censoring and non-normality of the data.[25] noted that the model bias could lead as much as 50 percent from the true load.LOADEST used three methods to deal with these issues namely maximum likelihood estimation (MLE), adjusted maximum likelihood estimation (AMLE) and least absolute deviation (LAD) methods.
In this study, we used AMLE method [26], the primary load estimation method, which assumes that the model residuals are normally distributed with constant variance.A series of 9 predefined modeling options that vary with number of explanatory variables are provided within the framework of LOADEST.The selection of the predefined model is based on user's knowledge of the hydrologic and biogeochemical system or alternatively one can use model's automated method of selection where it selects the "best" model for which the lowest value for Akaike Information Criterion (AIC) and the highest value for Schwarz Posterior Probability Criterion (SPPC) are obtained.Details on other background information on LOADEST can be found in [16].

Study Area and Data Analysis
This study performed statistical analyses to evaluate historical water quality and flow data collected from various water quality monitoring stations located throughout the Neuse River Basin, North Carolina, USA (Figure 1).It is located in the southeastern part of the country draining an area of over 15,540 km 2 into the Atlantic Ocean.A big threat to water quality in the basin are large quantities of nutrients primarily nitrogen and phosphorus, contributed primarily through non-point sources.
Water quality and flow data were obtained for five water quality monitoring stations across the watershed from USGS (www.nwis.waterdata.usgs.gov).A total of seven water quality parameters mainly nitrogen, phosphorus and their variants were collected.Table 1 lists pertinent information on these sites.While total nitrogen (TN) and total phosphorus (TP) has significant amount of monitoring data, others (ammonia, nitrate, nitrite and ortho-phosphate) are limited and not-available in many cases.Data locations were selected in a way that the drainage area could vary for possible correlation analyses with the drainage area and model performance.
The performance of the regression model was evaluated using three statistical evaluations.
1) The partial load factor (PLF) was obtained by dividing long term average estimated data by long-term average measured data.PLF of 1 means the perfect estimation.Less than 1 indicates under prediction while more than 1 indicates over prediction.
2) The Nash-Sutcliffe Efficiency (NSE) coefficient can be defined as: The value of NSE can range from -∞ to 1.An effi-ciency of 1 (E = 1) corresponds to a perfect match of modeled discharge to the observed data.An efficiency of 0 (E = 0) indicates that the model predictions are as accurate as the mean of the observed data, whereas an efficiency less than zero (E < 0) occurs when the residual variance is larger than the data variance.
3) The coefficient of determination (R 2 ) describes the degree of collinearity between simulated and measured data.It ranges from 0 to 1, with higher values indicating less error variance, and typically values greater than 0.5 are considered acceptable.

Results and Discussion
The collected water quality data from all five stations were tested for outliers and consistency, and finally reformatted as per the requirements of LOADEST.The model was then executed for each parameter individually.Model's in-built option for automatic selection of predefined regression model using the AIC statistics was chosen in all model executions.The model output related to the AMLE estimation method was selected for analyses.

LOADEST Prediction
The model output for the average annual loading was compared with the "true" loading which was estimated by calculating instantaneous daily loading (multiplying concentration by daily flow volume) and multiplying the resulting load by 365 days to obtain the annual loading.Table 2 shows the percent difference in loading estimation predicted by LOADEST when compared with the true loading.It clearly indicates that the model over predicted the loading most of the times except for few cases.TN was over predicted by an average of 5% from the true loading on four stations and under predicted the load by 1% at one station.Ammonia and TP were found to be over predicted by 43 and 28% respectively.This is consistent with the outcome of [11,14] where the TP load was found to differ by 30 and 34%.
Model estimated loads were compared with the measured load and statistical indicators PLF, R 2 and NSE were calculated.Table 3 lists all statistics for each of the seven parameters for each monitoring locations.R 2 values for TN at all stations varied from 0.97 to 0.99 indicating a very strong correlation between estimated and measured loads.NSE also showed strong correlation with values varied from 0.71 to 0.92.PLF indicated the variation in the range of prediction from −2% to +9% indicating positive bias with mild over prediction.
TP prediction showed similar results as that of TN except for station 02087500 where NSE resulted in −1.76.values accumulated well in one place (higher R 2 ) but far from the 1:1 line of estimated vs. measured comparison.PLF data identified range of prediction from −23% to +28% indicating a balance in bias with a slight over prediction.

Negative efficiency indicated a bias in prediction with
Org N estimation showed good correlation with bias toward over prediction that ranged from −1% to +12%.Ammonia was found to be over predicted significantly (43%) and the correlation was very poor to none.Nitrate, nitrite, and ortho-phosphate behaved similarly with good prediction correlation with slight to none over prediction.Figures 2 and 3 are example plots of the best and worst prediction of LOADEST as determined by R 2 .
Analysis was also extended to examine whether the  sampling size has any impact on the model's prediction accuracy.While it seemed like there might be a pattern of lower number of sample size causing over prediction, but it was not conclusive.For example, sample size of 77 at site # 02087500 had an average annual loading of 7394 kg/km 2 whereas sample size of 294 at site # 02089500 was found to have 3162 kg/km 2 .

Time Series Analysis
Time series graphs (Figure 4) were plotted for two parameters, TN and TP, for all five monitoring sites for which complete datasets were available.It facilitated a visual inspection of the model performance and identified possible model bias and trends both in model predictions as well as in observed (or measured) data.As indicated in Figure 4, LOADEST estimated loadings were extrapolated beyond the period for which the measured data were available.It was due to the fact that the flow data were available for longer duration and all was used in the LOADEST estimation.Load estimation pattern was found different for dif-ferent simulations (i.e.model executions).This may be due to the fact that each simulation chose a different regression model from a given set of 9 pre-defined models.
LOADEST selects best model based on the minimum value of AIC which optimizes between goodness-of-fit and model complexity.For three sites, the model selected the same pre-defined model for both TN and TP but it was different for the other two sites.No conclusive idea could be formed on whether the model selection was adequate.More research is needed in this area.
In general, the prediction and its trend seem to follow the observed data very well.Site-by-site analyses of model prediction are provided in Table 4.
Overall, the LOADEST model was found to perform well in predicting loads.However, a clear understanding of the regression models is needed for better application of the model.Analyzing several regression models provided within the LOADEST environment may provide enough evidence to select the best model for load estimation.Even though the statistical evaluator provides strong correlation, it may be misleading often times, so a visual inspection is necessary which complements the process of selecting the best regression model for the chosen parameter and the study region.

Conclusions
Accurate estimation of nutrient loads in the river and   streams is very necessary for many applications.There are many methods of load estimation which varies widely from the selection of the model to the ranges of errors it can produce.This study used the LOADEST model which includes several predefined regression models that specify the model form and complexity.Water quality data primarily nitrogen and phosphorus from five monitoring stations in the Neuse River Basin in North Carolina, USA were used in the development and analyses of the rating curves.The AMLE option of the model development was used along with automatic option of selecting pre-defined set of regression models.The performance of the model was evaluated using three statistical indicators: R 2 , NSE and PLF.We can conclude that LOADEST performed well in predicting loading in the stream of varying sample sizes and drainage area with bias towards over estimation most of the time.TN was over predicted by an average of 5% from the true loading on five stations and under predicted the load by 1% at one station.Ammonia and TP were found to be over predicted by 43 and 28% respectively.Model performed very well statistically for TN as indicated by R 2 range from 0.97 to 0.99, NSE range from 0.71 to 0.92, and PLF range from −2 to 9%.Time series analyses for examining trend of predicted loads produced mixed results with the selection of best models to poor fitting models.But in most cases, model selection did a good job in predicting loads and capturing the trend of observed data.
Even though LOADEST seems to work well in-general and statistics seem to sought a strong correlation in prediction, a clear understanding of the regression model and its selection is important for its application.A timeseries analysis is a must for detecting potential problems with the model selection and identifying possible trends in the observed data as well as in estimated rating curves.It is recommended to test multiple pre-defined regression models before concluding to a final and best model for rating curve development.Future study should also consider filtering observed data and exclude exogenous data from analysis which may potentially affect the model selection process leading to the erroneous prediction.

Figure 1 .
Figure 1.Neuse River Basin in North Carolina, USA with locations of water quality monitoring stations.

#
Analysis of Rating Curve TN Prediction covered majority of observed data A decreasing trend with a very mild slope 02085000 TP Over prediction of low concentrations values Prediction may be biased due to extreme data point TN Under predict the true nature of observation No apparent trend in observed or predicted data 02085500 TP Over prediction of low concentration values Prediction is too-amplified Not a good choice of model TN Lack of good prediction Missed more than half of the observation points Decreasing trend in observation and prediction 02086500 TP Poor model selection Missed majority of observation data TN Prediction covered majority of points No trend is visible 02089500 TP Prediction is misleading TP seems to follow TN closely (same pattern) No clear trend is expected TN Prediction followed the trend of observation Lack to adopt the variation in observation Trend is not clear 02089500 TP An interesting and clear decreasing trend Good model selection and prediction

Figure 4 .
Figure 4. LOADEST derived rating curve estimation and comparison with the observed data.