Sea surface temperature (SST) has significant influence in the hydrological cycle and affects the discharge in the stream. SST is an atmospheric circulation indicator which provides the predictive information about the hydrologic variability in the region around the world. Use of right location of SST for a given location of stream gage can capture the effect of oceanic-atmospheric interaction, improving the predictive ability of the model. This study aims on identifying the best locations of SST at the selected stream gage in the state of Utah that spatially covers the state from south to north, and use them for next six-month streamflow volume predictions. The data-driven model derived from the statistical learning theory was used in this study. Using an appropriate location of SST together with local climatic conditions and state of basin, an accurate and reliable streamflow was predicted for next six months. Influence of Pacific Ocean SST was observed to be stronger than that of Atlantic Ocean SST in the state of Utah. The SST of North Pacific developed the best model in most of the selected stream gages. Each model was ensured to be robust by the bootstrap analysis. The long-term streamflow prediction is important for water resource planning and management in the river basin scale and is a key step for successful water resource management in arid regions.
Streamflow depends on the distribution of precipitation in time and space which further depends on the climatic conditions. Shivakumar [
The precise information about the quantity of water availability in the next season can be very useful for the agricultural planning, watershed management, and other decision making processes [
Machine learning regression model has been used as an alternative to physically based models. The complexities in the physically-based models and difficulties associated with the data acquisitions and corresponding expenses that these models would require have limited the application of such models. Machine learning models are good at capturing the underlying physics of the system by relating input and output. They are robust and are capable of making reasonable predictions using historical data [
Five stream gages were selected at different locations of Utah which spatially covers the state from north to south (
Relevance Vector Machine is a supervised learning model based on sparse Bayesian learning. This is a model of identical functional form to the SVM developed by Vapnik [
For the given input-target pair
(streamflow in this study) on the inputs (e.g. SST, snow and temperature data) with the objective of making accurate predictions of the target (t) for previously unseen values of input x [
Target
Site ID | Name | Basin | Stream | Gage Location | ||||
---|---|---|---|---|---|---|---|---|
Area (Km2) | Length (Km) | Slope | Latitude (˚) | Longitude (˚) | ||||
10128500 | Weber River near Oakley | 419.8 | 40.7 | 0.020 | 40.737 | −111.247 | ||
10131000 | Chalk Creek at Coalville | 643.1 | 60.4 | 0.010 | 40.921 | −111.401 | ||
10174500 | Sevier River at Hatch | 880.6 | 50.1 | 0.007 | 37.651 | −112.430 | ||
09330500 | Muddy Creek near Emery | 272.0 | 32.3 | 0.004 | 38.982 | −111.249 | ||
10149000 | Sixth Water Creek near Springville | 38.9 | 1.6 | 0.048 | 40.118 | −111.314 | ||
The unknown function y is the product of design matrix (F) and weight parameter (w). In the vector form, Equation 1 can be written as,
where the target and weight vector are expressed as
independent Gaussian noise is assumed. Thus,
The maximum likelihood estimate of w and s2 in Equation 2 may suffer from over fitting [
mated from Bayes’ rule [
where
The predictions for new input
where
can be found in Tipping [
The model (Equation (4)) consists of predicting total volume of water passing the stream gage for next six months. Inputs to the model are past streamflow data, snow water equivalent and SnoTel temperature of nearby SnoTel stations and SST. The input variables are selected based on the underlying physical processes and climatic factors that influence the generation of streamflow.
where
Smith and Reynolds SST were used in this study which covers majority of world’s ocean by 2˚ by 2˚ grid [
In Utah, snow is an important variable affecting the discharge in the stream. When the precipitation falls as snow, it settles, compacts and melts several months later and is prominent source of streamflow [
Chalk#2 were used for Chalk Creek at Coalville, Buck Flat and Dill’s Camp were used for Muddy Creek near Emery and Strawberry Divide was used for Sixth Water Creek near Springville. Although some SnoTel sites were physically outside of the watershed, they were still included in the model due to their strong correlation with the streamflow processes. The SWE data were collected from Natural Resource Conservation Service (NRCS) (http://www.wcc.nrcs.usda.gov/snow). The period of 1980-2009 was used in this study because of the relative completeness of data in the basins for these years.
Temperature affects the melting rate of snow which consequently affects the discharge in the stream. The high discharge in the spring and early summer month is due to rising temperature when there is enough snowpack in the watershed. The temperature data were also collected from the SnoTel stations operated by NRCS and the period of data collection for local temperature was same as that of SWE.
The model was trained for 1980-2001 and tested on 2002-2009 for Weber River near Oakley, Chalk Creek at Coalville, and Muddy Creek near Emery. The Sevier River at Hatch was trained for 1982-2001 and tested for 2002-2009 while the Sixth Water Creek near Springville was trained for 2000 to 2006, and tested for 2007 to 2009. For the SST value, an individual as well as combinations of the SSTs were used for developing the best model. The best model was selected based on the test statistics (RMSE and Nash-Sutcliffe efficiency in the test phase).
The test statistics were computed for each individual SST for the volume of water passing through the stream gage for next six months. The SST locations that developed the best test statistics are shown in
A 95% confidence interval for the median RMSE is shown in
Using the best SST locations, the volume of water passing through each selected stream gage was predicted (
The illustration about the best location of SST for the given location of stream gage is discussed below. When monthly data are used, the data consists of seasonal, annual, and inter-annual components. The effect of seasonal
Streamflow Sites | 95% Confidence Interval (1000 ac-ft) | Best RMSE (1000 ac-ft) | Remark | |
---|---|---|---|---|
Lower | Upper | |||
Weber River near Oakley | 8.66 | 11.45 | 8.31 | NP, CP and TP |
Chalk Creek at Coalville | 2.75 | 4.46 | 2.65 | NP |
Muddy Creek near Emery | 2.52 | 4.11 | 2.44 | NP |
Sevier River at Hatch | 5.36 | 7.72 | 5.04 | NP |
Sixth Water Creek near Springville | 0.88 | 1.32 | 0.73 | NP |
Stream Site | Test RMSE (1000 ac-ft) | Efficiency | Best Combination of SST Locations |
---|---|---|---|
Weber River near Oakley | 8.307 | 0.965 | NP, CP and TP |
Chalk Creek at Coalville | 2.653 | 0.968 | NP |
Muddy Creek near Emery | 2.438 | 0.951 | NP |
Sevier River at Hatch | 5.042 | 0.987 | NP |
Sixth Water Creek near Springville | 0.732 | 0.739 | NP |
(a)
component is stronger than other components for the monthly data. However, when the variables were cumulative or averaged over the time for the model (Equation (2)), the seasonal component gets eliminated. The remaining components are annual to interannual components, which are low frequency components. North Pacific SST has low frequency component (annual, interannual to interdecadal) so it is obvious to have NP SST influencing more than any other SST locations for most of the streamflow sites in Utah for the volumetric predictions. This includes Chalk Creek at Coalville, Muddy Creek near Emery, Sixth Water Creek near Springville and Sevier River at Hatch. When monthly data were used, the best prediction was obtained from TP SST for Sevier River at Hatch, however, when predictions were made for the volume of water passing through the streamflow site, the variables were averaged or cumulative over the time. The seasonality effect was thus eliminated leaving low frequency components. These components were best represented by the NP region. Therefore, the best predictions were obtained from NP SST. This result is consistent with the result obtained by Asefa et al. [
For Weber River near Oakley, the combination of CP, NP, and TP developed the best model. However, this result was very close to prediction from the combination of NP and CP SST. The principal moisture source of this area is Pacific Ocean. In addition, this stream gage is outside of the ENSO dominance region. There is no seasonality component, therefore NP and CP SST appeared to be the most important inputs.
The bootstrap analysis is a data-based simulation method for statistical inference [
The Relevance Vector Machine successfully transformed the input variables (sea surface temperature, local
meteorological conditions, and SWE) into reasonably accurate forecasting of streamflows for next six months. For each gage, the best location of SST was identified. It was found that the SST of Pacific Ocean predicted better than that of Atlantic Ocean because this region represents the majority of Ocean-atmosphere climate influence in the western U.S. [
The prediction results were highly accurate for unimpaired stream gages while the accuracy was satisfactory for the impaired gage (Sixth Water Creek near Springville). Since the human induced effects were not incorporated in the model for impaired gage, it is obvious to have less efficiency compared to the unimpaired gages. The model has predicted the streamflow perfectly for high flow but low flows were not captured perfectly. The overall predictions were, however, accurate and had good agreement with the observed streamflow values. The uncertainty of the predictions was also captured and presented by the confidence interval. The reliability and robustness of the model were tested from the bootstrap analysis. This analysis confirmed the good predictability and robustness of the model.
This study has demonstrated that with the use of appropriate input, the RVM model can be utilized for the successful forecast of the long-term streamflow. Accurate and reliable long-term streamflow prediction is crucial for the management of water resources in the basin scale. This information could help the water managers and stakeholders for the planning and decision making of the water resources which ultimately reduces the financial risk associated with the water users to future water shortages.