Development of a Regional Regression Model for Estimating Annual Runoff in the Hailar River Basin of China

The Hailar River, a first-grade tributary of the Erguna River that borders China and Russia, is the main water source for the local industry and agriculture. However, because there are only 11 flow gauging stations and those stations cannot monitor all runoff paths, it is hard to directly use the existing flow data to estimate the annual runoffs from all subbasins of interest although such estimation is needed for utilization and protection of the water resources in the Hailar River. Thus, this study implemented an indirect approach (i.e., regional regression model) by correlating annual runoff with annual rainfall and water surface evaporation as well as hydrologic characteristics of the 11 subbasins monitored by the gauging stations. The study used 51 years (from 1956 to 2006) data. The results indicated a significant correlation (R > 0.87) between annual runoff and the selected subbasin characteristics and showed the model to be robust because the predicted runoffs for the validation period are compatible with the corresponding observed values. In addition, this model was used to estimate the annual runoffs for the subbasins that are not monitored by the 11 flow gauging stations, which adds new information to existing literature.


Introduction
While the availability of hydrological (e.g., flow) data is crucial for water resources planning and management, most rivers in developing countries, including China, do not have sufficient data partially due to poorly maintained monitoring networks [1][2][3].In addition, it is almost impractical to monitor all subbasins of interest within large basins [4], such as the 54,805 km 2 Hailar River basin located in northeastern China.Thus, researchers developed and/or used various methods to estimate runoff from ungauged basins/subbasins.Those methods include sophisticated simulation models as well as simple statistics models.In practice, simple models have been widely used by water agencies to estimate annual runoff at a regional scale.
Among the simple models, the simplest methods transfer streamflow from a nearby hydrologically similar basin by assuming the runoff per unit drainage area is constant [5], or directly uses a runoff map [6][7][8][9].The other approaches use multiple regression techniques to exploit the spatial relationship between annual runoff and readily measured basin characteristics, such as rainfall, potential evapotranspiration, drainage area, land use, and geomorphology.For example, a research related annual runoff to geomorphic and climate characteristics for three selected basins in western, central and southern U.S. [5], for one basin in western U.S. [10], for one basin in northeastern U.S. [11], for several areas in South Dakota [12], for whole U.S. [13], and for the state of New England [14,15].Also the basin characteristics can be easily determined using remote sensing (RS) and geographic information systems (GIS) [16][17][18], greatly facilitating the application of these approaches.For example, a GIS-based rainfallrunoff model with independent variables of rainfall, land use, and soil characteristics was adopted for the Tapi River basin, India [19], and a GIS model was exploited to estimate basin geomorphological, geological, soil, and climatic characteristics for predicting total streamflow [20].
However, a study similar with the researches mentioned above is lacking for the Hailar River basin, which is located in an undeveloped area of northeastern China, and has limited hydrologic data.The objectives of this study were to: 1) develop a regional regression model for estimating annual runoff in the Hailar River basin; and 2) use the regression model to predict the amounts of runoff for the return periods of interest.

Study Area
The 54,805 km 2 Hailar River basin (Figure 1), located in southestern Hulunbeir of China, is very sensitive to climatic and environmental changes [21,22].This study selected this basin to demonstrate the estimation of annual runoff because of the basin is typical for water resources planning and management.The river originates from the Da Xingan mountains, has a main channel length of 708.5 km, and is fed by 12 major tributaries upstream of its confluence with the Erguna River.The Hailar River basin has a temperate continental monsoon climate.That is, the basin is strongly influenced by the eastern Asian summer monsoon and frequently suffers from extreme climates: short-cool summers and long-cold winters [22,23].Based on the data from 1956 to 2006, the basin receives an average annual rainfall of 347.6mm , has a water surface evaporation of 801.7 mm and an annual average temperature of -1.2℃.The topography is dominated by mountains and hills, plains, and wetlands, with elevations ranging from 510 to 1622 m above mean sea level.

Available Data
There are 8 rainfall, 17 weather, and 11 flow gauging stations in and adjacent to the Hailar River basin (Figure 1).These stations, maintained by the local climatic and hydrologic monitoring departments, collect data on rainfall, water surface evaporation, and/or streamflow.Most of the stations have data from 1956 to 2006, while the datasets for several stations are not continuous.The missing values were filled using an interpolation and extension approach detailed in the following context.The basin characteristics (Table 1) were obtained from the Hailar Hydrographic Bureau.

Methods
This study investigated the correlations of annual runoff (R) with several selected independent variables, including annual rainfall (P), annual water surface evaporation (E), subbasin centroid coordinates (X,Y), subbasin centroid elevation (H), subbasin area (A B ), subbasin wetland area (A W ), and subbasin shape factor (K).The investigation was used to establish a regional regression model for estimating annual runoff of subbasins that cannot be monitored by the 11 flow gauging stations.The investigation was realized by: 1) using the multiple-period universal kriging spatial estimation theory (MUKSE) [24,25] to estimate values of annual rainfall and water surface evaporation from 1956 to 2006 for the subbasins that are monitored by the 11 flow gauging stations and the 12 ungauged subbasins; 2) using the principal component regression (PCR) technique provided in the Eviews 7.0 software package to build/validate a regression model between annual runoff and subbasin characteristics in terms of the data of the subbasins monitored by the 11 flow gauging stations; 3) using the validated regression model and it's coefficients to estimate the annual runoffs for the 12 ungauged subbasins of the Hailar River basin.

Annual Rainfall Estimation
This study used the MUKSE to estimate annual rainfall because this method is superior to the conventional interpolation methods [26,27], including the Thiessen polygon method, arithmetic average method, inverse distance or inverse distance square method, isopluvial line method, and Kriging method.The MUKSE is an improved version of the Kriging method and implements an optimal technique to estimate rainfall for small subbasins or localized areas where adequate rainfall data do not exist.The estimation was realized through the following six steps.
Step 1: Rainfall interpolation and extension In order to derive a complete annual rainfall dataset from year 1956 to 2006 for the 36 stations in and adjacent to the Hailar River basin, the univariate and bivariate statistical regression methods were implemented to fill the missing values.Herein, the annual rainfall at station i and year j was designated Z i (t j ) (i = 1, 2, …, 36; j = 1, 2, …, 51).
Step 2: Annual rainfall stationary testing The MUKSE requires that Z i (t j ) be stationary.This study used the Fourier cycle analysis to discern the cycles of the annual rainfall time series and then used a moving average method to test the stationary feature.
Step 3: Spatial drift equations determination The available annual rainfall (m i, i =1, 2, …, 36) data were used to estimate the missing values at the 36 stations.m i was regressed on station location (x i ,y i ) (i = 1, 2, …, 36) by using a trend surface analysis [24] to develop the spatial drift equations of annual rainfall time series.
Step 4: Robust experimental variogram and optimal approximation The spatial drift and experimental variogram can be computed as: where m i is the spatial drift of station i; T is the time series length; R i (t j ) and R k (t j ) are the residuals at stations i and k, respectively; r(i, k) is the experimental variogram between station i and k.The computed experimental variograms for the stations were trimmed in a robust statistical sphere [28,29] to eliminate any influences from the large errors of individual data points.Subsequently, along the directions of 0°, 90°, 45° and -45°, the variograms in different directions were determined with an angle interval of ± 22.5°.
According to trends of the annual rainfall experimental variograms along those four directions, a spherical theoretical variogram model was developed by fitting the variograms using the hydrology system package procedure [24].The model was used to create an "overlapping" variogram that is used in the next step (i.e., Step 5).
Step 5: Theoretical variogram and spatial drift equation The overlapping theoretical variogram model and spatial drift equation were tested as follows: for any year t j at station i, integrating annual rainfall series of the other stations (except for station i) to get equation ( 4), which in turn was used to estimate the annual rainfall Z i * (t j ) at station i.The theoretical variogram and spatial drift equation were considered to be reasonable (i.e., the MUKSE model is valid) [24,29] Step 6: Annual rainfall estimation The subbasins that are monitored by the 11 flow gauging stations were subdivided into 2 km × 2 km quincunx grids.The annual rainfalls for the grids were estimated using equations ( 4) and ( 5), and then were averaged to get annual rainfalls for the subbasins.
where 0 u and l u are Lagrange multipliers; ( , )

Annual Water Surface Evaporation Estimation
The aforementioned six steps for annual rainfall estimation were also implemented to estimate the annual water surface evaporations for the subbasins.Herein, the data at 24 stations were used.

Regional Regression Model Establishment
The flow hydrographs observed at the 11 gauging stations were used to calculate the annual runoffs for the years from 1956 to 2006.The maximum annual usage of surface water was 2271.38 Mm 3 , accounting for only 0.66% of the total annual runoff observed at the Cuogang station.This station is located at the Hailar River mouth.This indicates that the water usage can be neglected when estimating natural runoff.
As stated above, the dependent variable is annual runoff (R), while the independent variables are annual rainfall (P), annual water surface evaporation (E) calculated using the MUKSE, subbasin centroid coordinates (X, Y), subbasin centroid elevation (H), subbasin area (A B ), subbasin wetland area (A W ), and subbasin shape factor (K) that was calculated by the Hailar Hydrographic Bureau.
The regression was done using the PCR technique embedded in the Eviews 7.0 software package.The crucial feature of this technique is to transform the independent variables into unrelated principal components, each of which is a linear combination of the independent variables.R is regressed on the principal components.The regression procedure implemented in this study is as follows: 1) calculate the eigenvalues ( 1 ; 2) calculate the principal components of the independent variables as: ) are functions of the independent variables and independent of each other.Generally, the change of independent variables can be sufficiently described by first m principal components 1 ( ) ( 1,2, , ) For each year from 1956 to 2006, the rational regression model in terms of the independent variables can be expressed as:

Annual Rainfall and Water Surface Evaporation Estimation
The Fourier cycle analysis shows that the annual rainfall and water surface evaporation series have 7-, 18-or 44-year cycles.Using the 44-year cycle in the moving average method, it was found that the series for each gauging station met the stationary requirement of the MUKSE.
The calculated experimental variograms in different directions revealed that the point gropes were comparatively concentrated and had obvious trends, indicating that rainfall and evaporation had anisotropic spatial structures.The optimal simulation indicated that for rainfall, the semimajor is 555 km, the semiminor is 315 km, the direction angle is 58°, and the anisotropy ratio is 1.762, and for evaporation, the semimajor is 580 km, the semiminor is 240 km, the direction angle is 92°, and the anisotropy ratio is 2.417.Further, the overlapping theoretical variogram model and spatial drift equation are judged to be reasonable (Table 2), and the MUKSE can be used to estimate the annual rainfalls and water surface evaporations for the subbasins within the study area (Figures 2 and 3).

Annual Runoff Estimation
The MUKSE and PCR for estimating annual runoff were  judged to be good, as indicated by large R 2 > 0.826 (Figure 4) and the significant F statistics (p-value < 0.05), while somewhat multicollinearity likely exists because the adjoint probability t-test of independent variables in the model were insignificant at a significance level of  = 0.05.The multicollinearity problem was resolved by using the principal components.Further, the model did a very good job in reproducing the observed annual runoffs at the 11 flow gauging stations (Figure 5).Table 3 presents the coefficients of independent variables for the model (i.e., Equation 9).It shows that annual rainfall, basin centroid coordinates, basin centroid elevation and basin shape factor are positively correlated with annual runoff, implying that annual runoff tends to increase with the increase of these independent variables.Because rainfall is the origin of runoff generation, more rainfall will logically generate more runoff.The increase trend of runoff with longitude, latitude and elevation is consistent with that presented by a runoff depth isogram developed by the Hailar Hydrologic Bureau.On the other hand, annual water surface evaporation, subbasin area and subbasin wetland area are negatively correlated with annual runoff, implying that annual runoff tends to decrease with the increase of these independent variables.Evaporation can reduce the portion of rainfall to be converted into runoff, while wetlands likely increase surface storage, lowering the generation of runoff.Hailar River is the whole Hailar River basin; b C S is the coefficient of skewness, and C V is the coefficient of variation.

Subbasin Annual Runoff Estimation
For the subbasins that are not monitored by the 11 stations, the annual runoffs for the years from 1956 to 2006 were estimated using Equation ( 9) with model coefficients presented in Table 3.
Based on the estimated annual runoffs, the means, coefficients of variation ( V C ), and ratios of coefficients of skewness ( S C ) to V C were computed and are presented in Table 4.In terms of the V C and / S V C C , the runoffs for the return periods of interest (i.e., 75, 90, 95, and 97%) were computed by assuming a Person-III distribution and are also shown in Table 4.

Conclusions
This study set up a regional regression model by using the observed data of annual runoff, annual rainfall, annual water surface evaporation as well as other basin characteristics of the Hailar River basin from 1956 to 2006, through the methods of multiple-period universal kriging spatial estimation theory (MUKSE) and principal component regression (PCR) technique.
The testing results indicated that MUKSE was an effective method to estimate annual rainfall and annual water surface evaporation of ungauged subbasins, and PCR can resolve multicollinearity problem with a significant correlation (R 2 > 0.87) between annual runoffs and the subbasin characteristics.Finally, the model was used to predict the amounts of runoff for the return period of interest.These results will add invaluable information to existing literature.

Figure 1 .
Figure 1.Map showing the location and drainage network of the Hailar River basin.
are the variograms between station d x and x  , and station d x are the lth drift basis func- tions of station d x and 0 x , respectively; 0 2 ukt  is the variance of error.

Figure 5 .
Figure 5. Plots showing the model estimated vs. observed annual runoffs at the 11 flow gauging stations.

Table 1 . The characteristics of the subbasins in the Hailar River basin.
a Runoff stations represent subbasins monitored by the gauging stations.
2, …, 51) give: 1) an average error (Me) that can approximate to zero; 2) a variance of error ( 2 e  ) that can approximates to the average kriging variance (S *2 ); 3) an error histogram that can approximately represent a normal distribution; 4) an absolute standard deviation of the error histogram that can approximately represent a normal distribution; and 5) more than 95% of the absolute values of Me are less than 1.96 S *2 , i.e.,