Prediction of Flow Duration Curves for Ungauged Basins with Quasi-Newton Method

Prediction of flow-duration-curves (FDC) is an important task for water resources planning, management and hydraulic energy production. Classification of the basins as carstic and non-carstic may be used to estimate parameters of the FDC with predictive tools for catchments with/without observed stream flow. There is a need for obtaining FDC for ungauged stations for efficient water resource planning. Thus, study proposes a quite new approach, called the EREFDC model, for estimating the parameters of the FDC for which the parameters of the FDC are obtained with quasi-Newton method. Estimation are made for using the bv gauged stations at first than the FDC parameters are estimated for ungauged stations based on drainage area, annual mean precipitation, mean permeability, mean slope, latitude, longitude, and elevation from the mean sea level are used. The EREFDC model consists of various type of linearand nonlinear mathematical equations, is able to predict a wide range of the FDC parameters for gauged and ungauged basins. The method is applied to 72 unimpaired catchments studied are about for 50 years average daily measured stream flow. Results showed that the EREFDC model may be used for estimating. FDC parameters for ungauged hydrological basins in order to find FDC for ungauged stations. Results also showed that the EREFDC model performs better in carstic regions than non-carstic regions. In addition, parameters of FDC for tributaries at the basins with insufficient flow data or without flow data may be determined by using basin characteristics.


Introduction
Efficient use of energy sources is a major problem all over the world, especially renewable energy that is a core prerequisite for sustainable development.Hydroelectric energy is one of the sustainable energy sources that need to be carefully planned for future generations.Moreover, technological developments require gradually increasing energy needs in the future, but, it is usually not equally distributed in place and time in the world.
Modeling a flow duration curve (FDC) is essential for the power plants where the measurement could not be performed and the plants are run-of-the river type.This is one of the main the reason why hydrologists give so much importance to this subject.In addition, prediction of FDCs in ungauged stations are still challenging problem for hydrological community.
One way of efficient planning and use of hydroelectric energy require good measured data for all stream flows around the hydrological basins.This is usually impossible since it requires considerable amount of money and for gauging all the basins.Thus it needs to be method that deals with the parameter estimation of flow-durationcurves (FDC) for gauged and ungauged basins.The FDC is a parametric methods that supplies the necessary information for the various water resource applications [1].The values of daily FDCs present the most valuable information for the regional regime of flow during hydroelectric power station application in a streambed [2].In addition, a stream flow system can be defined by a FDC showing the distribution of flow frequencies obtained from measured flows.If the data are unattainable or limited, plenty of sources should be evaluated.Therefore, for the places where measurements cannot be carried out estimation of FDCs is needed.An experimental FDC can be easily obtained from flow observations by using standard nonparametric processes.The regionalization of a FDC is important when working with basins without gauging stations and shortage of flow data.
The usefulness of FDC is that it is a main input for Hydroelectric Power Plants (HEPP) that is classified into two main groups: 1) The HEPP with stored, regulated, and directly diverted of natural flows; and 2) The HEPP with storage reservoirs for which the flows have random characters in time and they are regulated by means of storing so that reliable and firm energy may be obtained by using this regulated amount of water.In the case of nonstored HEPP's, energy is to be produced in the powerhouse changes as a function of the existing flow value in the river bed if there is no storage area due to the topography.Therefore, this type of HEPP requires realestimation of flow quantity for relevant design and efficient use of stream flow.
Long term hydrologic data are generally not available in many hydrological basins.Annual mean flow values are commonly considered in many hydrologic designstudies.In order to obtain daily flow data at projectedpoint, an index-station of long-term observation values is selected considering similar geographical conditions.If the annual flow data are persistent and representative for the region, the data transfer is assumed to be done properly.It is known that the FDC are synthetic (artificial) curves so that the occurrences of the flows are disturbed by putting the flows to descending or ascending order.FDC is not a cumulative probability curve because the time series of the flows in a stream are not stationery for the intervals less than a year, so that the statistical characteristics change along the year like mean, standard deviation, and coefficient of skewness.Therefore, the exceedence probability of the flow in a certain day depends on the day where it is placed in [3].If a generalized FDC is drawn for each stream basin with observed data, the FDC with a certain errata may be obtained for the basins of nongauged stations.
Fennessey and Vogel [4] developed flow duration curves for the regions without adjustment and gauging station in Massachusetts and they analyzed the new models related with the regional flow duration curves.FDCs they found that have a complex structure requiring probability density functions with frequently three or more parameters.They approximated daily FDCs by utilizing twoparameter lognormal probability density function.Mimikou and Kaemaki [5] regionalized the flow duration curve by using the morpho-climatological properties of the drainage basin.They explained the regional variability of the flow duration curve associated with every parameter with the help of multiple regression techniques by using annual mean regional precipitation, basin area, hypsometric head and stream length.Alkan [6] suggested the dimensionless FDC uses from in Equation (1).
where, then, Q is the flow (m 3 /sec), t is the time series, α and β are the parameters of the FDC.Alkan [6] found that there is a nonlinear dependence between the natural logarithm of the initial value of the exponential model parameters and natural logarithm of coefficient of annual flows.This parametric model has been employed to the stream gauging stations in carstic and non-carstic basins in Turkey.Singh et al.
[7] made modeling of the FDCs for the small water projects without gauging stations and the basins with insufficient measurements in the Himalayas.Dimensionless FDCs were obtained by using normal, lognormal and exponential conversions from basins with gauging stations to the basins without gauging stations.Yu and Yang [8] obtained FDCs for Cho-Shuei Creek in Taiwan and they tested the validity of the FDCs.They determined that polynomial method contains less uncertainty compared to area index method according to the analyses of uncertainty of obtained FDCs.
The studies on the deficiencies of flow measurements are carried out by many researchers in many places in the world such as Greece [5], the USA [4], Italy [9], India [7], Taiwan [10] and Portugal [11].Crocker et al. [11] aimed to obtain a regional model in order to estimate the FDC for basins without measurements in some parts of Portugal.They used cumulative distribution function to combine a model used in estimation of a FDC when flow is not zero and a model used in estimation of the period, in percentage, when there is no stream [11].Cole et al. [12] indicated that the users of flow data need independent qualification indicator in order to use the data safely and they suggested the use of long term FDCs as an indicator.This method lights the way visually for the disorder in flow data and gives the place and the form of the fault.
[13] developed a model to estimate a FDC for the basins without gauging stations.FDCs were obtained experimentally by using a medium value and a distribution coefficient and then, they were made definable as regional FDCs or theoretical regional curves.Development of first degree moments of FDCs along a river system and their local scale like a basin area were analyzed as well as interpolations along the river system were prepared carefully.Daily flow data of Costa Rica were used in the study.Estimation errors are relatively about 30% higher for a period longer than 85%.However, for a period lower than 20% and in the center of a FDC, they become smaller about 10% and 8%, respectively.The differences between experimental and theoretical FDCs are low and better results were obtained in the center parts of a FDC.
Bari and Islam [14] applied a stochastic approach in order to obtain a FDC associated with a one year period and get rid of difficulties of a traditional FDC in which the date order of flows are masked.They investigated the theoretical development of a stochastic FDC and probability distribution suitable to the average daily flow distribution.The model was applied to the chosen four streams of Bangladesh.Small catchment areas are very important for the development of local water resources.As long as the global pressure on water resources increases, the potential of the drainage areas will continue to increase.Generally, the highland catchment areas with important water resources are suitable for the development of small hydroelectric energies.Estimation of a FDC is important for the design of hydraulic structures and related environmental assessment.
Niadas [15] suggested an approach about development of symbolic daily FDCs for small catchment areas by combining regional data with real instant flow data.Annual mean flow values were estimated by using instant flow data of the two regions in representation of the flow regime statistically.
Castellarin et al. [16] showed the relation between the frequency and dimension of the overflow in a FDC.Their study also aimed to estimate the FDC of streams without flow values by evaluating the efficiency and correctness of the data.The study was carried out for a large area in east Italy.In order to evaluate the uncertainty of the regional FDCs, they accepted the jack-knife cross validation method.Results included: a) The evaluation of reliability of regional FDCs for imponderable areas; b) the closeness of reliability data for the best three regional models presented; and c) The empirical FDC's based on limited data samples generally provide a better fit of the long-term FDC's than regional FDC's.
Ming et al. [17] proposed an index model for predicting the FDCs.The proposed index model was defined as nonparametric relationship between each parameter to the predictive tools and a linear combination of predictors.They found that the index model improved the prediction performance for ungauged stations.Similar study was due to Ganora et al. [18], where distance-based model was used to predict FDC for ungauged stations.They found that the distance-based model produced better estimates of the flow duration curves using only few catchment descriptors.Yokoo and Sivapalan [19] proposed an FDC curve reconstruction with climatic and landscape controls.Similar study was carried out by Viola et al. [20] for which the regional FDC was obtained in Sicily.The regional regression estimates were proposed in that study.
In all approaches involving the regionalization of FDCs, the applicability of the estimation methods for the small catchment areas for ungauged stations is quite limited.In addition, use of regression techniques developed so far for the regional estimations may not best represent the basin characteristics.Therefore, accurate estimations for small catchment areas need to be made with proper mathematical equations with commonly obtainable data for the region such as drainage area, mean precipitation rate, etc. Moreover there are many studies on the prediction of FDC curve with linear regression techniques and the statistical methods, but there is limited study on estimation of the FDC with nonlinear equations with regional parameters.One way of estimating the FDC parameters may be use of numerical method such as quasi-Newton method.The most popular quasi-Newton algo-rithm is the BFGS method, named by its discoverers Broyden, Fletcher, Goldfarb, and Shanno.The BFGS method is derived from the Newton's method in optimization, a class of hill-climbing optimization techniques that seeks the stationary point of a function, where the gradient is zero.Newton's method assumes that the function can be locally approximated as a quadratic Taylor expansion in the region around the optimum, and uses the first and second derivatives to find the stationary point.Many nonlinear equations for FDC parameter estimation are solved with the BFGS algorithm using the tools in Excel solver [21].The α and β parameters given in Equation ( 1) of the FDC are subsequently solved with solver tool in Excel by minimizing observed and estimated values of stream flow by using drainage area, annual mean precipitation, mean permeability, mean slope, latitude, longitude, and elevation from the mean sea level.During the estimation, the α and β parameters are obtained by an parametric Equation given in (1) at first for each gauged stations, then by using regional parameters (such as drainage are, mean slope, etc.) as an independent variable, α and β parameters are regionalized with set of linear and nonlinear equations given in Section 2.
The data need for estimating the parameters of FDC curves are obtained from US Geological Survey (USGS).The detailed information about the method [22] carried out in the USA applications in which the data transfer is performed for the imponderable area with the correlation between concurrent flows can be attained from USGS articles and reports [23,24].
The rest of the paper is organized as follows: The next section is about model development.Section 3 is about BFGS algorithm.Section 4 is on data collection and evaluation and finally, conclusions are given in Section 5.

Model Development
Modeling procedure is carried out in two steps: Step I: Obtaining parameters for each gauging stations The parameters of FDC for each of the gauged stations are obtained in Equation (1).In order to obtain the α and β parameters, in Equation ( 1) the average daily flows data are used for each stations that is averaged over 60 years of measured daily flow.Average daily flow for one-year long period are put into an order from maximum to minimum as referenced to a beginning of the January first for that year.Typical FDC curve are given in Figure 1 for station 1, named Pawnee R. at Rozel, in Kansas.As can be seen in Figure 1, the fitted FDC and measured FDC cure are in good agreement with the theoretical FDC.Estimating the parameters of α and β are obtained first for each of the stations, and then the regionalization is made at Step II.

At
Step I, the α and β parameters are calculated for each gauging stations by minimizing Equation (2) as: where SSE is the sum of squared errors between observed stream flows, Q, estimated stream flow Q est , and T is the total number of daily observed stream flow.That is set as T = 365.During solution quasi-Newton method with solver toolbox are used.
Step II.Regionalization By using the α and β parameters obtained.at Step I, the regionalization is carried out at Step II by using the regional parameters as drainage area (DA), annual mean precipitation (AMP), mean permeability (MP), mean slope (MS), latitude (LAT), longitude (LONG), and elevation from the mean sea level (EL).Equations are given in Equations ( 3) and (4).
x 1 = Drainage area (km 2 ); x 2 = Annual mean precipitation (mm); x 3 = Mean permeability (mm/h); x 4 = Mean slope (%); x 5 = Latitude(˚); x 6 = Longitude(˚); x 7 = Elevation (mm).where, ω are the weighting coefficient of the nonlinear equations.It is quite difficult for field engineers to use the FDC directly given in Equations ( 3) and ( 4) since most of them may not have the optimization knowledge; Thus, the α and β parameters are obtained by quasi-Newton method so called BFGS given in Section 3. Before applying Equations ( 3) and ( 4), the hydrological basins are clustered into two groups as carstic and noncarstic.The reason for clustering is a discharge difference between carstic and non-carstic regions in terms of drainage and flow characteristics.
Equations ( 3) and ( 4) are used to solve Equations ( 5) and (6) during solution process, the following objective functions are used: As can be seen in Figure 2, the EREFDC model starts with obtaining the parameters of FDC firs and then by using the regional geographical and hydrological parameters, the parameters of the EREFDC are obtained using the quasi-Newton method as given in Figure 2.

BFGS Algorithm
The most popular quasi-Newton algorithm is the BFGS method, named by its discoverers Broyden, Fletcher, Goldfarb, and Shanno.The BFGS method is derived from the Newton's method in optimization, a class of hillclimbing optimization techniques that seeks the stationary point of a function, where the gradient is zero.Newton's method assumes that the function can be locally approximated as a quadratic Taylor expansion in the region around the optimum, and uses the first and second derivatives to find the stationary point.Detailed discussion of BFGS method can be found in some numerical optimization textbooks, see the references [25,26].The BFGS algorithm can be summarized as follows [26,27]: Step 1: Estimate an initial design vector Choose a symmetric positive definite matrix as an estimate for the Hessian of the cost function.In the absence of more information, let and compute the gradient vector as . Where, k is iteration index and g is the cost function of the design vector.
Step 2: Calculate the norm of the gradient vector as then stop the iterative process; otherwise continue.
Step 3: Solve the linear system of equations to obtain the search direction.Where, d is search direction vector.
Step 4: Compute optimum step size k    to minimize Step 5: Update the design as .Step 6: Update the Hessian approximation for the cost function as

Flow Data
The st flow namely uncontrolled flow.Stations and their corresponding data are given in Table 1.As can be seen in Table 1 drainage area varies between 10 -3100 km 2 , annual precipitation varies between 100 -1000 mm, average basin permeability varies between 10 and 140 mm/h, the slope of the basin varies between 0.8% -6.0% and the elevation value varies between 200 -800 m.
Each of station includes a data from an average daily stream flow that has a length of 366 data in a year.One example of the data are given in Appendix for station number 6814000 Turkey C. 72 station are taken into account during the EREFDC model developments since there are no homogenous data on other station in the basin in Kansas city, USA.Data are classified as carstic and non-carstic group by putting them into an order according to the minimum and maximum station number.80% of the carstic and non-carstic data are used for ERECFDC model development and 20% of them are used for EREFDC model testing.
Carstic map are given in Figure 3.In order to find Figure 3, each gauged stations are extracted according to their coordinates, then those coordinates matched with carst maps taken from http//:pubs.usgs.gov/of/2004/1352/.After obtaining carstic maps, stations are grouped according to carstic and non-carstic region.

The EREFDC Application and Regionalization
The EREFDC model is applied to 21 carstic and 37 noncarstic uncontrolled measured flows for estimating the parameters of the EREFDC models.The 5 of the carstic and 9 of the non-carstic stations are used for testing the EREFDC.Considering carstic stations, predicted EREFDC model parameters for α and β are given in Equations ( 7) and (8), respectively.Similarly, EREFDC model parameters for α and β considering non-carstic stations, are given in Equations ( 9) and (10), respectively.

The EREFDC Testing
In order to test the EREFDC model, 20% of data in Tables 2(a) and (b) are randomly selected for which any of the selected data are not used for during the model devel-   Table 4 shows the average relative errors between observed and predicted stream flows obtained by the EREFDC model.The errors given in 3rd column is estimated with an av eraged over a year, then the average relative errors for testing stations are obtained as given in Table 4.As can be seen in Table 4, The average relative e between 11% to 88%, but only three of bered 7,140,850, 7,157,500 and 7,183,50 nce the data may pr troduction of hydraulic structure.Table 4 also shows that the relative error for carstic regions is quite better than non-carstic regions.

Conclusions
pose two-step procedure is proposed.At Step I, the FDC parameters are obtained for each gauged station by grouping the stations as carstic and non-carstic.The FDC parameters are obtained with Excel solver toolbox.
Step 1 by using the data at this, regionalization is made with geographical, physical and hydrological data given in Table 1.For this aim, the EREFDC regional model is with BFGS algorithm.The following results may be drawn from this study: 1) Prediction of FDC at ungauged hydrological basins may be estimated with the proposed EREFDC model by errors of 27% to 37% for carstic and non-carstic hydrological basins using the mathematical optimization technique called BGFC algorithm.
2) Two-step approach may be useful to obtain the FDC parameters in order to regionalize the FDC model in a 3) The EREFDC model is applied to 72 unimpaired catchments in USGS in USA for 60 years average daily measured stream-flow.Results showed that parameters of FDC for tributaries at the upper basins with insufficient flow data or without flow data may be determined by using basin characteristics for studied area.
4) Results showed that the EREFDC model provided about 37% average relative error for non-carstic and 27% for carstic basins.Thus, it could be possible to say tha nce in carstic model for estimating parameters of FDC This study deals with the prediction of flow-durationcurves for ungauged hydrological basins.For this pur-regions than non-carstic regions.5) This study focuses on the development of regional mathematical proposed that is quadratic type that is solved carstic and non-carstic basins.t the EREFDC provides quite better performa curves for carstic and non-carstic regions.The average relative errors may be considered as a quite high for noncarstic regions.Future studies should be improvement on the prediction performance of the ERFDC model for uncontrolled steam flows for various data in the world.

Figure 1 .
Figure 1.Typical FDC curve of Pawnee R. at Rozel, in Kansas.
is the total number of gauged stations for each carstic and non-carstic groups, α and β are the FDC parameters obtained from Step I, α pre and β pre are the predicted values.Flowchart of the proposed Estimation of REgionalized Flow Duration Curve (EREFDC) is given in Figure 2.
the correction matrices   D nd   E re given as k k
Data generation is carried out at Step I in the following way; Fitted FDC parameters defined at Step I are obtained by solver toolbox and the values are given in Ta- bles 2(a) and (b).As can be seen in Tables 2(a) and (b), coefficient of determination R 2 varies 44% to 98%.The reason for low R 2 at stations 714275 and 7154500 may be the measurement error or nonhomogeneity for the stream flow data Tables 2(a) and (b) show the non-carstic and carstic data among the 72 gauged stations and 46 of them are non-carstic group and 26 of them are carstic.

ures 4 (
a)-(i) for non-car by excluding 10% and 90% of flow exceedence percentile.Figures 5(a)-(e) show the observed and estimated FDC by the EREFDC model for carstic testing stations by excluding 10% flow exceedence percentile.

rresponding physical and geographical.
change in design); and go to Step 2.

Table 3 (
a) is for randomly selected noncarstic stations and Table 3(b) is for carsti