Average Rainfall Estimation : Methods Performance Comparison in the Brazilian Semi-Arid

Considering the rainfall’s importance in hydrological modeling, the objective of this study was the performance comparison, in convergence terms, of techniques often used to estimate the average rainfall over an area: Thiessen Polygon (TP) Method; Reciprocal Distance Squared (RDS) Method; Kriging Method (KM) and Multiquadric Equations (ME) Method. The comparison was done indirectly, using GORE and BALANCE index to assess the convergence results from each method by increasing the rain gauges density in a region, through six scenarios. The Coremas/Mãe D’Água Watershed employed as study area, with an area of 8385 km, is situated on Brazilian semi-arid. The results showed the TP, as RDS and ME techniques to be employed successfully to obtain the average rainfall over an area, highlighting the MEM. On the other hand, KM, using two variograms models, had an unstable behavior, pointing the prior study of data and variogram’s choice as a need to practical applying.


Introduction
The average rainfall over an area may be considered as the main input on watershed modeling process, especially of those which deal with surface runoff, partly because, in general, the rain is the only climatic variable that can explain fast increasing flow [1][2][3].Still, several studies show that spatial variability of rainfall over the basin and their distribution pattern, as well as its interaction with the basin, have considerable effect on runoff response generated [4][5][6][7].
In this sense, despite the development of radar technology has been observed, due to the limitations and characteristics of theses specialized measurements, point measurements made by rain gauge are still required for better modeling [1,[8][9][10].Besides, the point measures of rain are, in many places, the only available time-series source with enough spatial density for hydrological studies, so to runoff modeling, as well as to water resources planning.Thus, the study of techniques to estimate the average rainfall, using point measures, and the distribution pattern analysis, remains indispensable.
Besides the mean rainfall, RDS, KM and ME also estimate a continuous surface that adjusts itself to the known rainfall values, being useful on point rainfall values determination within the basin.This particular characteristic is very useful on rainfall spatial distribution evaluation [19] and provides specialized data for robust models applied in other hydrologic process.
However, the real value of average rainfall and its distribution is still an unknown variable [1].Thus, the direct

Methods and Data
It is not possible to compare values from any method with the real one, because it is unknown.It is possible to evaluate the variability (or convergence) of some model caused for a change in availability data.Thus, even that methods remain incomparable directly, find which of them would need less spatial data to achieve results supposedly more reliable (obtained in a better data scenario), it is feasible, and so comparing them indirectly, following this line: in a favorable situation which the rain gauge density was great enough.Probably all techniques would give great results, very close to real values of mean rainfall; so, it is reasonable to believe that methods with a good convergence behavior, at first, are more reliable in a less favorable situation.[4] suggested two indices, GORE and BALANCE, to compare average rainfall values given by different data subset.On the other hand, several researches have dealt with rainfall analyses

PEN ACCESS JWARP
from the rainfall-runoff modeling perspective [1,6,7,24,25,27].In some cases, it was found that reducing the number of rain gauges used could improve the models accuracy, inclusive.However, in the water resources management case for great scales of time and space, it's thought that a greater dataset will reflect in a better mean rainfall estimate and its behavior which, in some cases, is the objective of the specific study.Thus, the previously cited methods were compared with the two indices, using five scenarios with a different spatial data density, comparing their results with those obtained from a sixth scenario with the major density available, the reference set.
For this task, 56 rain gauge stations were selected within the study area with daily records from ten consecutive years, from 1965 to 1974, being admitting just one missing year.These registers were obtained from ANA database.The gap filling was done just with the simple mean and after this the annual series for each gauge were constructed.
The Watershed of Coremas Mãe D'Água Dam was chosen as study area for this work (Figure 1).Its outfall coordinates are S 06.99˚ and W 37.96˚, resulting in a 8385 km² drainage area with 528 km of perimeter.The data search region has been defined as a rectangle, north south oriented, with an offset of 0.5˚ from basin's limits.In this study were used the registers found for 56 stations of Paraiba and Pernambuco states.Figure 1 also shows the distribution of all stations used in each scenario.
From 56 selected stations, six scenarios were made, increasing the number of gauges until the maximum, as showed in Figure 1.The organization of them just prioritized a homogenous spatial distribution, without any preliminary evaluation, trying to ensure a random character, with no benefits to any method.Scenario 6 was taken as the reference one, because it has more spatial data available.Thus, for all calculating done, the spatial discretization taken was 0.05˚ (decimal degrees)

The Gore Method
The GORE index [4] was adapted as follows: let P i be the real average rainfall in a given interval of time (here considering being equal to results obtained in Scenario 6, because it's unknown in practice) and let P i E be the estimated value of rainfall for the same interval i in a given scenario, so: n is the time intervals number and P is the mean of P i values for all time considered. 1

Thiessen Polygon Methods-TP
Developed by [28], TP is a simple method created to obtain the average rainfall in great areas.It's frequently used [11,19] and its formulation consists on determining a weighted average with rainfall amount of each station, in which weights are determined according with the influence area of each station.TP is a good technique when there is a reasonable dense gauge network, otherwise mistakes may be considerable.However, according to [29], care should be taken regarding the type of precipitation being analyzed, since convective rainfall presents high temporal variability, so the measurement intervals must be compatible.Thus, [19] shows some variants of TP. [30] proposed the RDS as a tool to determine mean rainfall over a given area.This method assumes that any punctual rainfall into a given area can be estimated from the observed values, being inversely proportional to its distance to measures points.RDS may be considered as one of many interpolation techniques based on a weighting as a distance function.It's often used in a large range of studies related with rainfall [16], being cited by [13][14][15] and others.

Multiquadric Equations Method-ME
The application of quadric surfaces for points data interpolation was initially developed by [20] for application in geophysical sciences.After, [21] employed it to adjust rainfall surfaces, pointing ME as a good alternative tool.It's assumed that the real rainfall surface can be found by overlapping others individual quadric surfaces, each one starting on a known point.These surfaces may be parabolic or hyperbolic, whose adjust is smoother and, specially for conics, a more simple implementation [22], which is the formulation adopted in this study.[23] established a formal equivalence between ME and KM.[24], comparing both, chose by the use of ME for more practical with similar results.Still, [16] showed how to reduce bias of ME.

Kriging Method-KM
Based on regionalized variables concept, developed by [17], the KM consists of a set of techniques to estimate surfaces by modeling the spatial correlation structure of the variables in question.KM assumes there is a pheno-mena pattern, at a large scale, a local pattern and some local randomness [31].Still, the technique has been seeing as the best linear estimator because does not present bias [18].The determining the weight of each observation is done by an adjustment of a variogram model.The determination of the weight of each observed data was obtained by fitting a variogram model.It starts with the existing data and its position to calculate the correlations among them, then an adjustment is made upon the results obtained.Kriging formulation also allows verifying the statistical errors made [31].However, the method presents the variogram choice problem [19].For this work, two variograms models were tested: the KM with a Gaussian variogram (KG) and the KM with a cubic spline variogram (KCS).

Results and Discussion
The average rainfall on the watershed for each year and scenario (1 to 6), estimated by each method tested, is given in Table 1.On the other hand, Table 2 illustrates the results obtained for GORE and BALANCE indices  by each method, comparing Scenarios 1 to 5 with the reference 6.From Table 1, is possible to notice that reducing spatial data resulted in a general underestimation of annual mean rainfall.However, all techniques were exposed to the same data conditions and so, it is believed that they can be analyzed directly.It can observe that all methods, at Scenario 1, gave not so good results, with negative values.In general, indices had a trend of improvement when the number of rain gauges was increased, but it should be noted the irregular behavior presented by both KG and KCS.Despite the KCS having shown the best results in Scenario 1, it just returned to give good results in Scenario 4, but not so good as TP, RDS nor ME.On the other hand, KG showed good results for Scenario 2, but with the worse estimated results for Scenarios 4 and 5, demonstrating a significant instability.Thus, differently of other techniques, the KM was the only method that not presented the improvement behavior in results expected when increasing the data availability.So, results demonstrates that applying of geostatistical techniques on rainfall data needs preliminary studies of data employed and variogram model applied.
It can also verify that TP got an excellent performance from Scenario 2 onwards, for both indices.At this point, it should be noted that care should be taken with this technique.TP ponders the rainfall measures based on the area of influence of each station within the basin, which implies that a good homogeneity combined with an enough density may result reliable values for average rainfall, as seen here.However, in tiny scales of time or bad spatial distribution of data, the rainfall variability, or even the existence of error in the records, can contribute greatly to a discrepant with reality because of method formulation.Moreover, the use of the TP is not suitable for the estimation of rain in a certain region or point in the basin.
ME and RDS also presented great results, very similar, with some emphasis for ME at Scenarios 2, 4 and 5.Both showed the expected behavior improves, as TP did, when increasing the stations numbers.The ME has the advantage, given its theoretical base, that weights estimation, for all stations, are determined simultaneously, which allows possible registry errors may be diluted, resulting in a rainfall surface more reliable than that given by RDS.In other words, ME assesses the spatial structure of the events.As to results, in specific for BALANCE index, RDS obtained better results just at scenarios 1 and 3, pointing out that in the first one all methods were flawed in determining the average rainfall, underestimating its value significantly.

Conclusions
The direct comparison among techniques of average rainfall estimation is not possible because the real values are generally unknown.However, this research brought another approach in order to compare indirectly some methods, using not just the results obtained, sometimes very similar, but analyzing the data requiring each one to reach better results given in the best spatial data scenario, in other words, comparing their convergence behavior using GORE and BALANCE indices.The expected behavior from each technique was the continuous improvement in estimated results when increasing spatial data density.Thus, it is reasonable to infer that some methods are appropriate, with a good performance, when their behavior shows that, even with a small amount of data in space, their results approach those that would be given in better conditions.
The results for GORE and BALANCE indices, indicate that TP, RDS and ME as methods are to be applied with satisfaction to obtain the average rainfall value over an area.On the other hand, KM, tested with two variograms models, had a not expected unstable behavior.
Reflecting the need of preliminary studies about data and variograms to be applied, it means a disadvantage given by an increased complexity, especially from the point of view of procedures automation and management tools.
Returning to the methods with a good performance, emphasis must be given to ME, by the great results obtained and its formulation, which helps to mitigate casual data errors allowing estimating a continuous rainfall surface.Still, the study was done in a semi-arid region of Brazil, where pluvial behavior presents a high variability, even in larger time scales, reinforcing the results reached.
It is suggested that more studies be done in this way of indirect comparison of techniques, despite the technological advances, there are many regions where the rainfall monitoring is still scarce and there is a need for reliable water resources planning tools.In the same vein presented here, larger scenarios combinations may be done, using different watersheds and spatial and temporal discretization, so that the methods could be evaluated under various conditions.

Figure 1 .
Figure 1.Scenarios utilized to compare the methods.