Effects of Finer Scale Soil Survey and Land-Use Classification on SWAT Hydrological Modelling Accuracy in Data-Poor Study Areas ()
1. Introduction
Many countries have serious problems managing and modelling their water resources. Droughts, flooding, and water scarcity are widespread in cities and rural areas alike. Water managers often rely on hydrological modelling to help manage water resources, predict scarcity, identify the causes of scarcity problems and improve decision-making. Moreover, models can help assess the impact of incentive-based programs on protecting and maintaining water resources, such as payments for hydrological environmental services (PES), which have expanded rapidly around the world [1].
Researchers have demonstrated that modelling rainfall-runoff processes are greatly influenced by the scale and the quality of the spatial data available [2] - [7]. The use of data from small-scale sources has produced results with poor precision in temporal and spatial models [8] [9] [10]. Nonetheless, other studies have shown that varying the scale of the soil data has a limited effect on runoff predictions [11] [12] [13].
In developing countries, however, hydrological models suffer from a series of drawbacks, including the absence of data at an appropriate scale. Measuring spatial processes at the incorrect scale may produce errors, inefficiencies and low accuracy predictions [14]. The production of high-quality and large-scale maps is too costly and not feasible for public or private institutions. This is the case of Mexico, a country where soil and land use/land cover (LULC) maps covering the entire country have been published at a scale of 1:1,000,000 and 1:250,000 by Mexico’s National Institute of Statistics and Geography (INEGI). The exception is the topographic maps, which can be obtained at the scale of 1:50,000. Soil maps at the scale of 1:250,000 provide a low spatial data density on soil properties since there is approximately one soil sampling site every 230 km2. Therefore, if a small basin is studied, there will probably be no edaphological information from samples obtained on the site. As a result, researchers must parameterise the models with data obtained from soil samples collected in fields. The combined effect of the data gaps and estimates can reduce the model fit [9] [15].
The Soil and Water Assessment Tool (SWAT) model can comprehensively consider soil, land use/land cover, slope, meteorology, reservoir and runoff information to simulate different hydrological cycle processes. Most scholars that have used the SWAT hydrological model in Mexico have pulled detailed information from others studies and the INEGI maps. References [16] [17] [18] [19] used the SWAT model to analyse sediment movement, in [20] assessed land-use and land cover (LULC) change, in [21] on climate change, in [22] [23] [24] on multi-gauge calibration and characterising, in [25] on potential evapotranspiration, and [26] on pollution of rivers obtained accuracies that varied from satisfactory to very good. Only [27] achieved poor to satisfactory accuracies due to the remoteness of the gauging station used to calibrate (located at more than 30 km from their studied basins) and by the absence of observed data to calibrate. Based on these antecedents, we think that these authors have assumed that using small scale data is not suitable to perform hydrological modelling with SWAT. Hence, data at a finer scale must be obtained.
The studies above considered large basins with gauging stations or at least stations very nearby. However, this condition is difficult to meet in many sites around the world because the density of gauging stations is very low. In this context, it seems pertinent to ask if satisfactory results can be obtained in smaller basins. This kind of assessment is needed in Mexico, where payments for hydrological ecosystem services (PES) have widely been applied in the whole country since the year 2003 [28]. Payments are given to communities or individual property owners for conserving forest land in areas as small as ~2 km2. There is strong interest in understanding the impact of these programs in protecting forests, biodiversity and enhancing water services [29]. In Mexico, studies of water services are mainly focused on urban basins [30] [31] [32] with areas between 24 km2 and 33 km2 in Mexico City, where they used large scale data to determine the provision and quality of water services, and the identification of the benefits to the population as well as its alternatives for sustainable water use. None of them focused on the evaluation of PES. Until now, PES effects have been mainly assessed regarding the change in forest cover area [33], while studies focused on assessing water services associated with forest conservation in Mexico are scarce [34] [35].
The objective of this study is to quantify the improved accuracy of streamflow estimates when using data from maps at a larger scale in small basins that are more suitable for assessing the impact of PES programs. We define small basins as those with a surface area under 1,000 km2. A second objective was to evaluate to what extent the publicly available cartography in Mexico can be used in hydrological modelling when assessing PES programs. The study was conducted in a basin of 256 km2 located in “La Frailescana” Natural Protected Area (NPA) of Chiapas, Mexico. Soil and land use cover maps at a larger scale were produced from field work and satellite image interpretation. We chose the SWAT model, because it allows for simulating the effects of land management, changes in land use, and climatic factors in hydrological processes of basins [36]. It is essential to calibrate the model and evaluate its performance.
2. Site Description
We examined the basin of the Blanco River in the Sierra Madre mountain range of Chiapas (SMC), Mexico. The basin is located between 15˚51'14" and 16˚01'51" north latitude and 93˚07'43" and 93˚18'21" west longitude, within the “La Frailescana” Protected Area (Figure 1).
The climate of the basin is warm-humid, with median annual precipitation of 1395 mm. The average annual minimum and maximum temperatures are 14˚C and 36˚C respectively, with an average annual daily temperature of 23.7˚C [37]. Rainfall events are generally intense from July to October, and the dry season runs from December to April. As a result, runoff is highly seasonal, with most of the rainy season producing streamflow from May to November [38].
The Blanco River is approximately 20 km in length with an estimated 256 km2 contributing source area. The river originates in the highest part of the Sierra
Figure 1. Location of the Blanco River basin in the “La Frailescana” protected area. Slope, sub-basin division and location of the weather and gauging station.
Madre at an elevation of 2483 m. Just over half of the surface area of the basin (54%) has slopes greater than 45%. Geological traits consist of metagranite-metagranodiorite, metamorphic complex, siltstone-sandstone and sandy limestone. Metagranite-metagranodiorite and metamorphic formations are mainly found in the valley and slightly dissected mountain [39]. Litosol, Phaeozem and Regosol are soil units mostly found in the slightly dissected mountain and moderately dissected mountain [40]. Due to its geological and soil characteristics, the basin is permeable and the water drains through streams that descend to the slightly dissected mountain, where the infiltration of a large part of the water takes place [41].
Land cover mainly consists of oak forests, cloud forest, pine-oak forest, pine forest, evergreen tropical forest, and secondary tree and shrub vegetation, which together comprise 66% of the basin. Pasture and crop agriculture occupy almost 34% of the basin area, and 0.2% consists of human settlements. Forest cover was lost at an annual rate of 0.9% from 1985 to 2005, although the basin has begun to show signs of forest recovery since 2007. The basin concentrates the PES program of “La Frailescana”, which covers 16% of the basin area. Between the years 2005 and 2012, eight out of the 35 communities in the basin were granted PES funds with an average conservation area of 6.8 km2 per community.
3. Material and Methods
3.1. Description of Model SWAT
SWAT is a physically-based, distributed-parameter model developed to simulate the effects of water and land management practices on basin hydrology and water quality. SWAT is widely used in assessing soil erosion prevention and control, non-point source pollution control and regional management in basins [42]. It is a basin-scale, deterministic model that operates at a daily time step and on continuous-time simulations using topography, soil properties, LULC information, and climate data as main inputs. SWAT divides a basin into sub-basins, which in turn are subdivided into hydrological response units (HRUs), unique combinations of LULC, soil type, and slope class.
The hydrological components simulated in SWAT include evapotranspiration, surface runoff, percolation and lateral runoff (subsurface runoff). To simulate evapotranspiration, the model separately estimates the evaporation of water from the soil and plant transpiration [43]. The Penman-Monteith method was used to estimate potential evapotranspiration in this study [44]. Surface runoff from daily precipitation was calculated using a modification of the Soil Conservation Service (SCS) curve number (CN), which varies according to the soil water content. Percolation was approximated using a storage routing technique combined with a crack-flow model to predict flow through each soil layer [45]. The hydrological components were simulated for each HRU and then aggregated to the basin.
3.2. Model Set-up
The SWAT model version 2012 was set up and parameterised using the Arc-GIS (version 10.2) interface for SWAT (ArcSWAT). The topography variables were obtained from the Digital Elevation Model 3.0 (DEM) obtained from the Mexican Geographical Survey [45], with a resolution of 15 × 15 m. We used a map of rivers from the Basin Water Flow Simulator (INEGI, 2014) to complement the DEM information to delimit the basin and sub-basins in the ArcSWAT. The size of 25 sub-basins was adjusted from 3.6 to 17 km2.
Following the [40] criteria for preparation of soil maps, the slope was divided into four classes: 0% - 15%, 15% - 30%, 30% - 45%, and >45%. Slopes lower than 15% lead to valleys and inter-rill erosion, slopes between 15% and 45% lead to a slightly dissected mountain and when the slope is higher than 45%, the moderately dissected mountain and processes of stream erosion start [40].
We used weather data from The National Center for Environmental Prediction’s Climate Forecast System Reanalysis (CFSR) for the period 1990-2014. The data of the nearest weather station was not used because the station is located 18 km outside the basin and has many gaps. Most of the missing information is in the period 1992-2009, where there were profound changes in land use in the Sierra Madre. The CFSR weather was obtained for a bounding box of latitude 16.18˚, 15.67˚ and longitude −92.81˚, −93.47˚ from the Texas A&M University spatial sciences website [46]. It includes daily temperature maximum and minimum, precipitation, wind speed, relative humidity and solar radiation. The CFRS weather is produced using cutting-edge data-assimilation techniques as well as highly advanced atmospheric, oceanic, and surface-modelling components at 38 km of resolution [3] [47]. The production of CFRS data involves various spatial and temporal interpolation techniques using conventional weather data and satellite products.
3.3. Land Cover and Land Use Mapping and Parameterisation
We used two LULC maps: 1) a Land Use and Vegetation Series II map, key E15-11, at a scale of 1:250,000 from [48], which represents the smaller scale in this study, and 2) a map produced through supervised classification from a Landsat OLI/TIRS image acquired on January 2015. The scale of 1:100,000 is the upper limit of this type of satellite images [49], so this map represents the large-scale.
Due to time and financial constraints, both maps were parametrised by assigning pre-established SWAT classes to the land cover and land use groups [43]. We used generic classes of land cover included in the SWAT database because some classes were not found in the SWAT crops database. Adjustments were made only in maximum leaf area index (BLAI) and maximum canopy height (CHTMX) parameters for forest categories [50] (Table 1). Cloud forest and secondary pine-oak tree vegetation were included in the “mixed forest” category. Pine forest
Table 1. Reclassification of land use/land cover types in SWAT categories.
aBLAI means maximum leaf area index, CHTMX means maximum canopy height.
and high evergreen forest were reclassified as “evergreen forest”. The secondary shrub vegetation class of the pine-oak forest together with the pine-oak forest as a whole were assigned SWAT’s “deciduous forest” class considering that the species of the Quercus genus found in the area have a deciduous behaviour. We selected SWAT’s maize class for agriculture as maize is the principal crop in the study area. The main differences between both maps are the number of classes, the fragmentation of each class, which is much higher for the map at 1:100,000 scale, and the total area of each class (Figure 2).
Figure 2. (a) Land use/land cover maps; (b) Soil maps used in SWAT (meaning of abbreviations in Table 2 and Table 3).
Table 2. Landscape units in the Blanco River basin.
Table 3. Basic properties of soil classes in the upper soil horizon (see Table 2 for the map description of landscape units).
aI = Litosols, Hh = Phaeozem, Re = Regosol, Bc = Cambisol, Je = Fluvisol, Lc = Luvisol.
3.4. Soil Mapping and Parameterisation
We also used two soil maps. The first was the Soil Series I map, produced by [51] at the scale of 1:250,000. This map uses the FAO/UNESCO soil taxonomy, and its cartographic units are composed commonly of two or three classes where the first listed is the dominant soil. Five soil class associations occur in the studied watershed. The second map was based on landscape units that incorporate soil information obtained from soil sampling.
To parametrise the first map, we obtained the data needed by SWAT from the table annexed to the chart that lists the soil properties of the sites where a soil sample was collected and analysed. We chose the 13 closest sites with the same soil classes that are dominant in our basin. None of sites were within the basin. To parametrise the second map, 72 soils samples were collected from 30 sites, whose location was based on field accessibility, type of land use, and type of vegetation. The trial pit technique [52] was applied to characterise the soil profiles at each site.
Samples were analysed in the laboratory following the Mexican Official Standards (NOM-021-RECNAT-2000) [53] to obtain percentages of sand, silt, and clay, texture class, and percentage of organic matter. Other soil properties-saturated hydraulic conductivity, available water capacity, and soil bulk density were calculated using the SPAW software, version 6.02 [54]. To determine the hydrological group, we used the NumCur software. Soil erodibility was calculated using the Modified Universal Soil Loss Equation (MUSLE) [55]; while organic carbon content was calculated using the following equation [2] [56]:
(1)
Thirty soil sampling sites in an area of 256 km2 is equal to one site every 8.5 km2, a sampling density similar to a 1:50,000 soil map produced by INEGI. Hence, the second map represents the larger scale in our comparison. The input soil parameters entered to SWAT were obtained by averaging the values of the sites corresponding to the same class. For the second map, sites were grouped into nine landscape units constructed under the criteria proposed by [52], who point out that the landscape unit combines relief, geology, and climate, as the position occupied by soil in the landscape determines the state of the water table and, therefore, soil drainage conditions and associated morphological traits. Landscape units were defined by combining the climate map by [57] at the scale of 1:250,000, the geomorphological map of the basin drawn up based on the methodology of [58] with a scale of 1:100,000, and the geological map of the Mexican Geological System [39] at the scale of 1:250,000 (Table 2, Figure 2).
The parameters of basic soil properties in the two maps are presented in Table 3. Whereas there are no clear differences in organic matter content and horizon depth between maps, soils in the larger scale map have a higher sand percentage.
3.5. Landscape Representation Using SWAT
The subdivision of the hydrological response units (HRUs) was performed with slope class, soil type and LULC maps. HRUs were defined as multiple (no thresholds) because the use of maps at different scales leads to a loss of information by defining dominant HRUs [59]. When multiple HRUs were defined, the model used all information from the maps, and a higher accuracy was obtained. In contrast, when dominant HRUs were delineated, LULC classes of large-scale maps had 51% of their area changed while small-scale maps had 8% of their area modified (Figure 3). Similar conclusions may be reached by comparing the two soil maps.
The model processes are controlled by many parameters, which are presented in [60] and [61]. In this study, the focus is on the model components and parameters affected by the soil and LULC mapping approaches. A visual comparison of the small and large-scale maps was performed, and the spatial distribution of soil classes was related to their parameterisation. Finally, the combination of the two LULC maps with the two soil maps resulted in four different versions of the model (Table 4).
3.6. Streamflow Data
Considering the availability of observed streamflow data for calibration and
Figure 3. Variation in the spatial distribution of multiple and dominant HRUs by model.
Table 4. Versions of the SWAT model based on the combination of LULC and soil data.
validation, monthly calibration was performed from 1 January 1993 to 31 December 1998, and validation from 1 January 1999 to 31 December 2001. A 25-year period (1 January 1990 to 31 December 2014) was selected for simulation. The first three years were used to allow parameters to reach equilibrium (warm-up period), and the remaining period was used for calibration and validation. Streamflow data is provided on a monthly basis and was obtained from the National Data Bank of Surface Water (by its initials in Spanish, BANDAS) [62]. The gauging station near the basin was “El Brillante”, located 12.4 km from the reservoir of the basin studied. Stream flows into “El Brillante” station from a surface area of 714 km2. According to [63] [64], streamflow data from BANDAS in Chiapas are homogeneous, and therefore data from streamflow observed was adjusted proportionally to the smaller size of the study basin.
3.7. Parameters for Calibration and Uncertainty Analysis
Sensitivity, calibration and uncertainty analysis were performed using the Sequential Uncertainty Fitting, version 2 (SUFI-2) procedure of the software SWAT-CUP (SWAT Calibration and Uncertainty Programs) version 5.1.4 [65]. The sensitivity analysis was performed with the uncertainties of the parameter ranges, which are sampled through the Latin Hypercube-One factor At a Time (LH-OAT) method [66]. This approach has the advantage of producing a ranked list of sensitive parameters. We selected a list of 10 parameters related to streamflow for the sensitivity analysis. Subsequently, we removed the parameter that was found to be insensitive for the model output. Only the sensitive parameters were used for calibration. The choice of parameters, as well as their ranges, was based on studies in Mexico [18] [21] [30] [67] [68]. Parameters that govern the surface runoff, evapotranspiration, groundwater and soil were used (Table 5).
The calibration parameter ranges were based on sensitivity analysis using the SUFI-2 algorithm. SUFI-2 provides a platform to conduct calibration and validation, as well as uncertainty analysis. The know uncertainties are calibrated to bracket most of the observed data in the prediction uncertainty for a confidence level of 95% (95PPU). This algorithm was used to estimate the most observed variables within the 95 PPU band, which is quantified with an accumulated distribution of 2.5% and 97.5%. SUFI-2 considers the p-factor and r-factor to evaluate the performance of the calibration. P-factor is the percentage of measured data bracketed by the 95PPU band, and varies between 0 for useless simulation and 1 for perfect simulation. R-factor is calculated through the ratio of the average width of the 95PPU band (Prediction Uncertainty) and the standard deviation of the observed variable. The r-factor represents the range of the uncertainty interval and should be as small as possible.
3.8. Objective Function
The performance of the model was evaluated with the statistical Nash-Sutcliffe efficiency index (NSE) Equation (2). NSE measures whether an observed value
Table 5. Parameters selected for SWAT model calibration.
is better estimated by the model result or by the average of observed values. According to [69], for monthly discharge, the NSE ranges from ∞ to 1 and measures how well the plot of observed versus simulated data fits the 1:1 line.
(2)
where Qobs and Qsim are the observed and simulated streamflow at the i = 1 time step, respectively; Qmean is the average of the observed streamflow, and n is the total number of observations.
3.9. Model Evaluation
The procedures and criteria used for calibration and validation processes are:
1) We analysed the performance of the model graphically using the hydrograph and representing the over and underestimation, both general and seasonal, by plotting the observed and simulated data for the calibration period.
2) The objective function selected in the calibration process was NSE. However, we analysed model accuracy using R2, NSE (Equation (2)) and the percent bias (PBIAS) (Equation (3)), since they are the most widely used statistics for hydrological calibration and validation [70]. The R2 ranges from −1 to 1, and describes the proportion of the variance in measured data explained by the model [70]. The optimal value of PBIAS is 0, positive values indicate model underestimation bias, and negative values indicate model overestimation bias [71].
(3)
The variables in Equation (3) have similar meanings to those in Equation (2).
3) Due to memory limitations, the model simulations were run 150 times each, with a maximum of three iterations. The number of simulations and the iterations were based on the performance of the initial simulation.
4) Best simulation with a lower discrepancy between observed and simulated signals according to the objective functions was obtained.
5) Validation was performed afterwards following the SUFI-2 procedure with three years of observed data.
4. Results and Discussion
4.1. Evaluation of the Hydrological Model
Graphical results (Figure 4) indicated adequate calibration and validation over the range of observed streamflow, although the calibration results showed a better match than the validation results. The model with large-scale data (model 4) showed a better match with the observed streamflow than model 1, which used data from smaller-scale maps and models 2 and 3 that combined large and small scales. Also, only model 4 showed a better adjustment in the validation period than in the calibration one.
The volume of streamflow simulated in model 4 was more precise than model 1. By using information from large-scale maps (model 4), simulated streamflow was greater, and a better adjustment was obtained to the observed data (Figure 4). Upon calibrating simulated streamflow in model 1, an overestimation was observed during the dry seasons.
Model 2 adjusted better to the values observed during the rainy seasons, but the bias of simulated streamflow is greater than that obtained from the other models during the dry seasons. Model 3 shows a better adjustment to the observed data, although it is not sufficiently precise for the high levels of streamflow observed. Using the landscape map in models 2 and 4, streamflow and percolation increase so that the soil map, which represents the large-scale, improves precision during rainy seasons. In the simulation and the adjustment of results, models 1, 2 and 3 have similar precision levels. Concerning temporal variability, the four models represent quite precisely the streamflow observed due to the use of the same climatic database (Table 6). Actual and potential evapotranspiration increase using small-scale soil data (models 1 and 3) and have a significant decrease using large-scale soil data (models 2 and 4) (Table 6). Similar results were found for the shallow aquifer recharge.
4.2. Differences between Soil Databases
We found that the parameterisation of soil maps showed large variations in the magnitudes of the values of saturated hydraulic conductivity and erodibility factor.
Figure 4. Monthly/year observed and simulated streamflow for the calibration and validation period 1993-2001 according to the version of the model.
For example, the IHhRe class of the small soil scale map covered a larger area in the basin when we overlapped it with the large-scale soil map. The erodibility factor was higher in the IHhRe class from the small-scale map, while erodibility decreased when the percentage of surface area decreased on the large-scale map. On the large-scale map, the saturated hydraulic conductivity values and soil
Table 6. Mean annual water balance for the calibration period 1993-2002.
*All units in mm.
carbon increased while these values decreased in the small-scale map (Figure 5). From this variability, a significant effect can be expected on the components of the water balance, as well as on the streamflow. Based on the soil properties described in Figure 4, the saturated hydraulic conductivity in the upper soil horizon varied between “slow” and “moderate to rapid” [43], therefore the large-scale soil map can explain the gradients of the water dynamics in the soil. Similar results could be reached by comparing the IReHh class (on the small-scale map) with the large-scale map. Only the available water capacity values had similar behaviour in both soil maps.
4.3. Sensitivity Analysis
The sensitivity analysis indicated that SWAT model was sensitive to moisture conditions, followed by infiltration parameters (NSE ≤ 0.05). These parameters agreed in general with previous SWAT modelling in the basins of southern Mexico [24] [25]. The parameters used in the model are grouped into sensitive, less sensitive and insensitive. Sensitive parameters are the parameters that contribute an effect on the output, less sensitive are the parameters with a little effect in the presence of several changes in values, and insensitive parameters are those that do not have any effect of change. To achieve the best model fit, only the most sensitive parameters were adjusted in the calibration process (Table 7). The initial values were defined to see a change in the output for each parameter [24] [60] [72]. After that, the initial values were adjusted depending on soil type, and the sensitive behaviour in relation to the accuracy of the model. It was observed that the most sensitive parameters were those with the greatest difference between the maximum and minimum NSE.
The model with large-scale soil data (models 2 and 4) had better accuracies when the range of initial values was reduced, while the model with small-scale soil (models 1 and 3) retained the initial values. Moisture condition curve number was the first parameter calibrated in the model; this result is in accordance with [73], which suggests that the sensitive parameter of a model is mostly related to the infiltration. In the model with large-scale soil data, it was observed that
Figure 5. Soil properties for selected reference parametrisation of the two databases of the upper soil horizon for soil erodibility, soil carbon, available water capacity, and soil hydraulic conductivity by soil associations area.
Table 7. Sensitivity ranking of parameters used to calibrate and validate streamflow (ranked in descending order from the most sensitive to the not sensitive).
aDefault values are given by SWAT. In the parameter names, r_means the existing parameter value is multiplied by (1 + a given value); v_means the existing parameter value is to be replaced by the given value and a_means a given value is added to the existing parameter [65].
more water from the upper horizon was allowed to satisfy the streamflow demand in rainy seasons and did not overestimate the streamflow in dry periods. Only in the simulations that use data from the smaller scale soil map (models 1 and 3) were adjusted by ±20%.
GW_DELAY was the second most sensitive parameter, and it was decreased from 450 days to 31 days. This represents that the model produces a large amount of water available for contribution to streamflow. Consequently, the shallow aquifer releases water more quickly, and the system behaves more like a permeable layer.
However, the volume of water that contributes to the streamflow when using small scale soil (models 2 and 4) comes from the shallow aquifer, while the volume of water contributing to the streamflow using small-scale soil data (models 1 and 3) comes from the surface runoff and lateral flow (Table 7). If GW_DELAY had been with larger values (around 400 days), consequently the shallow aquifer releases the water more slowly, and the system behaves more like a semi-permeable layer, the basin discharge would be determined by the volume of stored water when it should be characterized by non-linear rate coefficients [25] [43] [74].
In all four models, the threshold water depth in the shallow aquifer (REVAPMN) was the third most sensitive parameter. “REVAP” relates to soil evaporation and plant transpiration. By reducing the GW_DELAY, REVAPMN decreased from 1000 to 750 mm. Other adjusted parameters were saturated hydraulic conductivity (SOL_K), available water capacity of soil layer (SOL_AWC) and baseflow alpha factor (ALPHA_BF). GW_DELAY remained sensitive when ALPHA_BF was modified. The ALPHA_BF describes the rate at which streamflow decreases when groundwater recharges the stream channel. The best ALPHA_BF values are obtained by analysing measured streamflow during periods of no recharge in the basin (Table 7). ALPHA_BF increased while those parameters with impact on the formation of surface runoff such as SOL_K and SOL_AWC decreased. However, it could be observed that using the larger-scale soil data (models 2 and 4) a slow response to recharge was obtained while models 1 and 3 had a fast response. Therefore, the volume of water in soil is higher when using small-scale soil maps leading to an overestimation of runoff, mainly in dry seasons.
4.4. Calibration and Validation
The model performance during calibration can be qualified as good since the objective function NSE ≤ 0.5, based on the general performance ratings for a monthly time step given by [70]. The model using large-scale LULC maps (models 3 and 4) produced the best simulation accuracy and can be considered good (Table 8 and Figure 6). The other two models (1 and 2), in contrast, can be considered marginally satisfactory; whereas their NSE values fluctuate between 0.52 and 0.57 while the goodness of fit between the measured and the simulated R2 performed the same way.
The correlation between the measured flow and the simulated showed that physical processes implicated in the generation of streamflow in the study basin are satisfactorily captured by the model during calibration. However, the model performance during validation can be qualified as no-satisfactory because R2 showed values below 0.5, while model 4 maintains a good R2. One of the major
Figure 6. Comparison of simulated streamflow with values observed according to the model.
Table 8. Final calibration and validation statistics for the four versions of the model.
In the performance rating, 0.50 < NSE ≤ 0.65 is satisfactory and 0.65 < NSE ≤ 0.75 is good. The R2 have similar meanings to NSE. ±15 ≤ PBIAS < ±25 is satisfactory, ±10 ≤ PBIAS < ±15 is good, and PBIAS < ±10 is very good [69].
reasons is that the model induces the fit of the simulated peak flows with those observed during the calibration period, and therefore the model maintains the same fit in the validation period. However, the observed peak flows were lower in the validation period, so the simulated peak flows tend to be closer. This behaviour can also be observed in the PBIAS values, where the simulated peak flows are lower than the observed peaks for all four simulations (<12%) for the calibration period, and the PBIAS values were less than 10% in the validation period. Local evidence suggested that the study basin had a higher deforestation rate, about 0.9% of the total area from 1985 to 2005, and a forest recovery process from 2007 due to the implementation of conservation activities with the PES funds [50]. In addition, the basin has faced hydro-meteorological phenomena such as extreme rainfall and floods derived from tropical cyclones in 1998 and 2005 [38] [75], all of which create uncertainty in streamflow volumes in the period selected for model calibration and validation.
Based on the sensitivity analysis, the accuracy of models 1 and 3 can be higher if it is considered that curve number (CN2) values were allowed to vary up to 20%. However, according to [60], a variation in the CN2 values by ±10% can be considered an acceptable range of error. If we had maintained the interval of CN2 within a variation of less than 10%, the values of NSE and R2 would have been lower. Several authors suggest that a balance must be found between the variation that can be allowed in the CN2 parameter and the accuracy of the model [60] [76] [77]. Therefore, we consider that a variation of ±20% in the CN2 parameter results in a loss of accuracy that is still acceptable. A CN2 value higher than ±20% would have produced significant changes in surface runoff, and the model would have been wrong even if it had good accuracy. In contrast, in models based on large-scale soil data (2 and 4), it was not necessary to allow CN2 values to vary beyond 10%. In summary, the use of large-scale soil maps can better explain the sensitivity of the parameters that evaluate the basin with satisfactory accuracy, while with the use of large-scale LULC maps (model 3), a better accuracy of the model can be obtained, the latter can be explained because the numerical curve method in SWAT model is based on the Soil Conservation Service (SCS) method which considers the antecedent moisture and LULC type. Although this method is based in LULC from the United States, it is widely used in Mexico because of its simplicity and the lack of instrumentations in basins [75] [78].
5. Conclusions
It is clear that the quality of the hydrological model depends on the underlying database. In this study, it was shown that it is not necessary to have maps at a larger scale than INEGI maps to have good streamflow estimations, but also to consider the criteria in the development of a soil or LULC map for application in hydrological modelling, which may depend on the problem to be studied. INEGI maps may still be useful for exploratory purposes in hydrological modelling.
Our results suggest that greater care should be taken in selecting the scale of the maps due to soil maps can better explain the subsurface and surface hydric dynamics of the basin, while LULC maps can improve the accuracy in the model; this can be observed in the performance of the four models and the information implemented in each one of them. The sensitivity analysis does not consider the impact of HRUs and the number of sub-basins in the model setup, factors that are generally significant [79] [80]. However, our study addresses the uncertainty of the parameterisations of the four models, allowing us to observe that the impact of CN2 on the streamflow simulation by using a large-scale soil map is smaller than using a smaller scale.
Previous studies using SWAT in Mexico show that hydrological modelling is restricted to areas where there is a gauging station at least, data at larger scales than INEGI maps and basins larger than 1000 km2. Our study shows that it is possible to achieve good model performance in smaller basins, but this requires any specific data from the study area.
Given that the public budget in Mexico is limited, we propose the following strategy. First, a preliminary evaluation of INEGI maps must be done before they can be used, after an initial evaluation is done to secure that the accuracy of data available to the public is at least satisfactory. In the exploratory analysis can be determined which input data is required at larger scales and which available data can be introduced to reduce, in this way, the cost of a limited budget where it can be allocated in areas with a requirement to assess the effect of a land cover conservation scheme on hydrological services.
Finally, priority areas necessarily must include a gauging station inside or have one very close to them. Gauging stations can be located further downstream and serve several small basins. Therefore, in addition to the fundamental role of the gauging stations, it is important to select a period of the observed streamflow where the basin has not presented events that lead to an uncertainty of the hydrological service (water quantity) in order to have optimal values for calibration and validation. However, if a basin has trajectories of environmental change contrasting, then an analysis of the impact of land-use change on streamflow would be necessary.
Acknowledgements
This work was supported by the Consejo Nacional de Ciencia y Tecnología (CONACYT; Mexico’s National Science and Technology Council) with grant number 284431. This work was developed during an academic stay at the School of Community and Regional Planning, University of British Columbia.