Natural Resources, 2010, 1, 11-18
doi:10.4236/nr.2010.11002 Published Online September 2010 (http://www.SciRP.org/journal/nr)
Copyright © 2010 SciRes. NR
11
Evaluation of Various Linear Regression Methods
for Downscaling of Mean Monthly Precipitation in
Arid Pichola Watershed
Manish Kumar Goyal, Chandra Shekhar Prasad Ojha
Dept. of Civil Engineering, Indian Institute of Technology, Roorkee, India.
Email: vipmkgoyal@rediffmail.com
Received July 20th, 2010; revised August 4th, 2010; accepted August 11th, 2010.
ABSTRACT
In this paper, downscaling models are developed using various linear regression approaches namely direct, forward,
backward and stepwise regression for downscaling of GCM output to predict mean monthly precipitation under IPCC
SRES scenarios to watershed-basin scale in an arid region in India. The effectiveness of these regression approaches is
evaluated through application to downscale the predictand for the Pichola lake region in Rajasthan state in India,
which is considered to be a climatically sensitive region. The predictor variables are extracted from (1) the National
Centers for Environmental Prediction (NC E P) reanalysis dataset for the period 1948–2000, and (2) the simulations
from the third-generation Canadian Coupled Global Climate Model (CGCM3) for emission scenarios A1B, A2, B1 and
COMMIT for the period 2001–2100. The selection of important predictor variables becomes a crucial issue for devel-
oping downscaling models since reanalysis data are based on wide range of meteorological measurements and obser-
vations. Direct regression was found to yield better performance among all other regression techniques explored in the
present study. The results of downscaling models using both approaches show that precipitation is likely to increase in
future for A1B, A2 and B1 scenarios, whereas no trend is discerned with the COMMIT.
Keywords: Backward, Forward, Precipitation, Regression, Stepwise
1. Introduction
Global circulation models (GCMs) are important tool in
assessment of climate change. These are numerical mod-
els that have been designed to simulate the past, present,
and future climate [1]. These models remain relatively
coarse in resolution and are unable to resolve significant
subgrid scale features. In most climate change impact
studies, such as hydrological impacts of climate change,
impact models are usually required to simulate sub-grid
scale phenomenon and therefore require input data at
similar sub-grid scale. The methods used to convert
GCM outputs into local meteorological variables re-
quired for reliable hydrological modeling are usually
referred to as “downscaling” techniques [2,3]. Precipita-
tion is an important parameter for climate change impact
studies. A proper assessment of probable future precipi-
tation and its variability is to be made for various water
resources planning and hydro-climatology scenarios.
A number of papers have previously reviewed down-
scaling concepts, including 1) low-frequency rainfall
events [4] 2) daily precipitation [5] 3)seasonal precipita-
tion [6] 4) daily and monthly precipitation [7] 5) monthly
precipitation [8] 6) monthly precipitation [9] 7) monthly
precipitation [10] 8) monthly precipitation [11] 9) annual
precipitation [3].
In this paper, we explore four linear regression ap-
proaches; namely, (a) direct regression, (b) forward re-
gression, (c) backward regression and (d) stepwise re-
gression as a downscaling methodology to study climate
change impact over Pichola lake basin in an arid region.
Apparently, in the literature, there appears no evidence of
any study dealing with simultaneous evaluation of vari-
ous regression approaches. In the light of this, the objec-
tive of this study is to 1) to rank various regression ap-
proaches 2) to downscale mean monthly precipitation
using best available regression approach from simula-
tions of CGCM3 for latest IPCC scenarios. The scenarios
which are studied in this paper are relevant to Intergov-
ernmental Panel on Climate Change’s (IPCC’s) fourth
Evaluation of Various Linear Regression Methods for Downscaling of
12
Mean Monthly Precipitation in Arid Pichola Watershed
assessment report (AR4) which was released in 2007.
2. Study Region
The area of the this study is the Pichola lake catchment in
Rajasthan in India that is situated from 72.5°E to 77.5°E
and 22.5°N to 27.5°N. It receives an average annual pre-
cipitation of 597 mm. It has a tropical monsoon climate
where most of the precipitation is confined to a few
months of the monsoon season. The south–west (summer)
monsoon has warm winds blowing from the Indian
Ocean causing copious amount of precipitation during
June–September months.
The Pichola watershed, located in Udaipur district,
Rajasthan is one of the major sources for water supply
for this arid region. During the past several decades, the
streamflow regime in the catchment has changed consid-
erably, which resulted in water scarcity, low agriculture
yield and degradation of the ecosystem in the study area
[12]. Regions with arid and semi-arid climates could be
sensitive even to insignificant changes in climatic char-
acteristics [13]. Temperature affects the evapotranspira-
tion [14], evaporation and desertification processes and is
also considered as an indicator of environmental degra-
dation and climate change. Understanding the relation-
ships among the hydrologic regime, climate factors, and
anthropogenic effects is important for the sustainable
management of water resources in the entire catchment
hence this study area was chosen because of aforemen-
tioned reasons. The location map of the study region is
shown in Figure 1.
3. Data Extraction
The monthly mean atmospheric variables were derived
from the National Center for Environmental Prediction
(NCEP/NCAR) (hereafter called NCEP) reanalysis data
set [15] for a period of January 1948 to December 2000.
The data have a horizontal resolution of 2.5° latitude X
longitude and seventeen constant pressure levels in ver-
tical. The atmospheric variables are extracted for nine
grid points whose latitude ranges from 22.5 to 27.5 °N,
and longitude ranges from 72.5 to 77.5 °E at a spatial
resolution of 2.5°. The precipitation are used at monthly
time scale from records available for Pichola Lake which
is located in Udaipur at 24° 34’N latitude and 73°40’E
longitude. The data is available for the period January
1990 to December 2000 [12].The Canadian Center for
Climate Modeling and Analysis (CCCma) provides
GCM data for a number of surface and atmospheric
variables for the CGCM3 T47 version which has a hori-
zontal resolution of roughly 3.75° latitude by 3.75° lon-
gitude and a vertical resolution of 31 levels. The data
comprise of present-day (20C3M) and future simulations
forced by four emission scenarios, namely A1B, A2, B1
and COMMIT. The nine grid points surrounding the
study region are selected as the spatial domain of the
predictors to adequately cover the various circulation
domains of the predictors considered in this study. The
GCM data is re-gridded to a common 2.5° using inverse
Pichola lake
Figure 1. Location map of the study region in Rajasthan State of India with NCEP grid.
Copyright © 2010 SciRes. NR
Evaluation of Various Linear Regression Methods for Downscaling of 13
Mean Monthly Precipitation in Arid Pichola Watershed
square interpolation technique [16].The utility of this
interpolation algorithm was examined in previous down-
scaling studies [17,18].
4. Regression Approaches
In statistical methods, the order in which the predictor
variables are entered into (or taken out of) the model is
determined according to the strength of their correlation
with the criterion variable.
In direct regression, all available predictor variables
are put into the equation at once and they are assessed on
the basis of proportion of variances in the criterion vari-
able (Y) they uniquely account for.
In Forward selection, the variables are entered into the
model one at a time in an order determined by the
strength of their correlation with the criterion variable.
The effect of adding each is assessed as it is entered, and
variables that do not significantly add to the success of
the model are excluded [19].
In Backward selection, all the predictor variables are
entered into the model. The weakest predictor variable is
then removed and the regression re-calculated. If this
significantly weakens the model then the predictor vari-
able is re-entered–otherwise it is deleted. This procedure
is then repeated until only useful predictor variables re-
main in the model [20,21].
Stepwise is the most sophisticated of these statistical
methods. Each variable is entered in sequence and its
value assessed. If adding the variable contributes to the
model then it is retained, but all other variables in the
model are then re-tested to see if they are still contribut-
ing to the success of the model. If they no longer con-
tribute significantly they are removed. Thus, this method
should ensure that one end up with the smallest possible
set of predictor variables included in one’s model [22].
5. Selections of Predictors
For downscaling predictand, the selection of appropriate
predictors is one of the most important steps in a down-
scaling exercise. Various authors have used large-scale
atmospheric variables, namely air temperature (at 925,
500 and 200mb pressure levels), geopotential height (at
500 and 200mb pressure levels), zonal (u) and meridional
(v) wind velocities (at 925 and 200mb pressure levels),
as the predictors for downscaling GCM output to mean
monthly precipitation over a catchment [8,10,23].
Predictors have to be selected based both on their rele-
vance to the downscaled predictands and their ability to
be accurately represented by the GCMs. Cross-correlations
are in use to select predictors to understand the presence
of nonlinearity/linearity trend in dependence structure
[23,24]. These cross-correlations between each of the
predictor variables in NCEP and GCM datasets are use-
ful to verify if the predictor variables are realistically
simulated by the GCM. Cross-correlations are computed
between the predictor variables in NCEP and GCM
datasets (Table 1). The cross correlations are estimated
using three measures of dependence namely, product
moment correlation, Spearman’s rank correlation and
Kendall’s tau Scatter plots and cross-correlations be-
tween each of the predictor variables in NCEP and GCM
datasets are useful to verify if the predictor variables are
realistically simulated by the GCM. Cross-correlations
between each of the predictor variables in NCEP and
GCM datasets are useful to verify if the predictor vari-
ables are realistically simulated by the GCM.
6. Development of Downscaling Models
For downscaling precipitation, the probable predictor
variables that have been selected to develop the models
are considered at each of the nine grid points surrounding
and within the study region. In this study, various linear
regression approaches are used to downscale mean
monthly precipitation in this study. The data of potential
predictors is first standardized. Standardization is widely
used prior to statistical downscaling to reduce bias (if any)
in the mean and the variance of GCM predictors with
respect to that of NCEP-reanalysis data [24]. Standardi-
zation is done for a baseline period of 1948 to 2000 be-
cause it is of sufficient duration to establish a reliable
climatology, yet not too long, nor too contemporary to
include a strong global change signal [24].
A feature vector (standardized predictor) is formed for
each month of the record using the data of standardized
NCEP predictor variables. The feature vector is the input
to the linear regression models, and the contemporaneous
Table 1. Cross-correlation computed between probable predictors in NCEP and GCM datasets.
Ta925 Ua925 Va925 Va200 Ta20 Zg200 Ua200 Ta500 Zg500
P 0.83 0.79 0.67 -0.18 0.66 0.81 0.23 0.81 0.60
S 0.68 0.56 0.43 -0.14 0.46 0.64 0.57 0.64 0.39
K 0.87 0.76 0.61 -0.20 0.68 0.85 0.73 0.85 0.59
H
ere P, S and K represent product moment correlation, Spearman’s rank correlation and Kendall’s tau respectively.
Copyright © 2010 SciRes. NR
Evaluation of Various Linear Regression Methods for Downscaling of
14
Mean Monthly Precipitation in Arid Pichola Watershed
value of predictand is the output. To develop down-
scaling models, the feature vectors which are prepared
from NCEP record are partitioned into a training set and
a validation set. Feature vectors in the training set are
used for calibrating the model, and those in the validation
set are used for validation. The 11-year mean monthly
observed precipitation data series were broken up into a
calibration period and a validation period. Four models
M1, M2, M3 and M4 were developed corresponding to
regression approaches namely stepwise, forward, back-
ward and direct respectively for predictand (Precipita-
tion). The models were calibrated on the calibration pe-
riod 1990 to 1995 and validation involved period 1996 to
2000. The various error criteria are used as an index to
assess the performance of the model. Based on the latest
IPCC scenario, models for mean monthly precipitation
were evaluated based on the accuracy of the predictions
for validation data set.
7. Results and Discussions
Downscaling models were developed following the
methodology as discussed in preceding section. The re-
sults and discussion are presented in this section.
7.1. Potential Predictor Selection
The most relevant probable predictor variables necessary
for developing the downscaling models are identified by
using the three measures of dependence following the
procedure. The cross-correlations enable verifying the
reliability of the simulations of the predictor variables by
the GCM, are shown in Table 1. In general, the most of
predictor variables are realistically simulated by the
GCM. It is noted that air temperature at 925mb (Ta 925)
is the most realistically simulated variable with a CC
greater than 0.8, while meridional wind at 200mb (Va200)
is the least correlated variable between NCEP and GCM
datasets (CC = -0.17). It is clear from Table 1 that air
temperature at 925mb (Ta 925), air temperature at 500
mb (Ta500), air temperature at 200 mb (Ta200), merid-
ional wind at 925mb (Va 925), zonal wind at 925mb
(Ua925), zeo-potential height at 200mb (Zg200) and
zeo-potential height at 500mb (Zg500) are better corre-
lated than meridional wind at 200mb (Va200) and zonal
wind at 200mb (Ua200).
7.2. Downscaling and performance of GCM
Models
Seven predictor variables namely air temperature at 925
mb, 500 mb and 200 mb, zonal wind (925 mb); merido-
inal wind (925 mb); zeo-potential height 500 mb and 200
mb at 9 NCEP grid points with a dimensionality of 63,
are used as the standardized data of potential predictors.
These feature vectors are provided as input to the various
regressions downscaling model. Results of the different
regression models (viz. M1 to M4) as discussed in previ-
ous section are tabulated in Table 2. Some of the pre-
cipitation values using this technique resulted in negative
precipitation. However, this is physically not possible to
have negative precipitation on a basin. Hence, these
negative values are considered zero to compute various
errors.
For predictand precipitation, coefficient of correlation
Table 2. Various performance statistics of model using various regression approaches.
CC SSE MSE RMSE
Model
Training Validation Training Validation Training Validation Training Validation
M1 0.90 0.79 111573.52 125884.77 1549.63 2098.08 39.37 45.80
M2 0.91 0.79 111304.52 125884.77 1545.90 2098.08 39.32 45.80
M3 0.94 0.65 73875.77 182400.92 1026.05 3040.02 32.03 55.14
M4 0.95 0.60 55529.22 204162.48 771.24 3402.71 27.77 58.33
NMSE N-S Index MAE
Training Validation Training Validation Training Validation
0.19 0.46 0.81 0.53 0.63 0.37
0.19 0.46 0.81 0.53 0.63 0.37
0.13 0.67 0.87 0.32 0.70 0.25
0.09 0.75 0.90 0.24 0.72 0.23
Here CC, SSE, SSE, MSE, RMSE, NMSE, N-S Index, MAE indicate Coefficient of Correlation, Standard Error of Estimate, Mean Square Error, Root Mean
Square Error, Normalized Mean square Error, Nash–Sutcliffe Efficiency Index and Mean Absolute Error respectively.
Copyright © 2010 SciRes. NR
Evaluation of Various Linear Regression Methods for Downscaling of 15
Mean Monthly Precipitation in Arid Pichola Watershed
(CC) was in the range of 0.65-0.95, RMSE was in the
range of 27.77-58.33, N-S Index was in the range of
0.24-0.90 and MAE was in the range of 0.23-0.72 for
regression based models (viz. M1 to M4) for training and
validation set. It can be observed from Table 2 that the
performance of direct regression models for mean
monthly precipitation are clearly superior to that of for-
ward, backward and stepwise regression based models in
training data set while the performance of stepwise and
forward regression models for predictand are clearly su-
perior to that of backward and direct regression based
models in validation data set. Results of forward and
stepwise regression are quite similar. It can be inferred
that model M4 using direct regression performed best for
predictand precipitation.
A comparison of mean monthly observed precipitation
with precipitation simulated using forward regression
models M4 has been shown from Figure 2 for calibration
and validation period. Calibration period is from 1990 to
1995, and the rest is validation period.
Once the downscaling models have been calibrated
and validated, the next step is to use these models to
downscale the control scenario simulated by the GCM.
The GCM simulations are run through the calibrated and
validated direct regression model M4 to obtain future
simulations of predictand. The predictand patterns are
analyzed with box plots for 20 year time slices. Typical
results of downscaled predictand obtained from the pre-
dictors are presented in Figure 3. In part (i) of Figure 3,
the precipitation downscaled using NCEP and GCM
datasets are compared with the observed precipitation for
the study region using box plots. The projected precipita-
tion for 2001–2020, 2021–2040, 2041–2060, 2061–2080
and 2081–2100, for the four scenarios A1B, A2, B1 and
COMMIT are shown in (ii), (iii), (iv) and (v) respec-
tively.
From the box plots of downscaled predictand (Figure
3), it can be observed that precipitation are projected to
increase in future for A1B, A2 and B1 scenarios. The
projected increase of precipitation is high for A1B and
A2 scenarios whereas it is least for B1 scenario. This is
because among the scenarios considered, the scenario
A1B and A2 have the highest concentration of atmos-
pheric carbon dioxide (CO2) equal to 720 ppm and 850
ppm, while the same for B1 and COMMIT scenarios are
550 ppm and 370 ppm respectively. Rise in concentra-
tion of CO2 in the atmosphere causes the earth’s average
temperature to increase, which in turn causes increase in
evaporation especially at lower latitudes. The evaporated
water would eventually precipitate [10,25]. In the
COMMIT scenario, where the emissions are held the
same as in the year 2000, no significant trend in the pat-
tern of projected future precipitation could be discerned.
The overall results show that the projections obtained for
precipitation are indeed robust.
8. Conclusions
This paper investigates the applicability of the various
linear regression approaches such as direct, forward,
backward and stepwise to downscale precipitation from
Figure 2. Typical results for comparison of the monthly observed Precipitation with Precipitation simulated using direct re-
gression downscaling model M4 for NCEP data. In the Figure calibration period is from 1990 to 1995, and the rest is valida-
tion period.
Copyright © 2010 SciRes. NR
Evaluation of Various Linear Regression Methods for Downscaling of
16
Mean Monthly Precipitation in Arid Pichola Watershed
(a) (b)
(c) (d)
(e)
Figure 3. Box plots results from the direct regression-based downscaling model M4 for the predictand Precipitation.
Copyright © 2010 SciRes. NR
Evaluation of Various Linear Regression Methods for Downscaling of
Mean Monthly Precipitation in Arid Pichola Watershed
Copyright © 2010 SciRes. NR
17
GCM output to local scale. The effectiveness of this
model is demonstrated through the application of lake
catchment in arid region in India. The predictand is
downscaled from simulations of CGCM3 for four IPCC
scenarios namely SRES A1B, A2, B1 and COMMIT.
Four regression models are developed and the perform-
ance of the models is evaluated using the statistical
measures CC, SSE, MSE, RMSE, NMSE, η and MAE.
The overall conclusions of this evaluation study are as
follows:
1) Overall direct regression performed best followed
by backward regression method. Backward regression
was followed by forward regression and stepwise regres-
sion which yielded the similar results.
2) Direct regression yielded better results for training
data set while forward regression performed better for
validation data set.
3) The results of downscaling models show that pre-
cipitation is projected to increase in future for A2 and
A1B scenarios, whereas it is least for B1 and COMMIT
scenarios using predictors.
REFERENCES
[1] A. Robock, R. P. Turco, M. A. Harwell, T. P. Ackerman,
R. Andressen, H-S Chang and M. V. K. Sivakumar, “Use
of General Circulation Model Output in the Creation of
Climate Change Scenarios for Impact Analysis,” Climatic
Change, Vol. 23, No. 4, 1993, pp. 293-335.
[2] F. Giorgi and L. O. Mearns, “Approaches to the Simula-
tion of Regional Climate Change: A Review,” Review of
Geophysics, Vol. 29, No. 2, 1999, pp. 191-216.
[3] S. Maxime, G. Hartmut , R. Lars, K. Nicole and O. Ri-
cardo, “Statistical Downscaling of Precipitation and Tem-
perature in North-Central Chile: An Assessment of Possi-
ble Climate Change Impacts in an Arid Andean Water-
shed,” Hydrological Sciences Journal, Vol. 55, No. 1,
2010, pp. 41-57.
[4] R. L. Wilby, C. W. Dawson and E. M. Barrow, “SDSM–
A Decision Support Tool for the Assessment of Climate
Change Impacts,” Environmental Modelling & Software,
Vol. 17, No. 2, 2002, pp. 147-159.
[5] E. P. Salathe, “Comparison of Various Precipitation
Downscaling Methods for the Simulation of Streamflow
in a Rainshadow River Basin,” International Journal of
Climatology, Vol. 23, No. 8, 2003, pp. 887-901.
[6] M. K. Kim, I. S. Kang, C. K. Park and K. M. Kim, “Super
Ensemble Prediction of Regional Precipitation over Ko-
rea,” International Journal of Climatology, Vol. 24, No.6,
2004, pp. 777-790.
[7] F. Wetterhall, S. Halldin and C. Y. Xu “Statistical Pre-
cipitation Downscaling in Central Sweden with the Ana-
logue Method,” Journal of Hydrology, Vol. 306, No. 1-4,
2005, pp. 136-174.
[8] S. Tripathi, V. V. Srinivas and R. S. Nanjundiah, “Down-
scaling of Precipitation for Climate Change Scenarios: A
Support Vector Machine Approach,” Journal of Hydrol-
ogy, Vol. 330, No. 3-4, 2006, pp. 621-640
[9] R. E. Benestad, “A Comparison between Two Empirical
Downscaling Strategies,” International Journal of Cli-
matology, Vol. 21, No. 13, 2001, pp. 1645-1668.
[10] A. Anandhi, V. V. Srinivas, R. S. Nanjundiah and D. N.
Kumar, “Downscaling Precipitation to River Basin for
IPCC SRES Scenarios Using Support Vector Machines,”
International Journal of Climatology, Vol. 28, 2008, pp.
401-420.
[11] S. Zekai, “Precipitation Downscaling in Climate Model-
ling Using a Spatial Dependence Function,” International
Journal of Global Warming, Vol. 1, No. 1-3, pp. 29-42.
[12] S. D. Khobragade, “Studies on Evaporation from Open
Water Surfaces in Tropical Climate,” PhD Dissertation,
Indian Institute of Technology, Roorkee, India, 2009.
[13] H. Linz, I. Shiklomanov and K. Mostefakara, “Chapter 4
Hydrology and Water Likely Impact of Climate Change
IPCC WGII Report WMO/UNEP Geneva,” 1990.
[14] C. R. Jessie, R. M. Antonio and S. P. Stahis, “Climate
Variability, Climate Change and Social Vulnerability in
the Semi-arid Tropics,” Cambridge University Press,
Cambridge, 1996.
[15] E. Kalnay, et al., “The NCEP/NCAR 40-Year Reanalysis
Project,” Bulle tin of the American Meteorological Society,
Vol. 77, No. 3, 1996, pp. 437-471.
[16] C. J. Willmott, C. M. Rowe and W. D. Philpot,
“Small-scale Climate Map: A Sensitivity Analysis of
Some Common Assumptions Associated with the
Grid-Point Interpolation and Contouring,” American Car-
tographer, Vol. 12, No. 2, 1985, pp. 5-16.
[17] D. A. Shannon and B. C. Hewitson, “Cross-scale Rela-
tionships Regarding Local Temperature Inversions at
Cape Town and Global Climate Change Implications,”
South African Journal of Science, Vol. 92, No. 4, 1996,
pp. 213-216.
[18] R. G. Crane and B. C. Hewitson, “Doubled CO2 Precipi-
tation Changes for the Susquehanna Basin: Down-Scaling
from the Genesis General Circulation Model,” Interna-
tional Journal of Climatology, Vol. 18, No. 1, 1998, pp.
65-76.
[19] J. Neter, M. Kutner, C. Nachtsheim and W. Wasserman,
“Applied Linear Statistical Models,” McGraw-Hill Com-
panies, Inc., New York, 1996.
[20] A. C. Rencher, “Methods of Multivariate Analysis,” John
Wiley & Sons Inc., New York, 1995.
[21] Novell Courseware Server, Acadia University, http:// plato.
acadiau.ca/courses/psyc/mcleod/2023Research/Multipl3-R
egression-types.html
[22] A. A. Al-Subaihi, “Variable Selection in Multivariable
Regression Using SAS/IML,” Journal of Statistical Soft-
ware, Vol. 7, No. 12, 2002, pp. 1-20.
[23] Y. B. Dibike and P. Coulibaly, “Temporal Neural Net-
Evaluation of Various Linear Regression Methods for Downscaling of
18
Mean Monthly Precipitation in Arid Pichola Watershed
works for Downscaling Climate Variability and Ex-
tremes,” Neural Networks, Vol. 19, No. 2, 2006, pp. 135-
144.
[24] R. L. Wilby, S. P. Charles, E. Zorita, B. Timbal, P. Whet-
ton and L. O. Mearns, “The Guidelines for Use of Climate
Scenarios Developed from Statistical Downscaling Meth-
ods,” p. 27, 2004. http://ipcc-ddc.cru.uea.ac.uk
[25] M. K. Goyal and C. S. P. Ojha, “Robust Weighted Re-
gression as a Downscaling Tool in Temperature Projec-
tions,” International Journal of Global Warming. 2010.
http://www.inderscience.com/browse/index.php?journalI
D=331 &action=coming
Copyright © 2010 SciRes. NR