On the Distributional Forecasting of UK Economic Growth with Generalised Additive Models for Location Scale and Shape (GAMLSS) ()
1. Introduction
Accurately forecasting economic growth is essential for businesses, researchers, and governments to make informed decisions and develop effective strategies. Traditional forecasting methods usually focus on point predictions or single estimates of future economic growth. However, these techniques often overlook the inherent uncertainty and complexity of economic data. To address this limitation, distributional forecasting has gained prominence by offering a comprehensive picture of potential outcomes, projecting the entire distribution of future values rather than a single-point estimate. Prior to the popularity of Generalised Additive Models for Location Scale and Shape (GAMLSS) developed by [1], various methods were used for forecasting the UK economic growth. Conventional time series models like Autoregressive Integrated Moving Average (ARIMA), Vector Autoregression (VAR), Autoregressive Distributed Lags (ARDL) and Error Correction Models (ECM) were used in previous studies to capture autocorrelation, interdependencies and forecast future economic growth, but producing low forecast accuracies. Quantile regression provides insight into conditional quantiles but lacks comprehensive distributional modelling. Bayesian techniques offered a probabilistic basis but required complex calculations. Machine learning techniques such as random forests and gradient boosting capture non-linear relationships but focus mainly on point predictions. Although Generalised Linear Models (GLMs) extend linear regression to different distributions, the GAMLSS proved superior in modelling higher-order moments with distributional properties.
This paper investigates the distributional forecasting of economic growth in the United Kingdom using GAMLSS. The flexible statistical framework enhances conventional GLMs by allowing all three distribution factors—location, scale, and shape—to be modelled as functions of explanatory variables. This capability facilitates the identification of intricate connections and non-linear effects in the data, leading to more precise and comprehensive projections of economic growth distributions. The accurate forecasting of economic growth is a cornerstone of effective policymaking, business strategy, and financial planning. Recently, there has been a growing recognition of the limitations of traditional linear models in capturing the complex dynamics of economic indicators. This has spurred interest in more flexible modelling frameworks that can accommodate non-linear relationships, heteroscedasticity, and non-normality. One such advanced approach is GAMLSS, which offers a comprehensive framework for distributional forecasting. This paper explores the application of GAMLSS to the distributional forecasting of UK economic growth, highlighting its superiority and outperformance over conventional methods.
Conventional researchers primarily rely on models such as ARIMA, VAR, ARDL, and ECM, which focus on point estimates and linear relationships. While these models have been instrumental in the development of econometrics, they often fall short in accounting for the intricacies inherent in economic data. For instance, economic growth data frequently exhibit volatility clustering, skewness and kurtosis, which are characteristics that linear models struggle to address adequately. GAMLSS goes beyond mean estimation to model the entire distribution of the response variable, offering a more nuanced understanding of economic phenomena. GAMLSS models are built on the foundation of Generalised Additive Models (GAMs) but enhance their capability by allowing the parameters of the distribution (such as location, scale, and shape) to be modelled as smooth functions of covariates. This flexibility is beneficial for economic data, which may exhibit non-linear trends and heterogeneity across different time periods and economic conditions. By modelling these parameters as smooth functions, GAMLSS can capture the underlying distributional changes over time, providing more informative forecasts.
The complex and dynamic nature of the UK economy presents a compelling case study for the application of GAMLSS. In the past few decades, the UK economy has undergone significant structural changes, including shifts in industrial composition, globalisation effects, and policy reforms. These changes have introduced non-linearities and varying degrees of volatility in patterns of economic growth. Traditional linear models often fail to capture these dynamics adequately, bringing about suboptimal forecasting performance. GAMLSS, with its ability to model non-linear relationships and changing distributions, offers a promising alternative. The main advantage of GAMLSS over conventional methods is that it can incorporate a wide range of distributions, including those that can explicitly model skewness and kurtosis. This is very relevant for economic growth data that often deviates from normality. For instance, during periods of economic recession or boom, growth rates can become highly skewed, and the variance may increase or decrease significantly. GAMLSS allows for these distributional characteristics to be directly modelled to improve forecast accuracy and provide more reliable predictive outcomes. This feature is essential for policymakers and analysts who rely on accurate risk assessments and scenario analysis.
GAMLSS can accommodate a variety of covariates as smooth functions, including macroeconomic indicators, financial variables, and policy measures. This flexibility enables the model to capture the complex interactions between different economic drivers and their impact on growth. For instance, the effect of interest rates on economic growth may not be linear and could vary depending on the current state of the economy. GAMLSS can provide a more detailed and accurate depiction of how different factors influence economic growth by modelling these effects as smooth functions. In addition to its flexibility and comprehensive distributional modelling capabilities, GAMLSS also offers robust diagnostic tools for model evaluation and validation. These tools help assess the goodness-of-fit, ensure that the model assumptions are met, and identify model misspecifications. This rigorous diagnostic process is crucial for building reliable forecasting models and enhancing their credibility in practical applications.
By employing GAMLSS for distributional forecasting, this study aims to contribute to the growing body of research that seeks to increase the accuracy and reliability of economic growth forecasts. The output of this study will strengthen decision-making processes and provide a greater understanding of the uncertainty associated with economic growth forecasting. The GAMLSS framework represents a significant advancement in the field of economic forecasting. Its ability to model the entire distribution of economic growth, rather than just the mean, provides a deeper and more accurate understanding of economic dynamics. By applying GAMLSS to UK economic growth data, this paper aims to demonstrate the practical advantages of this approach and encourage its broader adoption in economic forecasting and policy analysis. Thus, the insights from this application have the potential to enhance decision-making processes and contribute to more effective economic management.
The remaining aspect of this paper is structured as follows: Section 2 gives the literature review; Section 3 provides the research methodology; Section 4 delves into the data analysis and discussion; and Section 5 concludes the paper with a summary of the main points and suggestions for future research directions.
2. Literature Review
Economic performance is a major indicator of good or bad governance, and every governing party is seriously concerned about the stability or otherwise of its policies to promote economic growth and development. The UK economic growth trajectory in recent times has shown an unstable performance outlook from 2019 to 2023. Interestingly, both the Office for Budget Responsibility and the Office for National Statistics have reported that the UK annual economic growth rate was 1.64% for 2019 (0.24% increase from 2018), −10.36% for 2020 (12% decline from 2019), 8.67% for 2021 (19.03% increase from 2020) and 4.35% for 2022 (4.33% decline). Curiously, the aftermath of this simultaneous increase and decline was confirmed to have led to a technical recession for the third and fourth quarters of 2023 by the British Chambers of Commerce. However, the UK economy has been projected to achieve an increase in growth rate of 0.5%, 0.7% and 1.0% in 2024, 2025 and 2026, respectively.
This economic projection has raised the curiosity of researchers to ascertain the accuracy of this projection and, furthermore, predict the sectors where this growth may be achieved. The accuracy of this statistical economic forecast helps the government to effectively plan and take decisions to enhance growth on sectorial basis for the overall benefit of the UK economy and companies to make risk-friendly investment decisions for business purposes. Forecasting economic growth is significant for any country and plays a key role in planning and policy-making [2]. Economic forecasting involves the use of various economic indicators to predict future economic growth and conditions [3]. Scholars have applied different methods to analyse historical data of relevant variables from previous reports and surveys to show relationships and their overall impact on economic growth in the UK and across the globe [4]-[6]. Many studies have used the Gross Domestic Product (GDP) as a measure or benchmark for economic growth [7]-[10]. Economic growth forecasting is simply predicting a country’s GDP growth over time [9].
Companies predict future sales volume based on the GDP growth rate and the overall performance of the country’s economy [11]. Also, governments use economic forecast information to plan policy-making processes [12]. For example, government entities monitor the GDP growth rate before formulating fiscal and monetary policies [13]. They enact expansionary fiscal and monetary policies to boost aggregate demand during low economic growth rates or recession periods [14]. The overall idea is to help increase consumer spending on products and services and to increase the amount of money in circulation to boost economic activities [15]. On the other hand, contractional fiscal and monetary policies are put in place to cut government spending and raise taxes during economic booms [16]. Before Brexit and the COVID-19 pandemic, several research studies have been undertaken to forecast the UK economic growth [17]-[19]. However, The UK economy after Brexit and COVID-19 pandemic has been very dynamic, uncertain, and highly unpredictable [20].
The emergence of the COVID-19 pandemic and its convergence with Brexit resulted in a negative economic outlook for the UK economy [21]. During the pandemic, the UK economy declined sharply by 18.9 percent in one month, which accounted for the most significant monthly fall in the UK GDP on record [22] [23]. Several sectors such as transport, tourism, and entertainment were severely affected. The economic uncertainties resulting from the combined effect of Brexit and the COVID-19 created fear among UK investors, which resulted in a decline in consumer confidence and the depreciation of the British pound. The disruptions caused by Brexit and the COVID-19 pandemic have negatively affected previous economic prediction accuracy [24]. Hence, there is a need to carry out research works using emerging models to accurately forecast the UK’s future economic growth out-of-sample.
The UK economy was gradually recovering with GDP growth rate of 8.7 percent in 2021, and 4.3 percent in 2022 [22] [23]. Although, the current inflation in energy and food prices has resulted in negative GDP growth in the last two quarters of 2023 [22]. The UK economy experienced a recession between late 2023 and early 2024, however, the pound has currently stabilised and is anticipated to remain stable in the coming months [25]. The impact of Brexit and COVID-19 has influenced government decisions and policies targeted to re-organisation and improve the UK economy to absorb shocks from macroeconomic indicators such as stock price volatilities, and inflation, which is expected to positively affect the economy in the coming years [24]. However, these positive expectations about the UK economy call for empirical evaluation and validation. Studies that have focused on the current and future situation of the UK economy remain scarce; hence, there is a need for more current studies to evaluate and accurately predict the UK’s future economy.
However, the majority of these parametric models used for forecasting economic conditions mainly depend on fitting data that are normally distributed with a pre-specified linear relationship between the independent variables and the response variable [26]. Scholars argue that the normality and linearity assumptions of these models fail to capture the underlying true relationship between the dependent variable and the explanatory variables [27].
The low precision level of parametric models in forecasting the UK economy has triggered a new wave of research works using semiparametric and nonparametric models to improve precision rates [28]. Some semiparametric models depend on machine learning algorithms that relax the normal distribution assumptions for explanatory variables and the linear relationships assumptions [1]. They utilise an algorithm to identify the function that gives a better explanation of relationships [26]. Based on the limitations of the normality and linearity assumptions, scholars are now focusing more on the use of machine learning models such as neural networks, GAM, random forest regressions, support vector regressions and the GAMLSS to explore the vast availability of econometric data that are known to be very complex in explaining relationships [29].
While several studies have applied machine learning methods to evaluate effect relationships in different research areas [30]-[34], research works in macroeconomics using machine learning methods are rare. Some scholars applied machine learning techniques to predict GDP growth and concluded a significant improvement in the level of precision compared to using basic statistical methods [35]. In the same vein, [28] investigated the GDP growth rate of different countries using deep learning techniques and found a higher precision rate compared to previous studies that applied parametric methods. Another study by [26] explains that during economic recessions, non-linear machine learning models forecast economic conditions better than linear models. This argument was confirmed by a study, which compared the predictive performance of linear and nonlinear models using macroeconomic variables to forecast the UK economic growth and confirm that during recession, the Machine learning models perform better than linear models. Additionally, some scholars have advocated the adoption of GAMLSS for economic forecasts due to its perceived accuracy and reliability. Some researchers have applied GAMLSS as a forecasting model in different settings including the short-term forecasting of electricity price volatility [36]-[38], business sales revenue [39], centile estimation [40], and rainfall predictions [41].
GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skewed and kurtotic continuous and discrete distribution [27]. The GAMLSS is a semi-parametric model that allows the fitting of different distribution patterns (normal distributions, skewness, kurtosis) and relationships (linear and non-linear). There are benefits associated with applying the GAMLSS when compared to the other models mentioned earlier. For instance, the GAMLSS proposed more generalised distribution functions that allow normal, skewed and kurtotic distributions. Also, the dependent variable applied in GAMLSS is not restricted to follow the Gaussian distribution. Again, modelling in GAMLSS is not limited to the mean (i.e. location parameter), but also extends to the dispersion, skewness, and kurtosis (scale and shape parameters) that are not normally distributed [27]. However, very few researchers have applied the GAMLSS in macroeconomic research. Therefore, the objective of this study is to explore the GAMLSS as a distributional regression with model complexity to adequately fit the data, predict the distribution of the UK economic growth and forecast future economic growth out-of-sample.
3. Methodology
The research involves a critical examination of the Generalised Additive Model for Location, Scale and Shape (GAMLSS) as a flexible distributional regression in forecasting economic growth in-sample and out-of-sample over the conventional Autoregressive Distributed Lag (ARDL) and Error Correction Model (ECM). As empirical literature has documented the usefulness of ARDL and ECM in forecasting most economic and financial time series variables including economic growth, it would be statistically and economically meaningful to compare the various GAMLSS models with the ARDL and ECM in this paper. It is quantitative research involving a dataset obtained from the Office for National Statistics, covering 105 monthly observations of major economic indicators in the UK, ranging from January 2015 to September 2023. It consists of eleven variables which include economic growth (Econ), consumer price index (CPI), inflation (Infl), manufacturing (Manuf), electricity and gas (ElGas), construction (Const), industries (Ind), wholesale and retail (WRet), real estate (REst), education (Edu) and Health (Health). The response variable is Econ, while CPI, Infl, Manuf, ElGas, Const, Ind, WRet, REst, Edu and Health are the explanatory variables. Thus, Econ is related to the explanatory variables in the form:
(1)
R software version 4.4.1 will be used for the computations and graphics in this paper.
3.1. Augmented Dick-Fuller Test
The Augmented Dicky-Fuller (ADF) test is employed to investigate whether each time series variable has a unit root, i.e., stationary or non-stationary. Given a time series variable
, then the ADF test is based on the model:
(2)
where
is the intercept;
;
is the coefficient of
;
is the first difference of
. The null hypothesis
against the alternative hypothesis
is tested. Rejection of
implies that the time series is stationary, otherwise it is non-stationary.
3.2. Autoregressive Distributed Lag Model
The autoregressive distributed lag (ARDL) model used the lags of the response variable together with the lags of the explanatory variables as predictors for forecasting. Let
be the response variable and let
be k-dimensional set of explanatory variables, such that
assume stationary distribution. An ARDL model with
lags in the response variable and
lags of
additional explanatory variables used as predictors will take the form:
(3)
where
is the intercept;
are the coefficients;
is the error term with conditional zero mean given all explanatory variables and their lags [42]. The parameters of the ARDL model are estimated by standard least squares (SLE). If the estimated parameters are unbiased and consistent, then a one-step-ahead out-of-sample forecast
can be obtained by rolling the window. Assuming a forecast horizon
, then the out-of-sample forecasts
for various estimation windows are obtained.
3.3. Error Correction Model
Unit root tests by Augmented Dicky-Fuller (ADF) on time series variables are likely to have a combination of integration I(1) and I(0). [43] suggests a bound testing approach to deal with this issue. In this paper, an unrestricted error correction model (ECM) is proposed to deal with any problem arising from a combination of I(1) and I(0) variables in the model. The unrestricted ECM with different lag lengths
will take the form:
(4)
where
is the intercept;
is the first difference operator;
are short term dynamic coefficients of the lagged variables;
are the long run multipliers;
is the coefficient of the dummy variable;
is a vector of dummy variables;
is independent and identically distributed white noise with zero mean, homoscedastic and no autocorrelation. The unknown parameters of the model are estimated by ordinary least squares (OLS), involving
number of regressions to be estimated to obtain the optimal lag lengths in the model. Assuming a forecast horizon
, then the out-of-sample forecasts for various estimation windows are obtained.
3.4. Granger Causality Test
Granger [44] causality test assumes that if two variables are uncorrelated, then they are independent, i.e., one does not affect the other. If the lagged values of
show statistically significant results or improve the prediction of future values of
, then
Granger-cause
. The Granger causality model is defined as follows:
(5)
or
(6)
The null hypothesis
against the alternative hypothesis
which is based on the F-statistics is tested [45]. The null hypothesis is that
does not Granger-cause
, or that
does not Granger-cause
.
3.5. Generalised Additive Model for Location, Scale and Shape
The Generalised Additive Model for Location, Scale and Shape (GAMLSS) is a flexible non-parametric or semi-parametric distributional regression developed by [1], in which the distribution of the response variable is modelled as a smooth or linear function of location, scale and shape connecting the explanatory variables. Let
be the response variable with four distributional parameters, and let
with
be a known monotonic link function which connects the distributional parameters to
explanatory variables or predictors, then:
(7)
(8)
(9)
(10)
with general form:
(11)
where
and
are vectors;
is a parameter vector of length
;
a smooth non-parametric function of variables
;
are smoothing additive terms [46]. The implementation of the GAMLSS in R is flexible with many distributions together with smoothing additive terms. In this paper, the Gaussian (NO) is employed with different smoothing additive terms, designated as GAMLSS 1, GAMLSS 2 and GAMLSS 3, respectively. GAMLSS 1 is the GAMLSS model without smoothing additive terms as explanatory variables; GAMLSS 2 is the GAMLSS model which includes penalised beta
spline as smoothing additive terms; and GAMLSS 3 is the GAMLSS model which includes penalised varying coefficients function
as smoothing additive terms. The parameters of the models can be obtained by maximum likelihood estimation (MLE). A one-step-ahead out-of-sample forecast
can be obtained by rolling the window. Assuming a forecast horizon
, then the out-of-sample forecasts
for various estimation windows will be obtained.
In this paper, Akaike information criterion (AIC), mean absolute error (MAE) and Diebold-Mariano (DM) test were employed to compare the performance of the various GAMLSS models with the ARDL and unrestricted ECM, both in-sample and out-of-sample.
4. Results and Discussion
The dataset was obtained from the Office for National Statistics, covering monthly observations of major economic indicators in the UK, ranging from January 2015 to September 2023. It consists of eleven variables which include economic growth (Econ), consumer price index (CPI), inflation (Infl), manufacturing (Manuf), electricity and gas (ElGas), construction (Const), industries (Ind), wholesale and retail (WRet), real estate (REst), education (Edu) and Health (Health). The study aims to investigate the effectiveness and superiority of the GAMLSS models with smoothing additive terms in a machine learning framework over the conventional time series models (ARDL and ECM) in forecasting the UK monthly economic growth out-of-sample using a rolling window. All computations and graphics in this paper were obtained using R software version 4.4.1. In order to ensure the validity of the data and the models in this paper, diagnostics were conducted. The Johansen cointegration test was used to investigate whether the time series variables exhibit a long-term relationship over time, based on the trace test and maximal eigenvalue test, respectively. There is no evidence of multiple cointegrating relationships, and the Johansen test floundered as the variables are not integrated in the same order.
The ADF test was conducted on all-time series variables to ascertain the presence or absence of a unit root, and whether they are stationary or not. According to the analysis in Table 1, the variables ElGas, WRet, Edu and Health are stationary without differencing. Econ, Manuf, Const, Ind and REst are stationary after differencing their series once, while CPI and Infl are stationary at difference order two. The ADF seeks to clarify the stationarity or absence of unit root assumption of each time series variable with or without differencing. An autocorrelation test reveals that the autocorrelation between observations at different time lags remains constant for each time series variable. As the variables are not integrated in the same order, the autoregressive distributed lags (ARDL) model and the unrestricted error correction model (ECM) were employed. The assumptions pertaining to the ARDL and ECM which includes stationarity, no autocorrelation, no heteroscedasticity, normality, cointegration and dynamics were all investigated. It is worth noting that the I(2) in CPI and Infl triggers the deployment of the ECM in the analysis.
Table 1. The Augmented Dicky-Fuller (ADF) test results.
Variable |
I(d) |
DF Stats |
p-value |
Decision |
Econ |
1 |
−5.9713 |
<0.01** |
Stationary at I(1) |
CPI |
2 |
−8.0043 |
<0.01** |
Stationary at I(2) |
Infl |
2 |
−7.6508 |
<0.01** |
Stationary at I(2) |
Manuf |
1 |
−4.0596 |
<0.01** |
Stationary at I(1) |
ElGas |
0 |
−4.0527 |
<0.01** |
Stationary at I(0) |
Const |
1 |
−5.5165 |
<0.01** |
Stationary at I(1) |
Ind |
1 |
−5.8719 |
<0.01** |
Stationary at I(1) |
WRet |
0 |
−4.1332 |
<0.01** |
Stationary at I(0) |
REst |
1 |
−4.5856 |
<0.01** |
Stationary at I(1) |
Edu |
0 |
−3.4696 |
0.0482* |
Stationary at I(0) |
Health |
0 |
−3.7559 |
0.0237* |
Stationary at I(0) |
Note: I(d) indicate the number of times a time series variable is differenced to become stationary; “*” and “**” represent 5% and 1% significant levels, respectively.
Several ARDL models were fitted at distinct lag orders and the model with the best lag order was chosen. In the ARDL results (see Table 2), the history of Econ at lag orders 1 to 4 and the CPI do not provide statistically significant evidence of forecasting future economic growth. The other time series variables provide evidence of statistical significance at some lag orders. In particular, Infl is significant at lag orders 2 and 3; Manuf is highly significant at lag order 0; ElGas is significant at lag orders 0, 2, 4 and 5; Const is significant at lag orders 0, 1 and 3; Ind is significant at lag order 0; WRet is significant at lag orders 1 and 5; REst is significant at lag orders 3 and 4; Edu is significant at lag orders 2 and 5; and Health is significant at lag order 3. The analysis of the ARDL model reveals that the monthly historical values of economic growth and consumer price index do not possess significant statistical power in forecasting future monthly economic growth. Meanwhile, the historical values of inflation, manufacturing, electricity and gas, construction, industries, wholesale and retail, real estate, education, and health sectors possess statistically significant power in forecasting future monthly economic growth using appropriate lag orders. Turning to the unrestricted ECM (see Table 2), Econ is statistically significant at lag order 1; CPI and Infl are statistically insignificant; Manuf is highly significant; the lag of ElGas and the difference at orders 0, 1, 3 and 4 are statistically significant; the lag of Const and the difference at orders 0, 1 and 2 are statistically significant; the difference at orders 0, 1 and 2 for Ind are statistically significant; the difference at order 4 for WRet is statistically significant; the difference at order 3 for REst is statistically significant; the lag of Edu and the difference at orders 1, 3 and 4 are statistically significant; the difference at orders 1 and 2 for Health are statistically significant. Unlike the ARDL model that does not take the differencing of explanatory variables into account, most of these variables (in their differencing form) used by the ECM seem to possess statistically significant power in forecasting future economic growth, as evidenced by their test statistics and p-values. Evidently, the inclusion of error correction terms in the ECM seems to provide more useful statistical information than the ARDL. Thus, these time series variables could be regarded as leading economic indicators in the UK, owing to their influential strength in determining the UK economic growth in this direction.
Table 2. The ARDL and ECM results.
ARDL Results |
ECM Results |
Explanatory Variable |
Coeff. & Std. Error |
Explanatory Variable |
Coeff. & Std. Error |
Intercept |
0.0001378 (0.01913) |
Intercept |
0.0001378 (0.0191276) |
Lag(Econ, 1) |
0.04983 (0.09437) |
Lag(Econ, 1) |
−1.0896530 (0.1401809)*** |
Lag(Econ, 2) |
−0.09891 (0.07688) |
d(Lag(Econ, 1)) |
0.1394799 (0.1007185) |
Lag(Econ, 3) |
0.03848 (0.07420) |
d(Lag(Econ, 2)) |
0.0405703 (0.0811820) |
Lag(Econ, 4) |
−0.07905 (0.07583) |
d(Lag(Econ, 3)) |
0.0790461 (0.0758300) |
CPI |
0.02080 (0.02577) |
CPI |
0.0207996 (0.0257665) |
Infl |
−0.02393 (0.03153) |
Lag(Infl, 1) |
−0.0288039 (0.0331542) |
Lag(Infl, 1) |
0.001030 (0.01756) |
d(Infl, 1) |
−0.0239340 (0.0315261) |
Lag(Infl, 2) |
−0.03455 (0.01714)* |
d(Lag(Infl, 1)) |
0.0151747 (0.0114269) |
Lag(Infl, 3) |
0.04711 (0.01866)* |
d(Lag(Infl, 2)) |
−0.0193715 (0.0124551) |
Lag(Infl, 4) |
−0.02774 (0.01393) |
d(Lag(Infl, 3)) |
0.0277421 (0.0139281) |
Manuf |
0.07565 (0.009926)*** |
Manuf |
0.0756465 (0.0099258)*** |
ElGas |
0.03598 (0.007397)*** |
Lag(ElGas, 1) |
0.0629529 (0.0220472)** |
Lag(ElGas, 1) |
−0.008102 (0.006115) |
d(ElGas) |
0.0359808 (0.0073973)*** |
Lag(ElGas, 2) |
0.02102 (0.008670)* |
d(Lag(ElGas, 1)) |
−0.0350740 (0.0141776)* |
Lag(ElGas, 3) |
0.001019 (0.004577) |
d(Lag(ElGas, 2)) |
−0.0140500 (0.0084812) |
Lag(ElGas, 4) |
0.02277 (0.004838)*** |
d(Lag(ElGas, 3)) |
−0.0130306 (0.0061103)* |
Lag(ElGas, 5) |
−0.009740 (0.004641)* |
d(Lag(ElGas, 4)) |
0.0097396 (0.0046414)* |
Const |
0.07813 (0.008929)*** |
Lag(Const, 1) |
0.0054806 (0.0284925)*** |
Lag(Const, 1) |
−0.03861 (0.01503)* |
d(Const) |
0.0781349 (0.0089293)*** |
Lag(Const, 2) |
0.00005401 (0.01332) |
d(Lag(Const, 1) |
0.0340478 (0.0164448)* |
Lag(Const, 3) |
−0.02105 (0.009583)* |
d(Lag(Const, 2) |
0.0341018 (0.0117403)** |
Lag(Const, 4) |
−0.01306 (0.01049) |
d(Lag(Const, 3) |
0.0130565 (0.0104875) |
Ind |
0.7568 (0.03925)*** |
Lag(Ind, 1) |
0.9448038 (0.1315999) |
Lag(Ind, 1) |
−0.1036 (0.08525) |
d(Ind) |
0.7568079 (0.0392534)*** |
Lag(Ind, 2) |
0.08365 (0.06094) |
d(Lag(Ind, 1)) |
−0.2916016 (0.1163270)* |
Lag(Ind, 3) |
0.1227 (0.08608) |
d(Lag(Ind, 2)) |
−0.2079539 (0.0946743)* |
Lag(Ind, 4) |
0.08520 (0.05012) |
d(Lag(Ind, 3)) |
−0.0852047 (0.0501179) |
WRet |
0.01250 (0.01600) |
Lag(WRet, 1) |
0.0391124 (0.0235487) |
Lag(WRet, 1) |
0.04910 (0.01593)** |
d(Lag(WRet) |
0.0125012 (0.0159998) |
Lag(WRet, 2) |
0.003097 (0.01289) |
d(Lag(WRet, 1)) |
0.0224849 (0.0238150) |
Lag(WRet, 3) |
−0.01640 (0.01748) |
d(Lag(WRet, 2)) |
0.0255815 (0.0239915) |
Lag(WRet, 4) |
0.00875 (0.009806) |
d(Lag(WRet, 3)) |
0.0091855 (0.0132512) |
Lag(WRet, 5) |
−0.01794 (0.005885)** |
d(Lag(WRet, 4)) |
0.0179367 (0.0058848)** |
REst |
−0.01983 (0.04826) |
Lag(REst, 1) |
−0.0338830 (0.0775607) |
Lag(REst, 1) |
0.005177 (0.04166) |
d(REst) |
−0.0198341 (0.0482629) |
Lag(REst, 2) |
−0.0001837 (0.03278) |
d(Lag(REst, 1)) |
0.0192254 (0.0549368) |
Lag(REst, 3) |
−0.1138 (0.04312)* |
d(Lag(REst, 2)) |
0.0190417 (0.0501819) |
Lag(REst, 4) |
0.09473 (0.03436)** |
d(Lag(REst, 3)) |
−0.0947296 (0.0343610)** |
Edu |
0.007176 (0.009406) |
Lag(Edu, 1) |
0.0359072 (0.0160207)*** |
Lag(Edu, 1) |
−0.006223 (0.007609) |
d(Edu) |
0.0071763 (0.0094058) |
Lag(Edu, 2) |
0.02632 (0.009413)** |
d(Lag(Edu, 1)) |
−0.0349541 (0.0107874)** |
Lag(Edu, 3) |
−0.01175 (0.009001) |
d(Lag(Edu, 2)) |
−0.0086385 (0.0103674) |
Lag(Edu, 4) |
0.001720 (0.008780) |
d(Lag(Edu, 3)) |
−0.0203899 (0.0091609)* |
Lag(Edu, 5) |
0.02211 (0.005527)*** |
d(Lag(Edu, 4)) |
−0.0221096 (0.0055268)*** |
Health |
0.001897 (0.004988) |
Lag(Health) |
−0.0102198 (0.0067410) |
Lag(Health, 1) |
0.002939 (0.006325) |
d(Health) |
0.0018966 (0.0049877) |
Lag(Health, 2) |
0.003628 (0.005039) |
d(Lag(Health, 1)) |
0.0150556 (0.0062107)* |
Lag(Health, 3) |
−0.01835 (0.006617)** |
d(Lag(Health, 2)) |
0.0186837 (0.0054551)** |
Lag(Health, 4) |
0.004068 (0.003909) |
d(Lag(Health, 3)) |
0.0003377 (0.0032873) |
Lag(Health, 5) |
−0.004406 (0.002500) |
d(Lag(Health, 4)) |
0.0044056 (0.0025000) |
Note: Numbers in parenthesis represent the standard error; “*”, “**” and “***” represent significance at 5%, 1% and 0.1% significant levels, respectively.
The argument for the Granger causality is that the inclusion of the history of another time series variable together with the history of economic growth in forecasting future economic growth could be better predictable than using the history of economic growth alone. However, the argument is subject to statistical investigation, which results in the output in Table 3. The findings revealed that CPI, Infl, Manuf and Edu do not Granger-cause Econ. It implies that the data do not provide statistically justifiable evidence to accept the claim that the history of each of these time series variables, together with the history of the economic growth, can significantly forecast future economic growth better than using the history of the economic growth alone. On the other hand, there is statistically significant evidence that ElGas, Const, Ind, WRet, REst and Health Granger-cause Econ. Thus, the history of each of these time series variables, together with the history of economic growth, could significantly forecast future economic growth rather than using the history of economic growth alone. It is worth noting that the Granger causality is also based on distinct lag order, suggesting that the statistical strength depends on the specific lag order under consideration. In this paper, each Granger causality model was fine-tuned, and the optimal lag order for each case was chosen.
Table 3. The Granger causality test results.
Causality Model |
Optimal Lag Length |
Fstats |
p-value |
Econ with CPI |
1 |
1.5397 |
0.2175 |
Econ with Infl |
1 |
1.6419 |
0.2030 |
Econ with Manuf |
1 |
1.6766 |
0.1983 |
Econ with ElGas |
1 |
24.463 |
<0.001*** |
Econ with Const |
1 |
4.9195 |
0.0288* |
Econ with Ind |
4 |
3.2762 |
0.01472* |
Econ with WRet |
3 |
7.2533 |
0.0001964*** |
Econ with REst |
2 |
3.3743 |
0.03827* |
Econ with Edu |
5 |
2.1788 |
0.06351 |
Econ with Health |
1 |
19.272 |
<0.001*** |
Note: “*”, “**” and “***” represent significant at 5%, 1% and 0.1% significant levels, respectively.
The GAMLSS as flexible semi-parametric or non-parametric distributional regression in machine learning framework was employed to model and forecast future economic growth as well as future distribution of the economic growth model. Three GAMLSS models were fitted namely GAMLSS 1, which does not include any smoothing additive terms, GAMLSS 2 which includes penalised beta function pb() as smoothing additive terms, and GAMLSS 3 which includes penalised varying coefficients pvc() as smoothing additive terms. Model selection was done based on the Akaike Information Criterion (AIC), and the model with the lowest AIC is the best. Following the In-Sample results in Table 5, GAMLSS 2 has the lowest AIC and was chosen. The summary statistics for the GAMLSS 2 model are displayed in Table 4. Unlike the ARDL and ECM models, the GAMLSS 2 model produced a statistically significant intercept. The pb(CPI), pb(Infl) and pb(Health) are statistically insignificant, whereas the pb(Manuf), pb(ElGas), pb(Const), pb(Ind), pb(WRet), pb(REst) and pb(Edu) are all statistically significant in the model. In the GAMLSS models, only the histories of the other time series variables are used as explanatory variables, while the history of Econ was used as the response variable (one step ahead) in the model fitting. In the GAMLSS models with flexible smoothing additive terms, all explanatory variables have the same lag order 1, unlike the conventional time series ARDL and ECM models. If insignificant variables are dropped out in the GAMLSS, then the resulting model becomes parsimonious. The results in Table 4 confirmed that the penalised beta function of manufacturing, electricity and gas, construction, industries, wholesale and retail, real estate and education are statistically useful in predicting future economic growth both in-sample and out-of-sample. Unlike other relevant analyses in existing literature that focus on forecasting future economic growth alone, the GAMLSS in this paper was able to forecast both future economic growth as well as the future distribution of the economic growth in the long run. Model diagnostics were conducted on each GAMLSS model to ensure strict compliance with the underlying assumptions (see Figures 1-3). In particular, the normal density curves, Q-Q plots and worm plots confirmed that the residuals are normally distributed. Thus, the diagnostics seek to guarantee the validity and robustness of the various GAMLSS forecasting models.
Table 4. The summary results for the best GAMLSS Model.
Explanatory Variable |
Coeff. & Std. Error |
Intercept |
−0.027999 (0.011600)* |
pb(CPI) |
0.003031 (0.019517) |
pb(Infl) |
−0.002420 (0.023883) |
pb(Manuf) |
0.063743 (0.004116)*** |
pb(ElGas) |
0.016935 (0.001618)*** |
pb(Const) |
0.055364 (0.003132)*** |
pb(Ind) |
0.754518 (0.014350)*** |
pb(WRet) |
0.012404 (0.002280)*** |
pb(REst) |
0.077845 (0.016430)*** |
pb(Edu) |
0.011246 (0.003437)** |
pb(Health) |
−0.002073 (0.001842) |
Summary of the Quantile Residuals mean = −1.543809e−15 variance = 1.009615 coef. of skewness = 0.2068231 coef. of kurtosis = 2.896212 Filliben correlation coefficient = 0.9940068 |
Note: Numbers in parenthesis represent the standard error; “*”, “**” and “***” represent significant at 5%, 1% and 0.1% significant levels, respectively.
(a)
(b)
Figure 1. (a) Normality plot for GAMLSS without smoothing additive terms [GAMLSS 1]; (b) Worm plot for GAMLSS without smoothing additive terms [GAMLSS 1].
(a)
(b)
Figure 2. (a) Normality plot for GAMLSS with pb() smoothing additive terms [GAMLSS 2]; (b) Worm plot for GAMLSS with pb() smoothing additive terms [GAMLSS 2].
(a)
(b)
Figure 3. (a) Normality plot for GAMLSS with pvc() smoothing additive terms [GAMLSS 3]; (b) Worm plot for GAMLSS with pvc() smoothing additive terms [GAMLSS 3].
The analysis was split into in-sample and out-of-sample. The in-sample estimation window takes monthly observations ranging from January 2015 to December 2019 (see Table 5). Two estimation windows were used for the out-of-sample forecasts, based on rolling window. In addition, an expanding window was employed in this paper to double-check the out-of-sample forecasts obtained by the rolling window. The out-of-sample forecasting windows include monthly observations from January 2020 to December 2021 (window 1), and July 2021 to September 2023 (window 2). The AIC and mean absolute error (MAE) were the in-sample performance evaluation metrics used in the study. The ECM has a lower AIC and a lower MAE than the ARDL in-sample. In the out-of-sample analysis, the ECM yields a smaller MAE than the ARDL in both forecasting windows. The Diebold-Mariano test gives statistically significant evidence which confirms that the ECM forecasting model provides better forecast accuracy than the ARDL forecasting model at 5% significance level in window 2. However, the ECM does not significantly outperform the ARDL as judged by the Diebold-Mariano test at 5% significance level in window 1 (see Table 5). Notwithstanding, the ECM forecasting model generally outperformed the ARDL forecasting model both in-sample and out-of-sample based on the evaluation metrics. It could be deduced that the presence of error correction terms in the ECM model helps to improve the predictive tasks of the forecasting model.
Table 5. The in-sample and out-of-sample performance evaluation.
|
In-Sample |
Out-of-Sample |
Model |
Window |
AIC |
MAE |
Window |
DM test |
MAE |
ARDL |
Jan. 2015 to Dec. 2019 |
299.1194 |
1.1141 |
Jan. 2020 to Dec. 2021 |
- |
1.9462 |
ECM |
272.3514 |
0.9654 |
0.05203 |
1.8773 |
ARDL |
- |
Jul. 2021 to Sept. 2023 |
- |
1.3214 |
ECM |
0.04132* |
1.0978 |
|
The GAMLSS |
GAMLSS 1 |
Jan. 2015 to Dec. 2019 |
267.3639 |
1.2220 |
Jan. 2020 to Dec. 2021 |
0.03054* |
1.2435 |
GAMLSS 2 |
232.9040 |
0.7243 |
0.02791* |
0.4183 |
GAMLSS 3 |
256.1926 |
0.5967 |
0.02639* |
0.5211 |
GAMLSS 1 |
- |
Jul. 2021 to Sept. 2023 |
0.05465 |
1.0773 |
GAMLSS 2 |
0.01414* |
0.3542 |
GAMLSS 3 |
0.03024* |
0.4461 |
Note: The window represents the estimation time (one window for the In-Sample, two windows for the Out-of-Sample); GAMLSS 1 represents the GAMLSS model without smoothing additive terms, GAMLSS 2 represents the GAMLSS model with penalised beta function as smoothing additive terms pb(), GAMLSS 3 represents the GAMLSS model with penalised varying coefficients as smoothing additive terms pvc(); the DM values represent the p-values of the Diebold-Mariano tests; “*” indicates significant at 5% significance level.
Turning to the GAMLSS 1, 2 and 3 models, all the GAMLSS models have smaller AIC than the ARDL and ECM models in-sample. GAMLSS 1 and 2 have smaller MAE than the ARDL and ECM models in-sample. GAMLSS 1 outperformed the ARDL in terms of MAE, but could not outperform the ECM, owing to higher MAE than the ECM in-sample. In the out-of-sample case, the three GAMLSS models outperformed the ARDL and ECM forecasting models out-of-sample in both windows, as judged by their MAEs. In order to compare their forecast accuracies, the Diebold-Mariano test was employed to compare the ECM forecast accuracy with the accuracy of each GAMLSS forecasting model used in the study. In window 1, each of the three GAMLSS forecasting models significantly provides better accuracy than the best ECM forecasting model. In window 2, GAMLSS 2 and 3 provide statistically significant evidence of outperformance over the best ECM forecasting model, whereas GAMLSS 1 could not significantly outperform the best ECM model out-of-sample (see Table 5). Interestingly, the three GAMLSS forecasting models generally outperformed both the ARDL and ECM forecasting models out-of-sample in both forecasting windows, with GAMLSS 2 and 3 being superior to GAMLSS 1. Thus, the introduction of penalised smoothing additive terms in the machine learning framework in the GAMLSS seems to improve the predictive tasks of the resulting models. Overall, the paper has shown the superiority of the GAMLSS models as flexible distributional regression with penalised smoothing additive terms over the conventional techniques used in forecasting future economic growth, thereby contributing to the empirical literature. Unlike other techniques, including machine learning techniques used in previous studies, the GAMLSS in this paper possesses an exceptional quality in forecasting both future monthly economic growth and the future distribution of economic growth. Thus, the output of this paper will enrich empirical literature and provide meaningful economic information to government bodies regarding reasonable economic adjustments and decisive steps towards achieving sustainable economic growth in the long run. Also, it will be useful to future researchers in relevant disciplines with a quest for further advancement in the study.
5. Conclusions
The study assesses the effectiveness of GAMLSS as distributional regression in a machine learning framework with smoothing additive terms over some conventional econometric models in forecasting the UK’s monthly economic growth in-sample and out-of-sample using a rolling window. In particular, the ARDL and ECM were compared with the flexible GAMLSS models. The study revealed that GAMLSS models demonstrate superior outperformance in forecast accuracy over the ARDL and ECM models. The research confirmed the limitations of traditional economic indicators such as consumer price index and inflation in predicting economic growth when using ARDL and ECM models. Instead, it identified critical sectors such as manufacturing, electricity and gas, construction, industries, wholesale and retail, real estate, education, and health as key drivers of economic growth.
Notably, the GAMLSS models, particularly those incorporating smoothing additive terms, demonstrate superior outperformance in forecast accuracies over the conventional econometric models, as judged by their MAEs and DM tests, respectively. Model selection, according to AIC confirmed that the GAMLSS models are preferable to the ARDL and ECM. The ability of the GAMLSS model to adjust for various distributions of economic data offers a robust alternative to traditional models. Granger causality tests further underscored the importance of the identified sectors, revealing significant causal relationships with economic growth.
Unlike other models used in the literature, the GAMLSS models were able to forecast both the future economic growth and the future distribution of the growth, thereby guaranteeing long-run consistency over time and contributing to the empirical literature. It is worth noting that the inclusion of smoothing additive terms, such as penalised beta spline and penalised varying coefficients, improves the predictive tasks of the GAMLSS. As other machine learning techniques focused mainly on forecasting only future economic growth, the distribution of future economic growth together with the underlying assumptions may be lacking. Thus, the GAMLSS models are more reliable than other machine learning techniques in that they forecast both the future economic growth and future distribution of the growth to ensure that the assumptions of the forecasting models are satisfactorily maintained from time to time throughout the out-of-sample periods.
In light of these findings, the following recommendations are proposed for future research and policy-making:
1) Development of Early Warning Systems: Utilising the key indicators identified, there is a strong case for developing early warning systems that can detect potential economic downturns or instability. Such systems would enable timely interventions, helping to mitigate the impact of adverse economic conditions.
2) Policy Focus on Key Economic Sectors: Policymakers should prioritize the sectors identified as significant predictors of economic growth—namely manufacturing, energy, construction, industries, wholesale and retail, real estate, education, and health. Investments and supportive policies in these areas could substantially enhance economic resilience and spur growth.
3) Enhancement of GAMLSS Models: Given the superior outperformance of the GAMLSS models, it is recommended that future research focus on further optimising these models. This could involve experimenting with different combinations of smoothing terms and distributions to maximize forecasting accuracy across various economic scenarios.
4) Integration of Additional Variables: To improve the robustness of forecasting models, future studies should incorporate additional variables, particularly those reflecting external shocks or global events, such as Brexit or pandemics. This will help create a more comprehensive model that accounts for a broader range of economic influences.
5) Validation through Longitudinal Studies: It is recommended that longitudinal studies be conducted to validate the predictive models over more extended periods and under varying economic conditions. This will ensure the reliability and adaptability of the models in forecasting future economic trends.
In conclusion, this paper contributes valuable insights into the field of economic forecasting by demonstrating the potential and superiority of GAMLSS models over conventional time series econometric models. Thus, it is imperative for stakeholders, especially decision-making bodies and further researchers, to implement these recommendations to achieve more accurate economic growth forecasts and develop more effective strategies for fostering sustainable economic growth in the UK. However, the findings in this paper are based on the dataset with variables regarding the UK economic growth. Further researchers can extend or explore the technique to forecast the economic growth of other countries beyond the UK.