Univariate Time-Series Analysis of Second-Hand Car Importation in Zambia

Stanley Jere; Bornwell Kasense; Bwalya Bupe Bwalya

doi:10.4236/ojs.2017.74050

Open Journal of Statistics > Vol.7 No.4, August 2017

Univariate Time-Series Analysis of Second-Hand Car Importation in Zambia

Stanley Jere^*, Bornwell Kasense, Bwalya Bupe Bwalya
Department of Mathematics and Statistics, Mulungushi University, Kabwe, Zambia.
DOI: 10.4236/ojs.2017.74050 PDF HTML XML 1,459 Downloads 3,685 Views Citations

Abstract

Zambia largely depends on the international second-hand car (SHC) market for their motor vehicle supply. The importation of Second hand Cars in Zambia presents a time series problem. The data used in this paper is monthly data on SHC importation from 1^st January, 2014 to 31^st December, 2016. Data was analyzed using Exponential Smoothing (ES) and Autoregressive Integrated Moving Average (ARIMA) models. The results showed that ARIMA (2, 1, 2) was the best fit for the SHC importation since its errors were smaller than those of the SES, DES and TES. The four error measures used were Root-mean-square error (RMSE), Mean absolute error (MAE), Mean percentage error (MPE) and Mean absolute percentage error (MAPE). The forecasts were also produced using the ARIMA (2, 1, 2) model for the next 18 months from January 2017. Although there is percentage increase of 90.6% from November 2015 to December 2016, the SHC importation generally is on the decrease in Zambia with percentage change of 59.5% from January 2014 to December 2016. The forecasts also show a gradual percentage decrease of 1.12% by June 2018. These results are more useful to policy and decision makers of Government departments such as Zambia Revenue Authority (ZRA) and Road Development Agency (RDA) in a bid to plan and execute their duties effectively.

Keywords

Zambia, Importation, Second Hand Car, Exponential Smoothing Models, ARIMA Models, Forecasting

Share and Cite:

Jere, S. , Kasense, B. and Bwalya, B. (2017) Univariate Time-Series Analysis of Second-Hand Car Importation in Zambia. Open Journal of Statistics, 7, 718-730. doi: 10.4236/ojs.2017.74050.

1. Introduction

Over the last two decades private vehicle ownership in the developing countries has increased at an unprecedented pace. Between 1990 and 2005 the total number of registered vehicles in developing countries rose from 110 million to 210 million, and by some estimates it is forecast to reach 1.2 billion by 2030 [1] . Rising incomes explain a large share of this growth; as people get richer, they can afford the personal mobility that an automobile confers. High-income countries export large numbers of second-hand vehicles to low-income countries, and this trade will probably grow [2] . Zambia, being one of the developing countries, has experienced strong economic growth over the last decade and the country’s growth outlook is also positive [3] . The sustained positive growth of the Zambian economy has resulted in many shifts in consumption patterns of Zambian households. The economic reality of Zambia is that the majority of the population is middle class and hence middle income earners. This economic reality has forced Zambians to depend on the second-hand market for their motor vehicle supply. This is supported by [3] who noted that consumers with less purchasing power are more likely to be able to afford to buy second-hand motor vehicles. In addition, the car purchasing pattern in most developing countries has been high due to a rapid increase in ICT usage-Internet and mobile penetration, rising GDP and an emerging middle-class society [4] .

The importation of Second-hand Cars in Zambia presents a time series problem. There are several techniques that use time series but in this study we shall only concentrate on the Exponential Smoothing (ES) and Autoregressive Integrated Moving Average (ARIMA) models. In [5] , various smoothing techniques discussed include: The Simple Exponential Smoothing (SES) which is applied when the pattern is nearly horizontal; The Double Exponential Smoothing (DES) also known as the Holt’s Exponential Smoothing (HES) applied when data shows a trend; The Holt Winters (HW) also known as Triple Exponential Smoothing (TES) in which the data shows both trend and seasonality. The Best fit of the three models will be compared with the ARIMA model depending on whether the data used will exhibit a level and/or trend and/or seasonality. The ARIMA model is another method that is used to model and forecast time series data. The ARIMA models are also known as the “Box-Jenkins” approach following the work of Box and Jenkins [6] . This paper therefore, focuses on the major tools of decision making called univariate time-series-models.

2. Methodology

Below is the flowchart of the methodology.

Two main classes of models are considered in this paper: The Exponential Smoothing (ES) and Autoregressive Integrated Moving Average (ARIMA) models. The first class involves the SES, DES and TES models. The three models will be analysed and the best fit model will be chosen depending on whether the data used will exhibit a level and/or trend and/or seasonality. The second class involves the ARIMA models with the following model-building process: Tentative identification of a model, Estimation of parameters in the identified model and Diagnostic checks. The Best fit from the two classes will finally be compared to

Figure 1. Model identification procedure.

choose the model for forecasting (Figure 1).

2.1. Exponential Smoothing (ES) Models

2.1.1. Simple Exponential Smoothing (SES)

The SES is applied when the data pattern is nearly horizontal, and shows no particular trend or seasonal variation exists in previous data sets. For the series $ϕ_{1}, ϕ_{2}, \dots, ϕ_{t}$ the forecast for the preceding value $ϕ_{t + 1}$ , say #Math_5#, is based on the weights $1 - α$ and $α$ to the recent observation $ϕ_{t}$ and forecast ${\bar{ϕ}}_{t}$ respectively. Where α is the smoothing constant called alpha, $ϕ_{t}$ is the actual value for period t, ${\bar{ϕ}}_{t}$ is the forecast value for period t. The model is of the form

${\bar{ϕ}}_{t + 1} = {\bar{ϕ}}_{t} + α (ϕ_{t} - {\bar{ϕ}}_{t}), 0 < α < 1 and t > 0.$ (1)

The value of $α$ is subjectively such that a value close to zero is for smoothing out unwanted cyclical and irregular components and a value close to one is for forecasting.

2.1.2. Double Exponential Smoothing (DES)

This technique is used when the data exhibits a trend in its pattern. If you have a time series that can be described using an additive model with increasing or decreasing trend and no seasonality. The model is

${\bar{ϕ}}_{t} = α ϕ_{t} + (1 - α) ({\bar{ϕ}}_{t - 1} + β_{t - 1}), 0 < α < 1,$ (2)

$β_{t} = θ ({\bar{ϕ}}_{t} - {\bar{ϕ}}_{t - 1}) + (1 - θ) β_{t - 1}, 0 < θ < 1,$ (3)

${\hat{ϕ}}_{t + m} = {\bar{ϕ}}_{t} + β_{t} m$ (4)

where $ϕ_{t}$ is the actual value in time t, ${\bar{ϕ}}_{t}$ is the level of series at time t, $β_{t}$ is the slope (trend) of the time series at time t. $α$ and $β$ ( $= 0.1, 0.2, \dots, 0.9$ ) are the smoothing coefficient for level and smoothing coefficient for trend respectively. The best values of $α$ and $β$ correspond to the minimum mean square error (MSE).

2.1.3. Triple Exponential Smoothing (TES)

The TES model is applied when time series data exhibit seasonality. It incorporates three smoothing equations; first for the level, second for trend and third for seasonality. The Triple exponential smoothing model is:

${\bar{ϕ}}_{t} = \frac{α ϕ_{t}}{S_{t - p}} + (1 - α) ({\bar{ϕ}}_{t - 1} + β_{t - 1}), 0 < α < 1,$ (5)

$β_{t} = θ ({\bar{ϕ}}_{t} - {\bar{ϕ}}_{t - 1}) + (1 - θ_{t - 1}) β_{t - 1}, 0 < θ < 1,$ (6)

$S_{t} = \frac{γ ϕ_{t}}{{\bar{ϕ}}_{t}} + (1 - γ) S_{t - p}, 0 < γ < 1,$ (7)

So we have our prediction for time period $T + τ$ :

${\hat{ϕ}}_{T + τ} = ({\bar{ϕ}}_{T} + τ θ_{T}) S_{T}$ (8)

where: ${\bar{ϕ}}_{T}$ is the smoothed estimate of the level at time T, $θ_{T}$ is the smoothed estimate of the change in the trend value at time T, $S_{T}$ is the smoothed estimate of the appropriate seasonal component at T. α, β and γ are the level, trend and seasonal smoothing parameters respectively. ${\bar{ϕ}}_{t}$ is the smoothed level at time t, $θ_{t}$ is the change in the trend at time t, $S_{t}$ is the seasonal smooth at time t and p is the number of seasons per year.

2.2. Autoregressive Integrated Moving Average (ARIMA) Model

The ARIMA model has the following stages: identification, estimation, diagnosis and prediction. “I” stands for integrated process which implies that the process needs to undergo differentiation and that, upon completion of the modelling, the results undergo an integration process to produce final predictions and estimates [7] . The function representing the ARIMA model is denoted ARIMA (p, d, q), which produces a stationary function ARMA (p, q) upon differentiation with respect to time. In the ARIMA (p, d, q), p stands for the order of autoregressive (AR) part, d stands for the number of times the data needs to be differenced to become stationary and q stands for the moving average (MA) part. The R statistical package was used to perform the ARIMA modelling of identification, estimation, diagnosis and prediction. The expressions for MA, AR and ARMA are:

AR model:

${\hat{Y}}_{t} = ϑ_{1} Y_{t - 1} + ϑ_{2} Y_{t - 2} + \dots + ϑ_{p} Y_{t - p} + ε_{t} = \sum_{i = 1}^{p} ϑ_{i} Y_{t - i} + ε_{t},$ (9 )

MA model:

${\hat{Y}}_{t} = φ_{1} ε_{t - 1} + φ_{2} ε_{t - 2} + \dots + φ_{q} ε_{t - q} = \sum_{i = 1}^{q} φ_{i} ε_{t - i},$ (10)

and ARMA model:

${\hat{Y}}_{t} = \sum_{i = 1}^{p} ϑ_{i} Y_{t - i} + ε_{t} + \sum_{i = 1}^{q} φ_{i} ε_{t - i}$ (11)

where $ϑ_{t}$ is the auto-regressive parameter at time t, $ε_{t}$ is the error term at time t and $φ_{t}$ is the moving-average parameter at time t.

2.3. Assumption: Stationarity

The stationarity assumption implies that the mean, variance and autocorrelation structures do not change over time. Stationarity will mean a flat looking series, without trend, constant variance over time and no periodic fluctuations (seasonality). However, this assumption of stationarity applies to ARIMA models and not ES models. When the data is found to be non-stationary, the first difference (d = 1) will be used. Only in extreme cases will second difference (d = 2) be applied.

2.4. Model-Selection Criteria

Four model-selection metrics to evaluate the performance of the estimated Exponential Smoothing models and the estimated ARIMA model are used. The best fit model is one with a high number of smaller errors. These errors are; the Root Mean Square Error (RMSE), the Mean Absolute Percentage Error (MAPE), the Mean Percentage Error (MPE) and the Mean Absolute Error (MAE).

Table 1 shows how the errors measures are calculated.

3. Results and Discussion

The data collected was called into R version 3.3.3 to perform the necessary analysis as outlined in the subsections to follow. Figure 2 show a plot of the original SHC imports data.

Figure 2 indicates a trend.

3.1. Exponential Smoothing Output

Using the appropriate coding in R, the following output was automatically generated.

3.1.1. Simple Exponential Smoothing

The R output for the SES model was as shown in Table 2. The alpha level was

Table 1. Model accuracy metrics.

Figure 2. Time plot for imported second-hand cars in Zambia from Jan 2014 to Dec 2016.

Table 2. Model Information for Simple Exponential Smoothing.

Table 3. Model Information for Double Exponential Smoothing.

estimated at $α = 0.9104$ with initial state, $l = 5521.3676$ and $AIC = 578.5190$

And the fitted model for this result took the form of

${\bar{ϕ}}_{t + 1} = 0.9104 ϕ_{t} + 0.0896 {\bar{ϕ}}_{t}$ (12)

3.1.2. Double Exponential Smoothing

The R output for the DES model was as shown in Table 3. The level and trend components were estimated at $α = 0.8006$ and $β = 0.0004$ respectively, with initial states, $l = 6033.9228$ and $b = - 118.9081$ . $AIC = 581.2877$ .

The following equations constituted the fitted DES model for SHC importation using Equations ((2) and (3)).

${\bar{ϕ}}_{t} = 0.8006 ϕ_{t} + 0.1994 ({\bar{ϕ}}_{t - 1} + β_{t - 1}),$ (13)

$β_{t} = 0.0004 ({\bar{ϕ}}_{t} - {\bar{ϕ}}_{t - 1}) + 9.9996 β_{t - 1},$

$β_{t} = 0.0004 ({\bar{ϕ}}_{t} - {\bar{ϕ}}_{t - 1}) + 9.9996 β_{t - 1},$ (14)

3.1.3. Triple Exponential Smoothing

Table 4 shows the R output for the HW model. The smoothing parameters level, trend and gamma were estimated at $α = 0.8006$ , $β = 0$ and $γ = 1$ respectively. The coefficients are; $a = 2158.93914$ and.

Using Equations (5)-(7), we fitted the HW model for SHC imports as;

Table 4. Model Information for Triple Exponential Smoothing.

Figure 3. Plot of Exponential smoothing models of observed and fitted. (a) Plot of fitted TES model; (b) Plot of fitted DES model; (c) Plot of fitted SES model.

${\bar{ϕ}}_{t} = 0.7706147 \frac{ϕ_{t}}{S_{t - p}} + 0.2293853 ({\bar{ϕ}}_{t - 1} + β_{t - 1}),$ (15)

$β_{t} = β_{t - 1},$ (16)

$S_{t} = \frac{ϕ_{t}}{{\bar{ϕ}}_{t}}$ (17)

3.1.4. Choice of Appropriate Exponential Smoothing Technique

Figure 3 shows the plots for three fitted ES techniques and original data models for easy of comparisons and choosing.

Figure 3(a), which shows TES, is eliminated easily because, by graphical inspection, it does not closely mimic the time plot for observed data as good as the other two. Now the choice was to be made between Figure 3(b) and Figure 3(c) which clearly look so closely alike and both were mimicking the observed data plot quite well. Therefore, to make a good choice the AIC for both were calculated and compared with the model giving a smaller AIC being chosen.

Clearly the AICs in Table 5 show that the SES was a better fit than DES. Hence the appropriate ES technique of the three ES compared was chosen to be SES.

3.2. Autoregrassive Integrated Moving Average

3.2.1. Model Identification and Selection

To model an ARIMA, a time plot is the first step. Figure 4 shows a time plot of the SHC imports for d = 0 and d = 1. Figure 4(b) is as a result of non-stationarity nature of the observed data as evidenced by Figure 4(a) and the ACF and PACF plots in Figure 5. ARIMA modelling requires that observed data be stationary and if not, it must be made stationary.

Hence Figure 4(b) and Figure 5(c) and Figure 5(d) which are as a result of first difference, that is d = 1.

Model selection requires that the ACF and PACF plots for d = 1 in Figure 4 be examined to establish the most suitable ARIMA. But the ACF and PACF plots did not give clear indication of significant spikes at any one lag. Hence, several tentative ARIMA models and their respective AICs were examined as shown in Table 6. Table 6 shows ARIMA (2, 1, 2) was chosen as the best fit of the tentative ARIMA models examined. Although the first six had smaller AICs, their parameters were found to be insignificant. ARIMA (2, 1, 2) had all its parameters estimated significant as Table 7 shows.

3.2.2. Estimation

When estimating the parameters, R gave the following output for ARIMA (2, 1, 2) in Table 7(a). Then their significance was tested by use of p-value (see Table 7(b) for p-values of each parameter).

The parameters found significant were AR (1), AR (2), MA (1), and MA (2) at

Table 5. AIC for SES and DES.

Figure 4. Time Plot of undifferenced and differenced time series data. (a) Undifferenced time series plot; (b) First difference time series plot.

Figure 5. ACF and PACF of undifferenced and differenced time series data. (a) ACF for undifferenced data; (b) PACF for differenced data; (c) Undifferenced time series plot; (d) First difference time series plot.

Table 6. Measure of Accuracy for selected ARIMA models.

Note: *Number of significant parameters and lesser prediction errors

(a) (b)

Table 7. (a) Model Information for ARIMA (2,1,2); (b). p-values for estimated coefficients.

Note: *implies p-value < 0.05 hence significant coefficient.

Figure 6. Plot of fitted ARIMA (2, 1, 2) model.

5% significance level. Hence the fitted ARIMA (2, 1, 2) using equation 11 was;

${\overset{⌢}{X}}_{t} = 1.0907 ε_{t - 1} + 0.9983 ε_{t - 2} - 1.0536 X_{t - 1} - 0.9947 X_{t - 2}$ (18)

Figure 6 is a plot of the fitted model to the observed SCH imports which shows that the model fluctuates so closely to the actual SHC imports.

3.2.3. Diagnostic Checking

The model with best fit was identified by analysis of residuals to ensure they form a white noise process. The ACF of residual, the Q-Q plot and the histogram of residuals were used to show that the residuals of the fit form a white noise process. Figure 7 below shows that the residual are white noise and all p-values of the Ljung Box test are greater than 0.05. Hence ARIMA (2, 1, 2) is indeed the best fit model.

3.3. Discussion

The preceding sections revealed that of the three Exponential Smoothing techniques used for this analysis that is SES, DES and TED, SES was chosen as fitting the SHC imports data better than DES and TES. Its fitted model was estimated to be

$F_{t + 1} = 0.9104 Y_{t} + 0.0896 F_{t} .$

It was also revealed that ARIMA (2, 1, 2) fitted the data well as compared to other tentative ARIMA models suggested. ARIMA (2, 1, 2) was estimated to be

${\hat{Y}}_{t} = 1.0907 ε_{t - 1} + 0.9983 ε_{t - 2} - 1.0536 Y_{t - 1} - 0.9947 Y_{t - 2} .$

But then the question remains as to what is the best fit for the SHC imports

Figure 7. Diagnostic Checks.

Table 8. Criteria for selecting the better model between SES, DES, TES and ARIMA.

Note: Smaller error (*) implies better fit.

data of all the four considered in this report as highlighted in Figure 1. The accuracy of each fit was evaluated by using four metrics, as discussed in the preceding section. Each approach was applied to determine and rank the performances of the models for the given time series. Table 8 summarizes the four models and their forecasting performances.

The results indicate that the ARIMA model performs better than either of the other models for this given time series. The ARIMA (2, 1, 2) has more smaller prediction errors than the SES and so it was rightfully concluded that ARIMA (2, 1, 2) is the best model fit for the SHC imports data. Thus it can be used to even forecast future imports of SHCs.Note, however, that although the SES model exhibits the second best forecast after that of the ARIMA model, the performance of each model relies on the data used.

Here, it should be noted that differences between their performances are related to the differences between the methods of determining forecasts in the ES and in the ARIMA models. The forecasting method in the ES models relies on a weighted average of the past observed values in which the weights decline exponentially. This basically implies that the data for more recent observations contribute significantly more than the previous data does. The ARIMA model, however, has three parts: autoregression, integration and moving average, with the future value of a variable being a linear combination of the past values and the associated errors.

4. Forecasting

Forecasting is usually the last stage in time series analysis as stated in Figure 1. It plays a significant role in planning and decision making to policy makers. When both current and future events are taken into account, near perfect to perfect decisions are made by those in whom powers are bestowed of decision making. Thus, no matter how uncertain forecasts might appear, they need not be ignored and decision maker are compelled never to ignore forecasts because of their vital nature on the entire process. Hence, Table 9 shows forecasts of 18 months (from January 2017 to June 2018).

Figure 8 is a graphical representation of the forecast for ARIMA (2, 1, 2) for a future period of 18 months starting at January 2017.

5. Conclusion

Zambia largely depends on the international second-hand car market for their motor vehicle supply. In this paper, monthly time series data on second hand car

Table 9. ARIMA (2, 1, 2) forecasts for the next 18 months.

Figure 8. Plot of ARIMA (2, 1, 2) forecasts for the next 18 months.

(SHC) importation was analyzed using SES, DES, TES and ARIMA techniques. The quality of all the techniques was determined by comparing each one of the fitted model’s predictive power with the observed data. The results showed that ARIMA (2, 1, 2) was the best fit for the SHC importation because its errors were smaller than those of the SES, DES and TES. The four error measures used were RMSE, MAE, MPE and MAPE. The forecasts were also produced using the ARIMA (2, 1, 2) model for the next 18 months from January 2017. Although there is percentage increase of 90.6% from November 2015 to December 2016, the SHC importation generally has been on the decrease in Zambia with percentage change of 59.5% from January 2014 to December 2016. The forecasts also show a gradual percentage decrease of 1.12% by June 2018. Ultimately, these results can be used by Government departments like Zambia Revenue Authority and Road Development Agency in the bid to plan and execute their duties effectively.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Dargay, J., Dermot, G. and Martin, S. (2007) Vehicle Ownership and Income Growth, Worldwide: 1960-2030. Energy Journal, 28, 143-170. [CrossRef]
[2]	Davis, L.W. and Kahn, M.E. (2011) Cash for Clunkers? The Environmental Impact of Mexico’s Demand for Used Vehicles. Access, No. 38, 15. http://www.accessmagazine.org/articles/spring-2011/cash-clunkers-environmental-impact-mexicos-demand-used-vehicles/
[3]	Chikuba, Z. (2014) Zambia Institute for Policy Analysis and Research (ZIPAR). Used Motor Vehicle Imports and the Impact on Transportation in Zambia. Working Paper No. 21.
[4]	Kamau, H. (2014) Trade in Second-Hand Vehicles: Sustainable Transport Africa.
[5]	Gardener, E.S. (1985) Exponential Smoothing—The State of the Art. Journal of Forecasting, 4, 1-28. [CrossRef]
[6]	Box, G. and Jenkins, G. (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco.
[7]	Tularam, G.A. and Saeed, T. (2016) Oil-Price Forecasting Based on Various Univariate Time-Series Models. American Journal of Operations Research, 6, 226-235. [CrossRef]

	customer@scirp.org
	+86 18163351462 (WhatsApp)
	1655362766
	SCIRP WeChat

Journals Menu

Home

About SCIRP

Service

Policies