White Noise Analysis: A Measure of Time Series Model Adequacy

doi:10.4236/am.2019.1011069

Applied Mathematics
Vol.10 No.11(2019), Article ID:96506,15 pages
10.4236/am.2019.1011069

Imoh Udo Moffat¹, Emmanuel Alphonsus Akpan²

●How to Cite this Article

¹Department of Mathematics and Statistics, University of Uyo, Uyo, Nigeria

²Department of Mathematical Science, Abubakar Tafawa Balewa University, Bauchi, Nigeria

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: September 2, 2019; Accepted: November 18, 2019; Published: November 21, 2019

ABSTRACT

The purpose of this study is to apply white noise process in measuring model adequacy targeted at confirming the assumption of independence. This ensures that no autocorrelation exists in any time series under consideration, and that the autoregressive integrated moving average (ARIMA) model entertained is able to capture the linear structure in such series. The study explored the share price series of Union bank of Nigeria, Unity bank, and Wema bank obtained from Nigerian Stock Exchange from January 3, 2006 to November 24, 2016 comprising 2690 observations. ARIMA models were used to model the linear dependence in the data while autocorrelation function (ACF), partial autocorrelation function (PACF), and Ljung-Box test were applied in checking the adequacy of the selected models. The findings revealed that ARIMA(1,1,0) model adequately captured the linear dependence in the return series of both Union and Unity banks while ARIMA(2,1,0) model was sufficient for that of Wema bank. Also, evidence from ACF, PACF and Ljung-Box test revealed that the residual series of the fitted models were white noise, thus satisfying the conditions for stationarity.

Keywords:

Autocorrelation, Model Identification, Model Estimation, Diagnostic Checking, Time Series

1. Introduction

The fundamental building block of time series is stationarity and basically, the idea behind stationarity is that the probability laws that govern the behaviour of the process do not change overtime. This is to ensure that the time series process is in a state of statistical equilibrium and would in turn enhance a statistical setting for describing and making inferences about the structure of data that somehow fluctuate in a random manner [1] [2] [3]. According to [3], a process is said to be strictly stationary if the whole probability structure must depend only on time differences. A less restrictive requirement, called weak stationarity of order k, is that the moments up to some order k depends only on time lags and that the second order stationarity plus an assumption of normality are sufficient to produce strict stationarity (see also, [4] [5]). For simplicity, a time series is said to be stationary, if it has a mean, variance and autocovariance function that are constant over time (see [6]). Moreover, one most important and fundamental example of a stationary process is the white noise process which is defined as a sequence of independent (uncorrelated) and identically distributed random variables with zero mean and constant variance [2] [3] [5]. Thus, the white noise process is particularly important and constitutes an essential bedrock in time series model building.

In this study, our aim is to apply white noise process in measuring model adequacy targeted at confirming independence assumption, which ensures that no autocorrelation exists in the time series considered and that the ARMA model entertained is able to capture the linear structure in the dataset.

The motivation stems from the fact that the problem of statistical modeling is to achieve parsimony (i.e. the principle of model selection with the possibility of having the smallest number of parameters that completely express the linear dependence structure, providing better prediction, and generalization of new observations) conditional on the restriction of model adequacy.

Testing for model adequacy or diagnostic checking as defined by [7] incorporates all relevant information and when calibrated to the data no important significant departures from statistical assumptions made can be found. Actually, model adequacy involves residual analysis and overfitting. In time series modeling, a good model parameter estimates must be reasonably close to the true values, should have the dependence structure of the data adequately captured, and should also produce residuals that are approximately uncorrelated [2] [6] [8]. These residuals are obtained by taking the difference between an observed value of a time series and a predicted value from fitting a candidate model to the data. They are useful in checking whether a model has adequately captured the information in the data. According to [6], model adequacy is related primarily to the assumption that residuals are independent. Moreover, if the residuals of a given model are correlated, the model must be refined because it does not completely capture the statistical relationship amongst the time series [2]. Furthermore, a model is said to be adequate if the residuals are statistically independent implying that the residual series is uncorrelated. Therefore, in testing for model adequacy, which is mainly to check for independence of the residual series, an autocorrelation function (ACF), Partial autocorrelation function (ACF) and Ljung-Box test on the residuals are considered.

Another adequacy checking tool is overfitting, which has to do with adding another coefficient to a fitted model so as to see if the resulting model is better. The following are identified as the implications of fitting and overfitting:

1) Specify the original model carefully. If a simple model seems promising, check it out before trying a more complicated model.

2) When overfitting, do not increase the orders of both the autoregressive (AR) and moving average (MA) parts of the model simultaneously.

3) Extend the model in directions suggested by the analysis of the residuals. However, one setback of overfitting is the tendency of the violation of the principle of parsimony [2] [6].

Model adequacy has also been explored by the following studies: [7] - [17].

In addition, the remaining part of this work is organized as follows; Section 2 takes care of the methodology, followed by the results and then the discussion in Section 3, while the conclusion of the overall results is handled in Section 4.

2. Methodology

2.1. Return Series

The return series, $R_{t}$ , can be obtained given that $P_{t}$ is the price of a unit share at time, t, while $P_{t - 1}$ is the share price at time, t − 1.

$R_{t} = \nabla \ln (P_{t}) = (1 - B) \ln (P_{t}) = \ln (P_{t}) - \ln (P_{t - 1})$ (1)

here, $R_{t}$ is regarded as a transformed series of the share price, $P_{t}$ , meant to attain stationarity while B is the backshift operator. Thus, both the mean and the variance of the series are stable [18] [19].

2.2. Autoregressive Integrated Moving Average (ARIMA) Model

In [3] the extension of ARMA model to deal with homogenous non-stationary time series in which $X_{t}$ , itself is non-stationary but its $d^{t h}$ difference is a stationary ARMA model. Denoting the $d^{t h}$ difference of $X_{t}$ by

$φ (B) = ϕ (B) \nabla^{d} X_{t} = θ (B) ε_{t}$ (2)

where $φ (B)$ is the nonstationary autoregressive operator such that d of the roots of $φ (B) = 0$ are unity and the remainder lie outside the unit circle. $ϕ (B)$ is a stationary autoregressive operator (see also, [20] [21]).

2.3. Stationarity

The foundation of time series analysis is stationarity. Consider a finite set of return variables ${R_{t_{1}}, R_{t_{2}}, \dots, R_{t_{n}}}$ from a time series process, ${R (t) : t = 0, + 1, + 2, \dots}$ . The k-dimensional distribution function is defined as

$F_{R_{t_{1}} \dots R_{t_{k}}} (r_{1}, r_{2}, \dots, r_{k}) = P {R_{t_{1}} \leq r_{1}, R_{t_{2}} \leq r_{2}, \dots, R_{t_{k}} \leq r_{k}},$ (3)

where $r_{j}, j = 1, 2, \dots, k$ are any real numbers.

A process is said to be:

1) first-order stationary in distribution if its one-dimensional distribution is time invariant. That is, if

$F_{R_{t_{1}}} (r_{1}) = F_{R_{t_{1} + k}} (r_{1})$ , (4)

for any integers $t_{1}, k$ and $t_{1} + k$ .

2) second-order stationary in distribution if

$F_{R_{t_{1}}, R_{t_{2}}} (r_{1}, r_{2}) = F_{R_{t_{1} + k}, R_{t_{2} + k}} (r_{1}, r_{2}),$ (5)

for any integers $t_{1}, t_{2}, k, t_{1} + k$ and $t_{2} + k$ .

3) the $n^{t h}$ -order stationary in distribution if

$F_{R_{t_{1}}, R_{t_{2}}, \dots, R_{t_{n}}} (r_{1}, \dots, r_{n}) = F_{R_{t_{1} + k}, R_{t_{2} + k}, R_{t_{n} + k}} (r_{1}, \dots, r_{n}),$ (6)

for any $(t_{1}, \dots, t_{n})$ and k integers.

A process is said to be strictly stationary if (3.6) is true for any n, that is, $n = 1, 2, \dots$

According to [3], a process ${R_{t}}$ is weakly stationary if the mean $E (R_{t}) = μ$ is a fixed constant for all t and the autocovariances $C o v (R_{t}, R_{t + k}) = γ_{k}$ depends only on the time difference or time lag k for all t.

Stationary in the wide sense or covariance stationary is also referred to as second-order stationary process.

2.4. White Noise Process

A process ${a_{t}}$ is called a white noise process if it is a sequence of uncorrelated random variables from a fixed distribution with constant mean, $E (a_{t}) = μ_{a}$ , usually assumed to be zero, constant variance, $V a r (a_{t}) = σ_{a}^{2}$ and $γ_{k} = C o v (a_{t}, a_{t + k}) = 0$ , for all $k \neq 0$ . It is denoted by $a_{t} ~ WN (0, σ_{a}^{2})$ , where WN stands for white noise [5]. By definition, a white noise process ${a_{t}}$ is stationary with autocovariance function,

$γ_{k} = {\begin{cases} σ_{a}^{2}, k = 0, \\ 0, k \neq 0. \end{cases}$ (7)

The autocorrelation function is given as:

$ρ_{k} = {\begin{cases} 1, k = 0, \\ 0, k \neq 0. \end{cases}$ (8)

while the partial autocorrelation function is

$φ_{k k} = {\begin{cases} 1, k = 0, \\ 0, k \neq 0. \end{cases}$ (9)

Thus, the implication of a white noise specification is that the ACF and PACF are identically equal to zero.

2.5. Autocovariance and Autocorrelation Functions

According to [5], covariance between $R_{t}$ and $R_{t + k}$ denoted by $C o v (R_{t}, R_{t + k})$ , which is a function of the time difference, k, is called the autocovariance function ${γ_{k}}$ of the stochastic process. As a function of k, $γ_{k}$ is called the autocovariance function in the time series analysis since it represents the covariance between $R_{t}$ and $R_{t + k}$ from the same process. It is defined as

$γ_{k} = C o v (R_{t}, R_{t + k}) = E (R_{t} - μ) (R_{t + k} - μ) .$ (10)

The sample estimate of $γ_{k}$ is $C_{k}$ given by

$C_{k} = \frac{1}{n} \sum_{t = 1}^{n - k} (R_{t} - \bar{R}) (R_{t + k} - \bar{R})$ (11)

Similarly, the correlation between $R_{t}$ and $R_{t + k}$ denoted by $C o r r (R_{t}, R_{t + k})$ , which is a function of the time difference, k, is called the autocorrelation function ${ρ_{k}}$ of the stochastic process. As function of k, $ρ_{k}$ is called the autocorrelation function in time series analysis since it represents the correlation between $R_{t}$ and $R_{t + k}$ from the same process. It is defined as

$ρ_{k} = \frac{C o v (R_{t}, R_{t + k})}{\sqrt{V a r (R_{t})} \sqrt{V a r (R_{t + k})}} = \frac{γ_{k}}{γ_{0}}$ (12)

The corresponding sample estimate is given by

${\hat{ρ}}_{k} = \frac{C_{k}}{C_{0}}, k = 0, 1, 2, \dots$ (13)

2.6. Partial Autocorrelation Function (PACF)

The conditional correlation between $R_{t}$ and $R_{t + k}$ after their mutual linear dependency on the intervening variables $(R_{t + 1}, R_{t + 2}, \dots, R_{t + k - 1})$ has been removed, given by $C o r r (R_{t}, R_{t + k} / R_{t + 1}, R_{t + 2}, \dots, R_{t + k - 1})$ , is usually referred to as the partial autocorrelation in time series analysis ([5]).

Partial autocorrelation can be derived from the regression model, where the dependent variable, $R_{t + k}$ , from a zero-mean stationary process is regressed on k-lagged variables $R_{t + k - 1}, R_{t + k - 2}, \dots$ and $R_{t}$ , that is

$R_{t + k} = φ_{k 1} R_{t + k - 1} + φ_{k 2} R_{t + k - 2} + \dots + φ_{k k} R_{t} + α_{t + k},$ (14)

where $φ_{k i}$ denotes the $i^{t h}$ regression parameter and $α_{t + k}$ is an error term with mean zero and uncorrelated with $R_{t + k - j}$ , for $j = 1, 2, \dots, k$ . Multiplying $R_{t + k - j}$ on both sides of the above regression equation and taking the expectation, we get

$γ_{j} = φ_{k 1} γ_{j - 1} + φ_{k 2} γ_{j - 2} + \dots + φ_{k k} γ_{j - k}$ . (15)

Hence,

$ρ_{j} = φ_{k 1} ρ_{j - 1} + φ_{k 2} ρ_{j - 2} + \dots + φ_{k k} ρ_{j - k}$ . (16)

2.7. Diagnostic Checking of Linear Time Series Models

Diagnostic checking is applied with an objective of uncovering a possible lack-of-fit of the tentative model and possibly unraveling the cause of such a case. If no lack-of-fit is indicated, the model is ready for use. In other words, if any inadequacy is found, the iterative cycle of identification, estimation and diagnostic checking is repeated until a suitable and appropriate representation is obtained.

Once the parameters of the tentative models have been estimated, we check whether or not the residuals obtained from the estimated equation are approximately white noise. This is done by examining the ACF and PACF of the residuals to see whether they are statistically insignificant, that is, within two standard deviations at 5% level of significance. If the residuals are approximately white noise, the model may be entertained provided the parameters are significantly different from zero.

The Portmanteau lack-of-fit test uses the residual sample ACFs as a unit to check the joint null hypothesis test, which requires that several autocorrelations of $a_{t}$ are zero. [17] proposed the Portmanteau statistics given as:

$Q^{*} (m) = T \sum_{l = 1}^{m} {\hat{ρ}}_{l}^{2}$ , (17)

where T is the number of observations.

A test statistic for the null hypothesis, $H_{0} : ρ_{1} = \dots = ρ_{m} = 0$ , against the alternative, $H_{a} : ρ_{i} \neq 0$ , for some $i \in {1, \dots, m}$ under the assumption that ${a_{t}}$ is an i.i.d. sequence with certain moment conditions while $Q^{*} (m)$ is asymptotically a Chi-square random variable with m degrees of freedom.

[22] modified the $Q^{*} (m)$ statistic to increase the power of the test in finite samples as follows:

$Q (m) = T (T + 2) \sum_{l = 1}^{m} \frac{{\hat{ρ}}_{l}^{2}}{T - l}$ , (18)

where T is the number of observations.

The decision rule is to reject $H_{0}$ if $Q (m) > χ_{α}^{2}$ , where $χ_{α}^{2}$ denotes the 100(1 – α)th percentile of a Chi-squared distribution with m – (p + q) degrees of freedom. The decision rule can also reject $H_{0}$ if the p-value is less than or equal to α, the significance level.

In practice, the selection of m may affect the performance of the Q(m) statistic. The choice $m \approx \ln (T)$ provides better power performance [4].

3. Results and Discussion

3.1. Dataset

Data collection was based on secondary source from the records of Nigerian Stock Exchange. The data on the daily closing share prices of the sampled banks (Union bank, Unity bank and Wema bank) from January 3, 2006 to November 24, 2016 were obtained from the Nigerian Stock Exchange [23] and delivered through contactcentre@nigerianstockexchange.com.

3.2. Interpretation of Time Plot

Figures 1-3 represent the share price series for the three banks. The share prices of all the banks do not fluctuate around a common mean, which clearly indicates the presence of a stochastic trend in the share prices, and is also an indication of non-stationarity. Since the share price series are found to be non-stationary, the first difference of the natural logarithm of share price series is taken to obtain stationary (returns) series. The inclusion of the log transformation is to stabilize the variance. Figures 4-6 show that the returns series appear to be stationary.

Figure 1. Share price series of union bank of Nigeria.

Figure 2. Share price series of unity bank.

Figure 3. Share price series of wema bank.

Figure 4. Return series of union bank of Nigeria.

Figure 5. Return series of unity bank.

Figure 6. Return series of wema bank.

3.3. Building Autoregressive Integrated Moving Average (ARIMA) Model

3.3.1. Building Autoregressive Integrated Moving Average (ARIMA) Model of Union Bank of Nigeria

1) Model identification

From Figure 7 and Figure 8, both ACF and PACF indicate that mixed model could be entertained. The following models; ARIMA(1,1,0), ARIMA(0,1,1) and ARIMA(1,1,1) were considered tentatively.

Figure 7. ACF of return series of union bank of Nigeria.

Figure 8. PACF of return series of union bank of Nigeria.

2) Estimation of parameters

From Table 1, ARIMA(1,1,0) model is selected based on the grounds of significance of the parameters and minimum AIC.

3) Diagnostic checking of the model

From Figure 9 and Figure 10, all the lags coefficients of ACF and PACF are within the significance bands, that is, they are zero implying that the residual series of ARIMA(1,1,0) model appears to be a white noise series, that is, the series is independent and identically distributed with mean zero and constant variance.

Evidence from Ljung-Box Q-statistics in Table 2 shows that ARIMA(1,1,0) model is adequate at 5% level of significance given the Q-statistic at Lags 1, 4, 8 and 24. That is, the hypothesis of no autocorrelation is not rejected. Thus, confirming the independence of residual series.

3.3.2. Building Autoregressive Integrated Moving Average (ARIMA) Model of Unity

Bank

1) Model identification

From Figure 11 and Figure 12, both ACF and PACF indicate that mixed model could be entertained. The following models; ARIMA(1,1,0), ARIMA(0,1,1) and ARIMA(1,1,1) were considered tentatively.

2) Estimation of parameters

From Table 3, ARIMA(1,1,0) model is selected based on the grounds of significance of the parameters and minimum AIC.

3) Diagnostic checking of the model

From Figure 13 and Figure 14, all the lags coefficients of ACF and PACF are within the significance bands except lag 9, that is, they are zero implying that the residual series of ARIMA(1,1,0) model appears to be a white noise series, that is, the series is independent and identically distributed with mean zero and constant variance.

Evidence from Ljung-Box Q-statistics in Table 4 shows that ARIMA(1,1,0) model is adequate at 5% level of significance given the Q-statistic at Lags 1, 4, 8 and 24. That is, the hypothesis of no autocorrelation is not rejected. Hence, confirming the independence of the residual series.

3.3.3. Building Autoregressive Integrated Moving Average (ARIMA) Model of Wema Bank

1) Model identification

From Figure 15 and Figure 16, both ACF and PACF indicate that mixed model could be entertained. The following models; ARIMA(1,1,0), ARIMA(2,1,0), ARIMA(0,1,2) and ARIMA(2,1,1) were considered tentatively.

2) Estimation of parameters

From Table 5, ARIMA(2,1,0) model is selected based on the grounds of significance of the parameters and minimum AIC.

3) Diagnostic checking of the model

From Figure 17 and Figure 18, all the lags coefficients of ACF and PACF are within the significance bands, that is, they are zero implying that the residual series of ARIMA(2,1,0) model appears to be a white noise series, that is, the series is independent and identically distributed with mean zero and constant variance.

Evidence from Ljung-Box Q-statistics in Table 6 shows that ARIMA(2,1,0) model is adequate at 5% level of significance given the Q-statistic at Lags 1, 4, 8 and 24. That is, the hypothesis of no autocorrelation is not rejected. Thus, confirming the independence of the residual series.