Some Applications of Higher Moments of the Linear Gaussian White Noise Process

The Linear Gaussian white noise process is an independent and identically distributed (iid) sequence with zero mean and finite variance with distribution N (0, σ2 ) . Hence, if X1, x2, …, Xn is a realization of such an iid sequence, this paper studies in detail the covariance structure of X1d, X2d, …, Xnd, d=1, 2, …. By this study, it is shown that: 1) all powers of a Linear Gaussian White Noise Process are iid but, not normally distributed and 2) the higher moments (variance and kurtosis) of Xtd, d=2, 3, … can be used to distinguish between the Linear Gaussian white noise process and other processes with similar covariance structure.

Share and Cite:

Iwueze, I. , Arimie, C. , Iwu, H. and Onyemachi, E. (2017) Some Applications of Higher Moments of the Linear Gaussian White Noise Process. Applied Mathematics, 8, 1918-1938. doi: 10.4236/am.2017.812136.

1. Introduction

The objective of estimation procedures is to produce residuals (the estimated noise sequence) with no apparent deviations from stationarity, and in particular with no dependence among these residuals. If there is no dependence among these residuals, then we can regard them as observations of independent random variables; there is no further modeling to be done except to estimate their mean and variance. If there is significant dependence among the residuals, then we need to look for the noise sequence that accounts for the dependence  .

In this paper, we examine the covariance structure of powers of the noise sequence when the noise sequence is assumed to be independent and identically distributed normal (Gaussian) random variates with mean zero and finite variance, ${\sigma }^{\text{​}2}>0$ . Some simple tests for checking the hypothesis that the residuals and their powers are observed values of independent and identically distributed random variables are also considered. Also considered are tests for normality of the residuals and their powers.

The stochastic process ${X}_{t},t\in T$ is said to be strictly stationary if the distribution function is time invariant. That is;

$F\left({x}_{{t}_{1}},{x}_{{t}_{2}},\cdots ,{x}_{{t}_{m}}\right)=F\left({x}_{{t}_{1}+k},{x}_{{t}_{2}+k},\cdots ,{x}_{{t}_{m}+k}\right)$ (1.1)

where

$F\left({x}_{{t}_{1}},{x}_{{t}_{2}},\cdots ,{x}_{{t}_{m}}\right)=P\left({X}_{{t}_{\text{​}1}}\le {x}_{{t}_{1}},{X}_{{t}_{2}}\le {x}_{{t}_{2}},\cdots ,{X}_{{t}_{m}}\le {x}_{{t}_{m}}\right)$ (1.2)

That is, the probability measure for the sequence $〈{X}_{t}〉$ is the same as that for $〈{X}_{t+k}〉$ for all k. If a series satisfies the next three equations, it is said to be weakly or covariance stationary.

$\begin{array}{l}1.\text{\hspace{0.17em}}E\left({X}_{t}\right)=\mu ,\text{\hspace{0.17em}}t=1,2,\cdots ,\infty \\ 2.\text{\hspace{0.17em}}E\left[\left({X}_{t}-\mu \right)\left({X}_{t}-\mu \right)\right]={\sigma }^{\text{​}2}<\infty \\ 3.\text{\hspace{0.17em}}E\left[\left({X}_{{t}_{1}}-\mu \right)\left({X}_{{t}_{2}}-\mu \right)\right]=R\left({t}_{\text{​}2}-{t}_{\text{​}1}\right)\end{array}\right\}$ (1.3)

If the process is covariance stationary, all the variances are the same and all the covariances depend on the difference between ${t}_{1}$ and ${t}_{2}$ . The moments

$E\left[\left({X}_{t}-\mu \right)\left({X}_{t+k}-\mu \right)\right]=R\left(k\right),\text{\hspace{0.17em}}k=0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}.\cdots$ (1.4)

are known as the autocovariance function. The autocorrelations which do not depend on the units of measurements of ${X}_{t}$ are given by

$\rho \left(k\right)=\frac{R\left(k\right)}{R\left(0\right)},\text{\hspace{0.17em}}k=0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}\cdots$ (1.5)

A stochastic process ${X}_{t},t\in Z$ , where $Z=〈\cdots ,\text{\hspace{0.17em}}-1,\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}\cdots 〉$ , is called a white noise if with finite mean and variance all the autocovariances (1.4) are zero except at lag zero [ $R\left(k\right)=0$ , for $k\ne 0$ ]. In many applications, ${X}_{t},t\in Z$ is assumed to be normally distributed with mean zero and variance, ${\sigma }^{\text{​}2}<\infty$ , and the series is called a linear Gaussian white noise process if:

$\begin{array}{l}E\left({X}_{t}\right)=0\\ \mathrm{var}\left({X}_{t}\right)={\sigma }^{2}\\ R\left(k\right)=\left\{\begin{array}{l}{\sigma }^{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{otherwise}\end{array}\\ \rho \left(k\right)=\left\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{otherwise}\end{array}\end{array}\right\}$ (1.6)

and

${\varphi }_{kk}=corr\left({X}_{t},{X}_{t+k}/{X}_{t+1},\text{\hspace{0.17em}}{X}_{t+2},\cdots ,{X}_{t+k-1}\right)=\left\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0\\ 0,\text{\hspace{0.17em}}\text{otherwise}\end{array}$ (1.7)

where ${\varphi }_{kk}$ is known as the partial autocorrelation function. For large n, the sample autocorrelations:

${\stackrel{^}{\rho }}_{X}\left(k\right)=\frac{\underset{t=1}{\overset{n-k}{\sum }}\left({X}_{t}-\stackrel{¯}{X}\right)\left({X}_{t+k}-\stackrel{¯}{X}\right)}{\underset{t=1}{\overset{n}{\sum }}{\left({X}_{t}-\stackrel{¯}{X}\right)}^{\text{​}2}}$ (1.8)

of an iid sequence ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ with finite variance are approximately distributed as $N\left(0,\frac{1}{n}\right)$    . We can use this to do significance tests for the

autocorrelation coefficients by constructing a confidence interval. Here ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ is a realization of such an iid sequence, about $100\left(1-\alpha \right)%$ of the sample autocorrelations should fall between the bounds:

$±\frac{{Z}_{1-\frac{\alpha }{2}}}{\sqrt{n}}$ (1.9)

where ${Z}_{1-\frac{\alpha }{2}}$ is the $1-\frac{\alpha }{2}$ quartile of the normal distribution. If the null and alternative hypothesis are:

${H}_{\text{​}0}:{\rho }_{X}\left(k\right)=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ne 0\text{\hspace{0.17em}}\text{ }\text{ }\text{and}\text{\hspace{0.17em}}\text{ }\text{ }{H}_{\text{​}1}:{\rho }_{X}\left(k\right)\ne 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}\text{some}\text{\hspace{0.17em}}k\ne 0$ (1.10)

where ${\rho }_{X}\left(k\right)$ are autocorrelations at lag k computed for ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ .

We can also test the joint hypothesis that all m of the ${\rho }_{X}\left(k\right)$ correlation coefficients are simultaneously equal to zero. The null and alternative hypothesis are:

${\text{H}}_{0}:{\rho }_{X}\left(1\right)={\rho }_{X}\left(2\right)=\cdots ={\rho }_{X}\left(m\right)=0\text{\hspace{0.17em}}\text{ }\text{and}\text{\hspace{0.17em}}\text{ }{\text{H}}_{1}:{\rho }_{X}\left(i\right)\ne 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}i=1,2,\cdots ,m$ (1.11)

The most popular test for (1.11) is the  portmanteau test which admits the following form

${Q}_{BP}\left(m\right)=n\underset{k=1}{\overset{m}{\sum }}{\left[{\stackrel{^}{\rho }}_{X}\left(k\right)\right]}^{2}$ (1.12)

where m is the so-called lag truncation number  and (typically) assumed to be fixed  . Under the assumption that ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ is an iid sequence, ${Q}_{BP}\left(m\right)$ is asymptotically a chi-squared random variable with m degree of freedom.  modified the $Q\left(m\right)$ statistic to increase the power of the test in finite samples as

${Q}_{LB}\left(m\right)=n\left(n+2\right)\underset{k=1}{\overset{m}{\sum }}\left(\frac{{\left[{\stackrel{^}{\rho }}_{X}\left(k\right)\right]}^{2}}{n-k}\right)$ (1.13)

Several values of m are often used and simulation studies suggest that the choice of $m\approx \mathrm{ln}\left(n\right)$ provides better power performance  .

Another Portmanteau test formulated by  can be used as a further test for iid hypothesis, since if the data are iid, then the squared data are also iid. It is based on the same statistic used for the Ljung-Box test as

${Q}_{ML}\left(m\right)=n\left(n+2\right)\underset{k=1}{\overset{m}{\sum }}\left(\frac{{\left[{\stackrel{^}{\rho }}_{{X}^{2}}\left(k\right)\right]}^{2}}{n-k}\right)$ (1.14)

where the sample autocorrelations of the data are replaced by the sample autocorrelations of the squared data, ${\stackrel{^}{\rho }}_{{X}^{2}}\left(k\right)$ .

According to  , the methodology for testing for white noise can be roughly divided into two categories: time domain tests and frequency domain tests. Other time domain tests include the turning point test, the difference-sign test, the rank test  . Another time domain test is to fit an autoregressive model to the data and choosing the order which minimizes the AICC statistic. A selected order equal to zero suggests that the data is white noise  .

Let

${f}_{x}\left(\omega \right)=\frac{1}{2\text{π}}\underset{k=0}{\overset{\infty }{\sum }}{\rho }_{x}\left(k\right){\text{e}}^{ik\omega },\text{\hspace{0.17em}}\text{\hspace{0.17em}}\omega \in \left[-\text{π},\text{π}\right]$ (1.15)

be the normalized spectral density of ${X}_{t},t\in Z$ . The normalized spectral density function for the linear Gaussian white noise process is

${f}_{x}\left(\omega \right)=\frac{1}{\text{2π}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\omega \in \left[-\text{π},\text{π}\right]$ (1.16)

The equivalent frequency domain expressions to H0 and H1 are

H0: ${f}_{x}\left(\omega \right)=\frac{1}{\text{2π}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\omega \in \left[-\text{π},\text{π}\right]$ and H1: ${f}_{x}\left(\omega \right)\ne \frac{1}{\text{2π}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\omega \in \left[-\text{π},\text{π}\right]$ (1.17)

In the frequency domain,  proposed test statistics based on the famous ${U}_{p}$ and ${T}_{p}$ processes  , and a rigorous theoretical treatment of their limiting distributions was provided by  . Some contributions to the frequency domain tests can be found in  and  , among others. This study will concentrate on the time domain approach only.

A stochastic process ${X}_{t},t\in Z$ may have the covariance structure (1.6) even when it is not the linear Gaussian white noise process. Examples are found in the study of bilinear time series processes   . Researchers are often confronted with the choice of the linear Gaussian white noise process for use in constructing time series models or generating other stationary processes in simulation experiments. The question now is, “How do we distinguish between the linear Gaussian white noise process from other processes with similar covariance structure”? Additional properties of the linear Gaussian white noise process are needed for proper identification and characterization of the process from other processes with similar covariance structure. Therefore, the ultimate aim of this study is on the use of higher moments for the acceptability of the linear Gaussian white noise process. The first moment (mean) and second or higher moments (variance, covariances, skewness and kurtosis) of powers of the linear Gaussian white noise process was established in Section 2. The methodology was discussed in Section 3, the results are contained in Section 4 while Section 5 is the conclusion.

2. Mean, Variance and Covariances of Powers of the Linear Gaussian White Noise Process

2.1. Mean of Powers of the Linear Gaussian White Noise Process

Let ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots$ , where ${X}_{t},t\in Z$ is the linear Gaussian white noise process. The expected value of ${Y}_{t},t\in Z$ $\left[E\left({Y}_{\text{​}t}\right)=E\left({X}_{t}^{d}\right)\right]$ are needed for the effective determination of the variance and covariance structure of ${Y}_{\text{​}t}$ . Lemma 2.1 gives the required result.

Lemma 2.1: Let ${X}_{t},t\in Z$ be a linear Gaussian white noise process with mean zero and variance ${\sigma }^{\text{​}2}>0$ ( ${X}_{t}$ follows iid $N\left(0,{\sigma }^{2}\right)$ ), then

$E\left({X}_{t}^{d}\right)=\left\{\begin{array}{l}{\sigma }^{2m}\left(2m-1\right)!!,\text{\hspace{0.17em}}d=2m,\text{\hspace{0.17em}}m=1,2,\cdots \\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}d=2m+1,\text{\hspace{0.17em}}m=0,1,2,\cdots \end{array}$ (2.1)

where 

$\left(2m-1\right)!!=1×3×5×7×\cdots ×\left(2m-1\right)=\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)$ (2.2)

Proof:

Let ${X}_{t}=Z~N\left(0,{\sigma }^{2}\right)$ , then

$f\left(z\right)=\frac{1}{\sigma \sqrt{2\text{π}}}{\text{e}}^{\frac{-{z}^{2}}{2{\sigma }^{2}}};\text{\hspace{0.17em}}-\infty 0$ (2.3)

Note that

$E\left({Z}^{d}\right)={\int }_{-\infty }^{\infty }{z}^{d}f\left(z\right)\text{d}z$ (2.4)

$={\int }_{-\infty }^{\infty }{z}^{d}\frac{1}{\sigma \sqrt{2\text{π}}}{\text{e}}^{\frac{-{z}^{2}}{2{\sigma }^{2}}}\text{d}z$ (2.5)

1) Case 1: $d=2m\text{\hspace{0.17em}}\left( even \right)$

Equation (2.5) reduces to

$E\left({Z}^{d}\right)=2{\int }_{0}^{\infty }{z}^{d}\frac{1}{\sigma \sqrt{2\text{π}}}{\text{e}}^{\frac{-{z}^{2}}{2{\sigma }^{2}}}\text{d}z$ (2.6)

Let $y=\frac{{z}^{2}}{2{\sigma }^{2}}⇒{z}^{\text{​}2}=2{\sigma }^{2}y⇒z=\left(\sigma \sqrt{2}\right){y}^{\frac{1}{2}}$

$\frac{\text{d}z}{\text{d}y}=\left(\sigma \sqrt{2}\right)\cdot \frac{1}{2}\cdot {y}^{-\frac{1}{2}}=\left(\frac{\sqrt{2}}{2}\right)\sigma {y}^{-\frac{1}{2}}=\left(\frac{1}{\sqrt{2}}\right)\sigma {y}^{-\frac{1}{2}}=\left(\frac{\sigma }{\sqrt{2}}\right){y}^{-\frac{1}{2}}$

$\text{d}z=\left(\frac{\sigma {y}^{-\frac{1}{2}}}{\sqrt{2}}\right)\text{d}y$ (2.7)

$\begin{array}{c}E\left({Z}^{\text{​}d}\right)=\frac{2}{\sigma \sqrt{\text{2π}}}{{\int }_{0}^{\infty }\left[\sigma \sqrt{2}{y}^{\frac{1}{2}}\right]}^{2m}{\text{e}}^{\text{​}-\text{​}\text{​}\text{​}\text{​}y}\left(\frac{\sigma {y}^{-\frac{1}{2}}}{\sqrt{2}}\right)\text{d}y\\ =\frac{{2}^{m}{\sigma }^{2m}}{\sqrt{\text{π}}}{\int }_{0}^{\infty }{y}^{m-\frac{1}{2}}{\text{e}}^{-y}\text{d}y\end{array}$ (2.8)

The integral in Equation (2.8) is a gamma function $\left[{\int }_{0}^{\infty }{w}^{t-1}{\text{e}}^{-w}\text{d}w=\Gamma \left(t\right)\right]$  and by definition

$E\left({Z}^{d}\right)=\frac{{2}^{m}{\sigma }^{2m}}{\sqrt{\text{π}}}\Gamma \left(m+\frac{1}{2}\right)$ (2.9)

$\begin{array}{c}\Gamma \left(m+\frac{1}{2}\right)=\frac{\left[1×3×5×7×\cdots ×\left(2m-1\right)\right]\Gamma \left(\frac{1}{2}\right)}{{2}^{m}}\\ =\frac{\left[1×3×5×7×\cdots ×\left(2m-1\right)\right]\sqrt{\text{π}}}{{2}^{m}}\\ =\frac{\sqrt{\text{π}}×\left(2m-1\right)!!}{{2}^{m}}\end{array}$ (2.10)

Thus

$E\left({Z}^{d}\right)=\frac{{2}^{m}{\sigma }^{2m}}{\sqrt{\text{π}}}\cdot \frac{\sqrt{\text{π}}\left(2m-1\right)!!}{{2}^{m}}={\sigma }^{2m}\left(2m-1\right)!!$ (2.11)

2) Case II: $d=2m+1\text{\hspace{0.17em}}\left( odd \right)$

$\begin{array}{c}E\left({Z}^{\text{​}d}\right)=\frac{1}{\sigma \sqrt{\text{2π}}}{\int }_{-\infty }^{\infty }{z}^{d}{\text{e}}^{-\frac{{z}^{2}}{2{\sigma }^{\text{​}2}}}\text{d}z\\ =\frac{1}{\sigma \sqrt{\text{2π}}}{\int }_{-\infty }^{0}{z}^{d}{\text{e}}^{-\frac{{Z}^{2}}{2{\sigma }^{\text{​}2}}}\text{d}z+\frac{1}{\sigma \sqrt{\text{2π}}}{\int }_{0}^{\infty }{z}^{d}{\text{e}}^{-\frac{{z}^{2}}{2{\sigma }^{\text{​}2}}}\text{d}z\\ =\frac{1}{\sigma \sqrt{\text{2π}}}{\int }_{0}^{\infty }{z}^{d}{\text{e}}^{-\frac{{z}^{2}}{2{\sigma }^{\text{​}2}}}\text{d}z-\frac{1}{\sigma \sqrt{\text{2π}}}{\int }_{0}^{\infty }{z}^{d}{\text{e}}^{-\frac{{z}^{2}}{2{\sigma }^{\text{​}2}}}\text{d}z=0\end{array}$ (2.12)

Thus

$E\left({Z}^{d}\right)=E\left({X}_{t}^{d}\right)=\left\{\begin{array}{l}{\sigma }^{2m}\left(2m-1\right)!!,\text{\hspace{0.17em}}d=2m,\text{\hspace{0.17em}}m=1,2,\cdots \\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}d=2m+1\end{array}$

2.2. Variances of Powers of the Linear Gaussian White Noise Process

Theorem 2.2: Let ${X}_{t},t\in Z$ be a linear Gaussian white noise process with mean zero and variance ${\sigma }^{\text{​}2}>0$ ( ${X}_{t}$ follows iid $N\left(0,{\sigma }^{2}\right)$ ), then

$\text{Var}\left({Y}_{t}\right)=\text{Var}\left({X}_{t}^{d}\right)=\left\{\begin{array}{l}{\sigma }^{4m}\left[\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)-{\left(\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)\right)}^{2}\right],\text{\hspace{0.17em}}d=2m\\ {\sigma }^{2\left(2m+1\right)}\underset{k=1}{\overset{2m+1}{\prod }}\left(2k-1\right),\text{\hspace{0.17em}}d=2m+1\end{array}$ (2.13)

Proof:

Let ${X}_{t}~\text{iid}\text{\hspace{0.17em}}N\left(0,{\sigma }^{2}\right)$ , then the expected value of ${Y}_{t}={X}_{t}^{d},\text{\hspace{0.17em}}d=1,2,3,\cdots$ is given by Equation (2.1).

Case I: $d=2m,m=1,2,3,\cdots$ (d even)

Now

${Y}_{t}={X}_{t}^{d}={X}_{t}^{2m}⇒{Y}_{t}^{2}={X}_{t}^{2d}={X}_{t}^{2\left(2m\right)}={X}_{t}^{4m}$

From Equation (2.1)

$E\left({Y}_{t}\right)={\sigma }^{2m}\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)$ (2.14)

and

$E\left({Y}_{t}^{2}\right)={\sigma }^{4m}\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)$ (2.15)

$\begin{array}{c}\text{Var}\left({Y}_{t}\right)=E\left({Y}_{t}^{2}\right)-{E}^{2}\left({Y}_{t}\right)\\ ={\sigma }^{4m}\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)-{\left[{\sigma }^{2m}\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)\right]}^{2}\\ ={\sigma }^{4m}\left[\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)-{\left(\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)\right)}^{2}\right]\end{array}$ (2.16)

Case II $d=2m+1,\text{\hspace{0.17em}}m=0,1,2,\cdots$ (d odd)

${Y}_{t}={X}_{t}^{d}={X}_{t}^{2m+1}⇒{Y}_{t}^{2}={X}_{t}^{2d}={X}_{t}^{2\left(2m+1\right)}$

From Equation (2.1)

$E\left({Y}_{t}\right)=0$

$E\left({Y}_{t}^{2}\right)={\sigma }^{2\left(2m+1\right)}\underset{k=1}{\overset{2m+1}{\prod }}\left(2k-1\right)$ (2.17)

and

$\begin{array}{c}\text{Var}\left({Y}_{t}\right)=E\left({Y}_{t}^{2}\right)-{E}^{2}\left({Y}_{t}\right)=E\left({Y}_{t}^{2}\right)\\ ={\sigma }^{2\left(2m+1\right)}\underset{k=1}{\overset{2m+1}{\prod }}\left(2k-1\right)\end{array}$ (2.18)

Generally

$\text{Var}\left({Y}_{t}\right)=\text{Var}\left({X}_{t}^{d}\right)=\left\{\begin{array}{l}{\sigma }^{4m}\left[\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)-{\left(\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)\right)}^{2}\right],\text{\hspace{0.17em}}d=2m\\ {\sigma }^{2\left(2m+1\right)}\underset{k=1}{\overset{2m+1}{\prod }}\left(2k-1\right),\text{\hspace{0.17em}}d=2m+1\end{array}$ (2.19)

Table 1 summarizes the mean and variances of ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots ,10$ . The standard deviation of ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots ,10$ is also included when $\sigma =1.0$ . A plot of ${\sigma }_{{y}_{t}}=\sqrt{\mathrm{var}\left({Y}_{t}\right)}$ against d for fixed $\sigma =1$ is given in Figure 1. From Figure 1, we note that for fixed $\sigma$ , increase in d leads to an exponential increase in the standard deviation.

The specific objective of this paper is to investigate if powers of ${X}_{t},t\in Z$ are also iid and to determine the distribution of ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots$ , especially for $d=2$ . The analytical proofs are provided in Section 2.3.

2.3. Covariances of Powers of the Linear Gaussian White Noise Process

Theorem 2.3: If ${X}_{t},t\in Z$ is a linear Gaussian white noise process then

Figure 1. Plot of standard deviation of ${Y}_{t}={X}_{t}^{d}\left({\sigma }_{{Y}_{t}}\right)$ against power (d) for fixed σ = 1.

Table 1. Mean, variance and standard deviation of ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots ,10$ .

higher powers of $\left({Y}_{t}={X}_{t}^{d},\text{\hspace{0.17em}}d=1,2,3,\cdots \right)$ are also white noise processes (iid) but not normally distributed.

Proof:

Since ${X}_{t},t\in T$ are iid and ${Y}_{t}={X}_{t}^{d},\text{\hspace{0.17em}}d=1,2,3,\cdots$ , we consider for $k\ne 0$ .

$\begin{array}{c}{R}_{y}\left(k\right)=\mathrm{cov}\left({Y}_{t}{Y}_{t-k}\right)=\mathrm{cov}\left({X}_{t}^{d}{X}_{t-k}^{d}\right)\\ =E\left({X}_{t}^{d}{X}_{t-k}^{d}\right)-E\left({X}_{t}^{d}\right)E\left({X}_{t-k}^{d}\right)\\ =E\left({X}_{t}^{d}\right)E\left({X}_{t-k}^{d}\right)-E\left({X}_{t}^{d}\right)E\left({X}_{t-k}^{d}\right)=0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\ne 0\end{array}$

However, for $k=0$ , ${R}_{y}\left(0\right)=\mathrm{var}\left({Y}_{t}\right)=\mathrm{var}\left({X}_{t}^{d}\right)$ . Hence

${R}_{y}\left(\mathcal{l}\right)=\left\{\begin{array}{l}{\sigma }^{4m}\left[\underset{k=1}{\overset{2m}{\prod }}\left(2k-1\right)-{\left(\underset{k=1}{\overset{m}{\prod }}\left(2k-1\right)\right)}^{2}\right],\text{\hspace{0.17em}}d=2m\text{\hspace{0.17em}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathcal{l}=0\\ {\sigma }^{2\left(2m+1\right)}\underset{k=1}{\overset{2m+1}{\prod }}\left(2k-1\right),\text{\hspace{0.17em}}d=2m+1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathcal{l}=0\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathcal{l}\ne 0\end{array}$ (2.20)

It is clear from Equation (2.20) that when ${X}_{t},t\in Z$ are iid, the powers ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots$ of ${X}_{t},t\in Z$ are also iid. That is,

${R}_{y}\left(\mathcal{l}\right)=\left\{\begin{array}{l}\mathrm{var}\left({Y}_{t}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathcal{l}=0\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathcal{l}\ne 0\end{array}$ (2.21)

The probability distribution function (p.d.f) of ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots$ can be obtained to enable a detailed study of the series. Theorem 2.4 gives the p.d.f of ${Y}_{t}={X}_{t}^{2}$

Theorem 2.4: If ${X}_{t},t\in Z$ is a linear Gaussian white noise process, then ${Y}_{t}={X}_{t}^{2}$ has the p.d.f

$g\left(y\right)=\left\{\begin{array}{l}\frac{1}{\sigma \sqrt{\text{2π}}}{y}^{-\frac{1}{2}}{\text{e}}^{-\frac{y}{2{\sigma }^{2}}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}0 (2.22)

Proof:

If ${X}_{t}=X~N\left(0,{\sigma }^{2}\right)$ and $Y={X}_{t}^{2}={X}^{2}$ , the distribution function of Y is, for $y\ge 0$ ,

$\begin{array}{c}G\left(y\right)=P\left({X}^{2}\le y\right)=P\left(-\sqrt{y}\le X\le \sqrt{y}\right)\\ ={\int }_{-\sqrt{y}}^{\sqrt{y}}\frac{1}{\sigma \sqrt{\text{2π}}}{\text{e}}^{-\frac{{x}^{2}}{2{\sigma }^{2}}}\text{d}x=2{\int }_{0}^{\sqrt{y}}\frac{1}{\sigma \sqrt{\text{2π}}}{\text{e}}^{-\frac{{x}^{2}}{2{\sigma }^{2}}}\text{d}x\end{array}$

Let $x=\sqrt{v}$ , then since $\text{d}x=\left(\frac{1}{2\sqrt{v}}\right)\text{d}v$ , we have

$G\left(y\right)=2{\int }_{0}^{y}\frac{1}{\sigma \sqrt{\text{2π}}}{\text{e}}^{-\frac{v}{2{\sigma }^{2}}}\cdot \left(\frac{1}{2\sqrt{v}}\right)\text{d}v={\int }_{0}^{y}\frac{1}{\sigma \sqrt{\text{2π}}}{v}^{-\frac{1}{2}}{\text{e}}^{-\frac{v}{2{\sigma }^{2}}}\text{d}v$

Of course $G\left(y\right)=0$ , where $y<0$ . The p.d.f of Y is $g\left(y\right)={G}^{\prime }\left(y\right)$ and by one form of the fundamental theorem of calculus 

$g\left(y\right)=\left\{\begin{array}{l}\frac{1}{\sigma \sqrt{\text{2π}}}{y}^{-\frac{1}{2}}{\text{e}}^{-\frac{y}{2{\sigma }^{2}}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}0

Note that the p.d.f of ${Y}_{t}={X}_{t}^{2}$ is the p.d.f of a gamma distribution with parameters $\alpha =\frac{1}{2},\beta =2{\sigma }^{2}$ . That is, ${Y}_{t}={X}_{t}^{2}~G\left(\alpha ,\beta \right),\alpha =\frac{1}{2},\beta =2{\sigma }^{2}$ .

However, for a more detailed study on the behavioral of the linear Gaussian white noise process, the coefficient of symmetry and kurtosis for powers of the process are provided in Section 2.4.

2.4. Coefficient of Symmetry and Kurtosis for Powers of the Linear Gaussian White Noise Process

Non-normality of higher powers of ${X}_{t},t\in Z$ ( $d=2,3,\cdots$ ) can also be confirmed by the coefficient of symmetry and kurtosis defined by

${\beta }_{1}=\frac{{\mu }_{3}\left(d\right)}{{\left({\mu }_{2}\left(d\right)\right)}^{3/2}}$ (2.23)

${\beta }_{2}=\frac{{\mu }_{4}\left(d\right)}{{\left({\mu }_{2}\left(d\right)\right)}^{2}}$ (2.24)

where

${\mu }_{2}\left(d\right)=E\left[{\left({X}_{t}^{d}-E\left({X}_{t}^{d}\right)\right)}^{2}\right]=\mathrm{var}\left({X}_{t}^{d}\right)$ (2.25)

${\mu }_{3}\left(d\right)=E\left[{\left({X}_{t}^{d}-E\left({X}_{t}^{d}\right)\right)}^{3}\right]$ (2.26)

and

${\mu }_{4}\left(d\right)=E\left[{\left({X}_{t}^{d}-E\left({X}_{t}^{d}\right)\right)}^{4}\right]$ (2.27)

Note that

${\mu }_{3}\left(d\right)=E\left({X}_{t}^{3d}\right)-3E\left({X}_{t}^{2d}\right)E\left({X}_{t}^{d}\right)+2{E}^{3}\left({X}_{t}^{d}\right)$ (2.28)

${\mu }_{4}\left(d\right)=E\left({X}_{t}^{4d}\right)-4E\left({X}_{t}^{3d}\right)E\left({X}_{t}^{d}\right)+6E\left({X}_{t}^{2d}\right){E}^{2}\left({X}_{t}^{d}\right)-3{E}^{4}\left({X}_{t}^{d}\right)$ (2.29)

The kurtosis for $d=1,2,3,4,5$ and 6 are given in Table 2. A plot of ${\beta }_{2}=\frac{{\mu }_{4}\left(d\right)}{{\left({\mu }_{2}\left(d\right)\right)}^{2}}$ against $d=1,2,3,4,5$ is given in Figure 2. From Figure 2, we note that increase in d leads to an exponential increase in the kurtosis.

Figure 2. Plot of kurtosis coefficient against power of the linear Gaussian white noise process.

Table 2. Coefficient of symmetry and kurtosis for ${Y}_{t}={X}_{t}^{d},d=1,2,3,\cdots ,6$ .

3. Methodology

3.1. Checking for Normality

If the noise process is Gaussian (that is, if all of its joint distributions are normal), then stronger conclusions can be drawn when a model is fitted to the data. We have shown that all powers of the linear Gaussian process are non-normal. The only reasonable test is the one that enables us to check whether the observations are from an iid normal sequence. The Jarque-Bera (JB) test    for normality can be used. The JB test is based on the assumption that the normal distribution (with any mean or variance) has skewness coefficient of zero, and a kurtosis coefficient of three. We can test if these two conditions hold against a suitable alternative and the JB test statistic is

$JB=n\left(\frac{{\stackrel{^}{\beta }}_{1}^{2}}{6}+\frac{{\left({\stackrel{^}{\beta }}_{2}-3\right)}^{2}}{24}\right)$ (3.1)

where

${\stackrel{^}{\beta }}_{1}=\frac{\frac{1}{n}\underset{t=1}{\overset{n}{\sum }}{\left({X}_{t}-\stackrel{¯}{X}\right)}^{3}}{{\left(\frac{1}{n}\underset{t=1}{\overset{n}{\sum }}{\left({X}_{t}-\stackrel{¯}{X}\right)}^{2}\right)}^{3/2}}$ (3.2)

${\stackrel{^}{\beta }}_{2}=\frac{\frac{1}{n}\underset{t=1}{\overset{n}{\sum }}{\left({X}_{t}-\stackrel{¯}{X}\right)}^{4}}{{\left(\frac{1}{n}\underset{t=1}{\overset{n}{\sum }}{\left({X}_{t}-\stackrel{¯}{X}\right)}^{2}\right)}^{2}}$ (3.3)

$n$ is the sample size while, ${\stackrel{^}{\beta }}_{1}$ and ${\stackrel{^}{\beta }}_{2}$ are the sample skewness and kurtosis coefficients. The asymptotic null distribution of JB is ${\chi }^{2}$ with 2 degrees of freedom.

3.2. White Noise Testing

We have shown that the sample autocorrelations of ${X}_{1}^{d},{X}_{2}^{d},\cdots ,{X}_{n}^{d},\text{\hspace{0.17em}}d=1,2,3,\cdots$ . are those of the white noise series if the sample autocorrelations of ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ are also iid. We will adopt the Ljung-Box test by replacing the sample autocorrelations of the data ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ with those of ${X}_{1}^{d},{X}_{2}^{d},\cdots ,{X}_{n}^{d},\text{\hspace{0.17em}}d=1,2,3,\cdots$ and use the statistic

${Q}^{*}\left(m\right)=n\left(n+2\right)\underset{k=1}{\overset{m}{\sum }}\left(\frac{{\left[{\stackrel{^}{\rho }}_{{X}^{\text{​}d}{}^{\text{​}}}\left(k\right)\right]}^{2}}{n-k}\right)$ (3.4)

The hypothesis of iid data is then rejected at level $\alpha$ if the observed ${Q}^{*}\left(m\right)$ is larger than the $1-\frac{\alpha }{n}$ quartile of the ${\chi }^{\text{​}2}\left(m\right)$ distribution.

3.3. Determining the Optimal Value of d

Figure 1 suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. We are going to use the behavior of the variance and kurtosis coefficient to determine the optimal value of d. The optimal value is that value of d that gives a perfect fit for either the quadratic or exponential growth curves. Using the standard deviation for $5\le d\le 10$ , the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values. However, the residual of the resulting exponential curve is very large as measured by the following accuracy measures  .

Mean Absolute Error (MAE)

$\text{MAE}=\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}|{\stackrel{^}{e}}_{i}|$ (3.5)

Mean Absolute Percentage Error (MAPE)

$\text{MAPE}=\left[\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}|\frac{{\stackrel{^}{e}}_{i}}{{Z}_{i}}|\right]×100$ (3.6)

Mean Squared Error (MSE)

$\text{MSE}=\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}{e}_{i}^{2}$ (3.7)

where m is the value of d used in the trend analysis and,

${\stackrel{^}{e}}_{i}=\left\{\begin{array}{l}{\stackrel{^}{\sigma }}_{{y}_{t}}-{\sigma }_{{y}_{t}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{standard}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{deviation}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{Y}_{t}={X}_{t}^{d}\\ {\stackrel{^}{\beta }}_{2}-{\beta }_{2}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{Kurtosis}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{coefficient}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{Y}_{t}={X}_{t}^{d}\end{array}$ (3.8)

Table 3 gives the accuracy measures for the trend analysis of the standard deviation of ${Y}_{t}={X}_{t}^{d}$ when $\sigma =1$ while Table 4 gives detailed results for optimality.

When $d=4$ , the quadratic growth curve performs better than the exponential curve with minimal residual. Both curves fitted positive values at different data points. We also observed from Table 3 that with $d=3$ , the quadratic

Table 3. Summary of accuracy measures for the exponential and quadratic curves using the standard deviation of ${Y}_{t}={X}_{t}^{d}$ for $d=3,4,\cdots ,10$ .

Table 4. Fitting exponential and quadratic curves to the standard deviation of powers of linear Gaussian white noise process when $\sigma =1$ and $d=3,4$ .

*Exponential and Quadratic trend analysis cannot be possible for $d=2$ or $d=1$ .

growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual. The implication of the result is that we obtain a perfect fit for the data point when $d=3$ for the quadratic curve only. Hence, the optimal value of d is 3 when we use the standard deviation curve.

Figure 2 also suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. Using the kurtosis coefficient for $4\le d\le 6$ , the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values.

When $d=3$ , the quadratic growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual as that of the standard deviation curve. The implication of these results is that we obtain a perfect fit for the data point when $d=3$ for the quadratic curve only. Hence, the optimal value of d is 3. Therefore, we recommend that in order to stop the variance from exploding, the order of the data points should not be raised to power greater that three.

3.4. On the Use of Higher Moment for the Acceptability of the Linear Gaussian White Noise Process

We have shown that if ${X}_{t},t\in Z$ is a linear Gaussian white noise process, ${Y}_{t}={X}_{t}^{d};d=1,2,\cdots$ is also iid but not normally distributed. Using the variances and kurtosis of ${Y}_{t}={X}_{t}^{d}$ , we were able to establish that the optimal value of d is three. Variances and kurtosis of ${Y}_{t}={X}_{t}^{d}$ have been given in Table 5 and Table 6 respectively. It is also clear from Equation (2.24) that the kurtosis itself is a function of variances. We, therefore, insist that for a stochastic process to be accepted as a linear Gaussian white noise process, the following variances must be true:

$\mathrm{var}\left({X}_{t}\right)={\sigma }^{2}$ (3.9)

$\mathrm{var}\left({X}_{t}^{2}\right)=2{\sigma }^{4}$ (3.10)

and

$\mathrm{var}\left({X}_{t}^{3}\right)=15{\sigma }^{6}$ (3.11)

Table 5. Summary of accuracy measures for the exponential and quadratic curves using the Kurtosis Coefficient of ${Y}_{t}={X}_{t}^{d}$ for $d=3,4,5,6$ .

*Exponential and Quadratic trend analysis cannot be possible for $d=2$ or $d=1$ .

Table 6. Fitting exponential and quadratic curves to the kurtosis coefficient of powers of linear Gaussian white noise process when $\sigma =1$ and $d=3,4$ .

In view of these, we suggest that the two following null hypothesis be tested before a stochastic process is accepted as a linear Gaussian white noise process:

${H}_{01}:\mathrm{var}\left({X}_{t}^{2}\right)=2{\sigma }_{0}^{4}$ (3.12)

and

${H}_{02}:\mathrm{var}\left({X}_{t}^{3}\right)=15{\sigma }_{0}^{6}$ (3.13)

Then, the chi-square test statistic  for testing (3.12) is

${\chi }_{cal}^{2}=\frac{\left(n-1\right){S}_{{X}_{t}^{2}}^{2}}{2{\sigma }_{0}^{4}}$ (3.14)

while that for (3.13) is

${\chi }_{cal}^{2}=\frac{\left(n-1\right){S}_{{X}_{t}^{3}}^{2}}{15{\sigma }_{0}^{6}}$ (3.15)

where ${S}_{{X}_{t}^{2}}^{2}$ and ${S}_{{X}_{t}^{3}}^{2}$ are the estimated variance of the second and third power of the stochastic process, ${\sigma }_{0}^{2}$ is the null value for the true variance of the stochastic process and n is the number of observations of the random digits. The null hypothesis is rejected at level $\alpha$ if the observed value of ${\chi }_{cal}^{2}$ is larger

than $1-\frac{\alpha }{2}$ quartile of the chi-square distribution with $n-1$ . Degree of freedom.

4. Results

For an illustration, six (6) random digits were simulated using Minitab 16 series (see Appendix). The simulated series met the following conditions: 1) The simulated series $\left({X}_{t}\right)$ are normal and 2) Powers of ${X}_{t}^{d},d=1,2,3,4,5$ are shown to be iid but not normally distributed (see Table 7).

Table 7. Descriptive statistics and estimate of the test statistic for rejecting the null hypothesis of equality of the variance of higher moment for six simulated series, ${X}_{t}={e}_{t},{e}_{t}~N\left(0,1\right)$ , as linear Gaussian white noise process.

The value of the chi-square test statistic for testing (3.12) and (3.13) are also shown in Table 7. We observed that the null hypothesis is rejected at level $\alpha$ equals 5% for two simulated series and is not rejected for the other four. The result clearly showed that testing the variance of higher moments for ${Y}_{t}={X}_{t}^{d},d=2,3$ is a necessary condition for accepting the linear Gaussian white noise process.

5. Conclusion

We have been able to show that if ${X}_{t},t\in Z$ are iid then, all powers of ${X}_{t},t\in Z$ are also iid but, non-normal. Hence, we computed the kurtosis of some higher powers of ${X}_{t},t\in Z$ and established that an increase in the powers of ${X}_{t},t\in Z$ leads to an exponential increase on the kurtosis. We recommend that stochastic processes (white noise processes) and processes with similar covariance structure should be considered for normality, white noise testing and for test of the variance of higher moments being equal to the theoretical values of Table 1 with $d=1,2,3$ .

Appendix

Table A1. Six simulated white noise series: ${X}_{t}={e}_{t},{e}_{t}~N\left(0,1\right)$ data.

Conflicts of Interest

The authors declare no conflicts of interest.

  Brockwell, P.J. and Davies, R.A. (2002) Introduction to Time Series and Forecasting. 2nd Edition, Springer, New York. https://doi.org/10.1007/b97391  Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994) Time Series Analysis: Forecasting and Control. 3rd Edition, John Wiley and Sons Inc. Publication, Hoboken.  Fuller, W.A. (1976) Introduction to Statistical Time Series. 2nd Edition, Wiley, New York.  Box, G.E.P. and Pierce, D.A. (1970) Distribution of Residual Autocorrelations in Autoregressive Integrated Moving Average Time Series Models. Journal of the American Statistical Association, 65, 1509-1526. https://doi.org/10.1080/01621459.1970.10481180  Hong, Y. (1996) Consistent Testing for Serial Correlation of Unknown Form. Econometrica, 64, 837-864. https://doi.org/10.2307/2171847  Shao, X. (2011) Testing for White Noise under Unknown Dependence and Its Applications to Goodness-of-Fit for Time Series Models. Econometric Theory, 27, 1-32. https://doi.org/10.1017/S0266466610000253  Ljung, G.M. and Box, G.E.P. (1978) On a Measure of Lack of Fit in Time Series Model. Biometrika, 65, 297-303. https://doi.org/10.1093/biomet/65.2.297  Tsay, R.S. (2002) Analysis of Financial Time Series. John Willey & Sons, New York.  Mcleod, A.I. and Li, W.K. (1983) Diagnostic Checking ARMA Time Series Models using Squared Residuals Autocorrelations. Journal of Time Series Analysis, 4, 269-273.  Bartlett, M.S. (1956) An Introduction to Stochastic Processes: With Special Reference to Methods and Applications. University Press, Cambridge.  Genander, U. and Rosenblast, M. (1957) Statistical Analysis of Stationary Time Series. Wiley, New York.  Durlauf, S. (1991) Spectral Based Testing for the Martingale Hypothesis. Journal of Econometrics, 50, 355-376. https://doi.org/10.1016/0304-4076(91)90025-9  Deo, R.S. (2000) Spectral Test of the Martingale Hypothesis under Conditional Heteroscedasticity. Journal of Econometrics, 99, 291-315. https://doi.org/10.1016/S0304-4076(00)00027-0  Granger, C.W. and Anderson, A.P. (1978) An Introduction to Bilinear Time Series Model. Vandenhoeck and Ruprecht, Guttingen.  Iwueze, I.S. (1988) Bilinear White Noise Processes. Nigerian Journal of Mathematics and Applications, 1, 51-63.  Ibrahim, A.M. (2013) Extension of Factorial Concept to Negative Numbers. Notes on Number Theory and Discrete Mathematics, 19, 30-42.  Grossman, S.I. (1981) Calculus. 2nd Edition, Academic Press, Inc., New York.  Jarque, C.M. and Bera, A.K. (1980) Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals. Economics Letters, 6, 255-259. https://doi.org/10.1016/0165-1765(80)90024-5  Jarque, C.M. and Bera, A.K. (1981) Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence. Economics Letters, 7, 313-318. https://doi.org/10.1016/0165-1765(81)90035-5  Jarque, C.M. and Bera, A.K. (1987) A Tests for Normality of Observations and Regression Residuals. International Statistical Review, 55, 163-172. https://doi.org/10.2307/1403192  Hyndman, R.J. and Athanasopoulos, G. (2012) Forecasting: Principles and Practice. OTexts. https://otexts.com/fpp  Milton, J.S. and Jesse, C.A. (1995) Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences. McGraw Hill Inc., New York. 