Application of the Improved Generalized Autoregressive Conditional Heteroskedast Model Based on the Autoregressive Integrated Moving Average Model in Data Analysis

This study firstly improved the Generalized Autoregressive Conditional Heteroskedast model 
for the issue that financial product sales data have singular information when 
applying this model, and the improved outlier detection method was used to 
detect the location of outliers, which were processed by the iterative method. 
Secondly, in order to describe the peak and fat tail of the financial time 
series, as well as the leverage effect, this work used the skewed-t Asymmetric Power Autoregressive Conditional 
Heteroskedasticity model based on the Autoregressive Integrated Moving 
Average Model to analyze the sales data. Empirical analysis showed that the 
model considering the skewed distribution is effective.


Introduction
Time series models play very important roles in many business decisions. In the current big data era, all walks of life are faced with the problem of modeling and time sequence prediction. For example, the e-commerce platform needs to predict the future sales of all commodities; in the pre-sales industry, both online and offline pre-sales require significant time series forecasting. These data are non-linearly correlated in time series, with most of them affected by product promotion, inventory situation and market competition among other, which may lead to outliers in time series. Meanwhile, sales during the holiday promotion period are relatively volatile and flat, resulting in an asymmetry of yield fluctuation and the rate of return usually does not follow the normal distribution, showing skewness and peak thick tail. Therefore, how could such data be modeled and predicted remains an open question.
Let's start with classic time series models, such as the Autoregressive Integrated Moving Average model-Generalized Autoregressive Conditional Heteroskedast (ARIMA-GARCH) model [1] and normal Asymmetric Power Autoregressive Conditional Heteroskedasticity (APARCH) model [2] based on the Autoregressive Integrated Moving Average (ARIMA) model [3]. On the one hand, although the model can solve the heteroscedastic effect in the residual sequence, it provides no solution for singular information when data are applied to the Generalized Autoregressive Conditional Heteroskedast (GARCH) model [4]. On the other hand, the classical time series model in parameter estimation is usually based on the assumption of normal distribution, which does not fit well the distribution of volatility in practical applications. The explosive growth of new algorithm development makes this issue even more worthy of attention.
Therefore, in this study, an improved GARCH model, termed the Pro-GARCH model, was proposed to solve the problem of singular information in the data. The improved method is described below. First, in the GARCH (p, q) model, the rank of the Hessian matrix H is defined as 1 1 R R = , which allows to solve the problem of singular information.
Since the rank of the matrix is changed after QR decomposition, the estimated value of a given parameter is not affected. Secondly, because the matrix is singular, the inverse matrix of the Hessian matrix was obtained by determining the generalized inverse of the matrix, yielding the Pro-GARCH model. Furthermore, a skewed-t APARCH model [5] based on the ARIMA model [3] was proposed.
After assessing JD sales data, the results showed that the model was superior to the classical time series model in the accuracy of parameter estimation and prediction, and could more accurately describe the skewness problem in the sequence.
The remainder of the article is as follows. In the second part, the definitions of Pro-GARCH model and skewed-t APARCH model based on the ARIMA model will be provided. In the third part, the modeling process of skewed-t APARCH model based on the ARIMA model will be described. In the fourth part, we applied the novel and traditional classical time series models to JD sales data, respectively, and compared the results. Empirical analysis showed that the model is better than other models.

The Pro-GARCH Model
The Pro-GARCH (p, q) model we proposed is: is the deterministic information fitting, t η is independent and follows the standard normal distribution, and 0 0 α > , ( ) With an outlier in the data, the actual sequence is not t ε , but an observation sequence t e , defined as denote the magnitude and dynamic model of the outlier effect, respectively.

The Partial T-APARCH Model Based on the ARIMA Model
The skewed-t APARCH model [5] based on the ARIMA model [3] can be defined as: where, is an autoregressive coefficient polynomial for a stationary reversible ARIMA (p, q) model; is the moving smoothing coefficient polynomial for the stationary reversible ARIMA (p, q) model; t µ is the conditional mean, represents a distribution with mean and variance of 0 and 1, respectively; identically distributed and follows the skewed ( ) 0,1, , t v ξ distribution. The purpose of the power function in Equation (4)

Results
In this section, the construction process of the skewed-t Asymmetric Power Auto-regressive conditional heteroskedasticity (APARCH) model based on the ARIMA model is introduced in detail. Compared with the ARIMA-GARCH and normal APARCH models, respectively, based on the ARIMA model, the validity of the skewed-t APARCH model based on the ARIMA model was demonstrated.
All simulations in this paper were performed in R.

Data Preprocessing
We analyzed the sales data of Jingdong. Figure 1 shows the presence of outliers in the data. Therefore, we first used the Pro-GARCH model and the improved outlier detection method to process outliers and generated new data. This result was satisfactory. Among them, the data obtained after processing the abnormal values are shown in Figure 2 and were recorded as { } _ train x .

Model Establishment
The pure randomness test results showed that the P value of the LB test statistic was very low under the first-order to sixth-order delay (see Table 1). Therefore, we determined that the sequence belonged to a non-white noise sequence and could model the data.
The stability of the sequence { } _ train x was verified by the timing diagram method. As shown in Figure 2, the sequence was non-stationary. After using the    _ train x is shown in Figure 3. As shown in Figure 3  _ train x . The system's automatic ordering was compared with the ACF and PACF graphs. With p = 1 and q = 1, the model fitting was most reasonable; therefore, the ARIMA (p, d, q) model most suitable for this sequence was the ARIMA (1, 1, 1) model. Regarding the heteroscedasticity test, since the P values of the LM and Portmanteau Q tests were low (see Table   2), the residuals were heteroscedastic. Due to space constraints, only the first six P values were reported in this paper.    Therefore, this study selected the ARIMA (1, 1, 1)-GARCH (1, 1) and APARCH (1, 1) models based on the ARIMA (1, 1, 1) model for modeling the sequence under normal and partial t distributions. Table 3 provides the parameter estimation results of the new model.

Evaluation Criteria
In where, 2 t σ is the realized volatility, estimated using high frequency data, and 2 t σ is the predicted volatility at time t; N is the number of the performance evaluation data.

Prediction Results
Tables 4-6 provide the standardized residual test of the ARIMA (1, 1, 1)-GARCH (1, 1) and APARCH (1, 1) models based on the ARIMA (1, 1, 1) model under the assumption of normal and partial t distributions, respectively. Table 7 shows the prediction effect of the 20 steps of the model.
where, the skewness is 0.925 [9], the model coefficient is greater than 0 and satisfies 1 1 γ − < < . Equations (9) and (10)  model are depicted in Figure 6. The residual was almost completely within the confidence interval, indicating that model prediction was more accurate; the volatility of the model is presented in Figure 7.

Discussion
First, the Pro-GARCH model solves the singular information problem in data.
Secondly, as shown in Figure 5, the skewed-t APARCH model based on the ARIMA model could better capture the peak thick tail, skewness and leverage effect in the sequence. Finally, Table 3 and Table 7 show that the model is superior to the ARIMA-GARCH and APARCH models based on the ARIMA model under the assumption of normal distribution in the significance of parameter estimation and accuracy of prediction, respectively.

Conclusions
In this study, the Pro-GARCH model and the improved outlier detection method were used, and the iterative method was used to process outliers in the time series to obtain a new time series. The ARIMA-GARCH and normal APARCH models based on the ARIMA model, and the skewed-t APARCH model based on the ARIMA model were compared. Some concluding observations can be summarized as follows: 1) Using the Pro-GARCH model and the improved outlier detection method to process data and selecting absolute deviation of the median (MAD) as a robust estimation of the standard deviation of the model, the location of outliers could be found most accurately; 2) Compared with the ARIMA-GARCH and normal APARCH models based on the ARIMA model, the skewed-t APARCH model based on the ARIMA model could better capture the spikes and thick tails, skewness and leverage effects, and the model had elevated prediction ability; 3) No prediction method could stand out in any time series. Although the skewed-t APARCH model based on the ARIMA model showed good predictive power, it did not achieve the expected results, and there were certain losses; this model is not flexible and cannot be applied to multiple products simultaneously.
This is a huge challenge for time series modelers and requires further research.