Estimation and Application of GARCH-X Model Based on High-Frequency Data

Zefang Song; Lingjun Chen; Wenlin Huang

doi:10.4236/ajibm.2025.152012

American Journal of Industrial and Business Management > Vol.15 No.2, February 2025

Estimation and Application of GARCH-X Model Based on High-Frequency Data

Zefang Song^1*, Lingjun Chen¹, Wenlin Huang²
¹School of Economics and Statistics, Guangzhou University, Guangzhou, China.
²Department of Statistics, George Washington University, Washington, USA.
DOI: 10.4236/ajibm.2025.152012 PDF HTML XML 71 Downloads 406 Views

Abstract

This paper aims to study the GARCH-X model based on high-frequency data. Building upon the existing research on the selection criteria for optimal volatility representation and parameter estimation methods for high-frequency data GARCH(1,1) models, we extend the model to incorporate exogenous variables within high-frequency data GARCH(1,1) models and conduct both model simulations and empirical validation. The empirical research results show that in the study of the returns of the CSI 300 Index, including the 500 Index return rate as an exogenous variable can better explain the volatility of the return rate, thereby demonstrating the practical applicability of the model in real-world applications.

Keywords

High-Frequency Data, Exogenous Variables, GARCH-X Model, QMLE Estimation, VaR Forecasting

Share and Cite:

Song, Z. F., Chen, L. J., & Huang, W. L. (2025). Estimation and Application of GARCH-X Model Based on High-Frequency Data. American Journal of Industrial and Business Management, 15, 242-259. doi: 10.4236/ajibm.2025.152012.

1. Introduction

Since the GARCH model was proposed by Bollerslev (1986), it has become one of the most commonly used models for describing and predicting market volatility in financial economics. It has been widely studied and applied by many researchers and financial institutions for risk management and resource allocation. With the continuous complexity and diversification of the financial market, various extensions of the GARCH model have emerged, such as the EGARCH model, GJR-GARCH model, GARCH-M model, and so on (see, e.g., Nelson, 1991; Engle et al., 1987; Glosten et al., 1993). Recent exploration in this area continues to thrive. For instance, Qian and Xu (2023) developed a novel closed-form option-pricing formula for general GARCH models. In the same vein, Wu, Zhao and Cheng (2023) introduced the Real-Time GARCH-MIDAS model for volatility forecasting. These models enable the GARCH framework to better capture and predict volatility. Extensive theoretical and empirical research has been conducted to validate and apply these models. However, despite these valuable contributions, the understanding of GARCH, related models in financial markets remains incomplete.

From the perspective of incorporating exogenous variables, Han and Kristensen (2014) proposed the GARCH(1,1)-X model. They extended the traditional GARCH(1,1) model by directly adding an exogenous variable X to the conditional variance equation, quantifying the impact of the exogenous variable on volatility. Han (2015) also introduced the quasi-maximum likelihood estimation (QMLE) method for the GARCH(1,1)-X model and demonstrated its asymptotic properties under the conditions of the exogenous variables being stationary and ergodic. Lee (2017) extended the GARCH(1,1)-X model to the GARCH(p,q)-X model, further advancing the theoretical development of the GARCH-X model. Their research found that exogenous variables not only help explain the volatility of stock returns, exchange rates, and interest rates but also enhance the model’s predictive ability. Singvejsakul et al. (2021) extrapolated the GARCH-X model based on Bayesian estimation methods and applied the model to analyze the fluctuation effects of various factors on cassava products in Thailand. Using the GARCH-X model, Apergis and Rezitis (2011) confirmed that food price fluctuations in Greece are positively correlated with short-term deviations from food prices and macroeconomic factors. These studies show that the GARCH-X model can provide implementations that the GARCH model and some other derivative models cannot capture, thus helping the financial market to obtain important information on time, and its emergence can also improve the financial theory more reasonably.

The development of big data storage technology, accompanied by increasingly convenient access to financial electronic transaction data, has made access to high-frequency data of intra-day trading a reality. Therefore, many scholars began to derive the GARCH model from low-frequency data to high-frequency data. The introduction of high-frequency data can provide more information, theoretically improve the asymptotic variance of parameter estimation, in practice, it can better reflect the fluctuation of the daily returns. However, high-frequency information contains too much microscopic noise, and the introduction of more information also means the introduction of more error terms, and the corresponding parameter estimation may have a large deviation.

This brings us to the main problem of introducing high-frequency data into models. Based on this, Visser (2009, 2011) proposed a volatility representative model, which uses a suitable volatility representative model to introduce intraday high-frequency information into the GARCH model so that the low-frequency GARCH model structure contains high-frequency information. Visser (2011) gives the quasi-maximum likelihood estimation of the GARCH model of high frequency and also provides the method of selecting the volatility representative model, indicating that the use of high-frequency data helps improve the estimation accuracy of the GARCH model. Wang et al. (2018) provided parameter estimates for the high-frequency GARCH model using the composite quantile regression (CQR) method. Considering the advantages of realized variance in measuring intraday volatility and its strong empirical performance, Hansen et al. (2012) incorporated it as a new variable into the GARCH model to establish a new volatility model, known as the Realized GARCH model.

At the present stage, research on the GARCH-X model is mostly based on daily or monthly low-frequency data. The application of high-frequency data to the GARCH-X model is still insufficient. Considering the necessity of applying the GARCH-X model in financial market exploration and the effectiveness of high-frequency data in modeling and estimation, this paper combines high-frequency data with the GARCH-X model. This study aims to provide parameter estimation and optimal volatility representative selection based on high-frequency data within the GARCH-X model, thereby offering tools for capturing volatility in China’s financial markets to better regulate the market.

The biggest improvement of this study compared to existing research lies in the consideration of exogenous variable volatility models while incorporating high-frequency data. This improvement allows for a more detailed capture of external factors affecting volatility, leading to more accurate volatility predictions and better risk management. The results of this study have verified this point.

The structure of the rest of this paper is as follows: Section 2 provides the construction of the GARCH-X model based on high-frequency data; Section 3 presents the QMLE estimation and theoretical properties of the model, as well as the selection of optimal volatility representatives; Section 4 conducts simulation analysis; Section 5 performs empirical analysis based on 1-minute high-frequency trading data from the CSI 300 index, verifying the volatility prediction ability of the new model and the accuracy of VaR prediction; Section 6 concludes the paper.

2. GARCH-X Model with High-Frequency Data

2.1. GARCH-X Model

Let ${y_{t}}_{t = 1}^{T}$ be an observable return series. The GARCH(1,1)-X model with exogenous variables is constructed as follows:

${\begin{array}{l} y_{t} = σ_{t} ε_{t}, & (1 a) \\ V a r (y_{t} | ℱ_{t - 1}) = σ_{t}^{2} = ω + α y_{t - 1}^{2} + β σ_{t - 1}^{2} + γ x_{t - 1}^{2} & (1 b) \end{array}$

where ${ε_{t}}$ are independent and identically distributed random error terms with a mean of 0 and a variance of 1; when $t > s$ , $y$ and $ε_{t}$ are independent. $ℱ_{t - 1}$ represents the low-frequency information set at time $t - 1$ , $σ_{t}^{2}$ is the conditional volatility, $x_{t - 1}$ represents the exogenous variable, and $γ$ intuitively expresses the impact of the exogenous variable on the volatility. Set the parameters $ω > 0$ , $α > 0$ , $β \geq 0$ , $γ > 0$ to ensure the non-negativity of the conditional variance. Since the GARCH(1,1) model is sufficient to characterize the ARCH effect in financial data, this paper uses the GARCH(1,1)-X model as the baseline model to incorporate intraday high-frequency data.

2.2. Embedded Intraday Return Process of GARCH-X Model

To introduce high-frequency data, we consider extending the daily GARCH model using continuous-time processes. Let the intraday return process be denoted as $y_{t} (u)$ , representing the logarithmic return of the asset at time $u$ on day $t$ . We standardize the time of each trading day by defining each trading day’s time interval as $(0, 1)$ and construct the following model:

${\begin{array}{l} Y_{t} (u) = σ_{t} Z_{t} (u) & (2 a) \\ σ_{t}^{2} = ω + α y_{t - 1}^{2} + β σ_{t - 1}^{2} + γ x_{t - 1}^{2} & (2 b) \end{array}$

Where $Z_{t} (\cdot)$ is a standard independent process. When $u = 1$ , $Y_{t} (1) = y_{t}$ and $Z_{t} (1) = ε_{t}$ , the model is then a GARCH(1,1)-X model. Distinguished from low-frequency models, here $σ_{t}$ is referred to as the scale parameter, and since $Z_{t} (\cdot)$ is independent, Visser (2011) refers to the above model as a scale model.

Note that model (2a-2b) contains high-frequency data $Y_{t} (u)$ , $Z_{t} (u)$ , and low-frequency data $Y_{t - 1}, x_{t - 1}$ . Due to the inability to estimate parameters, based on the scale model, Visser (2011) further proposed a function that acts on intraday high-frequency data, referred to as the volatility representation. In general, the volatility representation is positive and satisfies positive homogeneity: $H (p Y (z)) = p H (Y (z))$ . Utilizing this homogeneity, Visser cleverly embeds the volatility representation into low-frequency GARCH models, forming a class of volatility representation models that complete the embedding of high-frequency data. This paper adopts the same idea, embedding the volatility representation into the GARCH-X model.

From Homogeneity, we know:

$H_{t} \equiv H (Y_{t} (u)) = H (σ_{t} Z_{t} (u)) = σ_{t} H (Z_{t} (u)),$ (3)

Let

$z_{H, t} = H (Z_{t} (u)), μ = E (z_{H, t}^{2}), ε_{t}^{*} = \frac{z_{H, t}}{\sqrt{μ}} .$ (4)

Since $Z_{t} (u)$ is an independent and identically distributed sequence, $z_{H, t}, ε_{t}^{*}$ are also independent and identically distributed random variable sequences, and $E (ε_{t}^{*}) = 1$ . Combining (2a-4), this paper constructs the following GARCH-X model to represent the embedded intraday return process:

$\begin{array}{l} H_{t} = σ_{t} z_{H, t} = σ_{t} \sqrt{μ} ε_{t}^{*}, \\ σ_{t}^{2} = ω + α y_{t - 1}^{2} + β σ_{t - 1}^{2} + γ x_{t - 1}^{2} . \end{array}$ (5)

3. Quasi-Maximum Likelihood Estimation of GARCH-X Model with High Frequency Data

3.1. Estimation of $θ$

We need to estimate parameter $θ = (ω, α, β, γ, μ)$ in the model, but because there is an unknown redundant parameter $μ$ in model(5), let $σ_{t}^{*} = σ_{t} \sqrt{μ}$ , then the model(5) can be transformed to

$\begin{array}{l} H_{t} = σ_{t} z_{H, t} = σ_{t}^{*} ε_{t}^{*} \\ σ_{t}^{* 2} = ω^{*} + α^{*} y_{t - 1}^{2} + β^{*} σ_{t - 1}^{2} + γ^{*} x_{t - 1}^{2} \end{array}$ (6)

where

$ω^{*} = ω μ, α^{*} = α μ, β^{*} = β, γ^{*} = γ μ .$ (7)

The method of maximum likelihood estimation proposed by Visser (2011) allows us to first estimate $θ^{*} = {(ω^{*}, α^{*}, β^{*}, γ^{*})}^{'}$ based on (6), then estimate $μ$ , and lastly derive parameter from (7). According to the transformed model (6), the log quasi-likelihood function is expressed as:

$L_{T} (θ^{*}) = \sum_{t = 1}^{T} l_{t} (θ^{*}) = - \frac{1}{2} \sum_{t = 1}^{T} (ln (σ_{t}^{* 2} (θ^{*})) + \frac{H_{t}^{2}}{σ_{t}^{* 2} (θ^{*})})$ (8)

Thus, the quasi-maximum likelihood estimation (QMLE) for $θ^{*}$ is defined as:

${\hat{θ}}^{*} = \arg \max L_{T} (θ^{*}) .$ (9)

At the same time, we can obtain ${\hat{σ}}_{t}^{* 2} = {\hat{ω}}^{*} + {\hat{α}}^{*} y_{t - 1}^{2} + {\hat{β}}^{*} σ_{t - 1}^{2} + {\hat{γ}}^{*} x_{t - 1}^{2}$ . Next, we only need to obtain the estimate of $μ$ to get the estimate of $θ$ based on (7). It is noted that when the volatility representative $H_{t} = | y_{t} |$ , model (6) corresponds to a low-frequency GARCH(1,1) model, the ${\hat{θ}}^{*}$ derived through (8) and (9) corresponds to the parameter estimation of the model (1a-1b). Denoted as $\tilde{θ} = {(\tilde{ω}, \tilde{α}, \tilde{β}, \tilde{γ})}^{τ}$ , thus obtaining the corresponding estimate sequence ${{\tilde{σ}}_{t}^{* 2}}$ . According to the aforementioned transformation, we find $μ = σ_{t}^{* 2} / σ_{t}^{2}$ , thus the estimate for $μ$ is:

$\hat{μ} = \frac{1}{T} \sum_{t = 1}^{T} \frac{{\hat{σ}}_{t}^{* 2}}{{\tilde{σ}}_{t}^{2}},$ (10)

thus, we can obtain:

$\hat{θ} = {(\hat{ω}, \hat{α}, \hat{β}, \hat{γ})}^{τ} = {(\frac{{\hat{ω}}^{*}}{\hat{μ}}, \frac{{\hat{α}}^{*}}{\hat{μ}}, {\hat{β}}^{*}, \frac{{\hat{γ}}^{*}}{\hat{μ}})}^{τ} .$ (11)

3.2. The Asymptotic Property of ${\hat{θ}}^{*}$

Denote $θ_{0}^{*} = {(ω_{0}^{*}, α_{0}^{*}, β_{0}^{*}, γ_{0}^{*})}^{τ}$ as the true value of the corresponding parameter. If we assume:

(A1) The parameter space of $θ^{*} \in Θ \subset {(0, \infty)}^{4}$ , $Θ$ is compact set;

(A2) $E [log (α_{0}^{*} ε_{t}^{* 2}) + β_{0}^{*}] < 0$ ;

(A3) $E (ε_{i}^{* 4}) < \infty$ .

Given that the conditions are met, then

$\sqrt{T} ({\hat{θ}}^{*} - θ_{0}^{*}) \overset{L}{\to} N (0, Σ^{*}), T \to \infty,$ (12)

where $Σ^{*} = A_{0}^{- 1} B_{0} A_{0}^{- 1}$ , ${(A_{0})}_{i, j} = E (\frac{\partial^{2} l_{t} (θ_{0}^{*})}{\partial θ_{i}^{*} \partial θ_{j}^{*}})$ , ${(B_{0})}_{i, j} = E (\frac{\partial l_{t} (θ_{0}^{*})}{\partial θ_{i}^{*}} \frac{\partial l_{t} (θ_{0}^{*})}{\partial θ_{j}^{*}})$ .

Note: (i) According to the proofs by Visser (2011) and Straumann and Mikosch (2006), let $v_{t} (θ^{*}) = σ_{t}^{* 2} (θ^{*})$ , then equation (8) can be rewritten as:

$\begin{matrix} L_{T} (θ^{*}) = \sum_{t = 1}^{T} l_{t} (θ^{*}) = - \frac{1}{2} \sum_{t = 1}^{T} (\ln (v_{t} (θ^{*})) + \frac{H_{t}^{2}}{v_{t} (θ^{*})}) \\ = - \frac{1}{2} \sum_{t = 1}^{T} (\ln (v_{t} (θ^{*})) + \frac{σ_{t}^{*} ε_{t}^{*}}{v_{t} (θ^{*})}) . \end{matrix}$ (13)

Since $ε_{t}^{*}$ is independent of $σ_{t}^{*}$ and $v_{t} (θ^{*})$ , and satisfies $E (ε_{t}^{*}) = 1$ , we define $L (θ^{*}) = E (l_{0} (θ^{*}))$ , then we have:

$\begin{matrix} L (θ^{*}) = - \frac{1}{2} E (\ln (v_{0} (θ^{*})) + \frac{σ_{0}^{*} ε_{0}^{*}}{v_{0} (θ^{*})}) 2 (L (θ^{*}) - L (θ_{0}^{*})) - 1 \\ = E (\ln (\frac{v_{0} (θ_{0}^{*})}{v_{0} (θ^{*})}) - \frac{v_{0} (θ_{0}^{*})}{v_{0} (θ^{*})}) \end{matrix}$ (14)

Given that $ln (x) - x \leq 1$ , with equality if and only if $x = 1$ , therefore, to maximize $L (θ^{*})$ , it must be that $v_{0} (θ^{*}) = v_{0} (θ_{0}^{*})$ , i.e., $θ^{*} = θ_{0}^{*}$ . Hence, it follows that $\frac{L_{T} (θ^{*})}{T}$ converges uniformly to $L (θ^{*})$ , and the parameter $θ^{*}$ converges almost surely to $θ_{0}^{*}$ .

(ii) The first-order partial derivative and the second-order partial derivative of $l_{t} (θ^{*})$ respectively are:

$\begin{matrix} \frac{\partial l_{t} (θ^{*})}{\partial (θ_{i}^{*})} = \frac{1}{σ_{t}^{* 2} (θ^{*})} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} - H_{t}^{2} \frac{1}{{(σ_{t}^{* 2} (θ^{*}))}^{2}} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} \\ = (1 - \frac{H_{t}^{2}}{σ_{t}^{* 2} (θ^{*})}) \frac{1}{{(σ_{t}^{* 2} (θ^{*}))}^{2}} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} \end{matrix}$ (15)

$\begin{matrix} \frac{\partial^{2} l_{t} (θ^{*})}{\partial θ_{i}^{*} \partial θ_{j}^{*}} = \frac{1}{σ_{t}^{* 4} (θ^{*})} (\frac{2 H_{t}^{2}}{σ_{t}^{* 2} (θ^{*})} - 1) \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{j}} \\ + (σ_{t}^{* 2} (θ^{*}) - H_{t}^{2}) \frac{1}{σ_{t}^{* 4} (θ^{*})} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i} \partial θ_{j}} \end{matrix}$ (16)

When $θ^{*} = θ_{0}^{*}$ , we get

$\begin{array}{l} {(A_{0})}_{i, j} = E (\frac{\partial l_{t} (θ_{0}^{*})}{\partial (θ_{i}^{*}) \partial (θ_{j}^{*})}) = E (\frac{1}{σ_{t}^{* 4} (θ_{0}^{*})} \frac{\partial σ_{t}^{* 2} (θ_{0}^{*})}{\partial θ_{i}} \frac{\partial σ_{t}^{* 2} (θ_{0}^{*})}{\partial θ_{j}}) \end{array}$ (17)

Since $H_{t} = σ_{t}^{*} ε_{t}^{*}$ and $σ_{t}^{*}$ and $ε_{t}^{*}$ are independent, we have:

$\begin{matrix} {(B_{0})}_{i, j} = E (\frac{\partial l_{t} (θ_{0}^{*})}{\partial (θ_{i}^{*})} \frac{\partial l_{t} (θ_{0}^{*})}{\partial (θ_{j}^{*})}) \\ = E (ε_{t}^{* 2} - 1) E (\frac{1}{σ_{t}^{* 4} (θ^{*})} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{j}}) \end{matrix}$ (18)

Thus, the asymptotic variance obtained is $Σ^{*} = A_{0}^{- 1} B_{0} A_{0}^{- 1}$ . Since $v a r (ε_{t}^{*}) = E (ε_{t}^{* 2}) = 1$ , we have $v a r (ε_{t}^{* 2}) = E ({(ε_{t}^{* 2} - 1)}^{2})$ .

Let $G {(θ_{0}^{*})}_{i, j} = E (\frac{1}{σ_{t}^{* 4} (θ^{*})} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{i}} \frac{\partial σ_{t}^{* 2} (θ^{*})}{\partial θ_{j}})$ , then the asymptotic variance of ${\hat{θ}}^{*}$ is

$Σ^{*} = v a r (ε_{t}^{* 2}) G^{- 1} (θ_{0}^{*})$

It is not difficult to see that the elements of the matrix $G (θ^{*})$ and $σ_{t}^{* 2} (θ^{*})$ are both related to the parameters $θ^{*} = (ω^{*}, α^{*}, β^{*}, γ^{*})$ and the first-order partial derivatives of $l_{t} (θ^{*})$ . From this, we can conclude that:

$\begin{matrix} σ_{t}^{* 2} = ω^{*} + α^{*} y_{t - 1}^{2} + β^{*} σ_{t - 1}^{* 2} + γ^{*} x_{t - 1}^{2} \\ = \frac{ω^{*}}{1 - β} + α^{*} \sum_{i = 0}^{\infty} {(β^{*})}^{i} y_{t - i - 1}^{2} + γ^{*} \sum_{i = 0}^{\infty} {(β^{*})}^{i} x_{t - i - 1}^{2}, \end{matrix}$ (19)

and

$\begin{array}{l} \frac{\partial σ_{t}^{* 2}}{\partial ω^{*}} = \frac{1}{1 - β}, \frac{\partial σ_{t}^{* 2}}{\partial α^{*}} = \sum_{i = 0}^{\infty} {(β^{*})}^{i - 1} y_{t - i}^{2}, \\ \frac{\partial σ_{t}^{* 2}}{\partial β^{*}} = \sum_{i = 0}^{\infty} {(β^{*})}^{i - 1} σ_{t - i}^{2}, \frac{\partial σ_{t}^{* 2}}{\partial γ^{*}} = \sum_{i = 0}^{\infty} {(β^{*})}^{i - 1} x_{t - i}^{2} . \end{array}$ (20)

Let the asymptotic variances of the estimates $\hat{ω}, {\hat{α}}^{*}, {\hat{β}}^{*}, {\hat{γ}}^{*}$ be $σ_{ω}^{* 2}, σ_{α}^{* 2}, σ_{β}^{* 2}, σ_{γ}^{* 2}$ respectively. Based on the asymptotic properties of ${\hat{θ}}^{*}$ , the following results are obtained:

$\begin{array}{l} \sqrt{T} (\hat{ω} - ω_{0}) \overset{L}{\to} N (0, σ_{ω}^{* 2} / μ^{2}) \\ \sqrt{T} (\hat{α} - α_{0}) \overset{L}{\to} N (0, σ_{α}^{* 2} / μ^{2}) \\ \sqrt{T} (\hat{β} - β_{0}) \overset{L}{\to} N (0, σ_{β}^{* 2} / μ^{2}) \\ \sqrt{T} (\hat{γ} - γ_{0}) \overset{L}{\to} N (0, σ_{γ}^{* 2} / μ^{2}) \end{array}$ (21)

3.3. Selection of Optimal Volatility Representation

We can see from the estimates that the choice of volatility representation affects the efficiency of parameter estimates. Therefore, we need to provide criteria for selecting the volatility representation. Based on the criterion of minimizing the asymptotic variance of the estimates, we proposed the optimal volatility representation selection criterion function for the general quasi-maximum likelihood estimation:

$M H = \frac{E (H_{t}^{4})}{{(E H_{t}^{2})}^{2}} .$ (22)

Li and Zhang (2021) prove that the smaller the value of $M H$ , the smaller the asymptotic variance of the parameter estimates, leading to more accurate estimates. This paper also seeks to select the optimal volatility representation by finding the minimum value of $M H$ .

4. Simulation

This paper uses numerical simulations to evaluate the actual effects of parameter estimation. To generate the simulation data for the model (2a-2b), we refer to Visser’s method. First, we simulate the standard stochastic process $Z (u)$ with its simulation process expressed as follows:

$\begin{array}{l} d Γ_{t} (u) = - δ (Γ_{t} (u) - μ_{Γ}) d u + σ_{Γ} d B_{t}^{(2)} (u) \\ d Z_{t} (u) = \exp (Γ_{t} (u)) d B_{t}^{(1)} (u), u \in [0, 1] \end{array}$

Where $B_{t}^{(1)}$ , $B_{t}^{(2)}$ are two independent Brownian motions, and $Z (0) = 0$ , $Γ (0)$ is randomly generated from a stationary distribution $(μ_{Γ}, σ_{Γ}^{2})$ , The intraday time is set to a frequency of 1 minute, dividing the time interval $[0, 1]$ into 240 small intervals corresponding to the 1-minute frequency of a typical trading day. The parameters are set as $δ = 1 / 2$ , $σ_{Γ} = 1 / 4$ , $μ_{Γ} = - 1 / 16$ , allowing us to simulate the discretized standard process $Z (u)$ .

Considering the mechanisms for generating two sets of data, they are specified as follows:

Model (1):

$\begin{array}{l} Y_{t} (u) = σ_{t} Z_{t} (u), \\ σ_{t}^{2} = 0.25 + 0.1 y_{t - 1}^{2} + 0.6 σ_{t - 1}^{2} + 0.01 x_{t - 1}^{2} \end{array}$

Model (2):

$\begin{array}{l} Y_{t} (u) = σ_{t} Z_{t} (u), \\ σ_{t}^{2} = 0.1 + 0.3 y_{t - 1}^{2} + 0.2 σ_{t - 1}^{2} + 0.05 x_{t - 1}^{2} . \end{array}$

Where the exogenous variables in Model 1 follow a t-distribution, while those in Model 2 follow a standard normal distribution.

In the parameter estimation process, based on Visser’s research, realized volatility (RV) is chosen as the representation of volatility, calculated as follows:

$R V_{t} (k) = \sqrt{\sum_{i = 1}^{m} {[Y_{t} (u_{i, k}) - Y_{t} (u_{(i - 1), k})]}^{2}},$ (23)

Where $k$ represents sampling frequency, and $m$ represents the number of returns in a day. In the simulation, sampling frequencies of 5 minutes, 15 minutes, and 30 minutes are chosen to represent realized volatility $R V 5$ , $R V 15$ , and $R V 30$ , respectively. To illustrate the superiority of high-frequency information, an additional volatility representation $H_{t} = | y_{t} |$ is considered for comparison. The number of trading days in the simulation is set as $T = 500, 700, 900$ and repeated for a total of 1000 times.

Table 1 and Table 2 present the parameter estimation results under different volatility representations and trading days. Among them, Bias represents the estimation bias, the estimated standard deviation, and the Mean.MH is the iterative mean of the criterion function value MH estimated based on (22). From the results in the tables, regardless of the assumptions made about the model parameters and error terms, the estimation method proposed in this paper can accurately estimate unknown parameters. Moreover, as the sample size increases, the sample bias, standard deviation, and MH mean for the same volatility representation decrease. For the same sample size, the bias, standard deviation, and MH mean estimated using the realized volatility $R V$ as the volatility representation are smaller than when using $| y_{t} |$ as the volatility representation. It can also be observed that as the sampling frequency of $R V$ increases, the MH mean decreases. The effectiveness of the volatility representation in estimation can be ranked as follows: $R V 5 > R V 15 > R V 30 > | y_{t} |$ , which indicates that incorporating high-frequency data into the volatility representation model significantly improves the parameter estimation results.

Table 1. The estimated results of Model 1.

			$\| y_{t} \|$	$R V 5$	$R V 15$	$R V 30$
$T = 500$	$\hat{ω}$	Bias	0.0449	−0.0001	0.0022	0.0053
	$\hat{ω}$	SD	0.1966	0.0633	0.0727	0.0886
	$\hat{α}$	Bias	0.0048	0.0034	0.0037	0.0045
	$\hat{α}$	SD	0.0572	0.0218	0.0242	0.0285
	$\hat{β}$	Bias	−0.0677	−0.00677	−0.0108	−0.0157
	$\hat{β}$	SD	0.2507	0.0813	0.0952	0.1164
	$\hat{γ}$	Bias	0.0132	0.0019	0.0028	0.0037
	$\hat{γ}$	SD	0.0295	0.0102	0.0119	0.0137
	Mean.MH		4.0051	1.4279	1.5477	1.7233
$T = 700$	$\hat{ω}$	Bias	0.0319	0.0001	0.0015	0.0058
	$\hat{ω}$	SD	0.1723	0.0526	0.0623	0.0759
	$\hat{α}$	Bias	0.0048	0.0017	0.0019	0.0028
	$\hat{α}$	SD	0.0527	0.0171	0.0204	0.0236
	$\hat{β}$	Bias	0.0095	−0.0040	−0.0064	−0.0127
	$\hat{β}$	SD	0.2257	0.0675	0.0817	0.0988
	$\hat{γ}$	Bias	−0.0487	0.0010	0.0018	0.0024
	$\hat{γ}$	SD	0.0254	0.0089	0.0103	0.0119
	Mean.MH		3.9818	1.4083	1.522	1.6933
$T = 900$	$\hat{ω}$	Bias	0.0302	0.0000	0.0004	0.0014
	$\hat{ω}$	SD	0.1595	0.0492	0.0562	0.0660
	$\hat{α}$	Bias	0.0021	0.0020	0.0024	0.0022
	$\hat{α}$	SD	0.0422	0.0153	0.0175	0.0203
	$\hat{β}$	Bias	−0.0426	−0.0031	−0.0040	−0.0056
	$\hat{β}$	SD	0.2046	0.0623	0.0726	0.0854
	$\hat{γ}$	Bias	0.0073	0.0005	0.0009	0.0017
	$\hat{γ}$	SD	0.0227	0.0075	0.0084	0.0100
	Mean.MH		3.9487	1.3975	1.5137	1.6821

Table 2. The estimated results of Model 2.

			$\| y_{t} \|$	$R V 5$	$R V 15$	$R V 30$
$T = 500$	$\hat{ω}$	Bias	0.0321	−0.0034	−0.0011	0.0025
	$\hat{ω}$	SD	0.1550	0.0504	0.0588	0.0710
	$\hat{α}$	Bias	0.0071	0.0028	0.0027	0.0029
	$\hat{α}$	SD	0.0603	0.0207	0.0232	0.0273
	$\hat{β}$	Bias	−0.0632	−0.0039	−0.0074	−0.0130
	$\hat{β}$	SD	0.2333	0.0740	0.0869	0.1060
	$\hat{γ}$	Bias	0.0137	0.0032	0.0038	0.0045
	$\hat{γ}$	SD	0.0244	0.0069	0.0077	0.0091
	Mean.MH		4.1522	1.4744	1.5951	1.7747
$T = 700$	$\hat{ω}$	Bias	0.0221	−0.0011	0.0003	0.0030
	$\hat{ω}$	SD	0.1383	0.0409	0.0498	0.0586
	$\hat{α}$	Bias	0.0066	0.0031	0.0031	0.0040
	$\hat{α}$	SD	0.0512	0.0178	0.0203	0.0234
	$\hat{β}$	Bias	−0.0460	−0.0053	−0.0075	−0.0128
	$\hat{β}$	SD	0.2067	0.0595	0.0721	0.0862
	$\hat{γ}$	Bias	0.0106	0.0025	0.0030	0.0039
	$\hat{γ}$	SD	0.0193	0.0062	0.0069	0.0081
	Mean.MH		4.1288	1.4579	1.5768	1.7554
$T = 900$	$\hat{ω}$	Bias	0.0204	−0.0008	0.0003	0.0017
	$\hat{ω}$	SD	0.1242	0.0369	0.0430	0.0510
	$\hat{α}$	Bias	0.0065	0.0022	0.0024	0.0031
	$\hat{α}$	SD	0.0470	0.0156	0.0179	0.0206
	$\hat{β}$	Bias	−0.0449	−0.0061	−0.0085	−0.0115
	$\hat{β}$	SD	0.1861	0.0537	0.0635	0.0754
	$\hat{γ}$	Bias	0.0096	0.0021	0.0027	0.0032
	$\hat{γ}$	SD	0.0169	0.0051	0.0060	0.0068
	Mean.MH		4.0352	1.4342	1.5542	1.7288

5. Real Data Analysis

5.1. Data Description

To demonstrate that the high-frequency GARCH-X model constructed in this paper has better predictive performance, we establish a high-frequency GARCH-X model using the CSI 300 Index as the research object, with the CSI 500 Index selected as the exogenous variable. The data covers a period from January 3, 2019 to July 15, 2021, totaling 615 trading days, with a sampling frequency of one minute. All data are sourced from the Tongdaxin Financial Terminal.

The CSI 300 Index is selected in this paper because it is one of the earliest indices launched in China, comprising the 300 most representative securities with large scale and good liquidity from the Shanghai Stock Exchange and the Shenzhen Stock Exchange. This index serves as a price revelation function that characterizes fluctuations in stock prices and is a key indicator that best reflects the overall trend of the CSI market. The CSI 500 Index, on the other hand, is composed of the top 500 stocks in terms of total market capitalization after excluding the constituent stocks of the CSI 300 Index and the top 300 stocks in total market capitalization among all A-shares. This index is an important indicator reflecting the overall performance of small-cap stocks. There is a certain correlation between these two indices. Therefore, this paper studies the CSI 500 Index as an exogenous variable together with the CSI 300 Index.

The opening hours of China’s stock exchanges are from 9:30 AM to 11:30 AM and from 1:00 PM to 3:00 PM, totaling 4 hours or 240 minutes. Therefore, all price sequence data consist of 1-minute closing prices, generating 240 data points per day. Data at a 5-minute frequency yields 48 data points per day, while data at a 10-minute frequency produces 24 data points per day, and data at a 30-minute frequency results in 8 data points per day.

The daily logarithmic rate of returns is calculated as follows:

$Y_{t} (u) = [ln g (P_{t} (u)) - log (P_{t - 1} (1))] * 100.$

Based on the calculation method for realized volatility presented in (23, we use high-frequency data at 1-minute, 5-minute, 10-minute, and 30-minute intervals to calculate the realized volatility of intraday returns, denoted as RV1, RV5, RV10, and RV30, respectively. Figure 1 presents the time series plots of the four realized volatilities for the CSI 300 Index. As can be seen from Figure 1, there is no apparent periodicity in the high-frequency data of the CSI 300 Index. According to the ADF test, these four realized volatilities are stationary series (with a p-value of 0.03 for RV1 and 0.01 for the others). Therefore, predictions can be made based on these series.

Figure 2 and Figure 3 present the realized volatility series plot and Q-Q plot of the exogenous variable, the CSI 500 Index, respectively. From Figure 2, it can be observed that the exogenous variable does not exhibit any apparent periodicity. According to the ADF test, these four sequences are stationary (with a p-value of 0.03 for RV1 and 0.01 for the others), and they do not follow a normal distribution.

Figure 1. The realized volatility series plot of the CSI 300 Index.

Figure 2. The realized volatility series plot of the CSI 500 Index.

Figure 3. The Q-Q plot of the returns of the CSI 500 Index.

5.2. Estimation and Volatility Selection of High-Frequency GARCH-X Model

According to the aforementioned estimation method of the GARCH-X model under high-frequency data, we respectively give the parameter estimation results of the model and the corresponding MH estimates under different volatility representations, as shown in Table 3.

Table 3. Parameter estimation and MH value of High frequency GARCH-X model.

$H_{t}$	$\hat{ω}$	$\hat{α}$	$\hat{β}$	$\hat{γ}$	$\hat{M} H$
$\| y_{t} \|$	0.0187	0.1211	0.8007	0.0314	7.5758
$R V 1$	0.0044	0.2477	0.5402	0.14	1.8376
$R V 5$	0.0161	0.1065	0.6671	0.2951	2.1272
$R V 10$	0.0076	0.1103	0.8740	0.0059	2.1753
$R V 30$	0.0094	0.0947	0.8829	0.0018	2.4198

It can be seen from Table 3 that among the four frequencies, the minimum MH value corresponds to the frequency of 1 minute. As the frequency decreases, the MH value increases, corresponding to the minimum $V a r {(ε_{t}^{*})}^{2}$ of the 1 minute. As the frequency decreases, the $V a r {(ε_{t}^{*})}^{2}$ value increases, and the MH value of all frequencies is smaller than that of the daily frequency. Therefore, it is appropriate to select the 1-minute volatility representation RV1 as the optimal volatility representation, and its corresponding estimation result as the final parameter estimation result.

Therefore, the results of the low-frequency GARCH(1,1)-X model fitting the CSI 300 index and selecting the 500 volatility index as exogenous variables are as follows:

$\begin{array}{l} y_{t} = σ_{t} ε_{t} \\ {\hat{σ}}_{t}^{2} = 0.0187 + 0.1211 y_{t - 1}^{2} + 0.8007 {\hat{σ}}_{t - 1}^{2} + 0.0314 x_{t - 1}^{2} \end{array}$ (24)

The fitting result of the GARCH(1,1)-X model with realized volatility in 1 minute is as follows:

$\begin{array}{l} y_{t} = σ_{t} ε_{t} \\ {\hat{σ}}_{t}^{2} = 0.0044 + 0.2477 y_{t - 1}^{2} + 0.5402 {\hat{σ}}_{t - 1}^{2} + 0.14 x_{t - 1}^{2} \end{array}$ (25)

Based on the obtained fitting results, the volatility estimates based on daily and 1-minute realized volatilities are shown in Figure 4. It is evident from the figure that the trends of these two estimates are nearly identical.

Figure 4. Volatility estimation plot.

5.3. VaR Estimation Using the High-Frequency GARCH-X Model

In today’s financial markets, there is a strong emphasis on financial risk management. Value at Risk (VaR) stands out as an effective measurement method in risk management for financial markets, quantifying the potential maximum loss that a portfolio may face within a certain holding period and at a given confidence level. It has become a widely used metric.

This paper applies the estimation results of high-frequency data using the GARCH(1,1)-X model, namely (25), to the VaR estimation and forecasting of the CSI 300 Index. The VaR estimation process based on high-frequency data is as follows: Firstly, the parameters $θ = {(ω, α, β, γ)}^{τ}$ of the GARCH(1,1)-X model are estimated as $\hat{θ} = (0.0044, 0.2477, 0.5402, 0.14)$ through model fitting using high-frequency data. Next, by substituting the estimated values $\hat{θ}$ and daily logarithmic returns into the conditional variance formula of Equation (25) in the GARCH(1,1)-X model, we obtain 514 estimated conditional variances, denoted as ${\hat{h}}_{t}$ . Let $ε_{t} = \frac{H_{t}}{\sqrt{{\hat{h}}_{t}}}$ , resulting in a sequence of $ε_{t}$ . Setting the quantile $p = 0.1$ , we take the 0.1 quantile of $ε_{t}$ as ${\hat{ε}}_{0.1}$ , finally, we set ${\hat{q}}_{t} = {\hat{ε}}_{0.1} \sqrt{{\hat{h}}_{t}}$ , which represents the VaR estimate of $H_{t}$ on day $t$ at the $p = 0.1$ level at time $t - 1$ .

For comparison and forecasting purposes, the first 514 trading days of data are used as the training set to estimate model parameters and VaR estimates. Then, a rolling forecasting method is employed to calculate the VaR forecast values for the subsequent 101 samples (forecasting set). The VaR estimation results for the training set and the VaR forecasting results for the forecasting set are shown in Figure 5.

Figure 5. VaR variation chart for training and prediction sets of the CSI 300 Index.

The left graph in Figure 5 shows the estimated results of VaR for the training set with a significance level of 0.1. It can be seen that there are over twenty instances where the predicted returns are greater than the actual returns during this period, while most of the time, the predicted returns are less than the actual returns. According to theory, there are 514 trading days in the training set, and 10% of the sample points, which amounts to an average of 52 trading days, should fall outside the VaR. This indicates that the prediction results for the training set are quite satisfactory. The right graph in Figure 5 presents the prediction results for the forecasting set. The forecasting set consists of 101 trading days, and with a 0.1 quantile set, ideally, an average of 10 sample points fall outside the VaR. In the results of this forecasting set, there are 10 sample points that fall outside the VaR, which suggests that the prediction results for the forecasting set are also quite satisfactory. Overall, it can be concluded that the high-frequency GARCH model with exogenous variables has good predictive power for VaR.

To assess whether the high-frequency GARCH(1,1)-X model with exogenous variables can better reflect the estimation and prediction of the CSI 300 Index, the Mean Absolute Error (MAE) is introduced as a criterion for evaluating the VaR prediction results:

$MAE = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} | .$

To test whether the high-frequency GARCH(1,1)-X model with exogenous variables can better reflect the estimation and prediction of the CSI 300 Index, referring to Iqbal and Mukherjee’s (2012) test on the VaR prediction results of training set. Let $N^{*} = \sum_{t = 1}^{N} I_{t}$ , $\hat{p} = \frac{N^{*}}{N}$ , where $I_{t} = I (H_{t} < {\hat{q}}_{t})$ , with $N$ being the total number of samples. Let $N_{i, j}$ be the count of points $t$ such that $I_{t - 1} = j$ and $I_{t} = i$ for $2 \leq t \leq N$ , where $i, j = 0, 1$ , also, let ${\hat{τ}}_{i, j} = \frac{N_{i, j}}{N_{i, 0} + N_{i, 1}}$ , $\hat{τ} = \frac{N_{0, 1} + N_{1, 1}}{N}$ . Define:

$L R_{u c} = 2 {log [\frac{{(1 - \hat{p})}^{N - N^{*}} {\hat{p}}^{N^{*}}}{{(1 - p)}^{N - N^{*}} p^{N^{*}}}]},$

$L R_{i n d} = 2 {log [\frac{{(1 - {\hat{τ}}_{0, 1})}^{N_{0, 0}} {\hat{τ}}_{0, 1}^{N_{0, 1}} {(1 - {\hat{τ}}_{1, 1})}^{N_{1, 0}} {\hat{τ}}_{1, 1}^{N_{1, 1}}}{(1 - {\hat{τ}}^{N_{0, 0} + N_{1, 0}}) {\hat{τ}}^{N_{0, 1} + N_{1, 1}}}]},$

then the statistic $L R_{c c} = L R_{u c} + L R_{i n d}$ is asymptotically distributed as $χ^{2}$ .

Comparing the low-frequency GARCH model, high-frequency GARCH model, and high-frequency GARCH model with exogenous variables, the results are shown in Table 4.

Table 4. The comparison results of the models.

Model	LRcc	MAE
Low-frequency GARCH model with exogenous variables	23.5	0.6132
High-frequency GARCH model with exogenous variables	0.0255	0.4317
High-frequency GARCH model	3.15	0.4446

As can be seen from Table 4, in the training set, the LRcc value of the high-frequency GARCH model with exogenous variables is the smallest, followed by the GARCH model with high-frequency data, and the LRcc value of the GARCH model with low-frequency data is the largest. In the prediction set, the ordering of the MAE value is the same as the LRcc value ordering of the training set. This indicates that the prediction results of the high-frequency GARCH model with exogenous variables closer to the real value and that the prediction of the high-frequency GARCH model with exogenous variables is better and more effective.

6. Conclusion

Based on the traditional GARCH-X model, the QMLE estimation of the GARCH-X model with high-frequency data is introduced in this paper. According to the existing research on the volatility representation model and the parameter estimation method of the GARCH model, it is further extended to the parameter estimation research of the GARCH-X model and verified by simulation. In the empirical part, with 500 volatility return as an exogenous variable, the empirical analysis of the return of the CSI 300 index is as follows: 1) The optimal realized volatility is selected as RV1 by MH value; 2) High-frequency data can obtain more information. Through the estimation and prediction of VaR, it can be seen that the model with high-frequency data is better than the model with low-frequency data, and the model with exogenous variables is better than the model without exogenous variables, which also verifies the validity of the high-frequency GARCH model with exogenous variables proposed in this paper and it can be better applied to the financial field.

This study can be generalized in several aspects. First, in this study, only realized volatility is introduced as the proxy for high-frequency data volatility, resulting in a relatively single variable selection. In fact, in existing research, there are numerous other volatility proxies that can be studied in depth, such as realized range, realized kernel estimation, etc. These different volatility proxies may capture different characteristics of financial market volatility, and are expected to bring new insights to the study of volatility in high-frequency data. Second, to further improve the model accuracy, more parameter estimation methods should be attempted in subsequent research, such as quasi-maximum exponential likelihood estimation. Lastly, this study mainly focuses on the basic GARCH model. In the future, it is advisable to consider expanding to other benchmark models such as GJR-GARCH, EGARCH, TGARCH, and FGARCH. In the context of high-frequency data, after introducing exogenous variables, compare the improvement effects of the predictive power of each model. Through comparative research, a more comprehensive understanding of the advantages and disadvantages of different models in high-frequency data prediction can be achieved, providing a more targeted basis for model selection in practical applications.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Apergis, N., & Rezitis, A. (2011). Food Price Volatility and Macroeconomic Factors: Evidence from GARCH and GARCH-X Estimates. Journal of Agricultural and Applied Economics, 43, 95-110. https://doi.org/10.1017/s1074070800004077
[2]	Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31, 307-327. https://doi.org/10.1016/0304-4076(86)90063-1
[3]	Engle, R. F., Lilien, D. M., & Robins, R. P. (1987). Estimating Time Varying Risk Premia in the Term Structure: The Arch-M Model. Econometrica, 55, 391-407. https://doi.org/10.2307/1913242
[4]	Glosten, L. R., Jagannathan, R., & Runkle, D. E. (1993). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 48, 1779-1801. https://doi.org/10.1111/j.1540-6261.1993.tb05128.x
[5]	Han, H. (2015). Asymptotic Properties of GARCH-X Processes. Journal of Financial Econometrics, 13, 188-221. https://doi.org/10.1093/jjfinec/nbt023
[6]	Han, H., & Kristensen, D. (2014). Asymptotic Theory for the QMLE in GARCH-X Models with Stationary and Nonstationary Covariates. Journal of Business & Economic Statistics, 32, 416-429. https://doi.org/10.1080/07350015.2014.897954
[7]	Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized GARCH: A Joint Model for Returns and Realized Measures of Volatility. Journal of Applied Econometrics, 27, 877-906. https://doi.org/10.1002/jae.1234
[8]	Iqbal, F., & Mukherjee, K. (2012). A Study of Value-at-Risk Based on M-Estimators of the Conditional Heteroscedastic Models. Journal of Forecasting, 31, 377-390. https://doi.org/10.1002/for.1224
[9]	Lee, O. (2017). Some Limiting Properties for GARCH (p, q)-X Processes. Journal of the Korean Data and Information Science Society, 28, 697-707.
[10]	Li, L. L., & Zhang, X. F. (2021). Daily Frequency GARCH Model Estimation Based on High Frequency Data. Journal of Guangxi Normal University (Natural Science Edition), 39, 68-78.
[11]	Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59, 347-370. https://doi.org/10.2307/2938260
[12]	Qian, Z., & Xu, X. (2023). An Option Valuation Formula for Stochastic Volatility Driven by GARCH Processes. Journal of Mathematical Finance, 13, 221-247. https://doi.org/10.4236/jmf.2023.132015
[13]	Singvejsakul, J., Chaovanapoonphol, Y., & Limnirankul, B. (2021). Modeling the Price Volatility of Cassava Chips in Thailand: Evidence from Bayesian GARCH-X Estimates. Economies, 9, Article 132. https://doi.org/10.3390/economies9030132
[14]	Straumann, D., & Mikosch, T. (2006). Quasi-Maximum-Likelihood Estimation in Conditionally Heteroscedastic Time Series: A Stochastic Recurrence Equations Approach. The Annals of Statistics, 34, 2449-2495. https://doi.org/10.1214/009053606000000803
[15]	Visser, M. P. (2009). Volatility Proxies and GARCH Models. University of Amsterdam.
[16]	Visser, M. P. (2011). GARCH Parameter Estimation Using High-Frequency Data. Journal of Financial Econometrics, 9, 162-197. https://doi.org/10.1093/jjfinec/nbq017
[17]	Wang, M., Chen, Z., & Wang, C. D. (2018). Composite Quantile Regression for GARCH Models Using High-Frequency Data. Econometrics and Statistics, 7, 115-133. https://doi.org/10.1016/j.ecosta.2016.11.004
[18]	Wu, X., Zhao, A., & Cheng, T. (2023). A Real-Time GARCH-MIDAS Model. Finance Research Letters, 56, Article 104103. https://doi.org/10.1016/j.frl.2023.104103

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies