^{1}

^{1}

^{1}

In this paper, the estimation of the parameters in partial functional linear models with ARCH(p) errors is discussed. With employing the functional principle component, a hybrid estimating method is suggested. The asymptotic normality of the proposed estimators for both the linear parameter in the mean model and the parameter in the ARCH error model is obtained, and the convergence rate of the slope function estimate is established. Besides, some simulations and a real data analysis are conducted for illustration, and it is shown that the proposed method performs well with a finite sample.

In order to combine the flexibility of linear regression models with the recent methodology for the functional linear regression models, partial functional linear models, which was introduced by [

Y = β ′ z + ∫ T γ ( t ) X ( t ) d t + ε , (1.1)

where Y is a real-valued response random variable, z is a d-dimensional vector of random variables with zero means and finite second moments, and X ( t ) is an explanatory functional variable defined on T with zero mean and finite second moments (i.e. E | X ( t ) | 2 < ∞ , for all t ∈ T ), β is a d-dimensional vector of unknown parameters, γ ( t ) is a square integrable function on T, ε is a random error and is independent of z and X. For simplicity, without loss of generality, it is assumed that T = [ 0,1 ] in the remainder of this paper. All the random variables are defined on the same probability space ( Ω , A , P ) .

Model (1.1) has been studied by many authors from different points. From the view of estimation of model (1.1), for example, reference [

However, all the works have a common assumption that the responses are observed independently. As is well known, uncertainty such as volatility uncertainty is a common phenomenon in modern economic and financial theory. Therefore, the assumption of independence of the response observations is not valid in the real data analysis. Motivated by the fact mentioned above, we may want to reconsider the model (1.1) so that it can reflect the volatility of the data. Fortunately, conditional heteroscedasticity can reflect the size of volatility appropriately. One of the most popular models which can show the heteroscedasticity in econometrics is the autoregressive conditional heteroscedasticity (ARCH) model which was introduced by [

If we have n observations { ( z 1 , X 1 , Y 1 ) , ⋯ , ( z n , X n , Y n ) } , model (1.1) can be written as

Y i = β ′ z i + ∫ 0 1 γ ( t ) X i ( t ) d t + ε i . (1.2)

The ARCH(p) model for { ε i } is defined by the following equations:

{ ε i = e i h i 1 / 2 , h i = α 0 + α 1 ε i − 1 2 + α 2 ε i − 2 2 + ⋯ + α p ε i − p 2 , (1.3)

where α 0 > 0 , α i ≥ 0 , i = 1 , ⋯ , p . Besides, { e i : i ≥ 1 } is an independent and identically distributed (i.i.d.) random sequence and independent of { ε t : t < i } with E e i = 0 and E e i 2 = 1 . For sake of establishing the asymptotic properties of the joint model (1.2) and (1.3), in this paper, we assume that the distribution functions { F i } of { ε i 2 } are absolutely continuous with continuous densities f i , which is uniformally bounded away from 0 and ¥ at the 1/2-th quantile points ξ i , i = 1 , ⋯ , n . Moreover, similar to [

The ordinary regression models with ARCH errors have been considered by many authors. For example, [

The rest of the paper is organized as follows. Section 2 gives the estimation of parameters for the partial functional linear regression models as well as ARCH(p) errors. Asymptotic theory of the proposed estimators is given in Section 3. In Section 4, we carry out a simulation study to illustrate the finite sample performance, and a real data analysis is conducted in Section 5. Some preliminary lemmas and the proofs of the theorems are presented in Appendix.

Firstly, we shall study how to produce the estimators β ^ , γ ^ of β , γ in this section. Let 〈 ⋅ , ⋅ 〉 and ‖ ⋅ ‖ denote inner product and norm on L 2 [ 0,1 ] respectively. Denote the covariance function of process X by C_{X} which is continuous on T × T . Then we have the following expansion

C X ( s , t ) = ∑ j = 1 ∞ λ j ρ j ( s ) ρ j (t)

by Mercer’s theorem ( [

X i ( t ) = ∑ j = 1 ∞ U i j ρ j (t)

and

γ ( t ) = ∑ j = 1 ∞ γ j ρ j ( t ) ,

where U i j = 〈 X i , ρ j 〉 are uncorrelated random variables with E [ U i j ] = 0 and E [ U i j 2 ] = λ j , and γ j = 〈 γ , ρ j 〉 . Then (1.2) is equivalent to

Y i = β ′ z i + ∑ j = 1 ∞ γ j U i j + ε i , i = 1 , ⋯ , n . (2.1)

To estimate the parameters in (1.2), following [

Y i ≐ β ′ z i + ∑ j = 1 m γ j U i j + ε i , i = 1 , ⋯ , n , (2.2)

where m → ∞ as n → ∞ . Furthermore, we employ the empirical version of C X

C ^ X ( s , t ) = 1 n ∑ i = 1 n X i ( s ) X i ( t ) = ∑ j = 1 ∞ λ ^ j ρ ^ j ( s ) ρ ^ j (t)

with ( λ ^ j , ρ ^ j ) being the pairs of eigenvalues and eigenfunctions of covariance operator related to C ^ X and λ ^ 1 ≥ λ ^ 2 ≥ ⋯ ≥ 0 , and substitute U i j in (2.2) with U ^ i j = 〈 X i , ρ ^ j 〉 . To get an elegant matrix form for model (2.2), denote Y = ( Y 1 , ⋯ , Y n ) ′ , Z = ( z 1 , ⋯ , z n ) ′ , U m = ( U ^ i j ) i = 1 , ⋯ , n j = 1 , ⋯ , m , γ ˜ = ( γ 1 , ⋯ , γ m ) ′ and ε = ( ε 1 , ⋯ , ε n ) ′ . Then (2.2) can be rewritten as

Y ≐ Z β + U m γ ˜ + ε ,

and the least square estimator β ^ and γ ˜ ^ are given by

( β ^ ′ , γ ˜ ^ ′ ) ′ = arg min ( Y − Z β − U m γ ˜ ) ′ ( Y − Z β − U m γ ˜ ) .

By simple calculation, we have

β ^ = ( Z ′ ( I − V m ) Z ) − 1 Z ′ ( I − V m ) Y

with V m = U m ( U ′ m U m ) − 1 U ′ m and

γ ˜ ^ = ( U ′ m U m ) − 1 U ′ m ( Y − Z β ^ )

provided that ( Z ′ ( I − V m ) Z ) − 1 exists (this is true with probability tending to 1, see Lemma 1 in [

γ ^ ( ⋅ ) = ∑ j = 1 m γ ^ j ρ ^ j ( ⋅ ) ,

where γ ^ j is the jth element of γ ˜ ^ .

To get asymptotic properties of β ^ , let C ^ z = n − 1 ∑ i = 1 n z i z ′ i , C ^ z Y = n − 1 ∑ i = 1 n z i Y i , C ^ z X ( t ) = n − 1 ∑ i = 1 n z i X i ( t ) , C ^ X z ( t ) = ( C ^ z X ( t ) ) ′ and C ^ Y X ( t ) = n − 1 ∑ i = 1 n Y i X i ( t ) . Then β ^ is equal to

β ^ = ( C ^ z − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 C ^ X z , ρ ^ j 〉 λ ^ j ) − 1 ( C ^ z Y − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 C ^ Y X , ρ ^ j 〉 λ ^ j ) , (2.3)

with 〈 C ^ z X , ρ ^ j 〉 = n − 1 ∑ i = 1 n z i 〈 X i , ρ ^ j 〉 and 〈 C ^ Y X , ρ ^ j 〉 = n − 1 ∑ i = 1 n Y i 〈 X i , ρ ^ j 〉 . Similarly, γ ^ j can be represented as γ ^ j = 〈 C ^ Y X − β ^ ′ C ^ z X , ρ ^ j 〉 / λ ^ j , j = 1, ⋯ , m .

So far, we have already obtained the estimator β ^ and γ ^ , now we turn to consider the estimation of α = ( α 0 , α 1 , ⋯ , α p ) ′ . Denote by

ε ^ i = Y i − β ^ ′ z i − ∑ j = 1 m γ ^ j U ^ i j , i = 1, ⋯ , n

the residuals. For ARCH(p) models, in view of the higher peak and heavy tail phenomenon, unlike Sastry’s idea that regress ε ^ i 2 on a column of ones and ε ^ i − 1 2 by minimizing the sum of the square of residuals, after getting the residuals ε ^ i , i = 1 , ⋯ , n to get the parameter’s estimate of ARCH (1) sequence ( [

α ^ = arg min α ∈ R p + 1 ∑ i = p + 1 n | ε ^ i 2 − α 0 − α 1 ε ^ i − 1 2 − ⋯ − α p ε ^ i − p 2 | , (2.4)

where α ^ = ( α ^ 0 , α ^ 1 , ⋯ , α ^ p ) ′ .

We first state the assumptions under which the asymptotic properties are proved, then present the theorems. Let ρ ( D ) and D ⊗ m denote the spectral radius and Kronecker product of matrix D respectively, and F t = σ { ε i : i ≤ t } in the following.

It is easy to see that E [ ε t | F t − 1 ] = 0 , E [ ε t 2 | F t − 1 ] = h t , namely, the ARCH(p) process forms a martingale difference sequence with E [ ε t 2 ] = α 0 1 − α 1 − ⋯ − α p . In order to attain the stationary solution and guarantee the existence of high moment of { ε t } , we suppose that

0 < α 0 < ∞ , ∑ j = 1 p α j < 1 , ρ ( Σ r ) < 1 (3.1)

for some integer r ≥ 1 , where Σ r = E ( D t ⊗ r ) ,

D t = ( α 1 e t 2 ⋯ α p − 1 e t 2 α p e t 2 1 ⋯ 0 0 ⋮ ⋱ ⋮ ⋮ 0 ⋯ 1 0 ) .

Then, as [

ε t 2 = α 0 [ e t 2 + ∑ j = 1 ∞ e t − j 2 δ ′ 1 ( ∏ i = 0 j − 1 D t − i ) δ 1 ] (3.2)

with δ 1 = ( 1,0, ⋯ ,0 ) ′ .

In the following, let C denote positive constant which may change from line to line. It is assumed that the random function X satisfies

E ‖ X ‖ 4 < ∞ (3.3)

and for each j

E [ U j 4 ] ≤ C λ j 2 (3.4)

for some constant C. For the eigenvalues of C X , assume that there exist C and a > 1 such that

C − 1 j − a ≤ λ j ≤ C j − a , λ j − λ j − 1 ≥ C j − a − 1 , j ≥ 1 (3.5)

to prevent the spacings among eigenvalues being too small. In order to guarantee that the regression weight function γ is smoother than the sample path X, for the Fourier coefficients γ j , we suppose that

| γ j | ≤ C j − b (3.6)

for some constant C and b > a / 2 + 1 . On the tuning parameter, we assume that

m ~ n 1 / ( a + 2 b ) , (3.7)

where a n ~ b n means there exist constants 0 < L < M < ∞ s.t. L ≤ a n b n ≤ M for all n. Besides, we also assume that

E ‖ z ‖ R d 4 < ∞ (3.8)

for the random vector z with ‖ z ‖ R d = ( z ′ z ) 1 2 and C z k X ( ⋅ ) = Cov ( z k , X ( ⋅ ) ) satisfies

| 〈 C z k X , ρ j 〉 | ≤ C j − ( a + b ) (3.9)

for each k = 1 , 2 , ⋯ , d and j ≥ 1 .

Let η i k = z i k − 〈 χ k , X i 〉 , where χ k = ∑ j = 1 ∞ ( 〈 C z k X , ρ j 〉 / λ j ) ρ j . Then, η 1 k , ⋯ , η n k are i.i.d. random variables. We suppose that

E [ η 1 k | X 1 , ⋯ , X n ] = 0 , E [ η 1 k 2 | X 1 , ⋯ , X n ] = B k k ,

where B k k is the kth diagonal element of

B = E [ η 1 η ′ 1 ] = C z − ∑ j = 1 ∞ 〈 C z X , ρ j 〉 〈 C X z , ρ j 〉 λ j , (3.10)

which is assumed positive definite, and η i = ( η i 1 , ⋯ , η i d ) ′ .

With the assumptions that mentioned above, we have the following results.

Theorem 1. If the assumptions (3.1) with r = 2 , (3.3)-(3.10) hold, we have

n 1 / 2 ( β ^ − β ) → d N ( 0, α 0 1 − α 1 − ⋯ − α p B − 1 )

as n → ∞ , where “ → d ” denotes convergence in distribution.

Theorem 2. Under the assumptions (3.1) with r = 1 , (3.3)-(3.10), one has

‖ γ ^ − γ ‖ 2 = O p ( n − ( 2 b − 1 ) / ( a + 2 b ) ) .

Theorem 3. Under the conditions of { ε i } and the assumptions (3.1) with r = 2 , (3.3)-(3.10), we have

n 1 / 2 ( α ^ − α ) → d N ( 0 , 1 4 D 1 − 1 P D 1 − 1 )

as n → ∞ , where

P = E ( 1 ε p 2 ε p − 1 2 ⋯ ε 1 2 ε p 2 ε p 4 ε p 2 ε p − 1 2 ⋯ ε p 2 ε 1 2 ⋮ ⋮ ⋮ ⋮ ε 1 2 ε 1 2 ε p 2 ε 1 2 ε p − 1 2 ⋯ ε 1 4 ) ,

D 1 = lim n → ∞ n − 1 ∑ i = p + 1 n f i ( ξ i ) v i v ′ i ,

with v i = ( 1, ε p + i − 1 2 , ε p + i − 2 2 , ⋯ , ε i 2 ) ′ .

Remark 1. Compared with [

Remark 2. To implement the proposed method, we need to know how to choose the cut-off point m. Theoretically, if m is too large, the number of parameters in model (2.2) is also too large and the estimate of the slope function γ may goes terrible by the properties of Functional Principal Component Analysis (FPCA); if m is taken as a small value, the approximation of model (2.2) to model (2.1) may not be enough. This is the role that condition (3.7) plays. There are well-established methods for choosing such tuning parameter m, such as Generalized Cross-Validation (GCV), AIC, BIC and FPCA. As we all know, the first three criteria are data-driven and the FPCA is based on the ratio of variance explained by the first m eigenvalues to the total variation of X. In section 4, GCV and FPCA are respectively considered.

Remark 3. In order to make inference for α , the estimation of the asymptotic variance, mainly involving the estimation of P and f i ( ξ i ) , is needed to be given. Based on (A.8) in the Appendix, it is reasonable to use n − 1 ∑ i = p + 1 n v ^ i v ^ ′ i as the estimate of P with v ^ i = ( 1, ε ^ p + i − 1 2 , ε ^ p + i − 2 2 , ⋯ , ε ^ i 2 ) ′ . For f i ( ξ i ) , the sparsity estimation methods or the kernel density estimation ideas, suggested by [

In this section, simulations are carried out to show the finite sample performance of the proposed method. The data is generated from the model (1.1) in the case where z i 1 and z i 2 are standard normal,

X ( t ) = ∑ j = 1 200 U j ρ j ( t ) , t ∈ [ 0 , 1 ] ,

where the U_{j}s are distributed as independent normal with mean 0 and variance λ j = ( ( j − 0.5 ) π ) − 2 respectively, ρ j ( t ) = 2 sin ( ( j − 0.5 ) π t ) and

Y i = z i 1 β 1 + z i 2 β 2 + ∫ 0 1 γ ( t ) X i ( t ) d t + ε i

with β = ( 2 , − 1 ) ′ , γ ( t ) = 2 sin ( π t / 2 ) + 3 2 sin ( 3π t / 2 ) . For the random error,

we take the following form: ε i = e i h i 1 / 2 , h i = α 0 + α 1 ε i − 1 2 + α 2 ε i − 2 2 , e i ~ i . i . d . t ( 5 ) ,

where α 0 takes value 0.1, α 1 takes value from 0.1, 0.3 and α 2 takes value from 0.3, 0.1 correspondingly. Note that t ( 5 ) has finite 4^{th} order moment, and

the condition (3.1) is satisfied by α = ( 0.1 , 0.1 , 0.3 ) ′ and α = ( 0.1 , 0.3 , 0.1 ) ′

with r = 1 , where it may be shown that both β ^ and α ^ given by (2.3) and

(2.4) are consistent. For α = ( 0.1 , 0.3 , 0.3 ) ′ with r = 1 , ρ ( Σ r ) = 1 . That is, it is on the boundary of the condition region.

We also consider the situation that α 1 and α 2 take values 0 to compare with the independent structure. For each α , we simulate 1000 random samples, each with sample size n = 100 , 300 , 500 respectively. For the determination of

m by FPCA, m = min { k : ∑ i = 1 k λ ^ i / ∑ i = 1 n λ ^ i ≥ 0.85 } is used. The accuracy of the

slope function estimate is checked by the mean integrated square error (MISE) which is defined as

MISE = 1 1000 ∑ i = 1 1000 [ 1 N ∑ s = 1 N ( γ ^ i ( t s ) − γ ( t s ) ) 2 ] ,

where γ ^ i ( ⋅ ) is the estimate of the slope function γ ( ⋅ ) obtained from the i-th replication, and t s , s = 1 , ⋯ , N are the equally spaced grid points at which the function γ ^ i ( t ) is evaluated. In our implementation, N = 100 is used. In this section, the results of the estimators of α using the Least Square (LS) method is also carried out to compare with the Least Absolute Deviation (LAD) method which is proposed by this paper. The results are summarized into Tables 1-3 and the shape of the true function γ and the estimated function γ ^ , based on the average of 1000 replications with α = ( 0.1 , 0.1 , 0.3 ) ′ are depicted in

n | ( α 1 α 2 ) | β 1 | β 2 | γ | α 0 | α 1 | α 2 |
---|---|---|---|---|---|---|---|

100 | (0.0 0.0) | 0.0021 | 0.0022 | 0.4414 | 0.0029 | 0.0021 | 0.0019 |

(0.1 0.3) | 0.0056 | 0.0060 | 0.9230 | 0.0024 | 0.0093 | 0.0458 | |

(0.3 0.1) | 0.0053 | 0.0068 | 0.9593 | 0.0028 | 0.0466 | 0.0086 | |

(0.3 0.3) | 0.0163 | 0.0178 | 1.7093 | 0.0084 | 0.0477 | 0.0482 | |

300 | (0.0 0.0) | 0.0006 | 0.0006 | 0.1400 | 0.0026 | 0.0006 | 0.0004 |

(0.1 0.3) | 0.0017 | 0.0019 | 0.3676 | 0.0021 | 0.0052 | 0.0364 | |

(0.3 0.1) | 0.0017 | 0.0017 | 0.3203 | 0.0021 | 0.0353 | 0.0060 | |

(0.3 0.3) | 0.0089 | 0.0089 | 1.0242 | 0.0076 | 0.0381 | 0.0402 | |

500 | (0.0 0.0) | 0.0003 | 0.0003 | 0.0865 | 0.0025 | 0.0002 | 0.0002 |

(0.1 0.3) | 0.0009 | 0.0009 | 0.1942 | 0.0022 | 0.0041 | 0.0320 | |

(0.3 0.1) | 0.0010 | 0.0009 | 0.1924 | 0.0021 | 0.0310 | 0.0050 | |

(0.3 0.3) | 0.0039 | 0.0045 | 0.6985 | 0.0024 | 0.0352 | 0.0386 |

n | ( α 1 α 2 ) | β 1 | β 2 | γ | α 0 | α 1 | α 2 |
---|---|---|---|---|---|---|---|

100 | (0.1 0.3) | 0.0056 | 0.0059 | 0.6424 | 0.0028 | 0.0096 | 0.0464 |

(0.3 0.1) | 0.0054 | 0.0066 | 0.6453 | 0.0031 | 0.0465 | 0.0086 | |

(0.3 0.3) | 0.0164 | 0.0176 | 0.8812 | 0.0094 | 0.0480 | 0.0491 | |

300 | (0.1 0.3) | 0.0017 | 0.0019 | 0.0910 | 0.0022 | 0.0053 | 0.0362 |

(0.3 0.1) | 0.0017 | 0.0017 | 0.0911 | 0.0022 | 0.0351 | 0.0061 | |

(0.3 0.3) | 0.0059 | 0.0084 | 0.2079 | 0.0021 | 0.0382 | 0.0416 | |

500 | (0.1 0.3) | 0.0009 | 0.0009 | 0.0581 | 0.0022 | 0.0041 | 0.0318 |

(0.3 0.1) | 0.0010 | 0.0009 | 0.0573 | 0.0021 | 0.0311 | 0.0050 | |

(0.3 0.3) | 0.0030 | 0.0030 | 0.1284 | 0.0020 | 0.0357 | 0.0399 |

n | ( α 1 α 2 ) | β 1 | β 2 | γ | α 0 | α 1 | α 2 |
---|---|---|---|---|---|---|---|

100 | (0.1 0.3) | 0.0056 | 0.0060 | 0.9230 | 0.1559 | 0.0203 | 0.0411 |

(0.3 0.1) | 0.0053 | 0.0068 | 0.9593 | 0.1286 | 0.0412 | 0.0252 | |

(0.3 0.3) | 0.0163 | 0.0178 | 1.7093 | 4.1482 | 0.0430 | 0.0541 | |

300 | (0.1 0.3) | 0.0017 | 0.0019 | 0.3676 | 0.1011 | 0.0122 | 0.0258 |

(0.3 0.1) | 0.0017 | 0.0017 | 0.3203 | 0.0619 | 0.0271 | 0.0206 | |

(0.3 0.3) | 0.0089 | 0.0089 | 1.0242 | 25.1315 | 0.0318 | 0.0690 | |

500 | (0.1 0.3) | 0.0009 | 0.0009 | 0.1942 | 0.1215 | 0.0123 | 0.0230 |

(0.3 0.1) | 0.0010 | 0.0009 | 0.1924 | 0.1684 | 0.0259 | 0.0251 | |

(0.3 0.3) | 0.0039 | 0.0045 | 0.6985 | 18.6453 | 0.0305 | 0.0525 |

n | β 1 | β 2 | γ | α 0 | α 1 | α 2 | |
---|---|---|---|---|---|---|---|

100 | LAD | 0.0030 | 0.0028 | 0.6190 | 0.0027 | 0.0455 | 0.0473 |

LS | 0.0030 | 0.0028 | 0.6190 | 0.0049 | 0.0379 | 0.0472 | |

300 | LAD | 0.0009 | 0.0009 | 0.1982 | 0.0028 | 0.0373 | 0.0387 |

LS | 0.0009 | 0.0009 | 0.1982 | 0.0023 | 0.0201 | 0.0261 | |

500 | LAD | 0.0006 | 0.0005 | 0.1088 | 0.0029 | 0.0341 | 0.0345 |

LS | 0.0006 | 0.0005 | 0.1088 | 0.0022 | 0.0171 | 0.0220 |

We also would like to know that how will the LAD method behave when the error of the ARCH sequence is not heavy tailed, such as e ~ N ( 0,1 ) . The simulation results are summarized into

We can derive the following conclusions from Tables 1-4.

1) From

2) For every fixed sample size n, it can be seen the larger value of coefficients α , the larger the corresponding MSE for the different coefficients form of errors. For example, when n = 100 , α take values ( 0.1,0.1,0.3 ) and ( 0.1,0.3,0.1 ) respectively, the MSE of α 1 = 0.3 is larger than the MSE of α 1 = 0.1 and so is α 2 . Moreover, the MSE of α ^ and MISE of γ ^ become large when the coefficients α 1 , α 2 take relative large values simultaneously, such as ( α 1 , α 2 ) = ( 0.3 , 0.3 ) in

3) From

4) The MSE of the coefficients α ^ , in

5)

6) As

Based on simulation results from

From

From the above observation, we see that the estimator (2.4) performs well, even under the boundary condition. It may be theoretically interesting to know the performance of the estimator in this case, but it is beyond our focus here.

In this section, we apply the proposed method to deal with a real dataset. The data consist of monthly electricity consumption, denote by C, consumed by

commercial sectors from January 1972 to January 2005 (397 months) and their annual average retail price P (33 years). A main goal of this study is to consider the effect of dependence structure of the error on the asymptotic variance of β ^ , when using the price and consumption to predict the consumption 6 months later.

According to the stationary test of the electricity consumption data, the heteroscedasticity and linear trend can be found and then may be eliminated by differencing the ln data. Corresponding to the general notation introduced in model (1.1), let

D j = ln C j − ln C j − 1 , j = 1 , 2 , ⋯ , 397 ,

X i = { D 12 ( i − 1 ) + t , t ∈ [ 1 , 12 ] } , i = 1 , 2 , ⋯ , 32.

The response variable is

Y i = D 12 i + 6 , i = 1 , 2 , ⋯ , 32 ,

and the additional real variable is defined by

z i = P i , i = 1 , 2 , ⋯ , 32.

Regress Y on Z and X with m chosen by the FPCA with threshold 0.85, then the residuals are obtained. Although it seems reasonable to treat the residuals as white noise sequence, the characteristics of volatility clustering may exists according to

of β ^ is 0.01, which is reduced by 94% comparing with the value 0.18, which is given under ignoring concrete form of the error, showing it is promising to consider the ARCH structure.

In this paper, the estimation of partial functional linear models with ARCH(p) errors using the LS method, as well as the parameters of ARCH(p) sequence using the LAD method are respectively considered. Considering that the dimensionality of the slope function is infinite, for this paper, the key point we have given consists in transforming the partial functional linear models with ARCH errors into the corresponding linear regression models by the K-L expansion and the idea of FPCA. The linear relationship between z and X is essentially assumed (see Remark 1 in [

This work is supported by NSFC No. 11771032, No.11571340 and the Science and Technology Project of Beijing Municipal Education Commission No. KM201710005032.

Wang, Y.F., Xie, T.F. and Zhang, Z.Z. (2018) Partial Functional Linear Models with ARCH Errors. Open Journal of Statistics, 8, 345-361. https://doi.org/10.4236/ojs.2018.82023

We will state the proofs of the theorems given in Section 3. Firstly, some lemmas will be given.

Lemma A.1. ( [

Lemma A.2. ( [

Lemma A.3. Consider { ε i : i ≥ 1 } forms an ARCH(p) process. Besides, (3.1) holds, then

n − 1 ∑ i = 1 n ε i 2 r → E [ ε i 2 r ] a . s .

for the integer r in condition (3.1); furthermore, if r ≥ 2 , then

n − 1 ∑ i = 1 n ε i 2 ε i − j 2 → E [ ε i 2 ε i − j 2 ] a .s ..

Proof. From Lemma A.1 and the representation (3.2), it follows that { ε t } and { ε t 2 } are strictly stationary ergodic sequences. Combining with Lemma A.2, the results follow immediately from the ergodic theorem ( [

Lemma A.4. If ε is independent of X and (3.1)-(3.2) hold, one has

‖ n − 1 ∑ i = 1 n X i ε i ‖ = O p ( n − 1 2 ) .

Proof. By simple calculation, the conclusion can be easily derived under the fact E [ ε i 2 ] = α 0 / ( 1 − α 1 − ⋯ − α p ) .,

Proof of Theorem 1. Let Φ ^ k ( x ) = ∑ j = 1 m ( 〈 C ^ z k X , ρ ^ j 〉 / λ ^ j ) 〈 ρ ^ j , x 〉 and Φ k ( x ) = ∑ j = 1 ∞ ( 〈 C z k X , ρ j 〉 / λ j ) 〈 ρ j , x 〉 with x ∈ L 2 ( [ 0,1 ] ) . Set ‖ A ‖ ∞ = max i ∑ j | A i j | and ‖ A ‖ = ∑ i = 1 d ∑ j = 1 d | A i j | for A = ( A i j ) ∈ R d × d . Observe that

n 1 / 2 ( β ^ − β ) = B ^ − 1 n 1 / 2 { 1 n ∑ i = 1 n ( z i − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 X i , ρ ^ j 〉 λ ^ j ) ( 〈 γ , X i 〉 + ε i ) } = B ^ − 1 n − 1 / 2 { ∑ i = 1 n ( z i − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 X i , ρ ^ j 〉 λ ^ j ) 〈 γ , X i 〉 + ∑ i = 1 n ( ∑ j = 1 ∞ 〈 C z X , ρ j 〉 〈 X i , ρ j 〉 λ j − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 X i , ρ ^ j 〉 λ ^ j ) ε i + ∑ i = 1 n ( z i − ∑ j = 1 ∞ 〈 C z X , ρ j 〉 〈 X i , ρ j 〉 λ j ) ε i }

with B ^ = C ^ z − { Φ ^ k ( C ^ z m X ) } k , m = 1, ⋯ , d .

According to Lemma A.4, similar to [

‖ B ^ − B ‖ ∞ = O p ( n − ( 2 b − 1 ) / ( a + 2 b ) ) ,

n − 1 / 2 ∑ i = 1 n ( z i − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 X i , ρ ^ j 〉 λ ^ j ) 〈 γ , X i 〉 = o p ( 1 ) ,

n − 1 / 2 ∑ i = 1 n ( ∑ j = 1 ∞ 〈 C z X , ρ j 〉 〈 X i , ρ j 〉 λ j − ∑ j = 1 m 〈 C ^ z X , ρ ^ j 〉 〈 X i , ρ ^ j 〉 λ ^ j ) ε i = o p ( 1 ) .

Now, we consider the term n − 1 / 2 ∑ i = 1 n ( z i − ∑ j = 1 ∞ 〈 C z X , ρ j 〉 〈 X i , ρ j 〉 λ j ) ε i : = n − 1 / 2 ∑ i = 1 n η i ε i . We will show

n − 1 / 2 ∑ i = 1 n η i ε i → d N ( 0, α 0 1 − α 1 − ⋯ − α p B ) . (A.1)

Let P j − 1 ( ⋅ ) = E [ ⋅ | F j − 1 ] and ξ i = n − 1 / 2 η i ε i , then { ξ i } forms a martingale difference series due to the fact that ξ i is F i − 1 -measurable and P i − 1 ( ξ i ) = 0 . Let u i denote the conditional variances of ξ i , then, for i = 1 , ⋯ , n ,

u i = P i − 1 ( ξ i ξ ′ i ) = n − 1 P i − 1 ( η i η ′ i ε i 2 ) = n − 1 E ( η i η ′ i ) P i − 1 ( ε i 2 ) = n − 1 B h i .

Therefore,

∑ i u i = n − 1 ∑ i B h i → p α 0 1 − α 1 − ⋯ − α p B ,

according to the law of large numbers ( [

∑ j P j − 1 ( ξ j ξ ′ j { ‖ ξ j ‖ > δ } ) = n − 1 ∑ j P j − 1 ( η j η ′ j ε j 2 { ‖ η j ε j ‖ > n 1 / 2 δ } ) ≤ n − 1 ∑ j P j − 1 ( η j η ′ j ε j 2 [ { ‖ η j ‖ 2 > n 1 / 2 δ } ∪ { ε j 2 > n 1 / 2 δ } ] ) = n − 1 ∑ j P j − 1 ( ε j 2 ) E ( η 1 η ′ 1 { ‖ η 1 ‖ 2 > n 1 / 2 δ } ) + n − 1 ∑ j P j − 1 ( ε j 2 { ε j 2 > n 1 / 2 δ } ) E ( η 1 η ′ 1 ) .

For the first term, it converges to zero because ‖ η j η ′ j ‖ is uniformly integrable. In view of the integrability of ε i 2 , the second term also converges to zero in probability. Using the martingale difference central limit theorem (CLT) ( [

Proof of Theorem 2. With Lemma A.4, the technics in the proof of Theorem 3.2 of [

Proof of Theorem 3. Firstly, we consider the following two equalities:

n − 1 ∑ i = p + 1 n ε ^ i 2 = n − 1 ∑ i = p + 1 n ε i 2 + o p ( n − 1 2 ) , (A.2)

n − 1 ∑ i = p + 1 n ε ^ i 2 ε ^ i − j 2 = n − 1 ∑ i = p + 1 n ε i 2 ε i − j 2 + o p ( n − 1 2 ) . (A.3)

From Theorem 1 and Theorem 2, we learn that

∑ j = 1 m ( γ ^ j − γ j ) 2 = ‖ γ ˜ ^ − γ ˜ ‖ 2 = O p ( n − ( 2 b − 1 ) / ( a + 2 b ) ) , (A.4)

‖ β ^ − β ‖ = O p ( n − 1 / 2 ) . (A.5)

Under the conditions (3.5)-(3.7) and X ∈ L 2 ( T ) , one has

∑ j = m + 1 ∞ γ j 2 〈 X i , ρ j 〉 2 ≤ C ∑ j = m + 1 ∞ j − 2 b 〈 X i , ρ j 〉 2 = O p ( n − ( 2 b − 1 ) / ( a + 2 b ) ) . (A.6)

In addition, according to (3.3) and λ 1 > λ 2 > ⋯ > λ m , the relation

lim sup n → ∞ n E ‖ ρ ^ j − ρ j ‖ 2 < ∞ (A.7)

holds, see ( [

ε ^ i = Y i − β ^ ′ z i − ∑ j = 1 m γ ^ j U ^ i j = β ′ z i + ∑ j = 1 ∞ γ j U i j + ε i − β ^ ′ z i − ∑ j = 1 m γ ^ j U ^ i j = ε i − ( β ^ − β ) ′ z i − ∑ j = 1 m ( γ ^ j − γ j ) 〈 X i , ρ ^ j 〉 − ∑ j = 1 m γ j 〈 X i , ρ ^ j − ρ j 〉 + ∑ j = m + 1 ∞ γ j 〈 X i , ρ j 〉

by (2.1). Combining this equality with (A.4)-(A.7), (A.2) and (A.3) can be proved.

Now we turn to consider the asymptotic form of α ^ . By Lemma A.3, we can conclude

( n − 1 P ^ ′ P ^ ) − 1 → P − 1 a . s . as n → ∞ , (A.8)

where

P ^ = ( 1 ε ^ p 2 ε ^ p − 1 2 ⋯ ε ^ 1 2 1 ε ^ p + 1 2 ε ^ p 2 ⋯ ε ^ 2 2 ⋮ ⋮ ⋮ ⋮ 1 ε ^ n − 1 2 ε ^ n − 2 2 ⋯ ε ^ n − p 2 ) .

Combine (A.2), (A.3), (A.8) and the assumptions about the densities of { ε i 2 } , the results of Theorem 3 holds by the Theorem 4.1 of [