Estimation for Nonnegative First-Order Autoregressive Processes with an Unknown Location Parameter

Consider a first-order autoregressive processes 1 t t t X X Z     , where the innovations are nonnegative random variables with regular variation at both the right endpoint infinity and the unknown left endpoint θ. We propose estimates for the autocorrelation parameter  and the unknown location parameter θ by taking the ratio of two sample values chosen with respect to an extreme value criteria for  and by taking the minimum of 1 ˆ t n t X X    over the observed series, where n̂  represents our estimate for . The joint limit distribution of the proposed estimators is derived using point process techniques. A simulation study is provided to examine the small sample size behavior of these estimates.


Introduction
In many applications, the desire to model the phenomena under study by non-negative dependent processes has increased.An excellent presentation of the classical theory concerning these models can be found, for example, in Brockwell and Davis [1].Recently, advancements in such models have shifted focus to some specialized features of the model, e.g.heavy tail innovations or nonnegativity of the model.In this paper we examine the behavior of traditional estimates under conditions leading to non-Gaussian limits.For example, the standard approach to parameter estimation within the AR(1) process is through the Yule-Walker estimator; where .
A slightly different approach presented in Mathew and McCormick [2] used linear programming to obtain estimates for  and  under certain optimization constraints.While there are many established methods to estimate the autocorrelation coefficient in an AR(1) model, there are just a few approaches on estimating the unknown location parameter in an AR(1) model.One, was mentioned in McCormick and Mathew [2] where they considered 1 range range range 1 1 ˆˆ, where , t  and j  provides the index of the maximal and minimal i X respectively for 1 .i n   In this paper we examine estimation questions and asymptotic properties of alternative estimates for  and  respectively, relating to the model 1 , 1, where 0 1 , 0 and   t Z is an i.i.d.sequence of nonnegative random variables whose innovation distribution F is assumed to be regularly varying at infinity with index   and regularly varying at  with index  , where  , denotes the unknown but positive left endpoint.As a result of not restricting the innovations to be bounded on a finite range, we can first estimate the autoregressive parameter  through regular variation at infinity and then estimate the positive but unknown location parameter through regular variation at  , the left endpoint.
While we have mentioned a few established estimation procedures, one notable exception was that of maximum likelihood.Although typically intractable and intricate in the time series setting, when the innovations in the AR (1) model are exponential, the maximum likelihood proce-dure had a major contribution on the estimation of positive heavy tailed time series.With these considerations in mind, Raftery [3] determined the limiting distribution of the maximum likelihood estimate for the autocorrelation coefficient  .As a result, the estimator was considered.The realization of this estimator was the stepping stone for the work done in this paper along with Davis and McCormick [4] which first considered this alternative estimator and used a point process approach to obtain the asymptotic distribution of the natural estimator ˆn  .This was done in the context that the in- novations distribution F varies regularly at 0, the left endpoint, and satisfy some moment condition.
The work presented in this paper is an extension of the work done in Davis and McCormick [4] including the following contributions to dependent time series with heavy-tail innovations.The first contribution involves the development of estimates for the autocorrelation coefficient and unknown location parameter under regular variation at both endpoints, with a rate of convergence , where  is slowly varying function.The second contribution involves using an extreme value method, e.g. point processes to establish the asymptotic distribution of the proposed estimators and weak convergence for the asymptotically independent joint distribution.An initial observation is that our estimation procedure is especially easy to implement for both  and  .That is, the autoregressive coefficient  in the causal AR (1) process is estimated by taking the minimum of the ratio of two sample values while estimation for the unknown location parameter  was achieved through minimizing This naturally motivates a comparison between the estimation procedure presented in this paper and the standard linear programming estimates mentioned above, since within a nonnegative AR(1) model the linear programming estimate reduces to the estimate proposed, namely, denotes the AR(1) process.This comparison along with the comparison between Mathew and McCormick's [2] optimization method and Bartlett and McCormick [5] extreme value method was performed through simulation and is presented in Section 3. The results found appear to demonstrate a favorable performance for our extreme value method over the 3 alternative estimators.
The main proofs in this paper rely heavily on point process methods from extreme value theory.The essential idea is to first establish the convergence of a sequence of point processes based on simple quantities and then apply the continuous mapping theorem to obtain convergence of the desired statistics.More background information on point processes, regular variation, and weak convergence can be found in Resnick [6].Also, a nice survey on linear programming estimation procedures and nonnegative time series can be found in Anděl [7], Anděl [8], and Datta and McCormick [9], whereas more applications on modeling the phenomena with heavy tailed distributions and ensuing estimation issues can be found in Resnick [10].
The rest of the paper is organized as follows: asymptotic limit results for the autocorrelation parameter  , unknown location parameter  , and joint distribution of   ,   are presented in Section 2, while Section 3 is concerned with the small sample size behavior of these estimates through simulation.

Asymptotics
The following point process limit result is fundamental.Since the result makes no use of an ARMA structure, we present it for more general linear models subject to usual summability conditions on the coefficients.In that regard for this result, we assume that is the stationary linear process given by Furthermore for this result we may relax our assumptions on the innovation distribution and we require that 1 Z has a regularly varying tail distribution, i.e.,

 
  0 for a slowly varying function and the innovation distribution is tail balanced and let  has Radon-Nikodym derivative with respect to Lebesgue measure which is close in statement and spirit to Theorem 2.4 in Davis and Resnick [11].In view of the commonality of the two results, we present only the needed changes to the Davis and Resnick proof to accommodate the current setting.Aside from keeping track of the time when points occur, i.e. large jumps, the difference in the point processes considered here with those in Davis and Resnick [11] is the inclusion of marks, i.e. the second component of the point .This complication induces an  additional weak dependence in the points which is addressed in Lemma 2.2 through a straight forward blocking argument.First, we establish weak convergence of marked point processes of a normalized vector of innovations.For a positive integer m define and point process Let denote the standard basis vectors for .Define an associated marked point process with the first component placed on an axis by In the following Lemma, we show that and are asymptotically indistinguishable in the following sense.Let Proof.Following the proof presented in Proposition 2.1 of Davis and Resnick [11], suppose that B     i is such that for some 1  , i As noted in Davis and Resnick [11] for all , one has .Observe that , , Similarly let and .Then where Then the result follows as in Davis and Resnick [11], Proposition 2.1, completing the proof.□ Lemma 2.2.Let and be the point processes on the space defined by and where and is independent of Proof.We employ a blocking argument to establish this result.Let be a sequence of integers such that and .
To complete the proof we first show that for all sets of the form given in (2.6) that The above limit result follows from the easily verifiable relations: ; ; (2.11) and Indeed, in view of (2.5) and (2.12), (2.7) is equivalent to showing and the above relation holds by (2.8), (2.10), and (2.11), viz.
Therefore the result is seen to hold by (2.7) and (2.14) by application of Theorem 4.7 in Kallenberg [12].□ Lemma 2.3.Let and be point processes , 1and Proof.We begin by applying the argument used in Theorem 2.2 of Davis and Resnick [11] with the modification that the relevant composition of maps of point processes is given by , , , ,  with each space being equipped with the topology of vague convergence.Therefore by the continuous mapping theorem and Lemma 2.2 we obtain .
Finally we complete the proof by Lemma 2.1 and (2.15) arguing as in Davis and Resnick [11].
We are now ready to present our fundamental result.Theorem 2.1.Let n  and  be the point processes on the space Remark.Apart from considering a time coordinate and restricting the process to an AR(1) process, the above Theorem 2.1 and Theorem 3.1 in Mathew and Mc-Cormick [2] consider essentially the same point process limit result.However, their result gave a wrong limit point process.This error is corrected in the current paper.
Proof.Observe that the map induces a continuous map on point processes given by .
Thus we obtain from Lemma 2.3 that .
The result now follows from (2.16) by the same argument in Davis and Resnick [11] to finish their Theorem 2.4.
Returning to the AR(1) model under discussion in this paper and the estimate ˆn  given in (1.2), we obtain the following asymptotic limit result.Theorem 2.2.Let be the stationary solution to the AR(1) recursion where Then note that for the point processes Applying Theorem 2.1 in the case of an AR(1) process so that , we have , 0 x Q is a bounded continuity set with respect to the limit point process  so that Therefore since For the second statement observe The result for the second statement now follows from (2.18) and the first part of the lemma.Finally, the identification of the limit distribution is well known.□ A useful observation follows from this lemma which we state as a corollary.
Corollary 2.2.Under the assumptions of Lemma 2.4 for any , 0 .The next lemma will provide another useful simplification-this time on ˆn  .

For a positive integer define m
Let and be defined as and .
Proof.We first note for any positive M that and 0, ,and , and where     , : 0, ,and where and we have for large n that Next, note that from the limit law for the maximum obtained above, by replacing with X  and by taking reciprocals, we derive the limit law for minimum, , where is a positive integer greater than .Furthermore, let q m 0 1, , .
Now we define the events We begin by showing that the events i   are negligible.
Thus for some constant and any , c  kn establishing the lemma.Define events i A and by .
The following result provides the asymptotic behavior of the probability of these events.
Lemma 2.7.For any , we have as 0, 0 Proof.Since the events i A are independent, we have Using Lemma 2.6 we have that .

∧
Hence using this limit law on The conclusion of this lemma provides that for all and , there is a constant dependent on no parameters for which the inequality stated there holds.
k n Proof.To calculate the intersection we define the following sets , : , and 1 , , : , and 1 , It then follows from (2.20), (2.21), and independence that Therefore, for some constant c In order to handle set 1 K , observe from construction of the blocks i J and set 1 K that if then the events and are independent.Thus, if we define  as an independent copy of y where and where we used Lemma 2.7 in the last step.Thus, we have that for some constant which completes the proof in view of Lemma 2.7.□ Lemma 2.9.For any , 0, 0 .
Thus by Lemma 2.9 we obtain Letting m tend to infinity in the abov m Lemm e and then tend to btain fro a 2.5 and The theorem now follows from this and Corollary 2.2.

Simulation Study
In this section we assess the reliability of our extreme value estimation method through a simulation study.This included a comparison between our estimation procedure and that of three alternative estima ion procedures for both the autocorrelation coefficient  and the unknown location parameter  under two different innovation ˆˆ, , where t  and j  provides the index of the maximal and minimal i X respectively for 1 , and i n    For this innovation distributio and d be nonnegative constants such that 1 c d   , then this distribution is regularly varying at both endpoints with index of regular v iation c   at infinity and index of regular variation  at  .For this si l mu udy two distributions w onsi on st ere c dered: . The means and standard deviations (written below in parentheses), of these estimates are reported in Table 1 along with the average F is a Pareto distribution with a regular varying tail distri- , was set to 0.9.
The means and standard deviations (written below in parentheses), of these estimates are reported in Table 2 along with the average length for a 95 percent empirical confidence intervals.For convenience, the empirical dis- Remark.In the case that 0 1  be fairly compared, whereas only when    is our estimator applicable.Now observe for the selected  values being considered, Table 1 shows that our estimator performs at least as well as the three other alternative estimators.This is particularly true under the heavier tail models, i.e. when 0 2    .In this regime our estimate shows little bias and the average lengths of the confidence intervals are smaller than the other three estimates, sometimes by a wide margin.In particular, when 0.8

 
and n = 1000 the confidence interval average length for our method is 3.13, 5.78 and times smaller than the three alternative estimators respectively.This is in part due to the use of one-sided confidence intervals since will always perform slightly better than the max  estimator, Bartlett's and McCormick [5] estimator max  main advantage lies with its versatility to perform well for various nonnegative time series, including but not restricted to higher order autoregressive models, along with ARMA models.This is particularly true when comparing average confidence interval lengths.Although all three estimators min  , range , and   .Nonetheless for small sample sizes our simulation study favors range over the other three estimators.The difficulty for a least square estimate is that a small negative bias for the estimate of the autocorrelation parameter   gives rise to a much larger positive bias in the estimate of 2 e  .While the affect is not as great, the positive bias found in our estimator min  and the others for  has a significant effect on the estimate for  . of the true value.Good performance with respect to this measure is reflected in curves near to 1.0 with diminishing good behavior as curves approach 0.0.When 0 1    , the figures seem to show that our estimator compared to the other three  .Notice that the convergence rate of the empirical probability to the theoretical probability is extremely slow.This is not surp i g since on average our estimate falls more than 0.1 r sin from the true value when 0.8   .The lower right plot in Figure 5 displays the asymptotic performance when  the innovation distribution F has regularly varying right tail with index   and finite positive left endpoint  , 2 allows a simplification in determining the joint asymptotic behavior of  ˆ, n n   by allowing us to replace ˆn  with 1 min t n t Z   and Resnick least square estimator is more efficient than all three extreme value estimators.While our estimator min  true value of the parameter  as tends to infinity respectively, in this setting they may not compete asymptotically with, say, a conditional least square

Figures 1 - 4
Figures 1-4 show a comparison be tween the probability that estimators min  , max  , , and range  ˆLS  are within 0.01 of the true autocorrelation parameter value, respectively.With a sample size of 500, these figures plotted the sample fraction of estimates which fell within a bound of 0.01  
the result follows.
and is non-increasing.Let us now define our estimate of  :

Table 2
reveals that our estimator for  generally performs better than the three alternative estimators for 0.6, 1.6, 2.6