Influential Observations in Stochastic Model of Divisia Index Numbers with AR ( 1 ) Errors

We use the general form of hat matrix and DFBETA measures to detect the influential observations in order to estimate the Divisia price index number when the error structure is first order serial correlation. An example is presented with reference to price data of Pakistan. Hat values show the noteworthy findings that the corresponding weights of consumer items have large influence on the parameter estimates and are not affected by the parameter of autoregressive process AR(1). Whereas DFBETAs for Divisia index numbers depend on both the weights and autoregressive parameter.


Introduction
A number of studies are available on the detection of leverages and influential observations in a regression model when the errors are assumed to be serially correlated with AR(1) and AR(2) processes.For instance, initially, Prais and Winsten [1], Kadiyala [2], Girilches and Rao [3], Maeshiro [4], and Park and Mitchell [5] have observed the significant effect of the first observation on the parameter estimates of regression model.Influence diagnostics are developed and studied by many authors, including Belsley et al. [6], Cook ([7] [8]), Cook and Weisberg [9], Draper and John [10], and Draper and Smith [11].They examined the effect of individual observation or a set of observations on the estimation of model parameters.The problem of influence diagnostic checking for the linear regression model with autocorrelated errors has, now, received much attention of the researchers and statisticians.Puterman [12] observed that the first transformed observation in linear regression model can have a large impact on the parameter estimates.Some authors agreed that the effect of not including the first observation is not always magnificent as suggested by Cochrane and Orcutt [13].Stemann and Trenkler [14] extended the approach of Puterman [12] to the general linear model with the first order autoregressive errors and showed the effect of the presence of a constant term on a leverage point when the correlation of the error term was large in absolute value.Barry et al. [15] extended the study of influential observations to the regression model with AR(2) errors and developed the diagnostic techniques using a hat matrix.
Much literature is on hand on the influential observations diagnostic for the regression models with continuous type of regressors.There is a need to have some techniques for finding the influential observations when the dilemma of constructing the index number is concerned.The objective of this paper is to use the analytical tools of hat matrix and DFBETA measures to identify the influential observations in estimating the Divisia price index number.
The paper is organized as follows.Section 2 introduces the Divisia index number model and the role of the initial observation in estimation of model is discussed.The relevant concepts on influence diagnostics for the underlying model are presented in Section 3.An application with reference to Pakistan price data is illustrated in Section 4 and lastly, Section 5 recapitulates the results.

Regression Model
The well-known Divisia index number is formulated by the following model where This yields the error structure of model (1) as Defining more compactly in matrix notation as follows, The inverse of variance-covariance matrix can easily be decomposed using cholesky decomposition and can be written as where Q is a lower triangular matrix of order (nT × nT), defined as The model ( 1) is written in matrix form It is well known that under the above assumption, the best linear unbiased estimator (BLUE) of β in model (1) could be obtained by the generalized least square (GLS) approach as given below ( ) ( ) There are different approaches to estimate the parameter vector β , in case of unknown autoregressive para- meter φ (see Judge et al. [16]).The value of φ is first estimated from the data using any of the number of suggested alternatives given in Gugarati [17].The transformation of the vector P D and the design matrix X to the new vector P  where and O ι are the vectors of one and zero respectively i.e.
[ ] We can now apply the simple ordinary least square (OLS) estimator to the transformed data to obtain estimated generalized least square (EGLS) and we have Substituting the results in Equation ( 4) provides the estimator of β , the familiar Divisia index numbers, written as

Influence and Hat Matrix
Several measures and plots have been developed to detect the influential observations in linear regression.Hat matrix is one of the common quantity that is used in detecting the influential points when the OLS procedure for estimation of regression parameter is used.The quantity is The hat matrix for the transformed data * X is written as The diagonal elements of the hat matrix, denoted by i h or ii h , are used as diagnostic technique for measur- ing the influence of a specific observation i on regression parameter estimates.The entries of matrix depend only on the values of design matrix X, and thus they serve as a measure of the distance of an observation from the centre of data.The large diagonal values indicate potentially large impact of the corresponding observation on regression estimates and thus considered an influential point if it satisfies the criteria that have the cut off points i.e.
. We obtain the hat matrix for model (1) using the transformed matrix * X described in Section 2 as follows: for each 1, 2, , The elements of matrix clearly show that the weights of commodities determine how much the important of particular commodity is in order to find the Divisia index number, and remain same for each time period t.They are not affected by the parameter of autoregressive process.The greater the value of weight, the more influential the commodity is, irrespective of the time period, because we are assuming the same weights over the underlying time period.
Another chief role of hat matrix in finding the significant expression to assess the effect of deleting an observation on parameter estimates and predicted values.The vector DFBETA, given by Belsley et al. [6], which denotes the difference between the estimates of the vector β with and without the ith observation i.e.; ( ) ( ) where ( ) ˆi β is the estimate of β with the ith observation excluded.
Puterman [12] studied the impact of the first observation in the constant mean model and regression through the origin model with AR(1) errors.One should see the work of Stemann and Trenkler [14] on the influence technique when considering the regression model with more than one regressors in the presence and absence of constant term.On the other hand, Barry et al. [15] extended the approach of Puterman to the influence of initial observations and subset of observations in linear regression model with AR(2) errors.Our main aim is, therefore, to obtain the influential points when we are dealing with index number model.For this purpose, we use equation ( 7) to find the results of DFBETA measures as follows; where p denotes the number of parameters in vector β .It is clearly seen that the DFBETA values are affected by the autoregressive coefficient of AR(1) process.For t = 1, when φ increases to 1 in both positive and nega- tive direction, deleting the ith observation has a great impact on the parameter estimates.When 0 φ = , all the entries of DFBETA matrix become zero that might reveal having the data with no influential point.Beside this all values depend on the constant factor of ith weight embodied by the first part of expression (8).

An Application
In this section, we present an application to price data for Pakistan.The data consists of 374 consumer items classified in ten groups for the period from July 2002 to June 2011.The groups are food and beverages, apparel textile and footwear, house rent, fuel and lighting, household furniture and equipment, transportation and communication, recreation and entertainment, education, cleaning laundry and per.appea, and medicare.In the first phase, we compute the parameter vector β using formula (5) which demonstrates the Divisia index number when the same weights are used through the entire period.To estimate the value of φ , the residuals versus ob- servation numbers are plotted in Figure 1.The plot simply verifies the stationary scenario of time series with constant mean and constant variance.
The Durbin-Watson statistic is 1.4002, indicating the presence of autocorrelation in the residual series.The estimate of φ using Yule-Walker procedure is obtained as 0.299.For each time period the hat values are esti- mated by the weights of commodities and large quantities are shown in Table 1.The highest hat entry is parallel to house rent index, implying its importance in estimating the index numbers.Others leading items include wheat flour bag, milk fresh, and electric charges for the consumption of more than 1000 units.To investigate the effect of presence of items with large hat values, we plot the estimated Divisia index numbers based on all items and without the items house rent and wheat flour bag in Figure 2. The index number estimates excluding house rent are much different from the estimates based on all items, whereas the estimates without wheat flour bag is relatively close to the index numbers with all observations.Table 2 presents list of items with the significant DFBETAs exceeding the cutoff value 0.00995.The large values of i h are represented by the values with superscript * .Out of these 37 influential items, 32 are from the first commodity group food and beverages that include wheat flour, vegetable ghee, sugar, beef with bones, and mutton with the corresponding higher diagonal values of hat matrix.Sugar refined with 19 0.019467 h = has an impact on estimation index number relating to 41 months or in other words, it affects the values of 41 alphas in parameter vector β .Other influential commodity groups include house rent with the highest 167 0.234298 h = , fuel and lighting, and transportation and communication.

Conclusion
In this paper, we use the general expression of hat matrix and DFBETA measure to detect the influential observations in order to estimate the Divisia price index number when the error is generated from AR(1) process.The hat values only depend on the weights of commodities, showing that the corresponding weights of consumer items have large influence on the parameter estimates and are not affected by the parameter of autoregressive process AR(1).An example is presented with reference to price data of Pakistan.From the findings of both hat matrix and DFBETAs, food and beverages are the leading commodity group as the maximum number of items in which group has large hat values and DFBETAs measures.

Figure 2 .
Figure 2. Divisia index numbers including all items, excluding house rent and wheat flour.

Table 1 .
Significant hat values for the items exceeding cut-off value 0.005348.

Table 2 .
Significant DFBETAs corresponding to items and their hat values.