A Survey of Methods to Interpolate , Distribute and Extrapolate Time Series *

This survey provides an overview with a broad coverage of the literature on methods for temporal disaggregation and benchmarking. Dozens of methods, procedures and algorithms have been proposed in the statistical and economic literature to solve the problem of transforming a low-frequency series into a high-frequency one. This paper classifies and reviews the procedures, provides interesting discussion on the history of the methodological development in this literature and permits to identify the assets and drawbacks of each method, to comprehend the current state of art on the subject and to identify the topics in need of further development. It would be useful for readers who are interested in the techniques but are not yet familiar with the literature and also for researchers who would like to keep up with the recent developments in this area. After reading the article the reader should have a good understanding of the most important approaches, their shortcomings and advantages, and be able to make an informed judgment on which methods are most suitable for his or her purpose. Interested readers, however, will not find much detail of the methods reviewed. Due to the broadness of the subjects and the large number of studies being referenced, it is provided some general assessments on the methods revised without great detailed analysis. This review article could serve as a brief introduction to the literature on temporal disaggregation.


Introduction
Time series modeling is commonly used by private and public economic and financial analysts in order to forecast future values, analyze the properties of a series, characterize its salient features and monitoring the current status of the economy [1,2].However, despite social and economic life having quickened and becoming more turbulent, it is not unusual that some relevant variables are not available with the desired timing and frequency.The extra costs associated with collecting data more frequently, some practical limitations to obtain certain variables with a higher regularity along with delays in the process of handling and gathering more frequent records deprive analysts, more often than desired, of the valuable help that high frequency data would provide to perform closer and more accurate short-term analyses.In fact, when temporally aggregated data are used to model, study and investigate the relationship among variables, individuals, and/or entities, it is quite possible that a distorted view of parameters' values, lag structures and sea-sonal components could be reached [3] and that, as a consequence, poor models and/or forecasts could be obtained and wrong decisions taken [4].In fact, in econometric modeling, when some of the relevant series are only available at lower frequencies, obvious improvements in model selection, efficiency in parameters' estimates and prediction quality are usually obtained if previously the frequency of the low-frequency time series is increased increased [5][6][7][8][9][10][11].
It is not surprising, therefore, that great quantities of methods, procedures and algorithms have been proposed in the literature in order to increase the frequency of some critical variables, especially in the economic area where it would facilitate a lower delay and a more precise analysis of the condition that an economy or company is experiencing at a given time, making easier to anticipate changes and to react to them.
Obviously, business and economics are not the only fields where it would be useful.Other areas also use these approaches and take advantage of these techniques to improve the quality of their analysis.This paper, however, will focus on the contributions made and used within economics; since the problems faced by shortterm analysts and quarterly account producers have been the main catalysts for enlargement and improvement of most of the procedures proposed in the literature.For instance, they have been historically linked to the need of solving problems related to the production of coherent quarterly national and regional accounts see, e.g., [12] and [13].Actually, it is quite probable that some of the future fruitful developments in this subject come to solve the new challenges posed in these areas.Additionally to the methods specifically proposed to estimate quarterly and monthly accounts, another significant number of methods suggested to estimate missing observations have been also adapted to this issue.Thus, classifying the large variety of methods proposed emerges as a necessary requirement in order to perform a systematic and ordered study of the numerous alternatives suggested.Furthermore, due to differing algorithms can give rise to series with different cyclical, seasonal and stochastic properties -even though they have been derived from the same research field using the same basic information [14], a classification would be in itself a proper tool for a suitable selection of the technique.
Depending on the criteria adopted alternative classifycations could be reached.This paper will follow the basic division proposed in [15,16], which divide the temporal disaggregation problems according to the basic information available for estimation.Although other divisions have been suggested in the literature, as for instance in [17] who divide the literature according to the nature of the problems, no categorization avoids the problem of deciding where to place some procedures or methods (which could belong to different groups and whose location turns out as extremely complicated) and it is the belief of the author that the former approach clarifies and makes simpler the exposition.Furthermore, to make quicker and easier to follow the paper the mathematical expressions have been omitted using a narrative style to inform about the logic of each procedure.
Previously to introduce any categorization, the different problems that can appear when disaggregating a time series must be set up.The next section is devoted to it.In particular, the rest of the paper is organized as follows.Section 2 introduces the different types of problems faced on this framework.Section 3 exposes the criteria followed to classify the different alternatives and details the main characteristics of each group.Sections 4 to 7 present the methods and section 8 discusses and concludes.

Interpolation, Distribution and Extrapolation Problems
In general, inside this framework and depending on the kind of variable handled (either, flow or stock) two dif- In both cases, when estimates are extended out of the period covered by the low-frequency series, the problem is called extrapolation.Extrapolation is used, therefore, to forecast values of the high-frequency series when no temporal constraints from short series are available; although, nevertheless, in some cases (especially in multivariate contexts), other different forms of constraints can exist.In these cases, the matrix B is replaced by a proper design matrix [18].
Furthermore, related to distribution problems, they can be found benchmarking and balancing problems [17] which are mainly used in management and by official statistical agencies to adjust the values of a HF series of ballpark figures (usually obtained employing sample techniques) to a more accurate LF series and other temporal disaggregation problems where the temporal aggregation function is different from the sum function.Anyway, despite the great quantity of procedures for temporal disaggregation of time series being proposed in the literature, the fulfillment of the constraints derived from the observed LF series is the norm in the subject.

Criteria of Classification
As has been stated on the introduction many different criteria could be proposed to divide the great amount of proposals that can be found in this framework.A first division could arise attending to the plane from which the problem is faced, either the frequency or the temporal plane.This division, however, is not well-balanced, since the temporal perspective has been by large more popular.On the one hand, the procedures that deal with the problem from a temporal perspective will be analyzed in Sections 4, 5, and 6.On the other hand, the methods that try to solve the problem from the frequency domain will be introduced in Section 7.
Another possible criterion of division attends to the use or not of related series, usually called indicators.Economic events tend to made visible in different ways and to affect many dimensions.The economic series are therefore correlated variables that do not evolve in an isolate way.Consequently, it is not unusual that some variables available in HF could display similar fluctuations than those (expected) for the target series.Some methods try to take advantage of this fact to temporally distribute the target series.Thus, the use or not of indicators has been considered as another criterion of classification.The procedures which deal with the series in an isolated way and compute the missing data of the disaggregated HF series from the temporal plane taking into account only the information given by the objective series are presented in Section 4.
Complementary to the previous methods appear the set of procedures which exploit the economic relationships between indicators and objective series.This group, composed for an extent and varied collection of methods widely used, has an enormous success.As example, it could be cited that the use of procedures based on indicators is the rule among the governmental statistic agencies that use indirect methods to estimate quarterly and monthly accounts [16,19].These procedures are presented in Section 5.
Finally, a last group has been considered: the Kalman filter approaches.The methods that use the state space representation to estimate the non available values have been grouped in Section 6.The great flexibility that offers the representation of the temporal processes in the state space and the enormous possibilities that these representations present to properly deal with log-trans-formations and dynamic approximations justify its own section, despite all the approaches of this section being classifiable in the previous two sections.

Non-Indicators Algorithms
Different approaches and strategies have been employed without using related variables.The first methods proposed were developed without any theoretical justification for the elaboration of quarterly (or monthly) national accounts as mere instruments.They were quite mechanical and generated the HF series imposing some a priori desired conditions.Gradually, nevertheless, new methods-theoretically founded on the Autoregressive Integrated Moving Average (ARIMA) representation of the series to be disaggregated-were progressively appearing, introducing more flexibility in the process.
The design of these primary methods, nevertheless, was already in those first days influenced for the need of solving one issue that appears recurrently in the subject and whose solution must tackle every method suggested to temporal disaggregate a time series: the spurious step problem.To prevent series with undesired discontinuities from one year to the next, the pioneers proposed to estimate quarterly values using figures from several consecutive years.The disaggregation methods proposed by [20][21][22] were devised to estimate the quarterly series corresponding to year t as a weighted average of the annual values of periods t-1, t and t+1.They estimate the quarterly series through a fix weight structure.The difference among these methods lies on the election of the weight matrix.[20] calculated the weight matrix by requiring that the estimated series verify some a priori "interesting" properties.[21] assumed that the curve of the quarterly estimates is located upon a second degree polynomial that passes by the origin.And [22] extended [21]'s proposal to polynomials with other degrees.Furthermore, following pioneers' ideas, [23] expanded [20] to the case of distributing quarterly or annual series into monthly ones and later [24] provided, in an econometric computer package G, a method to convert annual figures into quarterly series, assuming that a cubic polynomial is fitted to each successive set of two points of the lowfrequency series.All these methods, however, are univariate and it was necessary to wait two more decades to find inside this approach a solution for the multivariate problem.Just recently, [25,26] have extended, for both stock and flow variables, [24]'s univariate polynomial method to the multivariate case.
A different approach was proposed in [27] who, using also an ad-hoc mathematical procedure, build the quarterly series by solving an optimization problem.In particular, their method builds the quarterly series as solution of the minimization of the sum of squares of either the first or the second differences of the (unknown) consecutive quarterly values, under the condition that the annual aggregation of the estimated series adds up the available annual figures.Although [27] algorithms mostly reduced the subjective charge of the preceding methods, their way of solving the problem of spurious steps was still a bit subjective and therefore it was not free of criticism.On the particular, it could be consulted, among others [6,14,15,28].
In the same way that polynomial procedures were generalized, [27] approach was also extended in both flexibility and in the number of series to be handled.[29] extended [27] work introducing flexibility in the process in a double way.On one hand, they dealt with any pair of possible combination of high and low frequencies.And, on the other hand, they considered the minimization of the sum of the squared of the ith differences between successive sub-period values (not only first and second differences).The multivariate extension, nevertheless, was introduced later, in [30].
Additionally to the abovementioned ad-hoc mathematical algorithms, many other methods could be also classified within this group of techniques.Among them, it will be highlighted [31] and [32].[31] assumed that the target series is observed in its higher frequency in a part of the sample period (generally at the end of the sample) and proposed to use this subsample to estimate the non-available values employing the temporal characteristics of the series obtained from it.This strategy, however, as proved by [33], yields inefficient estimates.[32], on the other hand, proposed to obtain the target series minimizing a squared function defined by the inverse covariance matrix associated to the quarterly stationary ARMA(p,q) process obtained taking differences on the non-stationary ARIMA(p,d,q) one (an ARIMA process with autoregressive order p, integrated order d, and moving average order q).In particular, they suggested adjusting an ARIMA model to the low-frequency series, selecting a model for the quarterly values compatible with the annual model and minimizing the dth differences of the objective series by the loss function using the annual series as constraint.Unfortunately, according to [34], [32] method only performs well when the series are long enough to permit a proper estimation of the ARIMA process.[32], however, made possible to reassess [27].They showed [27]'s algorithm being equivalent to use their procedure assuming that the series to be estimated follow either a temporal integrated process of order one or two (i.e., I(1) or I(2)).
To know the advantages and disadvantages of many of the methods introduced in this section and decide which to use under what circumstances, it could be consulted [34,35].They performed a simulation and a real-data exercise in which [20,21,27,36]in its variant without indicators [32,37] and Lagrange polynomial interpolators were compared.

Based Method Indicators
Procedures based on related variables have been the most popular and the most widely used and successful.Thus, a great number of procedures can be found within this category.Comparing with the algorithms not-using indicators, related variable procedures have been assigned two main principal advantages: (i) they present better foundations in the construction hypothesis (which can comparatively affect the results validation); and, (ii) they make use of relevant economic and statistical information, being more efficient.Although, in return, the resulting estimates depend crucially on the indicators chosen and, as [38] observed, they hide an implicit hypothesis according to which the LF relationship is accepted to be maintained in the HF.
The first drawback implies that a special care should be taken in selecting indicators.This problem, however, far from being closed has been opened during decades.Already in 1951, [39] tried to establish some criteria the indicators should fulfill, and this issue has been repeatedly treated again and again [38,40,12,30].It, however, does not yield that some universal accepted sound criteria have been proposed.Indeed, although the use of indicators to predict the probable evolution of key series throughout the quarters of a year is the rule among the countries that estimate quarterly accounts by indirect methods there are no apparent fixed criteria to select them.A summary of the indicators used by the different national statistics agencies can be found in [12] and a list of features that a variable should fulfill to be selected as indicator is pointed out in [13].
In order to better manage the large quantity of procedures classified in this category, this section has been divided in three subsections.The first subsection is devoted to those procedures, called benchmarking techniques, which given an initial estimated HF series adjust their values using some penalty function in order to fulfill the LF constraints.Subsection 2 presents the procedures that take advantage of econometric models to approximate the incompletely observed variables.Some techniques which use dynamic regression models in the identification of the relationship linking the series to be estimated and the (set of) related time series have been also included in this subsection.Finally, subsection three shows those methods, named optimal procedures, which jointly estimate both parameters and HF series combining target LF series and HF indicators and incorporating the LF constraints in the process of estimation, basically [41] and its extensions.

Benchmarking/Adjusting Algorithms
As a rule, benchmarking and adjusting methods are composed of two stages.In the first step an initial approximation of the objective series is obtained.In the second step the first estimates are adjusted by imposing the constraints derived from the available and more reliable annual series.The initial estimates are reached using either sample procedures or some kind of relationship among indicators and target series.Among those options that use related variables to obtain initial approximations both non-correlated and correlated strategies could be found.The non-correlated proposals (the former from an historical perspective) do not explicitly take into account the existing correlation between target series and indicators and rapidly lost analysts' favor.[42] can be consulted for a wide summary of these algorithms.The correlation strategies, on the other hand, usually assume a lineal relationship between the objective series and the indicators from which an initial HF series is obtained.Once the initial approximation is available, it is adjusted to make it congruent with the observed LF series.The discrepancies between both LF series (the observed series and the series obtained by aggregation of the initial estimates) are then removed.
A great quantity of benchmarking procedures can be found in the literature.Bassie [43] proposed to distribute annual discrepancies by a structure of fixed weights calculated taking into account the discrepancies corresponding to two successive years and assuming that the weights function follows a third degree polynomial.[43] has been historically applied to series of the French and Italian economy [44][45][46] and currently Finland and Denmark use variants of this method to adjust their quarterly GDP series [12].However, it lacks of theoretical justification and, according to [43], the method spawns series with irregularities and cyclical components different to the initial approximations when the annual discrepancies are too big [12].
Vangrevelinghe [47] planned a different approach.His proposal (primary suggested to estimate the French quarterly familiar consumption series) consists of (1) applying [20] to both objective annual series and indicator annual series to obtain, respectively, an initial approximation and a control series, to then (2) modifying the initial estimate by aggregating the discrepancies between the observed quarterly indicator and the control series, using as scale factor the Ordinary Least Squares (OLS) estimator of the linear observed annual model.Later, minimal variations of [46] were proposed by [28] and [49].[28] suggested obtaining the initial estimates using [27], instead of [20], and [48] proposed generalizing [88] by allowing the weight structure to be different for each quarter and year, with the weight structure obtained, using annual constraints, from a linear model.
One of the most successful methods in the area (not only among benchmarking procedures) is the approach proposed by [36] in 1971.The great attractiveness of methods such as [36] -and also [41] -among analysts and statistical agencies [17], despite using more sophisticated procedures generally yielding better estimates [13], can be explained because as [49] pointed out short-term analysis in general and quarterly accounts in particular need disaggregation techniques being "…flexible enough to allow for a variety of time series to be treated easily, rapidly and without too much interven-tion by the producer;" and where "the statistical procedures involved should be run in an accessible and well known, possibly user friendly, and well sounded software program, interfacing with other relevant instruments typically used by data producers (i.e.seasonal adjustment, forecasting, identification of regression models,…)".
Denton [36] suggested adjusting the initial estimates minimizing a loss function defined by a square form.Therefore, the choice of the symmetrical matrix determining the specific square form of the loss function is the crucial element in [36].Denton concentrated on the solutions obtained minimizing the hth differences between the to-be-estimated series and the initial approximations and found [27] as a particular case of his algorithm.Later on, [50] proposed a slight modification to this function family in order to avoid dependence on the initial conditions.The main extensions of [36] nevertheless were reached by [51][52][53][54][55] who made more flexible the algorithm and extended it to the multivariate case.
Hillmer and Trabelsi [51,52] worked on the problem of adjusting a univariate HF series using data obtained from different sampling sources, and found [36] and [50] as particular cases of their proposal.In particular, they relaxed the requirements about the LF series permitting it to be observed with error; although, as compensation, they had to suppose known the temporal structure of the errors caused by sampling the LF series.When benchmarks are observed without error, the problem transforms into minimizing the discrepancies between the initial estimates and the LF series according to a loss function of the square form type [52].In these circumstances, they showed that the method of minimizing the hth differences proposed by [36] and [50] implies to implicitly admit: (1) that the rate between the variances of the observation errors and the ARIMA model errors of the initial approximation tends to zero; and, (2) that the observation errors follow an I(h) process with either null initial conditions, in [36], or with the initial values of the series of observation errors begin in a remote past, in [50].
In sample survey most time series data come from repeated surveys whose sample designs usually generate autocorrelated errors and heteroscedasticity.Thus, [53] introduced a regression model to take into account it explicitly and showed that the gain in efficiency of using a more complex model varies with the ARMA model assumed for the survey errors.In this line, [56] showed, through a simulation exercise and assuming that the survey error series follows an AR(1) process, that [53] and [57] have great advantages over [36] and that they are robust to misspecification of the survey error model.The multivariate extension of [36], on the other hand, was introduced under a general accounting constraint system in [54] and [55].They assumed a set of linear relationships among target variables and indicators from which initial estimates are obtained, to then, applying the movement preservation principle of Denton approach subject to the whole set of contemporaneous and temporal aggregation relationships, reach estimates of all the series verifying all the constraints.
Although [36] and also [55] do not require any reliability measurement of survey error series to be applied, their need in many other [36]'s extensions [51] led [58] to propose an alternative approach.In particular, to overcome some arbitrariness in the choice of the stochastic structure of the high frequency disturbances, [59] and [60] developed a new adjustment procedure assuming that the initial approximation and the objective series share the same ARIMA model.They combined an ARIMA-based approach with the use of high frequency related series in a regression model to obtain the Best Linear Unbiased Estimate (BLUE) of the objective series verifying LF constraints.This approach permits an automatic (which takes a recursive form in [60]) 'revision' of the estimates with each new observation, what introduces an important difference with the other procedures where the estimates obtained for the periods relatively far away from the last period of the sample are in practice 'fixed'.On the other hand, the multivariate and extrapolation extensions of Guerrero's approach were likewise provided by Guerrero and colleagues [58,61].[61] suggested a procedure for estimating unobserved values of multiple time series whose temporal and contemporaneous aggregates are known using vector autoregressive models; while [58] proposed a recursive approach to estimate current disaggregated values of the series and a method to predict future disaggregated values.In these cases, nevertheless, it should be noted that even though the problem can be cast using a state-space formulation, the Kalman filter approach cannot be applied directly in these circumstances since the usual assumptions underlying Kalman filtering are not fulfilled.
When dealing with economic variables, it is not uncommon to use logarithms or other transformations of original data (for example, most time series become stationary after applying first differences to their logarithms) to achieve better models of time series due to, as [141] showed, "…the failure to account for data transformations may lead to serious errors in estimation".A very interesting variant in this framework therefore emerges when log-transformations are taken.The problem of dealing with log-transformed variables in the temporal distribution framework was first considered by [63], and later treated, among others, in [64] and [1].However, due to the logarithmic transformation not being additive, the LF aggregation constraint cannot be directly applied.When only log-transformation are taken, [64] proposed to obtain initial estimates applying the exponential function to the approximations reached using a linear relationship between log-transformation of the target series and indicators, to then in a second step adopt [36] algorithm to get the final values; although according to [47] this last step could be unnecessary as "the disaggregated estimates present only negligible discrepancies with the observed aggregated values."On the other hand, when the linear relationship is expressed in terms of rate of change of the target variable (i.e., using the logarithmic difference), [47] and [65] suggested to obtain initial estimates for the non-transformed values of the objective variable using [66] and further benchmark (using Denton's formula) for either flow or index variable to exactly fulfill the temporal aggregation constraints.

Econometric Model Approaches
Economic theory stands for functional relationships among variables.The econometric models express those relations by means of equations.Models based on annual data conceal higher frequency information and are not considered sufficiently informative to policy makers.Building quarterly and monthly macro-econometric models is therefore imperative.Sometimes, the frequency of the variables taking part in the model is not homogeneous and expressing the model in the lower common frequency almost never offers an acceptable approximation, due to, as [67,68] showed, it is preferable to estimate the missing observations simultaneously with the econometric model rather than previously interpolated the unavailable values to directly handle the highfrequency equations.Thus, putting the model in the desired frequency and use the same model, not only to estimate the unknown parameters but also to estimate the non-observed values of the target series, represents in general a good alternative.
Drettakis [69] formulated a multiequation dynamic model about the United Kingdom economy with one of the endogenous variables observed only annually for a part of the sample and obtained estimates for the parameters and the unobserved values by Maximum Likelihood (ML) with complete information.[70] extended [69] to the case in which the number of unobserved series is higher than one and introduced an improvement to reduce the computational charges of the estimation procedure.The use of ML was also followed in [74], [75] and [10].As example, [10] derived the ML estimator when data are subject to different temporal aggregations and compared its sample variance with those obtaining after applying to [74,75] estimators Generalized Least Squares (GLS) and Ordinary Least Squares (OLS).On the other hand, GLS estimators were employed by [71,76,78] for models with missing observations in the exogenous variables and, therefore, probably with a heteroscedastic and serially correlated disturbance term.
In the extension to dynamic regression models, the ML approach was again used in Palm and Nijman's works.[79] considered a simultaneous equations model, not completely specified, about the Dutch labor market with some variables only annually observed and proposed to obtain initial estimates for those variables using the univariate quarterly ARIMA process derived from the observed annual series.These initial estimates were used to estimate the model parameters by ML. [77] studied the problem of parameters identification and [79] and [9] the estimation one.To estimate the parameters they proposed two alternatives based on ML.The first one consisted of building the likelihood function from the forecast errors, using the Kalman filter.The second alternative consisted of applying the EM algorithm adapted to incompletes samples.This adaptation was developed in a wide and long paper by [73].[57], on the other hand, presented a general dynamic stochastic regression model, which permits to deal with the most common short-term data treatment (including interpolation, benchmarking, extrapolation and smoothing), and showed that the GLS estimator is the minimum variance linear unbiased estimator [17].Additionally, although other temporal disaggregation procedures based on dynamic models have been proposed [49,80,81], they will be considered in the next subsection as they can be observed as dynamic extensions of [41].It must be noted however that they might be also placed on the previous subsection due to they could be seen under the perspective of the classical two-step approach of adjusting algorithms.

Optimal Procedures
Optimal methods get their name from the estimation strategy they adopt.Such procedures directly incorporate the restrictions derived from the observed annual series into the estimation process to jointly obtain the BLUE of both parameters and quarterly series.To do that, a linear relationship between target series and indicators is usually assumed.This group of methods is one of the most widely used and in fact its root proposal, [41], has served as basis for many statistical agencies [16,82,83,84] and analysts [85,86,87,88] to quarterly distribute annual accounts and to provide flash estimates of quarterly growth, among other tasks.
There are many links between benchmarking and op-timal procedure; however, according to [14], "(1) … compared to optimal methods, adjustment methods make an inefficient (and sometimes, biased) use of the indicators; (2) the various methods have a different capability of providing statistically efficient extrapolation".In contrast, the solution of optimal methods crucially depends on the correlations structure assumed for the disturbances of the linear relationship.In fact, many optimal proposals are only different in that point.All of them, nevertheless, pursue to avoid spurious steps in the estimated series.
Friedman [42] was the first one in applying this approach.He solved the stock variable case obtaining the BLUE of both coefficients and objective series.Nevertheless, [41] were who solved, using a common notation, the interpolation, distribution, and extrapolation problems and wrote the paper probably most influential and cited in this subject.They focused on the case of converting a quarterly series into a monthly one and assumed an AR (1) hypothesis for the disturbances in order to avoid unjustified discontinuities in the HF estimated series.Under this hypothesis, the covariance matrix is governed by the autoregressive coefficient of order one of the HF disturbance series, which is unknown.Hence, to apply the method it has to be previously estimated.[41] suggested exploiting the functional relationship between autoregressive coefficients of order one of the LF and the HF errors to estimate it.Specifically, they proposed an iterative procedure to estimate the monthly AR (1) coefficient from the rate between elements (1, 2) and (1, 1) of the quarterly error covariance matrix.
The [41] strategy of relating the autoregressive coefficients of order one of the high and low error series, however, cannot be completely generalized to any pair of frequencies [89] and consequently several other stratagems were followed to solve the issue.In line with [41], [55] obtained for the annual-quarterly case a function between the two autoregressive coefficients.The relation reached by [55], unfortunately, only has unique solution for non-negative annual autoregressive coefficients.Despite it, [90,91] took advantage of such a relation to suggest two iterative procedures to handle the Chow-Lin method with AR (1) errors in the quarterly-annual case.[90] even provided a solution to apply the method when an initial negative estimate of the autoregressive coefficient is obtained.Although, to handle the problem of the sign, [40] had already proposed to estimate the autoregressive coefficient through a two-step algorithm in which, in the first step, the element (1, 3) of the covariance matrix of the annual errors is used to determinate the sign of the autoregressive coefficient.In addition to the above possibilities, strategies based on the maximum likelihood (with the hypothesis of normality for the errors) have been also tried.Examples of this approach can be found in [46,92,93].
Despite the AR (1) temporal error structure being the most extensively analyzed, other disturbance structures has been proposed.Among the stationary ones, [94] held MA (1), AR (2), AR (4), and a mix between AR (1) and AR (4) as reasonable possibilities to deal with the annual-quarterly case.These complexities, however, were probed as unnecessary in [95], who through a Monte Carlo experiment showed that, although disturbances actually follow other stationary structures, assuming an AR (1) hypothesis on the disturbance term does not significantly influence the quality of the estimates.In regard to the extensions towards no stationary structures, [66] and [96] can be cited.On the one hand, [66], based on [97,98] results, recommended to use [36] and showed that such an approach to the problem is equivalent to use [41] with a random walk hypothesis for the errors.And, on the other hand, [96] studied the problem of monthly disaggregating a quarterly series and extended [41] for the case in which the residual series followed a Markov random walk.[96], however, did not solve the problem of estimating the parameter of the Markov process for small samples.Fortunately, [95] found a solution to this problem and extended [99] to the case of annual series and quarterly indicators.
Despite [55] abovementioned words about the superiority of optimal methods over adjustment procedures, all the previous methods can be, in fact, obtained as solutions of a quadratic-linear optimization problem [63], where the metric matrix that defines the loss function is the inverse of the high-frequency covariance error matrix.Theoretically, therefore, other structures for the disturbances could be easily managed.Thus, for example, in line with [37], the HF covariance error matrix could be estimated from the ARIMA structure of the LF covariance matrix.Despite it, low AR order models are still systematically chosen in practice due to (1) the covariance matrix of the HF disturbances cannot be, in general, uniquely identified from the LF ones and (2) the typical sample sizes occurring in economics usually provide poor LF error matrix estimates [64,49].In fact, the Monte Carlo evidence presented in [100] showed that this approach would likely perform comparatively badly when the LF sample size is lower than 40 (a really non infrequent size in economics).
The estimates obtained following [41] are, however, only completely satisfactory when the temporal aggregation constraint is linear and there are no lagged dependent variables in the regression.Thus, to improve accuracy of estimates taking into account dynamics specifications, usually encountered in applied econometrics works, several authors [80,81,101,102] have proposed to generalize [41] (including [66] and [96]) by the use of linear dynamic models, which permit to perform temporal disaggregation problems providing more robust results in a broad range of circumstances.In particular, [80] following the works initiated by [81,101,102] which were very difficult to implement in a computer program proposed an extension of [41] that is particularly adequate when the series used are stationary or co-integrated [103].They besides solve the features of the estimation of the first low-frequency period and produce disaggregated estimates and standard errors in a straightforward way.In [104] a MATLAB library to perform it is provided.This library completed the MATLAB libraries, also detailed in [105], that to run [27,36,41,54,66,96,106] offers the Spanish Statistical Official Institute (INE) free of charge [107].On the other hand, empirical applications of these procedures could be consulted in [49,65], where furthermore a panoramic revision of [80]'s procedure is offered.
The [41] approach and its abovementioned extensions are all univariate, thus to handle problems with more than J (>1) series to-be-estimated, multivariate extensions are required.In this situations, apart from the low-frequency temporal constraints, some additional cross-section, transversal or contemporaneous aggregates among the HF target series are usually available.To deal with this issue, different procedures [41] have been proposed in the literature.[108] was the first who faced this problem.[108] assumed that the contemporaneous HF aggregate of the J series is known and proposed to apply an estimation procedure in two steps.In the first step, he suggested applying [41], in an isolated way, to each one of the J series imposing only the corresponding LF constraint and assuming white noise residuals.In the second step, he proposed to run again [41] procedure, imposing as constraint the observed contemporaneous aggregated series under a white noise error vector, to simultaneously estimate the J series using as indicators the series estimated in the first step.This strategy, however, as [106] pointed out, does not guarantee the fulfillment of the temporal restrictions.
[102], attending to [108] limitation, generalized the [41] estimator and got the BLUE of the J series, fulfilling simultaneously the temporal and the transversal restrictions.Similar to [41], [106] again obtained that the estimated series crucially depend on the structure assumed for the disturbances.Nevertheless, he only offered a practical solution under the hypothesis of errors temporally uncorrelated.That hypothesis unfortunately is inadequate due to it can produce spurious steps in the estimated series.In order to solve it, [109] and [110] introduced a structure for the disturbances in which each one of the J error series follow either an AR (1) process or a random walk with shocks only contemporaneously Copyright © 2010 SciRes.JSSM correlated.[110], additionally, extended the estimator obtained in [106] to situations with more general contemporaneous aggregations and provided an algorithm to run such so complex disturbance structure in empirical works.Algorithm that is unnecessary under [54]'s simplification, which proposes assuming a multivariate random walk structure for the error vector.Finally, the multivariate extrapolation issue was introduced in [13] extending [109] proposal.

The Kalman Filter Strategies
In the study of time series, one of the approaches is to consider the series as a realization of a stochastic process with a particular model generator (e.g., an ARIMA process), which depends on some parameters.In order to predict how the series will behave in a future or to rebuild the series estimating the missing observation it is necessary to know the model parameters.The Kalman filter permits to take advantage of the temporal sequence of the series to implement through a set of mathematical equations a predictor-corrector type estimator, which is optimal in the sense that it minimizes the estimated error covariance when some presumed conditions are met.In particular, it is an efficient recursive filter that estimates the state of a dynamic system from a series of incomplete and noisy measurements.This approach appears very promising within the temporal disaggregation problem due to its great versatility.Moreover, for economists, it presents the additional advantage of making possible that both unadjusted and seasonally adjusted series can be simultaneously estimated.Among the different approaches to approximate the population parameters of the data generating process it stands out ML.The likelihood function of the stochastic process can be calculated in a relatively simple and very operative way by the Kalman filter.The density of the process, under a Gaussian distribution assumption for the series, can be easily derived from the forecast errors.Prediction errors can be computed in a straightforward way by representing the process in the state space, and the Kalman filter can then be used.In general, the pioneers methods based on the representation in the state space supposed an ARIMA process for the objective series and computed the likelihood of the process through the Kalman filter by employing the smooth point-fixed algorithm (see, e.g., [111] for details in both univariate and multivariate cases) to estimate the not available values.
Despite the representation of a temporal process in the state space not being unique, the majority of the proposals to adapt Kalman filter to manage missing observations can be reduced to the one proposed by [112].[112] suggested building the likelihood function excluding the prediction errors associated to those temporal points where no observation exist and proposed to use forecasts obtained in the previous instant to go on running the Kalman filter equations.Among others, this pattern was followed by [111][112][113][114][115][116][117].Additionally to [112] approach, other approaches can be found.[118] developed a new filter and some smooth algorithms which allow interpolating the non observed values with simpler computational and analytical expressions.[119] used state-space models to adjust a monthly series obtained from a survey to an annual benchmark.And, [120] followed the strategy of estimating missing observations considering them as outliers, while [121] introduced a prescribed multiplicative trend in the problem of quarterly disaggregating an annual flow series using its state-space representation.
Jones [112], pioneer in the estimation of missing observations from the state space representation, treated the case of a stock variable which is assumed to follow a stationary ARMA process.Later on, [113], also dealing with stationary series, extended [112] proposal for the case of flow variables.Besides, they adapted the algorithm to the case in which the target series follows a regression model with stationary residuals and dealt with the problem of working with logarithms of the variable.What's more, [113] also extended the procedure to the case of stock variables following non stationary ARIMA processes; although in this case, they compelled the target variable being available in HF for a large enough sample subperiod.In the non stationary case, however, when [113]'s hypothesis is not verified, building the likelihood of the process becomes difficult.Problems in converting the process into stationary and in defining the initial conditions arise.In order to solve it, [114] proposed to consider a diffuse initial distribution in the pre-sample, while [115] suggested transforming the observations in order to define the likelihood of the process.[115]'s transformation made possible to generalize the previous results (including those reached by [113]), although at the cost of destroying the sequence of the series, altering both smoothing and filtering algorithms.Fortunately, [117] went beyond this difficulty and solved it making possible to use the classical tools to deal with the problem of non stationary processes whatever the structure of the missing observations.However, although [115] and [117] extended the issue to the treatment of regression models with non stationary residuals (allowing related variables to be included in this framework), they did not deal with the case of flow variables in an explicit way.[116] was who handled such a problem and extended the solution to non stationary flow series.[116], moreover, suggested using the Kalman filter for the recursive estimate of the non-observed values as a tool to overcome the problem of the change of the estimates due Copyright © 2010 SciRes.JSSM to the increasing of the available sample.In this line, [122] used information contained in related series to estimate the monthly Swiss GDP from the quarterly series, while [123] estimated a monthly US GDP series from quarterly values after testing several state-space representations to, through a MonteCarlo experiment, identify which variant of the model gives the best estimates.They found the more simple representations did almost as well as more complex ones.Most of above proposals, however, consider the temporal structure (the ARIMA process) of the objective series known.In practice, however, it is unknown and it is required to specify the orders of the process to deal with it.In order to solve it, some strategies have been followed.Some attempts have tried to infer the process of a HF series from the observed process of the LF one [6,60,116]; while many other studies have concentrated on analyzing the effect of aggregation over a HF process (e.g., among others, [32], [124][125][126], and, more recently, [127] and [3]) and on studying its effect over stock variables observed in fixed step times (among others, [128], or [11]).Fortunately, the necessary and sufficient conditions under which the aggregate and/or disaggregate series can be expressed by the same class of model was derived by [129].
Both multivariate and dynamic extensions have been also tackled from this framework, although they are just incipient.On the one hand, the multivariate approach started by [111] was continued in [130], who suggested a multivariate seemingly unrelated time series equations model to estimate the HF series using the Kalman filter when several constraints exits.The framework they proposed is flexible enough to allow for almost any kind of temporal disaggregation problems of both raw and seasonally adjusted time series.On the other hand, [131] offered a dynamic extension providing, among others contributions, a systematic treatment of [96], which permits to explain the difficulties commonly encountered in practice when estimating [96]'s model.

Frequency Domain Approaches
A great amount of energy has been devoted to deal with the matter from the temporal perspective.Similarly, great efforts have been also devoted from the frequency domain, although they have had less successful and have done less fruits.In this approach, the greatest efforts have been invested on estimating the spectral density function or spectrum of the series, the main tool of a temporal process in the frequency plane.The estimation of the spectrum of the series has been undertaken from both angles: the parametric and the non-parametric perspective.
Both Jones [132] and Parzen [133,134] were pioneers in the study of missing observations from the frequency domain.They analyzed the problem under a systematic scheme for the observed (and therefore also for the unobserved) values.[132], one of the pioneers in studying the problem of estimating the spectrum, treated the case of estimating the spectral function of a stock stationary series sampled systematically.This problem was also faced by [134] who introduced the term of amplitude modulation, the key element in which later spectral developments were based on in their search for solutions.
The amplitude modulation defines itself as a zeros and ones series in the sample period.The value of the amplitude modulation is one in those periods where the series is observed, whereas it is zero in case of not being observed.
Different schemes for the amplitude modulation have been considered in the literature.[135] studied the case in which the amplitude modulation followed a Bernoulli random scheme.This random scheme was extended to others by [136] and [137].More recently, [138] obtained estimators of the spectral function for three types of modulation sequences: determinist, random and correlated random.On the other hand, [139], [140] and [141] followed a different approach; they assumed an ARIMA process and estimated its parameters with the help of the spectral approximation to the likelihood function.
Although the great majority of patterns for the missing observations can apparently be treated from the frequency domain, not all of them have a solution.This fact is a consequence of the impossibility of completely estimating the autocovariances of the process in many practical situations.In this sense, [142] studied the situations in which it is possible to estimate all the autocovariances.On the particular, it must be remembered [139] words: "… (the estimators) are asymptotically efficient when compared to the Gaussian maximum likelihood estimate if the proportion of missing data is asymptotically negligible.".Hence, the problem of disaggregating an annual time series in quarterly figures is one of those that do not still have a satisfactory solution from this perspective.Likewise, the efforts made to employ the spectral tools to estimate the missing values using the information given by a group of related variables have required so many restrictive hypotheses that its use has not been advisable until now.Nevertheless, from a related approach, [143] have made some advances proposing a method to estimate (under some restrictive hypothesis and in a continuous way) a flow variable.

Discussion
The large list of the reference section clearly shows that a vast quantity of methods, procedures, and algorithms has been proposed in the literature to deal with the prob-lem of transforming a low-frequency series (either annual or quarterly) into a high-frequency one (either quarterly or monthly).The first proposals, which built series using pure mathematical ad-hoc procedures, were progressively overcame and gradually strategies based on related indicators were gaining researchers' preferences, with the [41] method, and all its extensions, highlighting.Likewise, interesting solutions have been also suggested representing the series into the state-space and using the Kalman filter to handle the underlying dynamic system.In fact, in my opinion, the great flexibility of this strategy makes it a proper tool to deal with the future challenges to appear in the subject and to handle situations of non-systematic missing observations (non-treated in this paper).The advances made from the frequency domain however do not seem encouraging.None of the approaches, nevertheless, should be discarded rapidly due to, according to [144], pooling estimates obtained from different procedures can improve the quality of the disaggregated series.
An analysis of the historical evolution of the topic, nevertheless, points towards dynamic regression models and techniques using formulations in terms of unobserved component models/structural time series and the Kalman filter as the research lines that will hold a pre-eminent position in the future.On the one hand, the extension of the subject to deal with multivariate dynamic models is still waiting to be tackled; and, on the other hand, the state-space methodology offers the generality that is required to address a variety of inferential issues that have not been dealt with previously.In this sense, both approaches could be combined in order to solve one of the main open problems currently posed in the area: to jointly estimate some high-frequency series of rates when the low-frequency series of rates, some transversal constraints and several related variables are available.For example, it could be used to solve the problem of distributing among the regions of a country the national quarterly rate of growth, when the annual series of regional growth rates are known and several high-frequency regional indicators are available and, moreover, both the regional and sector structure of weights change quarterly and/or annually.In this line, Prioreti's [145] recent paper is a first step.
In addition, a new emerging approach that is taking into account the more recent developments in applied statistical literature (including data mining, dynamic common component analyses, time series models environment and Bayesian modeling) and that takes advantage of the continuous advances in computer hardware and software (making use of large dataset available) will likely turn up in the future as a main line in the subject.Indeed, as [146] point out: "Existing methods … are ei-ther univariate or based on a very limited number of series, due to data and computing constraints …until the recent past.Nowadays large datasets are readily available, and models with hundreds of parameters are easily estimated".In this line, [147] dealt with a dynamic factor model using the Kalman filter to perform an index of coincident US economic indicators; [146] modeled a large datasets with a factor model and developed an interpolation procedure which clearly improves univariate approaches that exploits the estimated factors as summary of all the available information; and, [148] proposed a bivariate basic structural model that permits to carry out simultaneously the seasonal and calendar adjustment and the temporal disaggregation.
ferent main problems can emerge: the distribution problem and the interpolation problem.On the one hand, the distribution problem appears when the observed values of a flow low-frequency (LF) series of length N must be distributed among kN values (where k is the number of sub-periods in which each period of low-frequency is divided: For instance, k = 3 if a monthly series must be estimated from a quarterly observed series, k = 4 if quarterly estimates for yearly data are desired and k = 12 if monthly data are required for an annually observed series), such that the temporal sum of the estimated high-frequency (HF) series fits the values of the LF series.It is, if y = [y 1 ,...,y N ' represents the (Nx1) vector of observed LF values and z = [z 1 ,...,z T ' the (T1) vector of missing HF values, with T = kN; the vectors y and z can be related by y = Bz, with B = I N  u, where I N is the identity matrix of order N,  stands for Kronecker's product and u is a (1k) vector of ones.On the other hand, the interpolation problem consists in generating a HF series with the values of the new series being the same as the ones of the LF series for those temporal moments where the latter is observed.It is u is equal to either a vector (1k) of zeroes except for the first or the last component that is one, depending on the corresponding high-frequency point time where the LF series is observed.