Nonparametric Demand Forecasting with Right Censored Observations

In a newsvendor inventory system, demand observations often get right censored when there are lost sales and no backordering. Demands for newsvendor-type products are often forecasted from censored observations. The Kap-lan-Meier product limit estimator is the well-known nonparametric method to deal with censored data, but it is undefined beyond the largest observation if it is censored. To address this shortfall, some completion methods are suggested in the literature. In this paper, we propose two hypotheses to investigate estimation bias of the product limit estimator, and provide three modified completion methods based on the proposed hypotheses. The proposed hypotheses are verified and the proposed completion methods are compared with current nonparametric completion methods by simulation studies. Simulation results show that biases of the proposed completion methods are significantly smaller than that of those in the literature.


Introduction
In a newsvendor inventory system, a decision maker places an order before the selling season with stochastic demand.If too much is ordered, stock is left over at the end of the period, whereas if too little is ordered, sales are lost.The optimal order quantity is often set based on the well-known critical ratio [1], therefore demand observations often get right censored when there are lost sales and no backordering.Because lost sales cannot be observed, the available sales data actually reflect the stock available for sale, rather than the true demand.Demands for newsvendor-type products are often forecasted from censored observations.The problem of demand forecasting in the presence of stockouts is a well-known problem of handling censored observations, which was recognized by [2].Approaches of handling censored observations can be divided into two classes: (1) parametric method, which often assumes that the observations come from specific theoretical distribution and then estimate parameters of the assumed distribution by applying maximum likelihood estimation or some updating procedures [3]; This method is often used in density forecasting [4]; (2) nonparametric method, which is often established based on the product limit estimator [5], and attempts to address the problem of the "undefined region" beyond the largest observation when it is censored [6].
Parametric methods for demand forecasting from censored observations have been investigated in [7−14].These works have been briefly reviewed in [15], and it has been indicated in [15] that it is difficult to determine the shape or family of demand distribution in advance when demand observations are censored.
The product limit (PL) estimator is a nonparametric maximum likelihood estimator of a distribution function based on censored data.If the largest observation is censored, the PL estimator is developed to estimate the left-hand side of demand distribution, but it is undefined for the right-hand side of distribution function.Under the assumption that there are more information besides the censored observations, Lau and Lau [3] and Zhang et al. [15] have investigated the problems of estimating the right-hand side of demand distributions.
Without additional information besides the censored observations, truncation techniques or completion methods are usually employed to define the whole distribution function.Truncation techniques are based on the data-driving rules, which include two common truncation rules: (1) truncating at the largest observation if it is censored, and (2) truncating at (n−l)th order statistics [6].These truncation rules may intuitively appear to have good properties by avoiding problems in tail, but they will incur large bias because the location of the ignored region is a random event.Completion methods aim to redefine the PL estimator beyond the largest observation if it is censored.We will briefly review nonparametric completion methods in the next section.
In this paper, we propose two hypotheses to investigate estimation bias of the PL estimator, and provide three modified completion methods based on the proposed hypotheses.The proposed hypotheses are verified and the proposed completion methods are compared with current nonparametric completion methods in the literature by simulation studies.
The remainder of this paper is structured as follows.We briefly introduce the PL estimator and review current nonparametric completion methods suggested in the literature.Then we propose two hypotheses to investigate estimation bias of the PL estimator, and provide three modified completion methods.We further verify the two hypotheses and compare the proposed completion methods with current nonparametric completion methods by simulation studies.The paper ends with some concluding remarks.

Nonparametric Completion Methods
In this section, we first introduce the PL estimator in the context of an inventory system, and then we review current nonparametric completion methods for the PL estimator suggested in the literature.

Let
, be iid (independent identically-distributed) demand from distribution F, and inventory level , be iid from distribution Kaplan and Meier [5] introduced the PL estimator for the survival function , which is estimated as follows: where denotes the ith ordered observation among all i Z , and : From the above definition, it is observed that the PL estimator is undefined beyond the largest observation, i.e., for and :

Review of Current Completion Methods
To overcome the shortfall of the PL estimator that it is undefined beyond the largest observation, some completion methods are suggested in the literature.Efron [16] introduced the notion of self-consistency, i.e.,   : n n t Z  Gill [17] defined the survival function by : Chen and Phadia [18] modified it as : Clearly, the extreme values of scalar c yield Efron's and Gill's versions, respectively.Besides the above three constant completion methods, there are two curve completion methods suggested in the literature.Brown et al [19] suggested an exponential completion method as follows: : The parameter B  is set by solving   where , denote the m ordered uncensored demand observations, the remaining observations are censored ones.Moeschberger and Klein [20] attempted to complete   Ŝ t by a two-parameter Weibull function as follows: : The two parameters M  and k in Equation ( 7) are determined by solving When a completion method is used, the bias of   , is entirely determined by the completion method [21].For a completion method, it is clear that the "undefined region" has the most contribution to the bias of the PL estimator.One might think that this region could be in some sense ignored, as it is sug-gested in truncation techniques.Because the location of this region is a random event, simply ignoring the "undefined region" will result in a large bias [6].
The bias of is negative and asymptotically zero as , whereas the bias of is positive and increasing as .The bias using any other completion method will be bounded by the biases of and [6].The bias of changes from negative to positive and it is increasing as .If an estimator is asymptotically zero as , we say that it has completeness, which is necessary for estimating moments of distribution. , , and have completeness since they are asymptotically zero as t , whereas and do not have the completeness.The curve completion methods, , and

New Completion Methods
In this section, we first propose two hypotheses to investigate estimation bias of the PL estimator at two special points, and provide three modified completion methods based on the proposed hypotheses.Then we simplify show the nonparametric completion methods by an example.

Estimation Bias of the PL Estimator
If demand observations i X , , are observable, then its empirical survival function S t can be expressed as follows: According to Equation (1), the estimation value of the PL estimator at point To compare  : From Equations (9)(10)(11)

Modified Completion Methods
In the spirit of the exponential curve completion method suggested by [19], we provide three modified completion methods based on the proposed hypotheses.
For this example, the various aforementioned completion methods are plotted in Figure 1.From Figure 1 it can be observed that, the five curve completion methods satisfy the downward sloping monotonicity of survival function, and that the five curve completion methods and Efron's self-consistent completion have the completeness.
Hypothesis 1 implies that the PL estimator will statistically overestimate at point Chen and Phadia [18] proposed an optimal constant completion by setting .Similarly, we set .This parameter setting is presented because scalar d should not be larger than one in solving

Simulation Studies
In this section, we verify the two proposed hypotheses and compare the aforementioned completion methods by simulation studies.

Simulation Experiments
In our simulation studies, we design two experiments under an inventory system with some specific distributions as follows: , , the parameter of the exponential curve can also be set as Experiment 1: Following [22−23], we take demand distribution F to be a Weibull distribution,  for with a=1 and 2. To reflect a variety of censoring distribution patterns, we also follow [23] to take Weibull distribution,

An Illustrative Example
In a case study of a newsvendor inventory system, Lau and Lau [3] presented 20 ordered daily sales observations:    [23] for further details.This experiment is applied for investigating the case when hazard rate is decreasing, constant or increasing.
Experiment 2: Analogous to [8], we express the relation between demand X and sales Z by writing sales as a random proportion of demand, i.e., i i i Z W X  , where is a random variable taking values on the interval [0.5,1].For periods with no stockout, , and therefore sales and demand are equal; for periods in which a stockout has occurred, sales will be less than demand with .We assume that stockouts occur in each period (independently) with probability ESP and when a stockout occurs, sales i W In our case studies, we take F to be a lognormal distribution with location parameter 4, and shape parameter 1   and 2, and we also set ESP=1/3, 1/2, and 2/3.This experiment is designed for investigating the case when hazard rate changes from increasing to decreasing.
In the above two experiments, we have four different cases in terms of hazard rate: decreasing, constant, increasing, and changing from increasing to decreasing.In comparison with Experiment 1, Experiment 2 makes an additional assumption on the relation between demand and sales, i.e., sales is a random (uniformly distributed) proportion of demand.
Considering the combination of the parameters in the above two experiments, under each of four cases of hazard rate, we have 6 different combinations of the parameters.Under each parameters' combination, we set the number of observations n=20, and randomly generate 1000 simulation runs.To ensure the applicability of the completion method suggested by Moeschberger and Klein [20], the number of uncensored observations in each simulation run is restricted to be larger than 3.For the convenience of comparison, the largest observation in each simulation run is restricted to be a censored one.

Hypotheses Verification
Under each of four cases of hazard rate, we calculate under 1000 simulation runs for verifying Hy- are reported in Tables 1 and 2. In these tables, 95% C.I. is short for 95% confidence level.
Results shown in Tables 1 and 2 verify Hypothesis 1, i.e., the PL estimator statistically overestimates at point : n n Z  .Table 1 also illustrates that increases with the increase of b/a, this implies that the estimation bias of the case with increasing hazard rate is larger than that of the case with decreasing hazard case.   3 and 4, respectively.The last two rows of these two tables present results of paired 2-tailed t-tests on Tables 3 and 4 Z  increase with the increase of the expected stockout probability.

Comparison with Current Completion Methods
In this subsection, we assess performance of the proposed completion methods in terms of integral absolute bias, Results of paired 2-tailed t-tests on IAB among the compared eight completion methods under each of four cases of hazard rate are shown in Tables 5-8, respectively.These tables report t-statistics on IAB between the row method and column method.One negative t-statistic in these tables means that the row method is better than the corresponding column method in terms of IAB, whereas positive t-statistic implies that the column method is better than the corresponding row method.t-statistic in parentheses represents that the comparison is at the 0.05 significance level; t-statistic in square brackets implies that there is no significant difference between the row and column methods; t-statistic without parentheses or square brackets expresses that the comparison is at the 0.01 significance level.
According to the following results shown in Tables 5-8, we come to the following conclusions in terms of IAB: (1) Ave is the leading completion method; (2) Left performs better than the optimal constant completion method CP; (3) CP is always better than the current curve completion methods (i.e., BHK and MK); (4) Efron and Gill are the two worst completion methods.

Conclusions
Demands for newsvendor-type products are often forecasted from censored observations.The Kaplan-Meier product limit estimator is the well-known nonparametric method to deal with censored data, but it is undefined beyond the largest observation.In this paper, we propose two hypotheses to investigate estimation bias of the PL estimator, and provide three modified completion methods based on the proposed hypotheses.Simulation results show that biases of the proposed completion methods are significantly smaller than that of the completion methods in the literature.According to these results, we know that the proposed completion methods can improve demand forecasting with right censored observations.We also show that simulation is a useful way to verify probability result which is difficult to be proved by using classical statistical theory and methods.
The developed methods are easy to implement in software packages.Many forecasting techniques have been integrated into enterprise software packages such as management information systems, enterprise resources planning systems, decision support systems.The proposed forecasting techniques in this paper are simple and easily implemented in enterprise software packages.

Z
 .Therefore bias can be reduced if parameter of the exponential curve is set by solving

Hypothesis 2
indicates that the relative estimation bias of the PL estimator at point   m Z is statistically smaller than that at point : n n Z  .Therefore bias can be reduced if parameter of the exponential curve is set by solving instead of , constant for 1 b a  , and increasing 1 b a  .The scale factor u in is adjusted in such a way so that the expected stockout probability (ESP)  G y

Figure 1 .
Figure 1.Comparison of the eight completion methods for the PL estimator under each of four cases of hazard rate.Statistical results of  

Table 2
, it is observed that the relative estimation bias of the PL estimator of point Z

Table 4 .
Statistical results of  