On the Convergence of Observed Partial Likelihood under Incomplete Data with Two Class Possibilities

In this paper, we discuss the theoretical validity of the observed partial likelihood (OPL) constructed in a Coxtype model under incomplete data with two class possibilities, such as missing binary covariates, a cure-mixture model or doubly censored data. A main result is establishing the asymptotic convergence of the OPL. To reach this result, as it is difficult to apply some standard tools in the survival analysis, we develop tools for weak convergence based on partial-sum processes. The result of the asymptotic convergence shown here indicates that a suitable order of the number of Monte Carlo trials is less than the square of the sample size. In addition, using numerical examples, we investigate how the asymptotic properties discussed here behave in a finite sample.


Introduction
Although the Cox model [1] is a standard tool for the analysis of time-to-event data, in practice analysts are often confronted with some problems in handling incomplete data beyond the right-censored form, such as interval-censored data [2], missing covariates [3][4][5] or (statistical) structural modelling.The inference for the Cox model under such cases of incomplete data can usually be performed based on the semiparametric profile likelihood (SPFL) [6] as a generalization of Cox's partial likelihood [7].On the other hand, as a substitute for the SPFL method, one can analyse the same data using the imputation method, which yields a sum of partial likelihoods.By describing the sum of all possible partial likelihoods more exactly, we can formulate the marginal of partial likelihoods [8], that is, the observed partial likelihood (OPL).In this area, the theory for the SPFL has been studied by many authors (e.g., [6,9]).However, to the best of our knowledge, there has not been much development in mathematical theory for the OPL.
In this paper, we discuss the theoretical validity of the OPL which appears in a Cox-type model under incomplete data beyond the right-censored form.In this area, one advantage of the OPL is that the baseline hazard function as a nuisance included in a Cox-type model is eliminated completely in the inferential likelihood.This yields a more stable computational system for optimization than that of the SPFL.For example, in a Cox-cure model, a computational process based the EM algorithm to obtain the SPFL easily fails to converge if a suitable starting value is not provided (e.g., see [10]).The main disadvantages of the OPL are, for instance, that a great length of time is required for the exact computation and it is not clear how much the amount of computation can be reduced by the Monte Carlo (MC) method.However, even if the feasible number of MC trials is smaller than desirable to approximate the OPL, and hence the MC approximation is quite rough, it may be sufficient for a starting value in the computational process of the SPFL.
Generally, it is difficult to investigate computationally to what extent the MC approximations of the OPL are valid, since the exact computation requires a huge number of summands, as the sample size and incomplete information of data are increasing.For this reason, it is worth studying the OPL theoretically.However, it is not easy to complete such a study in one go, because standard tools to study asymptotic properties of Cox's partial likelihood or the SPFL cannot be applied directly to an objective of the OPL.Therefore in this paper, for the sake of simplicity, we focus on the OPL constructed in incomplete data composed of unobserved two class labels.Typical cases of this type occur in a Cox-type model with incomplete data, such as missing binary covariates, a cure-mixture model or doubly censored data.As a main result, we establish the asymptotic convergence of the OPL and derive a limit form of the OPL.This result is a foundation or precondition for applying an infinite-dimensional Laplace approximation for integral on the baseline hazard.Such a Laplace approximation method will yield the other limit form of the OPL [11], which is useful in discussing the consistency and asymptotic normality of the estimators.However, the method is not convenient for showing the convergence of the OPL.For these reasons, it is also valuable to discuss the convergence of the OPL using the arguments employed in this paper.
A matter of interest in practice concerns MC approximations of the OPL.One other significant point is that the result for the convergence of the exact OPL can be easily tailored to the context of the MC approximations.Based on such an argument, we show that a suitable order of the number of MC trials is less than O(n 2 ) Further, in Section 4 we investigate how the asymptotic properties discussed here behave in a finite sample.
In Section 2 we formulate the OPL in incomplete data with two class possibilities, providing several examples of interest; in Section 3 we develop the tools to obtain the main result and show the convergence of the OPL, and in Section 4 we discuss the performances of MC approximations.)

Notations and Motivated Examples
( ) In the case that i q * expresses the difference between models, assume that the distribution of i T * follows the proportional hazards model formulated as from the population of the class  , and β is the regression As usual, the information on ( ) T ,∆ can be re-expressed using the counting processes ( ) ( ) and at-risk processes ( ) ( ).
In this paper, we consider incomplete data where some of the i q * 's are treated as missing.Let Each of these is used to construct the likelihoods.Further, if the event of { } i q * =  can be expressed by a probability, we use the following notation and assumption where i X is the covariate vector ( ) q * and α is the regression coefficient vector For simplicity, we will write ( ) ( )  be the collection of true i q * 's.In many cases of incomplete data with two class possi- bilities, the observed full likelihood (OFL) can be generally written as with the elements such that ( ) where the space { } 0 1 n , denotes the collection of all the vectors composed of 0 or 1 with length n, ( ) is the survival function of the -th i individual belonging to the class ,  ( ) ( ) is the cumulative baseline hazard function, and s is given by either of − in advance.In the following three examples, we show how the form of the OFL is related to the representative cases.Hereafter, we will often omit θ when it is clear that a function depends on , θ e.g. ( ) and so on.
Example: Missing Binary Covariates.Let us assume that ( ) ( ) ( ) ( ) For example, the first covariate is binary and may be missing, ( ) Then, we can write In this case, the OFL is ) ( ) Using the binomial expansion, this can be rewritten as

2)
Example: Cox Cure-Mixture Model.The Cox cure-mixture model [10,[12][13][14][15] is presumed to hold the proportional hazards model for uncured individuals and to be zero-hazard for cured ones.That is, we assume ( ) ( ) We observe that ( ) otherwise.The OFL is usually note here that ( ) ( ) t Then, (2.3) can be rewritten in the same form as (2.2).
Example: Doubly Censored Data.In doubly censored data [16,17], left-censored data may be included.Let i q *  indicate whether the -th i observation is left-censored or not, the OFL is then In the phenomenal meaning, the common model is assumed regardless of the type of observations, but we do not define i q * as .
such that i q * designates the type of model rather than an observation.Under this rule, because we have ( ) ( ) 0 0 1 i S t;Λ = for all , t (2.4) can be expressed as where note that i q * is defined as missing data (i.e.
observation is left-censored, and as complete data of

Observed Partial Likelihood
Let [ ] n ⋅ R be the integral operator proposed by [18] to derive the partial likelihood in the Cox model without time-dependent covariates.Without loss of generality, let us suppose that there are no ties.By operating we have the OPL , ∑ q q q  s where To discuss an asymptotic form of ( ) , p n θ  we will prepare some convenient expressions.First, to pack the expression of ( ) is an empirical version of the theoretical expectation [ ] Further, as an impor- tant definition, let ν ∞ Then, using these notations, ( ) ( ) where τ is the greatest follow-up time and The quantity of 1 2 is the total number that the same In addition, we define β * and 0 * Λ are the true β and 0 .Λ In the case of ( ) such as Cox cure-mixture model or doubly censored data, we have .

Σq
Thus, it may potentially be difficult to apply some of the standard tools in the survival analysis to the asymptotic discussion.For these reasons, our strategy to obtain a limit of ( ) ( ) We will then derive the result of a weak convergence on

Convergence of the Observed Partial Likelihood
We will now discuss how the mean of the log OPL converges to a deterministic function and provide Theorem 1 of the main result.The following conditions are assumed for these discussions.
Conditions A. Let Θ be a compact set of θ which includes ( ) where θ * and α * are the true θ and .
α The true baseline function ( ) where as a matter of course.However, in the case that there are no ( ) ( ) θ ∈Θ Theorem 1 is proved in Section 3.3.We prepare useful tools for such a proof in Sections 3.1 and 3.2 below.In Section 3.1, we discuss a relation needed to show that two OPL's converge to the same limit, determining a plan (Lemmas 1 and 2) to obtain Theorem 1.In Section 3.2, following the plan, we provide a tool (Lemma 3) to give a weak convergence of all possible partial-sum processes.

Relations between Two Observed Partial Likelihoods
Note that the OPL is constructed by an integral on { } 0 1 n , with the measure .n ν Thus, to give two OPL's with the same limit, it is predicted that the difference between the integrands of two OPL's should converge weakly to zeros on { } for example, by analogy of the dominated convergence theorem.Let for simplicity.Then, we have the following lemma about some where as → denotes almost sure convergence.(Proof of Lemma 1).Using Taylor expansions of ( ) { } ( ) where, with some ( ) ( ) ( ) ( ) without loss of generality.Then, { } Therefore, we have


we can investigate whether they converge to the same limit.The important problem is how to show the condition (3.1).For this purpose, we make the use of meaning that a convergence in -probability We have the following lemma to establish the condition (3.1).

Lemma 2. Suppose that
where p → denotes convergence in probability.Then, (3.1) is established.
Remark 2. For simplicity, letting as the area of ( ) . We can immediately show that (3.2) provides a version of convergence in probability of (3.1), i.e.
( ) because it is always satisfied that q q q q q q Thus, we show that the operators of lim and Pr are mutually exchangeable in (3.3) in a proof of Lemma 2.  which can be eventually written as The dominated convergence theorem provides This shows ( ) Pr lim E lim d Therefore, condition (3.2) gives (3.1).

All Possible Partial-Sum Processes
Note that, by the portmanteau theorem, (3.2) is equivalent to, as n → ∞, where D → denotes convergence in distribution.In this section, we develop a tool to show such a weak convergence on An important key to obtaining (3.2) or a limit of ( ) ( ) , and As representations of more essential terms to consider in the convergence on and where examples of ( ) , can be regarded as ( ) is the empirical version of the conditional expectation on a unique * q but is calculated letting q be fixed.Let which is the conditional expectation of ( ) i Y t θ ; on given .i q * Although i q * is treated as unknown in many incomplete problems, this is not the case for terms included in the observed partial likelihood.Hence, for a fixed ), and ( ) ( ) ), )

∑ q s
Example: Missing Binary Covariates.For simplicity, we assume that while the expectations of terms which may form all possible partial-sums in ( ) ( ) , ; q  and ( ) ( ) In these calculi, note that the Bayes rule is used, such as

∫ ∫
Example: Cox Cure-Mixture Model.In this model, i i C = ∆ is usually assumed.The expectations of terms which may form all possible partial-sums are In our application, note that centred  are zero-means and mutually independent but are not sampled from an identical distribution.
We will therefore discuss the partial-sum processes about ( ) In advance, let by Chebyshev's inequality, we immediately have However, a result of interest here is whether Pr and sup q can be exchanged, that is, about Incidentally, we cannot obtain the almost sure convergence in this problem, since (iii) The following examples show that the conditions needed in Lemma 3 are satisfied for ( ) , , , Conditions (i) and (iii) are clearly satisfied.
Conditions (i) and (ii) are shown by Conditions A1 and A4.Arbitrary ( ) by Condition A3, where ⋅ means the Euclidean norm for vectors.Hence, Condition (iii) is satisfied.

Proof of Theorem 1
Consider , and in which the random quantities are reduced less than that of ( ) ( ),

∫
∫ q s q s q q s q q s q s q s q s q by Conditions A1 and A4, similar to the standard Cox model (see [19]).Thus, ; − ; → q q s q  using Lemma 3 in Example 1 and applying the strong law of large numbers (SLLN) to ( ) ( ) by Conditions A1 and A5.For the latter application, note that ( ) , ; − , ; → q q s q  using Lemma 3 in Example 2 and the continuous mapping theorem about log-function.For the latter application, note that ( ) ( ) Applying the above three results to Lemmas 1 and 2, therefore, we obtain as as as sup ( ) 0, sup 0 and sup 0, Although (3.4) shows that the limit of ( ) ( ) p n θ  still depends on n and i q * 's.We will therefore investigate the limit form of ( ) ( ) † p n θ  further.In discussing a convergence about the form ( ) ( ) s of the partial sums, note that they can be written as Similarly to Lemma 3 (proof of s2), we show that ( ) In particular, because of (

E E E Pr
. We have the following lemma.( ) ( )

∞ ∈ , q
A proof of Lemma 4 is provided briefly in Appendix A.2 since it is similar to Lemma 3. Now, applying Lemma , ; , ; ; s q s q s q we obtain their limits as ; − ; q q q    converges in probability to zero as n → ∞ .
Hence, using Lemmas 1 and 2, we can show ∫ ∫ q q q q q q , so that, via the general binomial theorem, we can show that ∫ ∫ q q q q q q Also, ( ) and similar to the derivation of the -norm ∞  on the Banach space which results in the essential supremum, we conclude In addition, (3.6) is derived in the case of 1 = s .Results (3.5) and (3.6) show that Theorem 1 is complete. A limit function of ( ) ( ) is concretely provided by (3.6), which is summarized as follows.
Corollary 1.If Theorem 1 holds, then a limit expression to which ( ) ( ) using the delta method.Now consider the other aspect of (4.2) under 1. p = Applying Theorem 1 and Corol- lary 1 to such a problem, we obtain the following results where ( ) 2) under 0. p = That is, using the MC method, a computational load of (2 ) n O needed in the exact computation can be reduced to one of at most 2 ( ).O n

Numerical Examples
We will investigate two circumstances in the finite samples using the Cox cure-mixture model.One is how a relation such as (3.6) obtained as n → ∞ is located in the finite samples.The other is to observe numerically the practical size of the error in MC approximations, Under these settings, the simulated means of cure and censored rate are about 48% and 58%, respectively.We generate 100 pairs of simulated data of size n.We perform m MC approximations for each simulated data set.Let

Concluding Remarks
A main result of this paper was to show the almost sure convergence of the OPL constructed in incomplete data with two class possibilities.To obtain this result, we discussed the principle of formulating this type of structure of the OPL, and then developed the tools based on a partial-sum processes The limit function of the OPL resulting finally (Corollary 1) is the essential supremum of partial likelihoods obtained based on all the forms of complete data included in incomplete data, which is similar to -norm ∞  on a Banach space.In Section 4.2, we showed numerically how an essential supremum approximates the OPL in real data for the Cox cure-mixture model.
Unfortunately, it will be difficult to show consistency and asymptotic normality of the maximum OPL estimator (MOPLE) using the limit function of the OPL provided in Corollary 1.However, if the consistency is achieved (as almost expected), the global essential maximum will be accomplished around true complete data under a true regression parameter.On the other hand, for the purpose of showing the consistency of the MOPLE, there will be other convenient limit expressions, although not discussed in this paper.A future paper on this topic is based on an infinite-dimensional Laplace approximation for integral on the baseline hazard function [11].However, in applying such a Laplace approximation to the OPL, a precondition that the OPL converges to a deterministic function is necessary.Hence, in order to obtain this precondition and for the reason that it is generally difficult to show the convergence result directly using the Laplace approximation, it is meaningful to discuss the asymptotic convergence of the OPL using the argument employed in this paper.
The results on the convergence of the exact OPL could easily suit the context of MC approximations.For example, at the end of Section 4.1 we show that, by applying Theorem 1 and Corollary 1, the size of the MC error is less than In future study, it is important to derive the other expression of the limit function based on an infinite-dimensional Laplace approximation for integral on the baseline hazard and then to discuss the consistency and asymptotic normality of the MOPLE, since the asymptotic convergence of the OPL is given in this paper.Further, it is an interesting issue how the discussion of the OPL of the binary class as considered here could be extended to that under continuous class possibilities, such as the Cox frailty model.∑ q q q q q Thus, (A.1) is the weak convergence result under ( ) τ ,q fixed at one point.
Once the result of pointwise convergence at each point is obtained, by the - ε δ method we can show how the -variate and max max , it is then satisfied that ( ) .

∫ q q
To consider the number of partitions of D q with -band, ε let , 1 2 1 l g l ε = , , ,  be representative elements such that

(
be the observed survival time and right-censoring indicator of the indicator function.Suppose that the individuals possess some difference between models or observations identified by the two classes.We define such a class variable by 1 if the -th individual belongs to class 1 0 if the -th individual belongs to class 0 i q *  Here we use the rule ( )

Remark 1 .
In (2.5), even if we consider a difference or quotient between

Theorem 1 .
censored data, such conditions on α are omitted because ( ) ( ) Suppose that Conditions A are satisfied.Then, as ,

X
shown to be less than Simulated Data: For the second purpose, we prepare simulated data with follows the standard uniform distribution.The latent distribution of i T * is standard exponential and the censoring follows a uniform distribution [0 3 65]., .

Figure 2 .Figure 3 .j
Figure 2. Plots of averages (polygonal lines) and SEs (horizontal whiskers) of × , , is totally bounded.(Proof of s11).In the proof of s2, we show that bounded.Because this boundedness is equivalent to the compact property in , surely by a compact set.
of elements of ( ).


bounded by these sums is also so.(Proof of s13).Here we show that [ ] the collection of all the functions such that 0 or 1 at each n partition point i n on [ ] measure µ on [ ] expressed as the 1 L -norm in D q , ) implied by the exact method up to a feasible level.Further, in Section 4.2 we performed numerical experiments to investigate the practical size of the error in MC approximations using the Cox cure-mixture model.These experiments indicate that the exact OPL may be sufficiently approximated with the number of MC trials smaller than n O