Statistical Inference in Generalized Linear Mixed Models by Joint Modelling Mean and Covariance of Non-Normal Random Effects

Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and variance-covariance identity matrix. In this paper, we propose to release random effects to non-normal distributions and discuss how to model the mean and covariance structures in GLMMs simultaneously. Parameter estimation is solved by using Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm very well in terms of accuracy and stabilization, which is demonstrated by real binary salamander mating data analysis and simulation studies.

One common idea in above literatures is random effects in GLMMs, which represent latent effects between individuals, follow normal distribution with mean zero and identity covariance matrix.However, these assumptions are not always valid in practice because individual has its own special natures and may not be normal distributed.To address this issue, multivariate t distribution and multivariate mixture normal distribution will be assumed for random effects in GLMMs.Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm solves the high-dimensional integration problem in the marginal likelihood under the non-normal assumptions.
Review of GLMMs and the marginal Quasi-likelihood will be proposed in Section 2. QMC approximation, the simplest Good Point set (square root sequence) and analytical form of maximum likelihood estimates (MLEs) in GLMMs will be discussed in Section 3. The idea of modified cholesky decomposition and joint modelling of mean and covariance structure will be explained in Section 4. In Section 5, the salamander data will be analyzed as an example under the assumption of multivariate t distribution of random effects, then simulation study with same design protocol as salamander data followed in Section 6. Discussions on the related issues and further studies are given in Section 7.

Generalized Linear Mixed Models
The generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor of a conditionally independent exponential family model (McCulloch, 2003, [9]).

( )
( ) ( ) ( ) where i Y denotes the observed response, 1, 2, , i n =  , i ′ x is the th i row of design matrix; β is 1 p × fixed effect parameter; i ′ z is the th i row of the second design matrix; b is 1 q × random effect.
In this definition we see the usual ingredients of a generalized linear model.First, the distribution of i Y from an exponential family (in this case the distribution is assumed to hold conditional on the random effect b ).
Second, a link function, ( ) .g is applied to the conditional mean of i Y given b to obtain the conditional linear predictor i η b .When i η b is identical to the i ϑ , GLMMs is said to have canonical to link function.
Finally, the linear predictor is assumed to consist of two components, the fixed effects portion, described by i β ′ x and random effects portion, i ′ z b , for which a distribution is assigned to b .e.g. if b is said to follow a q-dimensional t distribution, denote by ( ) , and its density function is given by Y. Chen et al.

570
with the mean of b is ( ) and the covariance matrix of b is ( ) respectively, where µ is a 1 q × vector, Σ is a q q × symmetric positive definite matrix, ν is the degree of freedom.If b is said to follow a q-dimensional mixture normal distribution F ( ) ( ) ( ) with mean 1 0 q× , covariance matrix ( ) determinates the conditional log quasi-likelihood of β and θ given b (Breslow and Clayton, 1993, [1]).

( )
; It is extremely challenging to obtain the maximum likelihood estimates (MLEs), ( ) β θ , which maximize (3) or (4).Although the integration involves an analytical expression, the computational problems are magnified when the specified model contains a large number of random effects and random effects have a crossed designed according to data description.

Quasi-Monte Carlo Integration
Quasi-Monte Carlo (QMC) sequences are a deterministic alternative to Monte Carlo (MC) sequences (Niederreiter, 1992, [11]).For the q-dimensional unit cube domain should be selected to minimize the error of the integral quadrature.If the integrand f has finite variation, then the error of the quasi-Monte Carlo quadrature can be bounded using the Koksma-Hlawka inequality where ( ) V f is variation of ( ) f x in the sense of Hardy and Krause (Fang and Wang, 1994, [12]) and ( ) is the star discrepancy of sample points, which is a measure of uniformity of distribution of a finite point set { } , , , sup where A is an arbitrary q-dimensional subcube parallel to the coordinate axes and originating at the centre,

( )
V A is its volume, and ( ) m A is the number of sample points inside this subcube.For carefully selected sample points, the discrepancy and consequently the error of low-discrepancy sequ-ences (Niederreiter, 1992, [11]) can be in the order of ( ) ( ) , which is much better than ( ) the probabilistic error bound of Monte Carlo methods (Fang and Wang, 1994, [12]).Moreover, quasi-Monte Carlo methods guarantee this accuracy in a deterministic way, unlike Monte Carlo methods where the error bound is also probabilistic.
In the QMC approach, there exist precise construction algorithms for generating the required points.These algorithms can be divided into two families, the Lattice rule and Digital-net.Only the good point set, which is belong to lattice rule (Fang and Wang, 1994, [12]) will be used and discussed in following sections.

The Good Point (GP) Set
A good point , , , , where , , where { } y represents the fractional part of the value y.In practice, the following forms of the good point ν are recommended as square root sequence , , , where i p , 1 i q ≤ ≤ , is a series of prime numbers, for example, the first q primes if i j p p i j ≠ ≠ .

MLE in GLMMs by QMC Estimation
When QMC method applied to marginal quasi-likelihood function (4) ( ) ( ) ( ) Let ( ) . h is the inverse function of ( ) .g .The global maximum of ( ) with respect to β and θ is the solution of the score equations where , k w is a weight for the kth point in the ith subject and it has the following form ( ) ( ) The weight k w given in ( 9) is a function of β and θ , i.e.
( ) w w β θ ≡ , which must be taken into account when calculating the second-order derivatives.
The score Equations ( 7) and ( 8) in general have no analytical solutions for β and θ .A numeral solution is given by Pan and Thompson (2007) [7], which used Newton-Raphson algorithm.Σ , and joint modelling of mean and covariance matrix were published by Pourahmadi (1999Pourahmadi ( , 2000) ) [13] [14], to obtain a statistically meaningful unconstrained parameterization of covariance.The remarkable advantages are: 1) it guarantees covariance matrix is positive definite; 2) it reduces the number of parameters so that it makes computation efficient; 3) it has very clear statistical interpretation.The main idea of modified cholesky decomposition is that a q q × symmetric matrix It offers a simple unconstrained and statistically meaningful reparametrization of the covariance matrix.So ( ) . T and D are easy to compute and interpret statistically: the below-diagonal entries of T are the negatives of the autoregressive coefficients (ACs), jk φ , in the autoregressive model.

(
) where j b is a 1 1 × number, which index the jth dimension of random effect b.That is, the linear least squares predictor of j b based on its predecessors  (11).Obviously we have ( ) The linear joint models of the autoregressive coefficients (ACs) and the logarithms of innovation variances (IVs) are covariates, and γ and λ are low-dimensional parameter vectors.The choice of covariate vectors jk c and j h are flexible in some senses.

Salamander Data Analysis
The famous salamander interbreeding data set was published by McCullagh and Nelder (1989), which came from three repeated same protocol design experiments.The salamanders originally lived in two geographically isolated populations, Roughbutt (RB) and Whiteside (WS).Each experiment involved in two closed groups and each group involved 5 females and 5 males from each population.In a fixed period, each female mated with 6 males, 3 from each population.So 60 correlated binary observations were created in each closed group, and totally, 360 binary observations (1 for successful interbreeding, 0 for failed interbreeding) were in three experiments.The primary two objectives of experiment are to study whether the successful mating rate is significant between populations and if heterogeneity between individuals in the mating probability exists.This binary data set is challenging because it is block crossed, correlated, balanced (as each female mated with a total of six males and vice verse) and incomplete (as each female mated with just three out of a possible five males, and vice verse).
In a closed group, 5 females and 5 males from each population involved, so total 20 salamander were indexed by i or j.Denoting by ij y the outcome for the mating of the ith female with the jth male, the logit mixed model for each experiment ( ) where the conditional probability is used to model the correlated binary responses.
( ) x ws ws ws ws 1 f i ws = if the th i female comes from WS, otherwise 0. Similarly, 1 m j ws = if the th j male comes from WS, otherwise 0.
Here the random effect is not assumed as normal distribution but multivariate t distribution with degree of freedom 3, 7,10,15 . Note that each experiment includes two closed groups and involves 40 salamanders so that the log-likelihood for each experiment involves a 40-dimensional integrals.The dimensionality for each experiment can be further reduced to the sum of two 20-dimensional integrals due to the block design of two closed groups, see Karim and Zeger (1992) [4] and Shun (1997) [15] for more details about the design.When the three experiments data are pooled as in model ( 13), the log-likelihood  is a sum of six 20-dimensional integrals so that the MLEs of the fixed effects β and variance components θ become extremely difficult to obtain.
Modelling b Σ for salamander mating data, 20 q = , * 1 4 q = and * 2 4 q = .The autoregressive coefficients (ACs) model is reordered as ( ) where ( ) describes the variance of the th j sala- mander ( ) where Gender 1 j = , if the th j salamander is female, otherwise 0. Similarly, District 1 j = if the th j sala- mander comes from WS, otherwise 0.
In the logarithms of innovation variances (IVs) model, originally there were 4 covariates including the item Gender District j j × with interaction parameter 4 λ .But the results showed that 4 λ approximately equals zero and has smaller order of magnitude than 2 λ and 3 λ .The hypothesis test also showed that there is no evidence to reject 4 0 λ = .
The salamander data is now analyzed by using QMC approach to calculate the MLEs of the fixed effects and variance components involved in the model.Specifically, we implement the simplest QMC point, square root sequences to estimate the parameters for the modelling of the pooled data.A set of 20-dimensional points on the unit cube were generated for the six 20-dimensional integrals.Then the points are modified through using transformation, ( ) Σ .This reduction of number of parameters makes the computation easier.Note that, the higher the dimension q, the more significant this advantage.Third, the modified Cholesky decomposition confirms that unconstrained parameters lead to strict condition that is the b Σ is a symmetric positive definite matrix.
In the calculations, the convergence accuracy is set to be The numerical results from Tables 1-4 show that for each of the degree of freedom, QMC method make the maximum log-likelihood similar when increasing the number of QMC points from 10,000 to 100,000 which indicates any number of points between these could prove reasonable estimate of the parameters.In addition, the parameter estimates using these sequences become stable quickly.Bearing in mind that the integral space is   , it is still small in such a space.Nevertheless, the numerical results show that the QMC points approximation based estimation in the GLMMs performs very well in terms of accuracy and stabilization.When using 100,000 QMC points, our Fortran code takes about 50 minutes to obtain the results.Furthermore, the convergence of our algorithm is usually made between the 3th and 5th iterations.On the other hand, increasing the number of points may lead to less iterations.
For mean model, the estimators for fixed effects, β are similar when K increases over 30,000.For autoregressive coefficients (ACs) model, the estimations of γ , including 1 γ , 2 γ , 3 γ and 4 γ are very close to zero.For logarithms of innovation variances (IVs), only 1 γ is not close to zero.It needs to be noted that some of standard deviations are 0.0000 for γ and λ , which means those values are less than 7 10 − .A criteria easy to be executed to decide which covariate is significant is if the estimator is greater than twice of standard error.So the four fixed effect parameters of β are significant; four parameters for γ are not; 1 λ is significant, but 2 λ and 3 λ are not.The conclusion is that 1 γ , 2 γ , 3 γ , 4 γ 2 λ and 3 λ can be regarded as zero.In other words, T is an identity matrix and D is a diagonal matrix with the same element.So , it can be said that every dimension of random effect of each salamander is independent of others.Also, the effects of female and male salamanders are independent and no heterogeneity between female and male.
Another point is why the random effect is not assumed as multivariate mixture normal distribution for the salamander data because Newton-Raphson algorithm does not achieve convergency until the mixture normal distribution degenerates to normal distribution.

Simulation
We conducted a simulation study to assess the efficiency of the QMC estimation in the GLMMs, in particular, to assess the performance of the GP set point when the distribution of random effect is multivariate t distribution (1) and multivariate mixture normal distribution (2).Based on the logistic model in (13), we simulate the salamander pooling data, which have 360 correlated binary observations.In the simulation study, we run 100 simulations.The same protocol design as the salamander mating experiment is adopted for the simulated data and the log-likelihood function for each simulated data set thus involves six 20-dimensional integrals that are analytically intractable.
When using the QMC approximation, we generate 50000 K = integration nodes on the cube 20 C using the square roots of the first 20 prime numbers (the GP set method).Then, we use Equation (6) to approximate the integrated log-likelihood when random effect followed multivariate t distribution.When random effect followed multivariate mixture normal distribution, the log-likelihood changes to The Newton-Raphson algorithm ( 10) is used to maximize the approximated log-likelihood function.The true values are chosen to be (1.20,−2.80, −1.00, 3.60) for the fixed effects, ( ) for ACs parameters and ( ) 2, 1, 0.5 − for IVs parameters.ACs parameters and IVs parameters build variance components.The starting values for the algorithm are chosen to be the estimators as for the real data analysis.The convergence criterion is set to be 2 10 − for the successive iterated values.The algorithm stops after 20 iterations and is considered to be non-convergent.The simulation studies are conducted and coded in FORTRAN language and this programme was run on a PC Pentium (R) 4 PC (CPU 3.20 GHz).A summary of the simulation results is presented in Table 5 for multivariate t distribution and Table 6 for multivariate mixture normal distribution.The results given are the average parameter estimates based on 100 replications and their standard errors.
It is clear that estimation by modified Cholesky decomposition with QMC approach is able to produce reasonably good results in GLMMs when the random effect is not normal distribution.When the random effects followed by multivariate t distribution (Table 5), the estimates of regression fixed effects and variance components are those on average of 100 simulation, which has little bias.That means the method of modified Cholesky decomposition of covariance modelling with QMC approach is quite useful and effective.When the random effects followed by multivariate mixture normal distribution (Table 6), the estimates of regression fixed effects are acceptable although bigger bias, especially for fixed effects.The estimates of variance components are quite accurate.That is, when random effects change to two peaks distribution, the accuracy need to be enhanced.

Conclusion and Discussion
It is quite common in the literatures that random effects are independent identically distributed normal variables in generalized linear models (GLMMs).In longitudinal study, the covariance-variance matrix of random effects for individuals are usually assumed to be homogeneous.However, this assumption may not be valid in practice.
In this article, the assumption extends to random effects follow multivariate t and mixture normal distribution.Approximation of marginal quasi-likelihood and parameter estimation in GLMMs is very difficult and challenging because they involve integral on random effects, especially for high-dimensional problems.
In this article, we have used Quasi-Monte Carlo (QMC) sequence approximation to calculate the maximum likelihood estimates and marginal log quasi-likelihood in GLMMs.The key idea of QMC sequences is to generate integration points which are scatted uniformly in the integration domain.The good point, one simple QMC point, is used through this article.Newton-Raphson (NR) algorithm is the iterative method to obtain optimum.QMC-NR method converges very quickly and performs very well in terms of accuracy and stabilization.In longitudinal studies, the covariance structure plays a crucial role in statistical inferences.The correct modelling of covariance structure improves the efficiencies of the mean parameters and provides much more reliable estimates (Ye and Pan, 2006, [16]).The joint modelling (JM) for mean and for covariance-variance components for random effects reduces the number of parameters and has a clear statistical meaning.QMC-NR method with joint modelling can be called as QMC-NR-JM method.Although QMC-NR-JM method performs well in stability and produces accurate estimation of the parameters in GLMMs, there is no way to avoid intensive computation in real data analysis and simulation study.Particularly, the binary cross-designed salamander dataset, involving six 20-dimensional integrals in the log quasi-likelihood function, has been analyzed and the same protocol design is adopted for simulations.
Note that a practical issue is how to select of the number of integration points?My solution is increasing the number of QMC points gradually until the MLEs become stable.It correspondingly leads to increasing the computational time and efforts.This problem is quite obvious for simulation study, especially when the number of parameters increases.Another worth noting issue is that a range of starting values for the fixed effects and variance components have been tried and we find that the different starting values almost can't change the results when algorithm reaches convergency.
In the QMC-NR-JM approximation method, the estimators of random effects cannot be calculated at the same time when estimating MLEs for fixed effects and variance components.Pan and Thompson (2007) [7] proposed an iterative method to predict of the random effects, implying that an initial value of b substituted into the equation to yield a new prediction of b.This process is then repeated until the convergence of b , but this method cannot guarantee convergence for arbitrary data.So how to estimate the random effects is still a open question in QMC-NR-JM method.
is Hessian matrix of log likelihood.
The second-order derivatives of log likelihood

Appendix B: MLEs for Covariance Modeling
The first and second derivative of b Σ respect to γ and λ are: ( ) ( ) ( ) ( ) ( ) * Φ is introduced for the purpose of calculation for ( )  j q i i i j i q j q q q q q q q q q q q j q c c c We can put the th j column of C into a q q × lower triangular matrix like * Φ .So we have * 1 q q q × matrices.The element , The above relationship is a one-to-one correspondence between i and ( ) The first derivative can be calculated as as the entries below diagonal.
The second derivative can be calculated as where ( ) ( ) ( ) , , 1 1 2 2 j j j j j q q j q j h h ( ) ( ) ( ) ( ) ( ) ( ) root of b Σ , for example, it can be taken as the Cholesky decomposition of b Σ or eigenvalue-eigenvector decomposition.

bΣ
is positive definite if and only if there exist a unique unit lower triangular matrix T, with 1 as diagonal entries, and unique diagonal matrix D with positive diagonal entries such that On the other hand, the diagonal entries of D are the innovation variances (IVs), come from different districts (populations), otherwise 0. The logarithms of innovation variances (IVs) model is ).Fis the cumulative distribution function of the multi-Σ .After that, we approxi- mate the integrated log-likelihood, then we use the Newton-Raphson algorithm(10) to maximize the approximated log-likelihood function.The covariance modelling for ( ) Cov b b Σ = , the covariance matrix of random effect, uses 4 parameters for autoregressive coefficients of i b and j b , 1 , 20 i j ≤ ≤ , (i and j index the dimension of b while i b means the th i salamander's latent effect) and plus 3 parameters for innovation variances of ĵ j =  .Using this joint modelling method to analyze the real world data can estimate b Σ more to the true one because of no factitious assumption on it, instead, only the data tells us what is b Σ .Another advantage is the parameter is reduced from looks similar to the matrix T, but opposite sign for the lower triangular elements.

1 T
− is a lower triangular matrix with 1's as diagonal entries, so that

Table 1 .
MLEs of the parameters by covariance modelling for the pooled salamander data using square root sequence points when varying K, the number of square root sequence points (standard errors in parentheses) (random effect-multivariate 3 t ).

Table 2 .
MLEs of the parameters by covariance modelling for the pooled salamander data using square root sequence points when varying K, the number of square root sequence points (standard errors in parentheses) (random effect-multivariate 7 t ).

Table 3 .
MLEs of the parameters by covariance modelling for the pooled salamander data using square root sequence points when varying K, the number of square root sequence points (standard errors in parentheses) (random effect-multivariate 10 t ).

Table 4 .
MLEs of the parameters by covariance modelling for the pooled salamander data using square root sequence points when varying K, the number of square root sequence points (standard errors in parentheses) (random effect-multivariate 15 t ).

Table 5 .
The average of the 100 parameter estimates in the simulation study, where the QMC approximation uses 50000 K = points implemented using the gp set for random effect of multivariate 3 t , 7 t , 10 t and 15 t distribution (si- mulated standard errors are given in parentheses).

Table 6 .
The average of the 100 parameter estimates in the simulation study, where the QMC approximation uses