^{1}

^{2}

^{*}

In order to assess causality between binary economic outcomes, we consider the estimation of a bivariate dynamic probit model on panel data that has the particularity to account the initial conditions of the dynamic process. Due to the intractable form of the likelihood function that is a two dimensions integral, we use an approximation method: The adaptative Gauss-Hermite quadrature method. For the accuracy of the method and to reduce computing time, we derive the gradient of the log-likelihood and the Hessian of the integrand. The estimation method has been implemented using the d1 method of Stata software. We made an empirical validation of our estimation method by applying on simulated data set. We also analyze the impact of the number of quadrature points on the estimations and on the estimation process duration. We then conclude that when exceeding 16 quadrature points on our simulated data set, the relative differences in the estimated coefficients are around 0.01% but the computing time grows up exponentially.

Testing Granger causality has generated a large set of paper in the literature. The larger part of this literature concerns the case where we have continuous dependent variables. For binary outcomes, there is also a way to consider the causality problem. As described by [

For panel data case, as the one way fix effects model estimated on a finite sample has necessarily inconsistent estimators [

This specification leads to a likelihood function with an intractable form that is a two dimensions integral with a large set of parameters to be estimated. The estimation of this likelihood function requires the use of numerical approximation of integral function such as maximum simulated likelihood (see [

The main goal of this paper is to propose and to test a method for estimating a two equations system where the explanatory variables are binary in a panel data framework. To the extent of our knowledge, there is no program to do so, especially as we propose the calculation of the Hessian matrix and the gradient vector of our maximisation program.

In this paper, we discuss on the problem of testing Granger causality with a bivariate dynamic probit model taking into account the initial conditions. The organization of this paper is the following one. In Section 2 we explain the causality test method for bivariate probit model with panel data. In Section 3, we describe the estimation method available when the likelihood function has an intractable form (two dimensions integral in our case). Section 4 presents the calculation of the gradient with respect to the model parameters and the calculation of the Hessian matrix with respect to the random effects vector. In Section 5, we present a robustness analysis of our selected estimation method by doing some simulations^{1}.

This section aims to describe causality test method in the case of binary variables. We start by presenting the general approach in time series before introducing panel data case. We end this section by a discussion on the initial conditions problem.

Causality concept was introduced by [_{t} is causing Y_{t} (if Z_{t} is included in the model it improves the predictability of Y_{t} than if not) from lag causality that means lag values of Z improve the predictability of Y_{t}. In this section, we rule out the instantaneous causality and deal with lag causality of one period.

The one period Granger causality can be rephrase in terms of conditional independence. Without lost of generality, we present the univariate case for time series. Let’s Y_{t} and Z_{t} denote some dependent variables and X_{t} denote a set of controls variables. One period Granger non-causality from Z to Y is the conditional independence of Y_{t} from Z t − 1 conditionally to X_{t} and Y t − 1 . More clearly, Granger non-causality from Z to Y is:

f ( Y t | Y t − 1 , X t , Z t − 1 ) = f ( Y t | Y t − 1 , X t ) (1)

Note that the same kind of relationship can be written for Granger non-causality from Y to Z. As Y_{t} and Z_{t} are binary outcome variables, we can use latent variables ( Y * and Z * respectively) and make the assumption that Y and Z have positive outcomes (equals to 1) if their latent variables are positive. The latent variables are defined as follows:

For the left side of the Equation (1) ( f ( Y t | Y t − 1 , X t , Z t − 1 ) ):

Y t * = X t β 1 + δ 11 Y t − 1 + δ 12 Z t − 1 + ϵ t 1 (2)

Z t * = X t β 2 + δ 21 Y t − 1 + δ 22 Z t − 1 + ϵ t 2 (3)

For the right side of the Equation (1) ( f ( Y t | Y t − 1 , X t ) ):

Y t * = X t β 1 + δ 11 Y t − 1 + ϵ t 1 (4)

Z t * = X t β 2 + δ 21 Z t − 1 + ϵ t 2 (5)

where

( ϵ 1 ϵ 2 ) ⇝ N ( 0, Σ ϵ ) with Σ ϵ = ( 1 ρ ϵ ρ ϵ 1 )

To fit the joint distribution of Y and Z conditionally to X (meaning that we estimate a bivariate model), we need to analyze four available situations that are ( Y = Z = 1 ) , ( Y = Z = 0 ) , ( Y = 1 ; Z = 0 ) and ( Y = 0 ; Z = 1 ) . For each of these situations, we have:

P ( Y t = 1 , Z t = 1 | X t ) = P ( ϵ t 1 > − X t β 1 − δ 11 Y t − 1 − δ 12 Z t − 1 , ϵ t 2 > − X t β 2 − δ 21 Y t − 1 − δ 22 Z t − 1 )

P ( Y t = 0 , Z t = 0 | X t ) = P ( ϵ t 1 < − X t β 1 − δ 11 Y t − 1 − δ 12 Z t − 1 , ϵ t 2 < − X t β 2 − δ 21 Y t − 1 − δ 22 Z t − 1 )

P ( Y t = 1 , Z t = 0 | X t ) = P ( ϵ t 1 > − X t β 1 − δ 11 Y t − 1 − δ 12 Z t − 1 , ϵ t 2 < − X t β 2 − δ 21 Y t − 1 − δ 22 Z t − 1 )

P ( Y t = 0 , Z t = 1 | X t ) = P ( ϵ t 1 < − X t β 1 − δ 11 Y t − 1 − δ 12 Z t − 1 , ϵ t 2 > − X t β 2 − δ 21 Y t − 1 − δ 22 Z t − 1 )

As we can see, by assuming q t 1 = 2 Y t − 1 and q t 2 = 2 Z t − 1 , we can rewrite the probabilities above as:

P ( Y t , Z t | X t ) = Φ 2 ( q t 1 ( X t β 1 + δ 11 Y t − 1 + δ 12 Z t − 1 ) , q t 2 ( X t β 2 + δ 21 Y t − 1 + δ 22 Z t − 1 ) , q t 1 q t 2 ρ ϵ ) (6)

where Φ 2 ( ) stands for the bivariate normal c.d.f.

Then testing Granger non-causality in this specification is testing H 0 : δ 12 = 0 for Z is not causing Y and testing H 0 : δ 21 = 0 for Y is not causing Z.

For panel data case, two major approaches can be used. The first one is to consider that causal effect is not the same for all individuals in the panel ( [

Y i t * = X t 1 β 1 + δ 11 , i Y i , t − 1 + δ 12 , i Z i , t − 1 + η i 1 + ζ i t 1 (7)

Z i t * = X t 2 β 2 + δ 21 , i Y i , t − 1 + δ 22 , i Z i , t − 1 + η i 2 + ζ i t 2 (8)

where ( η i 1 , η i 2 ) ′ denotes the individual random effects which are zero mean and covariance matrix Σ η and ( ζ i t 1 , ζ i t 2 ) ′ denote the idiosyncratic shocks which are zero mean and covariance matrix Σ ζ with

Σ η = ( σ 1 2 σ 1 σ 2 ρ η σ 1 σ 2 ρ η σ 2 2 ) and Σ ζ = ( 1 ρ ζ ρ ζ 1 )

In this approach, testing Granger non-causality is equivalent to test δ 12 , i = 0 , i = 1 , ⋯ , N for Z is not causing Y and to test δ 21 , i = 0 , i = 1 , ⋯ , N for Y is not causing Z.

The second approach (that is used in this paper) is to assume the causal effects, if they exist, are the same for all individuals in the panel. With the same notation that the previous case, the latent variables are:

Y i t * = X t 1 β 1 + δ 11 Y i , t − 1 + δ 12 Z i , t − 1 + η i 1 + ζ i t 1 (9)

Z i t * = X t 2 β 2 + δ 21 Y i , t − 1 + δ 22 Z i , t − 1 + η i 2 + ζ i t 2 (10)

Then testing Granger non-causality is equivalent to test H 0 : δ 12 = 0 for Z is not causing Y and to test H 0 : δ 21 = 0 for Y is not causing Z.

Finally, Equations (9) and (10) are the core of our problem. Since Y and Z are binary panel outcomes and each equation includes lag dependent variables, estimating jointly these two equations can be viewed as the estimation of a bivariate dynamic probit model.

For the first wave of the panel (initial conditions), due to the fact that we do not have data for the previous state on Y and Z (no values for Y i ,0 and Z i ,0 ) we are not able to evaluate P ( Y i 1 , Z i 1 | Y i ,0 , Z i ,0 , X i ) . By ignoring it in the individual likelihood, researchers also ignore the data generation process for the first wage of the panel. This means that they assume the data generating process of the first wave of the panel to be exogenous or to be in equilibrium. These assumptions hold only if the individual random effects are degenerated. If this assumption is not fulfilled, the initial conditions (the first wave of the panel) are explained by the individual random effects and ignoring them leads to inconsistent parameter estimates [

The solution proposed by [

Y i , 1 * = X i 1 γ 1 + λ 11 η i 1 + λ 12 η i 2 + ϵ i 1 (11)

Z i , 1 * = X i 2 γ 2 + λ 21 η i 1 + λ 22 η i 2 + ϵ i 2 (12)

where ( ϵ i 1 , ϵ i 2 ) ′ denotes the vector of idiosyncratic shocks which are zero mean and covariance matrix Σ ϵ with Σ ϵ = ( 1 ρ ϵ ρ ϵ 1 ) .

As η 1 and η 2 are individual random effects respectively on Y and Z, λ 12 and λ 21 can be interpreted as the influence of the Y random individual effects (respectively Z random individual effects) on Z (respectively on Y) at the first wave of the panel.

Due to the fact that the likelihood function has an intractable form (an integral function), it is impossible to estimate this likelihood by usual methods. We then deal with numerical integration methods that are numerical approximation method for an integral. In this section we describe two major methods and argue for one of them to estimate our likelihood function.

The Gauss-Hermite quadrature is a numerical approximation method use to close the value of an integral function. The default approach is related to an univariate integral of the form:

∫ ℝ f ( x ) exp ( − x 2 ) d x (13)

where exp ( − x 2 ) denotes the Gaussian factor^{2}. Then the integral above can be approximated using:

∫ ℝ f ( x ) exp ( − x 2 ) d x = ∑ q = 1 Q w q ∗ f ( x q ) (14)

where x q , q = 1 , ⋯ , Q are nodes from the Hermite polynomial and w q , q = 1 , ⋯ , Q are corresponding weights.

This approximation supposes that the integrand can be well approximated by an 2 Q + 1 order polynomial and that the integrand is sampled on a symmetric range centered on zero. So, for suitable results, these two assumptions must be taken into account.

We first assume that finding the optimal number of quadrature points can be achieved numerically. For the accuracy of the approximation, it is required to choose the optimal number of quadrature points. To do this, one can start with a number q ¯ of quadrature points and increase it to assess if it significantly changes the result, and repeat this process until convergence in terms of overall likelihood value variation and estimated coefficients variation. But, it is also important to take into account the fact that increasing number of quadrature points also increases the computing time. An example of the impact of number of quadrature points on estimated results is given in Section 5.

For the problem of suitable sampling range, the solution of using the adaptative Gauss-Hermite quadrature was proposed by [^{2}. That means (see [

∫ ℝ f ( x ) d x = ∑ q = 1 Q w q ∗ f ( x q * ) (15)

Then the sampling range is transformed and the new nodes are x q * = μ + 2 σ x q and weights are w q * = 2 σ w q exp ( x q 2 ) . For [^{2}. For the implementation, we can start with μ = 0 and σ = 1 and at each iteration of the likelihood maximization process, calculate the posterior weighted mean and variance of the quadrature points and use them to calculate the nodes and weights for the next iteration. For [

σ = ( − ∂ 2 ∂ x 2 log ( f ( x ) ) | x = x ^ ) − 1 / 2 (16)

For the multivariate integral case, the same approach is used. Without lost of generality, we discuss the bivariate case that can be apply to others multivariate cases. The function to approximate is written as follows:

∫ ℝ 2 f ( x , y ) d x d y (17)

With the assumption of independence between x and y (that can be overcome by using a Cholesky decomposition x ′ = x and y ′ = ρ x ′ + y , see [

∫ ℝ 2 f ( x , y ) d x d y = ∑ q 1 = 1 , q 2 = 1 Q w q 1 * w q 2 * f ( x q 1 * , y q 1 * ) (18)

And in this case, the nodes and weights are derived as follows:

( x q 1 * y q 1 * ) = x ^ + 2 ∗ ( − ∂ 2 ∂ x 2 log ( f ( x , y ) ) | x , y = x ^ ) − 1 / 2 ∗ ( x q 1 y q 1 ) (19)

and

( w q 1 * w q 2 * ) = 2 ∗ | − ∂ 2 ∂ x 2 log ( f ( x , y ) ) | ( x , y ) = x ^ | − 1 / 2 ∗ ( w q 1 exp ( x q 1 2 ) w q 2 exp ( x q 2 2 ) ) (20)

where | A | denotes the determinant of matrix A, and x ^ denotes the mode of the integrand f ( x , y ) .

Jackel (2005) also suggests that for the nodes with low weights (when contributions to the integral value are not significant) we can prune the range from those nodes in order to save calculation time. That means to set a scalar

τ = w 1 w | ( Q + 1 ) / 2 | Q and drop all nodes with weights lower than this scalar.

Maximum Simulated Likelihood method was introduced by [

f ( x , y ) = ∫ ℝ 2 f * ( x , y , u 1 , u 2 ) g ( u 1 , u 2 ) d u 1 d u 2 (21)

where g ( u 1 , u 2 ) is a probability distribution function, f * ( x , y , u 1 , u 2 ) is called simulator and denotes the function from which the mean value at some draws u_{1} and u_{2} gives an approximation of the overall likelihood. Without lost of generality, we only define the two dimensions case that can be generalized to fewer or larger dimensions integral. For this kind of likelihood function, [_{1} and u_{2} drawn from the same probability distribution function g (the probability distribution function of the individual random effects). Then the overall likelihood function can be approximated by (u_{1d} denotes the d^{th} draw from u_{1}; the same definition holds for u_{2d}):

f ( x , y ) = 1 D ∑ d = 1 D f * ( x , y , u 1 d , u 2 d ) (22)

where D denotes the number of draws.

To implement this method, we start by simulating a bivariate normal draw N ( 0, I 2 ) and we give them the ( u 1 , u 2 ) covariance matrix structure. Then we calculate the value of the simulator at these transformed draws and we repeat D times. The overall likelihood is the mean of the simulator value at each transformed draw. At each iteration, once the random effects covariance matrix is calculated, we apply it to the simulated first normal draws to transform them in draws of the random effects and use them to calculate the likelihood. This process is repeated until convergence.

The simulated likelihood estimator is consistent and asymptotically equivalent to the likelihood estimator ( [

As described above, they are two main methods to estimate our likelihood function. To choose which method to implement, we deal with the accuracy and the computing time requirement.

For our estimations, we choose the adaptative Gauss-Hermite quadrature proposed by [

• Our dataset is an unbalanced panel data with 10,569 individuals observed in mean over 26 years, that leads 255,206 observations. Due to the fact that the simulated likelihood method requires that the number of draw D be larger than the square of the number of observations, we do not use it to avoid waste of time in computing process.

• The Gauss-Hermite quadrature requires that we find the best number of quadrature Q that is the one for whom the integrand can be well approximated by an 2 Q + 1 order polynomial. If Q is small, that reduces computing time. Our estimations are achieved in general for Q between 8 and 14. It means that at each iteration, for the likelihood value calculation, we do a weighted sum of between 8 2 = 64 and 14 2 = 196 terms.

• Using the Gauss-Hermite quadrature method reduces computing time but this computing time remains very long if the integrand is not sampled at the suitable range (meaning that the adaptative method has not been used). And in this case, the maximization process spends between two and three weeks before achieving convergence on an Intel Core i7 computer at 3.4 GHz with 8 GB of RAM memory. By applying the adaptative Gauss-Hermite quadrature, the computing time is significantly reduced and then, we spend between two and three days for achieving convergence on the same computer.

Note that the reduced convergence time mentioned above is in part due to the implementation of the first order derivatives of the likelihood function. Using the overall log-likelihood approximated by the Liu and Pierce adaptative Gauss-Hermite quadrature method, we can get derivatives with respect of all model parameters. The implementation of these derivative in the maximization process allows us to used the Stata’s d1 method. The convergence time saved by this method is clearly huge. On our overall data set, with 8 quadratures points, when we use a non adaptative quadrature method, the convergence is not achieved: after 3 weeks of computation, the model underflows. When we use the [

In this section we describe some requirements of the selected method that is the adaptative Gauss-Hermite Quadrature. The first one is the fact that the adaptative Gauss-Hermite quadrature requires to derive the Hessian of the log of the integrand ( [

The gradient of the overall log-likelihood function has been calculated to speed up the maximization process. This will allow us to use the Stata’s d1 method that requires the implementation of the gradient vector in addition to the log-likelihood. The likelihood function for an individual i is:

L i = ∫ ℝ 2 Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) ∏ t = 2 T i Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ϕ ( η i , Σ η ) d η i 1 d η i 2 (23)

where

q i t 1 = 2 y i t 1 − 1 ∀ i , t

q i t 2 = 2 y i t 2 − 1 ∀ i , t

h i 0 = Z i 1 γ 1 + λ 11 η i 1 + λ 12 η i 2

w i 0 = Z i 2 γ 2 + λ 21 η i 1 + λ 22 η i 2

h ¯ i t = X i t 1 β 1 + δ 11 h i , t − 1 + δ 12 w i , t − 1 + η i 1

w ¯ i t = X i t 2 β 2 + δ 21 h i , t − 1 + δ 22 w i , t − 1 + η i 2

Using the adaptative Gauss-Hermite quadrature method by [

L i = ∑ k = 1 , j = 1 Q w k * w j * Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) × ∏ t = 2 T i Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ϕ ( η i , Σ η ) | η i 1 = x k * , η i 2 = x j * (24)

To get the gradient vector, the log-likelihood above must be derived with respect to 13 parameters that are: β ¯ 1 = ( β 1 , δ 11 , δ 12 ) ′ , β ¯ 2 = ( β 2 , δ 21 , δ 22 ) ′ , γ 1 , γ 2 , λ 11 , λ 12 , λ 21 , λ 22 , σ 1 , σ 2 , ρ η , ρ ζ , and ρ ϵ .

Let’s l k j denotes:

l k j = Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) ∏ t = 2 T i Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ϕ ( η i , Σ η ) | η i 1 = x k * , η i 2 = x j *

Then, the first order derivatives with respect to each α of the 13 parameters is given by:

∂ log ( L i ) ∂ α = ∑ k = 1 , j = 1 Q ∂ l k j / ∂ α L i

With respect to β ¯ 1 the first order derivative is:

∂ l k j ∂ β ¯ 1 = l k j ∑ t = 2 T i q i t 1 ϕ ( q i t 1 h ¯ i t ) Φ 1 ( q i t 2 w ¯ i t − q i t 2 ρ ζ h ¯ i t 1 − ρ ζ 2 ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

With respect to β ¯ 2 the first order derivative is:

∂ l k j ∂ β ¯ 2 = l k j ∑ t = 2 T i q i t 2 ϕ ( q i t 2 w ¯ i t ) Φ 1 ( q i t 1 h ¯ i t − q i t 1 ρ ζ w ¯ i t 1 − ρ ζ 2 ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

With respect to γ 1 the first order derivative is:

∂ l k j ∂ γ 1 = l k j q i 0 1 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − q i 0 2 ρ ϵ h i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to γ 2 the first order derivative is:

∂ l k j ∂ γ 2 = l k j q i 0 2 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − q i 0 1 ρ ϵ w i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to λ 11 the first order derivative is:

∂ l k j ∂ λ 11 = l k j q i 0 1 x k * ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − q i 0 2 ρ ϵ h i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to λ 12 the first order derivative is:

∂ l k j ∂ λ 12 = l k j q i 0 1 x j * ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − q i 0 2 ρ ϵ h i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to λ 21 the first order derivative is:

∂ l k j ∂ λ 21 = l k j q i 0 2 x k * ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − q i 0 1 ρ ϵ w i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to λ 22 the first order derivative is:

∂ l k j ∂ λ 22 = l k j q i 0 2 x j * ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − q i 0 1 ρ ϵ w i 0 1 − ρ ϵ 2 ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

With respect to σ 1 the first order derivative is:

∂ l k j ∂ log ( σ 1 ) = l k j ∗ ( − 1 + ( x k * / σ 1 ) 2 − ρ η x k * x j * / ( σ 1 σ 2 ) 1 − ρ η 2 )

With respect to σ 2 the first order derivative is:

∂ l k j ∂ log ( σ 2 ) = l k j ∗ ( − 1 + ( x j * / σ 2 ) 2 − ρ η x k * x j * / ( σ 1 σ 2 ) 1 − ρ η 2 )

With respect to ρ η the first order derivative is:

∂ l k j ∂ log ( 1 + ρ η 1 − ρ η ) 1 / 2 = l k j ∗ ( ρ η − ρ η ( ( x k * / σ 1 ) 2 + ( x j * / σ 2 ) 2 ) − ( 1 + ρ η 2 ) x k * x j * / ( σ 1 σ 2 ) 1 − ρ η 2 )

With respect to ρ ζ the first order derivative is:

∂ l k j ∂ log ( 1 + ρ ζ 1 − ρ ζ ) 1 / 2 = l k j ∑ t = 2 T i q i t 1 q i t 2 ϕ ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

With respect to ρ ϵ the first order derivative is:

∂ l k j ∂ log ( 1 + ρ ϵ 1 − ρ ϵ ) 1 / 2 = l k j q i 0 1 q i 0 2 ϕ ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ )

Remarks:

• For σ 1 , σ 2 , ρ η , ρ ζ , and ρ ϵ , we used some transformations on parameters to insure that in the maximization process, each σ remains positive and each ρ remains between −1 and 1 at all iteration. For σ we use exponential transformation then in the derivation, we derive with respect to

log ( σ ) . For ρ we use arc-tangency transformation (i.e. exp ( 2 ρ ) − 1 exp ( 2 ρ ) + 1 ) then in the derivation, we derive with respect to log ( 1 + ρ 1 − ρ ) 1 / 2 .

• To easily derive a bivariate normal probability with zero mean, variance one and correlation ρ, we can transform it into an integral where the integrand is a product of an univariate normal density and an univariate normal probability as follows:

Φ 2 ( x , y , ρ ) = ∫ − ∞ y ϕ ( v ) Φ ( x − ρ v 1 − ρ 2 ) d v = ∫ − ∞ x ϕ ( u ) Φ ( y − ρ u 1 − ρ 2 ) d u .

• Given the transformation above, the first order derivatives of Φ 2 ( x , y , ρ ) with respect to x and y are respectively given by:

∂ Φ 2 ( x , y , ρ ) ∂ x = ϕ ( x ) Φ ( y − ρ x 1 − ρ 2 )

∂ Φ 2 ( x , y , ρ ) ∂ y = ϕ ( y ) Φ ( x − ρ y 1 − ρ 2 )

For the requirement of the adaptative Gauss-Hermite quadrature method, we need to derive the Hessian matrix of the log of the integrand function with respect to the random effects vector^{3}. From the individual likelihood function defined in Equation 23, the log of the integrand is:

log ( g ( η i 1 , η i 2 ) ) = log ( Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) ∏ t = 2 T i Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ϕ ( η i , Σ η ) ) (25)

We derive from the log of the integrand in Equation (25) the Hessian matrix by calculating:

− ∂ 2 ∂ ( η i 1 ) 2 log ( g ( η i 1 , η i 2 ) )

− ∂ 2 ∂ ( η i 2 ) 2 log ( g ( η i 1 , η i 2 ) )

− ∂ 2 ∂ η i 1 ∂ η i 1 log ( g ( η i 1 , η i 2 ) )

The first order derivatives are given by:

− ∂ ∂ η i log ( g ) = − Φ ′ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − ∑ t = 2 T i Φ ′ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) + η i 1 / σ 1 2 − ρ η i 2 / ( σ 1 σ 2 ) 1 − ρ η 2

With respect to η i 1 we have:

Φ ′ 2 η i 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) = q i 0 1 λ 11 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − q i 0 2 ρ ϵ h i 0 1 − ρ ϵ 2 ) + q i 0 2 λ 21 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − q i 0 1 ρ ϵ w i 0 1 − ρ ϵ 2 )

Φ ′ 2 η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) = q i t 1 ϕ ( q i t 1 h ¯ i t ) Φ 1 ( q i t 2 w ¯ i t − q i t 2 ρ ζ h ¯ i t 1 − ρ ζ 2 )

And with respect to η i 2 we have:

Φ ′ 2 η i 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) = q i 0 1 λ 12 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − q i 0 2 ρ ϵ h i 0 1 − ρ ϵ 2 ) + q i 0 2 λ 22 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − q i 0 1 ρ ϵ w i 0 1 − ρ ϵ 2 )

Φ ′ 2 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) = ϕ ( q i t 2 w ¯ i t ) Φ 1 ( q i t 1 h ¯ i t − q i t 1 ρ ζ w ¯ i t 1 − ρ ζ 2 )

The second order derivatives are given by:

− ∂ 2 ∂ ( η i 1 ) 2 log ( g ) = − Φ ″ 2 η i 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) + Φ ′ 2 η i 1 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − ∑ t = 2 T i ( Φ ″ 2 η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) − Φ ′ 2 η i 1 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ) + 1 σ 1 2 ( 1 − ρ η 2 ) (26)

− ∂ 2 ∂ ( η i 2 ) 2 log ( g ) = − Φ ″ 2 η 2 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) + Φ ′ 2 η i 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − ∑ t = 2 T i ( Φ ″ 2 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) − Φ ′ 2 η i 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ) + 1 σ 1 2 ( 1 − ρ η 2 ) (27)

− ∂ 2 ∂ η i 1 δ η i 2 log ( g ) = − Φ ″ 2 η i 1 η i 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ε ) Φ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) + Φ ′ 2 η 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ ′ 2 η 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) Φ 2 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − ∑ t = 2 T i ( Φ ″ 2 η i 1 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

− Φ ′ 2 η 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ ′ 2 η 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) Φ 2 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) ) − ρ η σ 1 σ 2 ( 1 − ρ η 2 ) (28)

where

Φ ″ 2 η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) = − h ¯ i t Φ ′ 2 η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) − ρ ζ ϕ η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

Φ ″ 2 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) = − w ¯ i t Φ ′ 2 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) − ρ ζ ϕ η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

Φ ″ 2 η i 1 η i 2 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ ) = q i t 1 q i t 2 ρ ζ ϕ η i 1 ( q i t 1 h ¯ i t , q i t 2 w ¯ i t , q i t 1 q i t 2 ρ ζ )

Φ ″ 2 η i 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) = ( 2 λ 11 λ 21 − ρ ϵ ( λ 11 2 + λ 21 2 ) ) ϕ ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − λ 11 2 h i 0 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − ρ ε q i 0 2 h i 0 1 − ρ ε 2 ) − λ 21 2 w i 0 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − ρ ϵ q i 0 1 w i 0 1 − ρ ϵ 2 )

Φ ″ 2 η i 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) = ( 2 λ 12 λ 22 − ρ ϵ ( λ 12 2 + λ 22 2 ) ) ϕ ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − λ 12 2 h i 0 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − ρ ϵ q i 0 2 h i 0 1 − ρ ϵ 2 ) − λ 22 2 w i 0 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − ρ ε q i 0 1 w i 0 1 − ρ ϵ 2 )

Φ ″ 2 η i 1 η i 1 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) = q i 0 1 q i 0 2 ( λ 11 λ 22 + λ 12 λ 21 − ρ ϵ ( λ 11 λ 12 + λ 21 λ 22 ) ) ϕ 2 ( q i 0 1 h i 0 , q i 0 2 w i 0 , q i 0 1 q i 0 2 ρ ϵ ) − λ 11 λ 12 h i 0 ϕ ( q i 0 1 h i 0 ) Φ 1 ( q i 0 2 w i 0 − ρ ϵ q i 0 2 h i 0 1 − ρ ϵ 2 ) − λ 21 λ 22 w i 0 ϕ ( q i 0 2 w i 0 ) Φ 1 ( q i 0 1 h i 0 − ρ ϵ q i 0 1 w i 0 1 − ρ ϵ 2 )

Then, the Hessian matrix is given by:

H = ( − ∂ 2 ∂ ( η i 1 ) 2 log ( g ) − ∂ 2 ∂ η i 1 ∂ η i 2 log ( g ) − ∂ 2 ∂ η i 1 δ η i 2 log ( g ) − ∂ 2 ∂ ( η i 2 ) 2 log ( g ) ) (29)

As described in Section 3.1, after having derived this Hessian matrix, we calculate its value at the mode of the integrand and use it to re-sample the integrand.

This section aims to insure that the implemented method gives suitable results. We consider that the implemented method give us suitable results if for a given relationship between variables, by applying the estimation method on these variables we find approximatively the same coefficients. To reach this goal, we perform a robustness analysis on the estimation method. This robustness analysis is an empirical one based on simulations. We use two different approaches for that.

The first approach is to simulate bivariate binary variables by specifying a relationship between some explanatory variables (it means that we set coefficients of explanatory variables) and estimate this relationship with the implemented method in order to compare the results with the relationship specified before. In the second approach, we introduce new variables (that were not used in the data generating process) when estimating the relationship with the implemented method and compare the new results with the first ones. The implemented method is robust when it is able to correctly estimate the relationship specified even if we introduce other variables and also to estimate non significant coefficients to those other variables. Finally, the method we make use of to check for the robustness is the same that in [

As the estimation method implemented is a numerical approximation method, the results will depend on the selected number of quadrature points. We deal with the incidence of number of quadrature points on results in the last part of this section. For a better analysis of the results we also add the standard errors of each estimated coefficients.

In this section, we use variables from the French SIP (Santé et Itinéraire Professionnel) survey data set and we simulate error terms and a relationship between some selected variables. The subset of the database use for this section is an unbalanced panel of 1202 individuals with total waves per individual between 5 and 10 waves.

We set the error terms parameters as σ 1 = 2.1 , σ 2 = 3.1 , ρ η = 0.7 , ρ ζ = 0.5 and ρ ϵ = 0.4 .

We simulate idiosyncratic errors vectors ζ = ( ζ 1 , ζ 2 ) ′ and ϵ = ( ϵ 1 , ϵ 2 ) ′ as bivariate normal variables with zero mean, variance equal to 1 and covariances respectively equal to ρ ζ and ρ ϵ . We also simulate individual random effects as bivariate normal variables with zero mean, covariance equals to ρ η and variance equals to σ 1 2 for the first component of the random effects vector and equals to σ 2 2 for the second component of the random effects vector. It has been done as follows:

ϵ 1 = r n o r m a l (0,1)

ϵ 2 = r n o r m a l ( 0 , 1 ) ∗ 1 − ρ ϵ 2 + ρ ϵ ϵ 1

ζ 1 = r n o r m a l ( 0 , 1 )

ζ 2 = r n o r m a l ( 0 , 1 ) ∗ 1 − ρ ζ 2 + ρ ζ ζ 1

where r n o r m a l ( μ , σ ) denote the random normal density with mean μ and standard deviation σ. As individuals effects are time invariant, we simulate η as follows:

η 1 = r n o r m a l ( 0 , σ 1 ) if t = 1

η 2 = r n o r m a l ( 0 , σ 2 ) ∗ 1 − ρ η 2 + ρ η σ 2 σ 1 η 1 if t = 1

η 1 = η 1 [ t = 1 ] if t ≠ 1

η 2 = η 2 [ t = 1 ] if t ≠ 1

For the initial conditions ( t = 1 ), the simulated relationship is the following:

y 1 * = − 0.2 + 0.3 i l l − 0.2 u n e m p + 0.4 η 1 − 0.5 η 2 + ϵ 1

y 2 * = 2 − 0.2 i l l − 0.08 a g e + 0.3 η 1 + 0.5 η 2 + ϵ 2

y 1 = I ( y 1 * > 0 )

y 2 = I ( y 2 * > 0 )

For t > 1 , we specify the following relationship:

y 1 t * = 1.9 + 0.3 y 1 , t − 1 + 0.1 y 2 , t − 1 − 0.05 M a l e t − 0.2 u n e m p t + η 1 + ζ 1 t

y 2 t * = − 0.4 − 0.1 y 1 , t − 1 + 0.4 y 2 , t − 1 + 0.05 M a l e t − 0.5 d e n s t + η 2 + ζ 2 t

y 1 t = I ( y 1 t * > 0 )

y 2 t = I ( y 2 t * > 0 )

The variable ill denotes having an illness episode in the year, unemp denotes being out of labour marking during the year, age denotes the age of individual, and Male is 1 if individual is male and 0 otherwise. Estimation results for 16 quadrature points are displayed in

In this section, we keep the same DGP than in Section 5.1 and we add other variables in the model that we estimate in order to evaluate the robustness of the estimation method by the fact that all estimated coefficients for variables in the DGP should remain the same and the added variables coefficients should not

Equation (1) | Equation (2) | |||
---|---|---|---|---|

DGP | Estimated coef. | DGP | Estimated coef. | |

(1) | (2) | (1’) | (2’) | |

Dynamic Equation | ||||

y^{1} − 1 | 0.3 | 0.2195 ∗ ∗ ∗ ( 0.05 ) | −0.4 | − 0.0051 ( 0.0567 ) |

y^{2} − 1 | 0.1 | 0.1267 ∗ ∗ ( 0.0513 ) | 0.4 | 0.4926 * * * ( 0.061 ) |

Gender = Male | −0.05 | − 0.0554 ( 0.0521 ) | 0.05 | 0.073 ( 0.0594 ) |

Medical density_{ } | − | − | 0.5 | 0.5687 ( 1.1111 ) |

Unemployment rate | −0.2 | − 0.1682 * * * ( 0.269 ) | − | − |

Intercept | 1.9 | 2.3113 * * * ( 0.2667 ) | −0.4 | − 0.4677 ( 2.122 ) |

Initial Conditions | ||||

Illness before prof: life | 0.3 | 0.3032 * * * ( 0.0283 ) | −0.2 | − 0.1624 * * * ( 0.0221 ) |

Age | − | − | −0.08 | − 0.093 * * * ( 0.0202 ) |

Unemployment rate | −0.2 | − 0.144 ∗ ∗ ( 0.057 ) | − | − |

Intercept | −0.2 | − 0.7331 ( 0.6194 ) | 2 | 2.6757 * * * ( 0.4591 ) |

λ 1 | 0.4 | 0.2581 * * * ( 0.0651 ) | 0.3 | 0.2660 * * * ( 0.0463 ) |

λ 2 | −0.5 | − 0.5168 * * * ( 0.0753 ) | 0.5 | 0.7022 * * * ( 0.0598 ) |

Covariance matrix structure | ||||

DGP | Estimated coef. | |||

(4) | (5) | |||

σ 1 | 2.1 | 2.4399 * * * ( 0.1034 ) | ||

σ 2 | 3.1 | 2.7649 * * * ( 0.1365 ) | ||

ρ η | 0.7 | 0.7188 * * * ( 0.0212 ) | ||

ρ ζ | 0.5 | 0.5290 * * * ( 0.0419 ) | ||

ρ ε | 0.4 | 0.6972 * * * ( 0.1378 ) |

Estimated standard deviations for estimated coefficients are given within parenthesis. ***: significant at the 1% level, **: significant at the 5% level.

significant. We introduce two variables rural and nationality (not French) in the dynamic equations of the regression.

Results are in ^{4}. As we can see in the

Equation (1) | Equation (2) | |||||
---|---|---|---|---|---|---|

DGP | coef: | coef: | DGP | coef: | coef: | |

(1) | (2) | (3) | (1’) | (2’) | (3’) | |

Dynamic Equation | ||||||

y^{1} − 1 | 0.3 | 0.2195 * * * ( 0.05 ) | 0.2184 * * * ( 0.05 ) | −0.1 | − 0.0051 ( 0.0567 ) | − 0.0052 ( 0.0568 ) |

y^{2} − 1 | 0.1 | 0.1267 * * ( 0.0513 ) | 0.1283 * * ( 0.0513 ) | 0.4 | 0.4926 * * * ( 0.061 ) | 0.4944 * * * ( 0.0612 ) |

Gender = Male | −0.05 | − 0.0554 ( 0.0521 ) | − 0.0571 ( 0.0521 ) | 0.05 | 0.073 ( 0.0594 ) | 0.0751 ( 0.0596 ) |

Medical density | − | − | − | 0.5 | 0.5687 ( 1.1111 ) | 0.5567 ( 1.1112 ) |

Unemployment rate | −0.2 | − 0.1682 * * * ( 0.0269 ) | − 0.1698 * * * ( 0.0269 ) | − | − | − |

Not French | − | − | 0.1246 ( 0.0956 ) | − | − | 0.0015 ( 0.1076 ) |

rural | − | − | 0.0743 ( 0.0628 ) | − | − | 0.0283 ( 0.0719 ) |

Intercept | 1.9 | 2.3113 * * * ( 0.2667 ) | 2.2994 * * * ( 0.2667 ) | −0.04 | − 0.4677 ( 2.122 ) | − 0.4527 ( 2.1215 ) |

Initial Conditions | ||||||

Illness before prof: life | 0.3 | 0.3032 * * * ( 0.0283 ) | 0.3032 * * * ( 0.0283 ) | −0.2 | − 0.1624 * * * ( 0.0221 ) | − 0.1627 * * * ( 0.0221 ) |

Age | − | − | − | −0.08 | − 0.093 * * * ( 0.0202 ) | − 0.0932 * * * ( 0.0202 ) |

Unemployment rate | −0.2 | − 0.144 * * ( 0.057 ) | − 0.144 * * ( 0.057 ) | − | − | − |

Intercept | −0.2 | − 0.7331 ( 0.6194 ) | − 0.7335 ( 0.6195 ) | 2 | 2.6757 * * * ( 0.4591 ) | 2.6803 * * * ( 0.4595 ) |

λ 1 | 0.4 | 0.2581 * * * ( 0.0651 ) | 0.2582 * * * ( 0.0653 ) | 0.3 | 0.266 * * * ( 0.0463 ) | 0.267 * * * ( 0.0464 ) |

λ 2 | −0.5 | − 0.5168 * * * ( 0.0753 ) | − 0.5171 * * * ( 0.0754 ) | 0.5 | 0.7022 * * * ( 0.0598 ) | 0.703 * * * ( 0.0599 ) |

Covariance matrix structure | ||||||

DGP | Estimated coef: | Estimated coef: | ||||

(4) | (5) | (6) | ||||

σ 1 | 2.1 | 2.4399 * * * ( 0.1034 ) | 2.4353 * * * ( 0.1032 ) | |||

σ 2 | 3.1 | 2.7649 * * * ( 0.1365 ) | 2.763 * * * ( 0.1366 ) | |||

ρ η | 0.7 | 0.7188 * * * ( 0.0212 ) | 0.7187 * * * ( 0.0212 ) | |||

ρ ζ | 0.5 | 0.529 * * * ( 0.0419 ) | 0.5301 * * * ( 0.0419 ) | |||

ρ ε | 0.4 | 0.6972 * * * ( 0.1379 ) | 0.697 * * * ( 0.1378 ) |

Estimated standard deviations for estimated coefficients are given within parenthesis. ***: significant at the 1% level. **: significant at the 5% level.

As the accuracy of the method depends on the number of quadrature points used for the likelihood calculation, we propose an assessment of how it affects the results when this number increases. For doing so, we fit the same model with different numbers of quadrature points and we calculate the relative difference in log-likelihood and in estimated parameters.

We fit some models by using the same simulated relationship between variables as in Section 5.1.

The results are displayed in the

As we can see from

This paper describes the bivariate dynamic probit model with endogenous initial conditions starting by justifying the econometric specification of the model, giving the estimation method and its requirements and ending by presenting a robustness analysis. We calculate the derivatives of the log-likelihood function (the gradient) with respect to the 13 parameters in the model. This is the main contribution of our research as many programs use numerical computation of the gradient vector instead of encoding the mathematically derived expression of

DGP | Q = 10 | Q = 16 | Q = 22 | Q = 24 | |
---|---|---|---|---|---|

Log likelihood | −8212.05 | −8211.26 | −830.71 | −8301.27 | |

y^{1} | Dynamic equation | ||||

y^{1} − 1 | 0.3 | 0.2754 * * * ( 0.0489 ) | 0.2195 * * * ( 0.05 ) | 0.2206 * * * ( 0.052 ) | 0.2131 * * * ( 0.0527 ) |

y^{2} − 1 | 0.1 | 0.1376 * * * ( 0.0483 ) | 0.1267 * * ( 0.0513 ) | 0.1196 * * ( 0.0554 ) | 0.1010 * ( 0.0568 ) |

Gender = Male | −0.05 | − 0.0580 ( 0.0479 ) | − 0.0554 ( 0.0521 ) | − 0.0732 ( 0.058 ) | − 0.0599 ( 0.0604 ) |

Unemployment rate | −0.2 | − 0.1509 * * * ( 0.0262 ) | − 0.1682 * * * ( 0.0269 ) | − 0.1792 * * * ( 0.0273 ) | − 0.1810 * * * ( 0.0275 ) |

Intercept | 1.9 | 2.3270 * * * ( 0.2598 ) | 2.3113 * * * ( 0.2667 ) | 2.3089 * * * ( 0.2726 ) | 2.30 * * * ( 0.2753 ) |

y^{2} | Dynamic equation | ||||

y^{1}_{−1} | −0.1 | 0.0224 ( 0.0541 ) | − 0.0051 ( 0.0567 ) | − 0.0136 0.0594 | − 0.0191 ( 0.0605 ) |

y^{2}_{−1} | 0.4 | 0.5851 * * * ( 0.0596 ) | 0.4926 * * * ( 0.0610 ) | 0.4846 * * * ( 0.0642 ) | 0.4752 * * * ( 0.0650 ) |

Gender = Male | 0.05 | 0.0570 ( 0.0542 ) | 0.0730 ( 0.0594 ) | 0.0817 ( 0.0650 ) | 0.0725 ( 0.0673 ) |

Medical density | 0.5 | 1.3305 ( 1.0685 ) | 0.5687 ( 1.1111 ) | 0.4874 ( 1.1357 ) | 0.3549 ( 1.1473 ) |

Intercept | −0.4 | − 1.7595 ( 2.040 ) | − 0.4677 ( 2.1220 ) | − 0.4064 ( 2.1704 ) | − 0.1492 ( 2.1936 ) |

Estimated standard deviations for estimated coefficients are given within parenthesis. ***: significant at the 1% level. **: significant at the 5% level.

DGP | Q = 10 | Q = 16 | Q = 22 | Q = 24 | |
---|---|---|---|---|---|

y^{1} | Initial conditions | ||||

Illness before prof: life | 0.3 | 0.3005 * * * ( 0.0278 ) | 0.3032 * * * ( 0.0283 ) | 0.3022 * * * ( 0.0282 ) | 0.3026 * * * ( 0.0284 ) |

Unemployment rate | −0.2 | − 0.1592 * * * ( 0.0573 ) | − 0.1440 * * ( 0.0570 ) | − 0.1437 * * ( 0.0571 ) | − 0.1431 * * ( 0.0572 ) |

Intercept | −0.2 | − 0.6120 ( 0.6197 ) | − 0.7331 ( 0.6194 ) | − 0.7065 ( 0.6187 ) | − 0.7153 ( 0.6188 ) |

λ 11 | 0.4 | 0.2608 * * * ( 0.0644 ) | 0.2581 * * * ( 0.0651 ) | 0.2584 * * * ( 0.0658 ) | 0.2628 * * * ( 0.0664 ) |

λ 12 | −0.5 | − 0.5076 * * * ( 0.0723 ) | − 0.5168 * * * ( 0.0753 ) | − 0.5051 * * * ( 0.0744 ) | − 0.5019 * * * ( 0.0741 ) |

y^{2} | Initial conditions | ||||

Age | −0.08 | − 0.0859 * * * ( 0.0196 ) | − 0.0930 * * * ( 0.0202 ) | − 0.0929 * * * ( 0.0205 ) | − 0.0943 * * * ( 0.0207 ) |

Illness before prof: life | −0.2 | − 0.1593 * * * ( 0.0221 ) | − 0.1624 * * * ( 0.0221 ) | − 0.1648 * * * ( 0.0225 ) | − 0.1650 * * * ( 0.0226 ) |

Intercept | 2 | 2.7329 * * * ( 0.4483 ) | 2.6757 * * * ( 0.4591 ) | 2.5788 * * * ( 0.4644 ) | 2.5904 * * * ( 0.4676 ) |

λ 21 | 0.3 | 0.2689 * * * ( 0.0467 ) | 0.2660 * * * ( 0.0463 ) | 0.2691 * * * ( 0.0474 ) | 0.2679 * * * ( 0.0475 ) |

λ 22 | 0.5 | 0.7136 * * * ( 0.0607 ) | 0.7022 * * * ( 0.0598 ) | 0.7008 * * * ( 0.0625 ) | 0.6932 * * * ( 0.0626 ) |

Covariance matrix structure | |||||

σ 1 | 2.1 | 2.5202 * * * ( 0.1053 ) | 2.4399 * * * ( 0.1034 ) | 2.3920 * * * ( 0.1047 ) | 2.3898 * * * ( 0.1051 ) |

σ 2 | 3.1 | 2.7012 * * * ( 0.1307 ) | 2.7649 * * * ( 0.1365 ) | 2.7928 * * * ( 0.1444 ) | 2.8281 * * * ( 0.1468 ) |

ρ η | 0.7 | 0.7380 * * * ( 0.0206 ) | 0.7188 * * * ( 0.0212 ) | 0.7143 * * * ( 0.0219 ) | 0.7162 * * * ( 0.0219 ) |

ρ ζ | 0.5 | 0.5451 * * * ( 0.0411 ) | 0.5290 * * * ( 0.0419 ) | 0.5225 * * * ( 0.0423 ) | 0.5145 * * * ( 0.0424 ) |

ρ ε | 0.4 | 0.6550 * * * ( 0.1394 ) | 0.6972 * * * ( 0.1378 ) | 0.6996 * * * ( 0.1381 ) | 0.6944 * * * ( 0.1371 ) |

Estimated standard deviations for estimated coefficients are given within parenthesis. ***: significant at the 1% level. **: significant at the 5% level. *: significant at the 10% level.

Quad. points | 10 | 16_{ } | 22 | 24 |
---|---|---|---|---|

Comp. time (in min.) | 83 | 190 | 450 | 480 |

the gradient. Furthermore, for the use of the adaptative Gauss-Hermite quadrature, we also calculate the Hessian matrix with respect to individual random effects vector.

The implementation has been done using Stata software. We wrote 2 ado-files for this purpose. We use Stata’s d1 method for the maximization process. For the use of this method, we implement the gradient vector for the 13 parameters and we also implement the Hessian matrix with respect the random effects vector in order to use the adaptative Gauss-Hermite quadrature. We also wrote two others ado-files for the estimation of the bivariate probit for panel data and the bivariate dynamic probit without initial conditions for panel data. These ado-files are written using the same method (Stata’s d1 method) with the adaptative Gauss-Hermite quadrature. These ado-files are available upon request.

Due to the fact that the integration is bi-dimensional, estimation time is very high and stills increasing when the number of quadrature points or the number of observation or the number of explanatory variable increase. For an estimated model, one should insure that when increasing the number of quadrature point, the computed results don’t change significantly before using them. It means that the relative difference in the results must be around 0.1% or fewer, and if so, we can conclude that the results remain stable when increasing the number of quadrature points. And, it means that there is no need to increase the number of quadrature points that will increase computing time but will not improve significantly the results. However, increasing the number of quadrature points also increases the computation time. One way for major improvement of the program is the use of multi-core (parallel) computing scheme. This scheme allows to make the computation of the contributions to the likelihood (Equation (23)) at each quadrature point separately and simultaneously on several cores. It has the advantage to save time since the contributions are computed in the same time.

Finally, our method gives reasonable computing durations with real dataset. In [

We acknowledge the Centre Maurice Halbwachs (Réseau Quetelet) for access to the SIP 2007 data set (Santé et itinéraire professionnel-2007. DARES producteur. Centre Maurice Halbwachs diffuseur).

Moussa, R. and Delattre, E. (2018) On the Estimation of Causality in a Bivariate Dynamic Probit Model on Panel Data with Stata Software: A Technical Review. Theoretical Economics Letters, 8, 1257-1278. https://doi.org/10.4236/tel.2018.86083