This paper provides an estimation approach for the multi-equations’ systems in panel data. Multi-equations systems are at the heart of economic modeling. Researchers who want to establish causal links between two outcomes, often need to consider simultaneity between the latter, to overcome endogeneity issues (for instance when considering supply and demand equations). Difficulties arise when considering linear and non-linear outcomes at the same time and this is why Roodman  implemented the Stata module cmp for multidimensional models. In this paper, we further develop this technique to allow researchers to implement a simultaneous equations model in a panel dimension setting. Implemented under Stata, our method, xtcmp, is a Full Information Maximum Likelihood (FIML) estimator. This paper explains the associated theory (derivation of the log-likelihood function, the associated gradient and the Hessian matrices of the log-integrand function) and offers an application of t xtcmp, while making comparisons with cmp.

1. Introduction

In empirical economics, a common approach is to consider a linear data-generating process. However, non-linear outcomes are often present and important in research questions. This is due to the structure of the database where interviewers transcribe yes-no answers into binary outcomes. However, when researchers point out a project, they often have to take into account different variables, continuous and categorical, at the same time, while considering simultaneous equations framework in a dynamic setting, in which each dependent variable is endogenous in one (or more) equation(s) of the model. The advantage of simultaneous equations model is to consider the correlation between the error terms of each equation. More specifically, in a dynamic setup, such models allow researchers to consider different individual effects (which are part of error terms, the latter being decomposed into an individual effect fixed across time and an effect which depends on time) across equations. This is of importance since these terms are unobserved, specific to each outcome, and might imply endogeneity issues. For instance in health economics, when investigating causal relationship between health and income, this can run from income to health and from health to income such that both are endogenous to each other . In this way, considering a dynamic simultaneous equation model allows to consider unobserved individual effects such as physical maturity (thanks to genetics for health) or intellectual abilities (for income).

The framework of multi-equations models has been widely used in the literature to address several issues including the case of an endogenous binary outcome. Greene  has reformulated the estimation of the impact of an endogenous treatment on a continuous outcome using a multi-equations model. This reformulation has been extended to the analysis of endogenous binary outcome by . Thus, several papers have analyzed the effects of an endogenous treatment on diverse types of outcomes including continuous and count outcomes   . Also, some generalizations for the case of noncompliance and nonresponse have been introduced by . However, all these methods focus on cross sectional data and do not account for panel data.

There is almost no automated estimation method in Stata software to estimate parameters in this multi-equations model. An exception is the cmp command which is the first general Stata tool for this class of models, and is written as a Seemingly Unrelated Regressions (SUR) estimator . However, this command does not explicitly consider panel dimension of the data which might be an issue due to the effervescence of databases with a temporal dimension. Moreover, simple relationships among variables at a point in time do not capture adequately the dynamic interaction of changing humans in changing environments. Thus, there is a need to develop a command for simultaneous equations model for panel database.

As a result, we offer an extension of the cmp framework, in a case where there are either two equations (one linear and one binary outcomes) or three equations (either two linear dependent variables with one binary, or one linear and two binary dependent variables), while explicitly considering the panel dimension of the data. In this way, our command xtcmp is a Full Information Maximum Likelihood (FIML) estimator, taking into account time dimension of the data, as well as, linear and non-linear outcomes (which is not feasible with three-stage least squares, because the latter only takes into account linear dependent variables).

As a result, the likelihood function is a multidimensional integral, such that we use the adaptive Gauss-Hermite quadrature method as an approximation (as proposed by Liu and Pierce ). For the accuracy of the method and to reduce computing time, we derive the gradient of the log-likelihood, and the Hessian of the respective integrand. The estimation method of xtcmp has been implemented using the d1 method of Stata software (see  for further details).

Section 2 derives the likelihood for a FIML estimator in a general setting, as well as, for three specific cases. Section 3 discusses the estimation requirements needed, and details the Hessian matrices, as well as the gradient vector according to the parameters. In Section 4, we give some examples on the use of xtcmp, while making comparisons with Roodman’s  cmp results. Section 5 concludes the paper.

2. Likelihood for a FIML Estimator

Let Y denotes a d-dimensional vector of endogenous variables in a simultaneous equations model. Roodman  specifically discusses conditions for consistency and identification of such model. Let Y k denotes the kth component of Y such that the value of Y k for individual i at period t is given by y i t k . We assume that the first d 1 components of Y, where d 1 < d , are binary outcomes, and the others, d 2 = d − d 1 , are continuous. Let ϵ denotes the vector of associated error terms. By assuming a panel random effects model, ϵ can be decomposed into two terms such that ϵ = μ + ν , where μ is time-invariant. In this way, the error term for individual i at period t in the kth equation is given by ϵ i t k = μ i k + ν i t k . The full model can be written as follows:

Y ˜ = X β + ϵ (1)

where Y ˜ contains the related latent variables for the first d 1 equation and the original continuous variables for the others. The explanatory matrix X is given by X = d i a g ( Z k ) where Z k , with k = 1 , ⋯ , d , corresponds to the explanatory variables for the kth equation. Similarly, the parameter vector is β = ( β 1 , ⋯ , β d ) ′ where β k is the parameter vector of the kth equation. We suppose that the classical hypotheses on independence between 1) error components, and 2) the error components and explanatory variables, are satisfied. Furthermore, let us assume that the error components are independent and identically distributed with zero means and covariance matrices Σ μ and Σ ν , the latter being defined as follows:

Σ ν = ( Σ 1 Σ ′ 3 Σ 3 Σ 2 ) (2)

in which Σ 1 is a d 1 dimension matrix with 1 over the diagonal (which corresponds to the covariance matrix structure for simultaneous equations with only binary outcomes), Σ 2 is a d 2 dimension matrix, and Σ 3 is a d 2 ∗ d 1 dimension matrix. Thus, the overall individual likelihood is given by:

L i = ∫ ℝ d { ∏ t = 1 T i f ν ( ν i t 1 , ⋯ , ν i t d | μ i 1 , ⋯ , μ i d ) } f μ ( μ i 1 , ⋯ , μ i d ) d μ i 1 ⋯ d μ i d (3)

where f μ ( μ i 1 , ⋯ , μ i d ) = 1 ( 2 π ) d / 2 ( det ( Σ μ ) ) exp ( − 1 2 μ ′ Σ μ − 1 μ ) . The d 1 first equations being related to binary outcomes, if we define q i t k = 2 ∗ y i t k − 1 , then, the density function f ν ( ν i t 1 , ⋯ , ν i t d | μ i 1 , ⋯ , μ i d ) is given by The density function is denoted by l i t for simplification:

l i t = f ν ( ν i t 1 , ⋯ , ν i t d | μ i 1 , ⋯ , μ i d ) = ∫ − ∞ q i t 1 z i t 1 ⋯ ∫ − ∞ q i t d 1 z i t d 1 ϕ d ( ν i t 1 , ⋯ , ν i t d ) d ν i 1 ⋯ d ν i d 1 (4)

where z i t k = Z i t k β k + μ i k , with k = 1 , ⋯ , d and ν i t k = y ˜ i t k − z i t k , with k = d 1 + 1 , ⋯ , d . The idiosyncratic error is ν = ( ν 1 , ⋯ , ν d 1 , ν d 1 + 1 , ⋯ , ν d ) ∼ N ( 0, Σ ν ) , such that ( ν d 1 + 1 , ⋯ , ν d ) ∼ N ( 0, Σ 2 ) and ( ν 1 , ⋯ , ν d 1 ) | ( ν d 1 + 1 , ⋯ , ν d ) ∼ N ( m ( ν 1 , ⋯ , ν d 1 ) | ( ν d 1 + 1 , ⋯ , ν d ) , Σ ( ν 1 , ⋯ , ν d 1 ) | ( ν d 1 + 1 , ⋯ , ν d ) ) with:

m d 1 | d 2 = m ( ν 1 , ⋯ , ν d 1 ) | ( ν d 1 + 1 , ⋯ , ν d ) = Σ ′ 3 ( Σ 2 ) − 1 ( ν d 1 + 1 , ⋯ , ν d ) ′ (5)

and

Σ d 1 | d 2 = Σ ( ν 1 , ⋯ , ν d 1 ) | ( ν d 1 + 1 , ⋯ , ν d ) = Σ 1 − Σ ′ 3 ( Σ 2 ) − 1 Σ 3 (6)

Thus, we have:

l i t = ϕ d 2 ( ν i t d 1 + 1 , ⋯ , ν i t d ) Φ d 1 ( ( q i t 1 z i t 1 , ⋯ , q i t d 1 z i t d 1 ) , m d 1 | d 2 , Σ d 1 | d 2 ) (7)

in which Φ d 1 ( ( q i t 1 z i t 1 , ⋯ , q i t d 1 z i t d 1 ) , m d 1 | d 2 , Σ d 1 | d 2 ) denotes the cumulative distribution function of a multivariate normal function with mean m d 1 | d 2 and a covariance matrix Σ d 1 | d 2 .

We now focus on two cases. The first one, related to a simultaneous equations model with two outcomes, one binary and the other one continuous, is treated in Section 2.1. Then, we focus on a case with three outcomes, composed of either one binary and two continuous variables (that is developed in Section 2.2), or two binary and one continuous variables (analyzed in Section 2.3).

2.1. Case with Two Outcomes: One Binary and One Continuous

Let us consider the two following equations:

y ˜ i t 1 = Z i t 1 β 1 + ϵ i t 1 (8)

y ˜ i t 2 = Z i t 2 β 2 + ϵ i t 2 (9)

in which y i t 1 is a binary variable equal to 1 if y ˜ i t 1 > 0 , and y ˜ i t 2 = y i t 2 is a linear outcome. The associated variance/covariance matrices of error components are:

Σ ν = ( 1 ρ 1 σ ρ 1 σ σ 2 )

Σ μ = ( σ 1 2 ρ 1 , 2 σ 1 σ 2 ρ 1 , 2 σ 1 σ 2 σ 2 2 )

By identification, we have Σ 1 = 1 , Σ 2 = σ 2 , and Σ 3 = ρ 1 σ . Thus, m ν 1 | ν 2 = ρ 1 σ ν 2 and Σ ν 1 | ν 2 = 1 − ρ 1 2 . In this way, the likelihood has the following form:

L i = ∫ ℝ 2 { ∏ t = 1 T i f ν ( ν i t 1 , ν i t 2 | μ ) } f μ ( μ i 1 , μ i 2 ) d μ i 1 d μ i 2 = ∫ ℝ 2 { ∏ t l i t } f μ ( μ i 1 , μ i 2 ) d μ i 1 d μ i 2

in which l i t is the individual likelihood:

l i t = ϕ 1 ( ν i t 2 ,0, σ 2 ) Φ 1 ( q i t 1 ( z i t 1 + ρ 1 σ ν i t 2 ) 1 − ρ 1 2 ) (10)

2.2. Case with Three Outcomes: One Binary and Two Continuous

Let us consider the two previous equations (Equation (8) and Equation (9)), and a new one:

y ˜ i t 3 = Z i t 3 β 3 + ϵ i t 3 (11)

also corresponding to a linear equation with y ˜ i t 3 = y i t 3 . The associated variance/covariance matrices of error components are:

Σ ν = ( 1 ρ 1 σ a ρ 2 σ b ρ 1 σ a σ a 2 ρ 3 σ a σ b ρ 2 σ b ρ 3 σ a σ b σ b 2 )

Σ μ = ( σ 1 2 ρ 1 , 2 σ 1 σ 2 ρ 1 , 3 σ 1 σ 3 ρ 1 , 2 σ 1 σ 2 σ 2 2 ρ 2 , 3 σ 2 σ 3 ρ 1 , 3 σ 1 σ 3 ρ 2 , 3 σ 2 σ 3 σ 3 2 )

By identification, we have Σ 1 = 1 , Σ 2 = ( σ a 2 ρ 3 σ a σ b ρ 3 σ a σ b σ b 2 ) , and Σ 3 = ( ρ 1 σ a , ρ 2 σ b ) ′ . Thus, m ν 1 | ( ν 2 , ν 3 ) = ( ρ 1 − ρ 2 ρ 3 ) ν 2 σ a + ( ρ 2 − ρ 1 ρ 3 ) ν 3 σ b 1 − ρ 3 2 and Σ ν 1 | ( ν 2 , ν 3 ) = ρ 1 2 + ρ 2 2 − 2 ρ 1 ρ 2 ρ 3 1 − ρ 3 2 . As a result, we can write the following likelihood:

L i = ∫ ℝ 3 { ∏ t = 1 T i f ν ( ν i t 1 , ν i t 2 , ν i t 3 | μ ) } f μ ( μ i 1 , μ i 2 ,, μ i 3 ) d μ i 1 d μ i 2 d μ i 3 = ∫ ℝ 3 { ∏ t l i t } f μ ( μ i 1 , μ i 2 ,, μ i 3 ) d μ i 1 d μ i 2 d μ i 3

in which l i t is the individual likelihood, defined as:

l i t = ϕ 2 ( ( ν i t 2 , ν i t 3 ) ,0, Σ 2 ) Φ 1 ( q i t 1 ( z i t 1 + m ν 1 | ( ν 2 , ν 3 ) ) Σ ν 1 | ( ν 2 , ν 3 ) ) (12)

2.3. Case with Three Outcomes: Two Binary and One Continuous

In order to derive the likelihood function for a case composed of two binary and one continuous outcomes, let us consider Equation (8) for the first binary outcome and Equation (11) for the linear variable. To consider another binary outcome, let us redefine the following:

y ˜ i t 2 = Z i t 2 β 2 + ϵ i t 2 (13)

where y i t 2 is a binary variable, equal to 1 if y ˜ i t 2 > 0 . The associated variance/covariance matrices of error components are:

Σ ν = ( 1 ρ 1 ρ 2 σ ρ 1 1 ρ 3 σ ρ 2 σ ρ 3 σ σ 2 )

Σ μ = ( σ 1 2 ρ 1 , 2 σ 1 σ 2 ρ 1 , 3 σ 1 σ 3 ρ 1 , 2 σ 1 σ 2 σ 2 2 ρ 2 , 3 σ 2 σ 3 ρ 1 , 3 σ 1 σ 3 ρ 2 , 3 σ 2 σ 3 σ 3 2 )

By identification, we have Σ 1 = ( 1 ρ 1 ρ 1 1 ) , Σ 2 = σ 2 , and Σ 3 = ( ρ 2 σ , ρ 3 σ ) ′ . Thus, m ( ν 1 , ν 2 ) | ν 3 = ( ρ 2 σ ν 3 , ρ 3 σ ν 3 ) ′ and Σ ( ν 1 , ν 2 ) | ν 3 = ( 1 − ρ 2 2 ρ 1 − ρ 2 ρ 3 ρ 1 − ρ 2 ρ 3 1 − ρ 3 2 ) . The likelihood has the following form:

L i = ∫ ℝ 3 { ∏ t = 1 T i f ν ( ν i t 1 , ν i t 2 , ν i t 3 | μ ) } f μ ( μ i 1 , μ i 2 ,, μ i 3 ) d μ i 1 d μ i 2 d μ i 3 = ∫ ℝ 3 { ∏ t l i t } f μ ( μ i 1 , μ i 2 ,, μ i 3 ) d μ i 1 d μ i 2 d μ i 3

in which l i t is the individual likelihood:

l i t = ϕ 1 ( ν i t 3 ,0, σ 2 ) Φ 2 ( q i t 1 ( z i t 1 + ρ 2 σ ν i t 3 ) 1 − ρ 2 2 , q i t 2 ( z i t 2 − ρ 3 σ ν i t 3 ) 1 − ρ 3 2 ; q i t 1 q i t 2 ρ ) (14)

with ρ = ρ 1 − ρ 2 ρ 3 ( 1 − ρ 2 2 ) ( 1 − ρ 3 2 ) .

3. Estimation Requirement

The likelihood function being a d-dimensional integral function, we use the Gauss-Hermite quadrature method (see Moussa and Delattre ). Implementing this method requires to 1) compute the mode μ ^ of the log-integrand log ( f ) = log ( { ∏ t = 1 T i   l i t } f μ ( μ i 1 , ⋯ , μ i d ) ) in μ = ( μ i 1 , ⋯ , μ i d ) and to derive the Hessian matrix H at μ ^ with respect to μ ; and, 2) to derive the gradient of the overall likelihood function with respect to the parameters.

Let Q denotes the selected number of quadrature points, x denotes the Q dimension vector of quadrature nodes, and w denotes the Q dimension vector of quadrature weight. By applying the adaptive Gauss-Hermite quadrature , the likelihood function in Equation (3) can be rewritten as:

L i = ∑ k 1 = 1 Q ⋯ ∑ k d = 1 Q w k 1 * ⋯ w k d * { ∏ t = 1 T i l i t } f μ ( μ i 1 , ⋯ , μ i d ) | μ = x * (15)

in which x * = μ ^ + 2 H − 1 / 2 x and w * = ( w 1 * ⋯ w d * ) ′ = 2 d / 2 det ( H − 1 / 2 ) ⋅ diag ( w ′ ⋅ exp ( x 2 ) ) .

The derivation of the Hessian matrix is explained in Section 3.1 while the gradient of the overall likelihood function is derived in Section 3.2.

3.1. Hessian Matrix at μ ^

Based on the expressions of l i t for each case described in Section 2 (Equation (10), Equation (12), and Equation (14)), we first need to write the associated log-integrand log ( f ) corresponding to each three cases. Then, we can focus on

the calculation of the Hessian matrices, where we need to derive ∂ 2 ∂ η i k ∂ η i j log ( f ) , in which k , j = 1 , ⋯ , d .

Focusing on the first case with two equations, we have the following log-integrand:

log ( f ) = log ( f μ ( μ i 1 , μ i 2 ) ) + ∑ t = 1 T i log ( ϕ 1 ( ν i t 2 , 0 , σ 2 ) ) + ∑ t = 1 T i log ( Φ 1 ( q i t 1 ( z i t 1 + ρ 1 σ ν i t 2 ) 1 − ρ 1 2 ) )

with the notation b i t = q i t 1 ( z i t 1 + ρ 1 σ ν i t 2 ) 1 − ρ 1 2 , we find:

∂ 2 ∂ ( η i 1 ) 2 log ( f ) = − 1 σ 1 2 ( 1 − ρ 1 , 2 2 ) + 1 1 − ρ 1 2 ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) − ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ ( η i 2 ) 2 log ( f ) = − 1 σ 2 2 ( 1 − ρ 1 , 2 2 ) − T i σ 2 + ρ 1 2 σ 2 ( 1 − ρ 1 2 ) ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) − ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) = ρ 1 , 2 σ 1 σ 2 ( 1 − ρ 1 , 2 2 ) + ρ 1 σ ( 1 − ρ 1 2 ) ∑ t = 1 T i     q i t 1 − b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

Thus, the Hessian matrix is given by:

H = ( − ∂ 2 ∂ ( η i 1 ) 2 log ( f ) − ∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) − ∂ 2 ∂ η i 1 δ η i 2 log ( f ) − ∂ 2 ∂ ( η i 2 ) 2 log ( f ) ) (16)

Then, focusing on the case of three equations with one binary outcome, the log-integrand is given by:

log ( f ) = log ( f μ ( μ i 1 , μ i 2 , μ i 3 ) ) + ∑ t = 1 T i log ( ϕ 2 ( ( ν i t 2 , ν i t 3 ) , 0 , Σ 2 ) )     + ∑ t = 1 T i log ( Φ 1 ( q i t 1 ( z i t 1 + m ν 1 | ( ν 2 , ν 3 ) ) Σ ν 1 | ( ν 2 , ν 3 ) ) )

The associated derivatives, assuming b i t = q i t 1 ( z i t 1 + m ν 1 | ( ν 2 , ν 3 ) ) Σ ν 1 | ( ν 2 , ν 3 ) , a 1 = 1 − ρ 1 , 3 2 , a 2 = 1 − ρ 2 , 3 2 , and r a = ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 a 1 a 2 , are given by:

∂ 2 ∂ ( η i 1 ) 2 log ( f ) = − 1 ( σ 1 a 1 1 − r a 2 ) 2   − 1 ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ ( η i 2 ) 2 log ( f ) = − 1 ( σ 2 a 2 1 − r a 2 ) 2 − T i ( 1 − ρ 3 2 ) σ a 2   − ( ρ 1 − ρ 2 ρ 3 ) 2 ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( σ a ( 1 − ρ 3 2 ) ) 2 ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ ( η i 3 ) 2 log ( f ) = − 1 − ρ 1 , 2 2 ( σ 3 a 1 a 2 1 − r a 2 ) 2 − T i ( 1 − ρ 3 2 ) σ b 2   − ( ρ 2 − ρ 1 ρ 3 ) 2 ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( σ b ( 1 − ρ 3 2 ) ) 2 ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) = ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 σ 1 σ 2 ( a 1 a 2 1 − r a 2 ) 2   − ρ 1 − ρ 2 ρ 3 σ a ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( 1 − ρ 3 2 ) ∑ t = 1 T i     q i t 1 b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ η i 1 ∂ η i 3 log ( f ) = ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 σ 1 σ 3 ( a 1 a 2 1 − r a 2 ) 2   − ρ 2 − ρ 1 ρ 3 σ b ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( 1 − ρ 3 2 ) ∑ t = 1 T i     q i t 1 b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) = ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 + ρ 3 T i ( 1 − ρ 3 2 ) σ a σ b   − ( ρ 2 − ρ 3 ρ 3 ) ( ρ 1 − ρ 2 ρ 3 ) σ a σ b ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( 1 − ρ 3 2 ) 2 ∑ t = 1 T i b i t ϕ 1 ( b i t ) Φ 1 ( b i t ) + ( ϕ 1 ( b i t ) ) 2 ( Φ 1 ( b i t ) ) 2

Thus, the Hessian matrix is given by:

H = ( − ∂ 2 ∂ ( η i 1 ) 2 log ( f ) − ∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) − ∂ 2 ∂ η i 1 ∂ η i 3 log ( f ) − ∂ 2 ∂ η i 1 δ η i 2 log ( f ) − ∂ 2 ∂ ( η i 2 ) 2 log ( f ) − ∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) − ∂ 2 ∂ η i 1 δ η i 3 log ( f ) − ∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) − ∂ 2 ∂ ( η i 3 ) 2 log ( f ) ) (17)

Finally, for the three equations with two binary outcomes case, the log-integrand can be written as:

log ( f ) = log ( f μ ( μ i 1 , μ i 2 , μ i 3 ) ) + ∑ t = 1 T i log ( ϕ 1 ( ν i t 3 , 0 , σ 2 ) )     + ∑ t = 1 T i log ( Φ 2 ( q i t 1 ( z i t 1 + ρ 2 σ ν i t 3 ) 1 − ρ 2 2 , q i t 2 ( z i t 2 + ρ 3 σ ν i t 3 ) 1 − ρ 3 2 ; q i t 1 q i t 2 ρ ) )

Then, considering a 1 , a 2 and r a previously defined and the following notations:

b i t 1 = q i t 1 ( z i t 1 + ρ 2 σ ν i t 3 ) 1 − ρ 2 2

b i t 2 = q i t 2 ( z i t 2 + ρ 3 σ ν i t 3 ) 1 − ρ 3 2

r n = 1 − ρ 2 2

r m = 1 − ρ 3 2

ρ = ρ 1 − ρ 2 ρ 3 r n r m

p i t 1 = q i t 1 ϕ ( b i t 1 ) Φ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 )

p i t 2 = q i t 2 ϕ ( b i t 2 ) Φ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 )

p i t 3 = ρ 3 σ 1 − ρ 2 ϕ ( b i t 2 ) Φ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) + ρ 2 σ ϕ ( b i t 1 ) Φ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 )

We find the following:

∂ 2 ∂ ( η i 1 ) 2 log ( f ) = − a 2 2 ( σ 1 a 1 a 2 1 − r a 2 ) 2 + ∑ t = 1 T i q i t 1 r n ( − b i t 1 p i t 1 − q i t 2 ρ ϕ ( b i t 1 ) ϕ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 ) ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − ( p i t 1 ) 2 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

∂ 2 ∂ ( η i 2 ) 2 log ( f ) = − a 1 2 ( σ 2 a 1 a 2 1 − r a 2 ) 2   + ∑ t = 1 T i q i t 2 r m ( − b i t 2 p i t 2 − q i t 1 ρ ϕ ( b i t 2 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − ( p i t 2 ) 2 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

∂ 2 ∂ ( η i 3 ) 2 log ( f ) = − 1 − ρ 1 2 ( σ 3 a 1 a 2 1 − r a 2 ) 2 − T i σ 2 − ∑ t = 1 T i ( p i t 3 ) 2 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2   + ∑ t = 1 T i ρ 3 σ a 2 1 − ρ 2 ( ( ρ 2 σ a 1 − q i t 1 q i t 2 ρ ρ 3 σ a 2 ) ϕ ( b i t 2 ) ϕ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 ) − ρ 3 σ b i t 2 p i t 2 ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2   + ∑ t = 1 T i ρ 2 σ a 1 1 − ρ 2 ( ( ρ 3 σ a 2 − q i t 1 q i t 2 ρ ρ 2 σ a 1 ) ϕ ( b i t 1 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) − ρ 2 σ b i t 1 p i t 1 ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) = ρ 1 − ρ 2 ρ 3 σ 1 σ 2 ( a 1 a 2 1 − r a 2 ) 2   + ∑ t = 1 T i q i t 1 q i t 2 r n r m 1 − ρ 2 ( ϕ ( b i t 1 ) ϕ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 ) ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − p i t 1 p i t 2 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

∂ 2 ∂ η i 1 ∂ η i 3 log ( f ) = ρ 2 − ρ 1 ρ 3 σ 1 σ 3 ( a 1 a 2 1 − r a 2 ) 2   + ∑ t = 1 T i q i t 1 r n ( ρ 3 σ r m − q i t 1 q i t 2 ρ ρ 2 σ r n 1 − ρ 2 ϕ ( b i t 1 ) ϕ ( b i t 2 − q i t 1 q i t 2 ρ b i t 1 1 − ρ 2 ) − ρ 2 σ b i t 1 p i t 1 ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − p i t 1 p i t 3 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) = ρ 3 − ρ 1 ρ 2 σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2   + ∑ t = 1 T i q i t 2 r m ( ρ 2 σ r n − q i t 1 q i t 2 ρ ρ 3 σ r m 1 − ρ 2 ϕ ( b i t 2 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) − ρ 3 σ b i t 2 p i t 2 ) Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − p i t 2 p i t 3 ( Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) ) 2

Thus, the Hessian matrix has the same form as the previous one (matrix 17). In other words, it is given by:

H = ( − ∂ 2 ∂ ( η i 1 ) 2 log ( f ) − ∂ 2 ∂ η i 1 ∂ η i 2 log ( f ) − ∂ 2 ∂ η i 1 ∂ η i 3 log ( f ) − ∂ 2 ∂ η i 1 δ η i 2 log ( f ) − ∂ 2 ∂ ( η i 2 ) 2 log ( f ) − ∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) − ∂ 2 ∂ η i 1 δ η i 3 log ( f ) − ∂ 2 ∂ η i 2 ∂ η i 3 log ( f ) − ∂ 2 ∂ ( η i 3 ) 2 log ( f ) ) (18)

3.2. Gradient Vector with Respect to the Parameters

Based on the likelihood function given by Equation (15), parameters to estimate are β k , with k = 1 , ⋯ , d , and the associated covariance matrices Σ μ and Σ ν . Thus, the gradient has to be calculated with respect to these parameters. The first order derivative of the log-likelihood function with respect to a parameter α , in the set of parameters, is given by:

∂ log ( L i ) ∂ α = ∑ k 1 = 1 Q ⋯ ∑ k d = 1 Q ∂ f / ∂ α L i (19)

Focusing on the three cases, we apply this formula to compute derivatives with respect to each parameter.

First, considering the two outcomes case, we need to consider the six following parameters: β 1 , β 2 , σ , ρ 1 , σ 1 , σ 2 , ρ 12 . As in subsection 3.1, we consider the previously defined b i t , which is specific to the case with two outcomes, such that we have:

∂ f ∂ β 1 = f ∗ ∑ t = 1 T i q i t 1 ϕ 1 ( b i t ) 1 − ρ 1 2 Φ 1 ( b i t )

∂ f ∂ β 2 = f ∗ ∑ t = 1 T i ( ν i t 2 σ 2 − ρ 1 q i t 1 ϕ 1 ( b i t ) σ 1 − ρ 1 2 Φ 1 ( b i t ) )

∂ f ∂ log ( σ ) = f ∗ ∑ t = 1 T i ( − 1 + ( ν i t 2 σ ) 2 − ρ 1 q i t ν i t 2 ϕ 1 ( b i t ) σ 1 − ρ 1 2 Φ 1 ( b i t ) )

∂ f ∂ log ( 1 + ρ 1 1 − ρ 1 ) 1 / 2 = f ∗ ∑ t = 1 T i ( ρ 1 q i t z i t 1 + q i t ν i t 2 σ )

∂ f ∂ log ( σ 1 ) = f ∗ ( − 1 + ( μ i 1 σ 1 ) 2 − ρ 1 , 2 μ i 1 μ i 2 σ 1 σ 2 1 − ρ 1 , 2 2 )

∂ f ∂ log ( σ 2 ) = f ∗ ( − 1 + ( μ i 2 σ 2 ) 2 − ρ 1 , 2 μ i 1 μ i 2 σ 1 σ 2 1 − ρ 1 , 2 2 )

∂ f ∂ log ( 1 + ρ 12 1 − ρ 12 ) 1 / 2 = f ∑ t = 1 T i ( q i t 1 ρ 1 z i t 1 − ν i t 2 σ ) ϕ 1 ( b i t ) 1 − ρ 1 2 Φ 1 ( b i t ) = f ∗ ( ρ 1 , 2 − ρ 1 , 2 ( μ i 1 σ 1 ) 2 + ρ 1 , 2 ( μ i 2 σ 2 ) 2 − ( 1 + ρ 1 , 2 2 ) μ i 1 μ i 2 σ 1 σ 2 1 − ρ 1 , 2 2 )

Now, focusing on the case of three equations with one binary outcome, we consider notations associated to this case in subsection 3.1 ( b i t , a 2 , and r a ). We compute derivatives with respect to β 1 , β 2 , β 3 , σ a , σ b , ρ 1 , ρ 2 , ρ 3 , σ 1 , σ 2 , σ 3 , ρ 1,2 , ρ 1,3 and ρ 2,3 , such that:

∂ f ∂ β 1 = f ∗ ∑ t = 1 T i q i t 1 ϕ 1 ( b i t ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t )

∂ f ∂ β 2 = f ∗ ∑ t = 1 T i ( ν i t 2 σ a 2 − ρ 3 ν i t 3 σ a σ b 1 − ρ 3 2 − q i t 1 ( ρ 1 − ρ 2 ρ 3 ) ϕ 1 ( b i t ) σ a ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ β 3 = f ∗ ∑ t = 1 T i ( ν i t 3 σ b 2 − ρ 3 ν i t 2 σ a σ b 1 − ρ 3 2 − q i t 1 ( ρ 2 − ρ 1 ρ 3 ) ϕ 1 ( b i t ) σ b ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ log ( σ a ) = f ∗ ∑ t = 1 T i ( − 1 + ( ν i t 2 σ a ) 2 − ρ 3 ν i t 2 ν i t 3 σ a σ b 1 − ρ 3 2 − q i t 1 ( ρ 1 − ρ 2 ρ 3 ) ν i t 2 ϕ 1 ( b i t ) σ a ( 1 − ρ 3 2 ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ log ( σ b ) = f ∗ ∑ t = 1 T i ( − 1 + ( ν i t 3 σ b ) 2 − ρ 3 ν i t 2 ν i t 3 σ a σ b 1 − ρ 3 2 − q i t 1 ( ρ 2 − ρ 1 ρ 3 ) ν i t 3 ϕ 1 ( b i t ) σ b ( 1 − ρ 3 2 ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ log ( 1 + ρ 1 1 − ρ 1 ) 1 / 2 = f ∗ ∑ t = 1 T i ( ( 1 − ρ 1 2 ) ( − q i t 1 ( ρ 3 ν i t 3 σ b − ν i t 2 σ a ) 1 − ρ 3 2 + b i t ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ρ 2 ( 1 − ρ 2 ) ( ρ 1 − ρ 2 ρ 3 ) ) ϕ 1 ( b i t ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ log ( 1 + ρ 2 1 − ρ 2 ) 1 / 2 = f ∗ ∑ t = 1 T i ( ( 1 − ρ 2 2 1 − ρ 3 2 q i t 1 ( − ρ 3 ν i t 2 σ a + ν i t 3 σ b ) + b i t ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( ρ 2 + ( ρ 1 ρ 2 − ρ 3 ) ρ 2 ( 1 − ρ 2 ) ( ρ 1 − ρ 2 ρ 3 ) ) ) ϕ 1 ( b i t ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t ) )

∂ f ∂ log ( 1 + ρ 3 1 − ρ 3 ) 1 / 2 = f ∗ ∑ t = 1 T i ( ρ 3 ( 1 − ( ν i t 2 σ a ) 2 + ( ν i t 3 σ b ) 2 1 − ρ 3 2 ) + ( 1 + ρ 3 2 ) ν i t 2 ν i t 3 σ a σ b 1 − ρ 3 2 ) + f ∗ ∑ t = 1 T i q i t 1 ( − ρ 2 ν i t 2 σ a − ρ 1 ν i t 3 σ b + 2 ρ 3 ( ρ 1 − ρ 2 ρ 3 ) ν i t 2 σ a + ( ρ 2 − ρ 1 ρ 3 ) ν i t 3 σ b 1 − ρ 3 2 + b i t ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) ( ρ 1 ρ 3 − ρ 2 ) ρ 2 ( 1 − ρ 2 ) ( ρ 1 − ρ 2 ρ 3 ) ) ϕ 1 ( b i t ) ( 1 − ρ 2 ) ( 1 − ρ 2 2 ) Φ 1 ( b i t )

∂ f ∂ log ( σ 1 ) = f ∗ ( − 1 + ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 − ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 − ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( σ 2 ) = f ∗ ( − 1 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 − ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 − ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( σ 3 ) = f ∗ ( − 1 + ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 − ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 − ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( 1 + ρ 1 , 2 1 − ρ 1 , 2 ) 1 / 2 = f ∗ ( 1 − ρ 1 , 2 2 ( a 1 a 2 1 − r a 2 ) 2 ( ρ 1 , 2 ( μ 3 i σ 3 ) 2 + μ 1 i μ 2 i σ 1 σ 2 − ρ 2 , 3 μ 1 i μ 3 i σ 1 σ 3 − ρ 1 , 3 μ 2 i μ 3 i σ 2 σ 3 ) )     + f ( 1 − ρ 1 , 2 2 ) ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( a 1 a 2 1 − r a 2 ) 2 ( 1 − ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 ( a 1 a 2 1 − r a 2 ) 2 )     + f ( 1 − ρ 1 , 2 2 ) ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( a 1 a 2 1 − r a 2 ) 2 ( − ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 + 2 ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 ( a 1 a 2 1 − r a 2 ) 2 )

+ f ( 1 − ρ 1 , 2 2 ) ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( a 1 a 2 1 − r a 2 ) 2 ( 2 ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( 1 + ρ 1 , 3 1 − ρ 1 , 3 ) 1 / 2 = f ∗ ( ( 1 − ρ 1 , 3 2 ) ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) + ( ρ 1 , 3 ( μ 2 i σ 2 ) 2 − ρ 2 , 3 μ 1 i μ 2 i σ 1 σ 2 + μ 1 i μ 3 i σ 1 σ 3 − ρ 1 , 2 μ 2 i μ 3 i σ 2 σ 3 ) a 1 2 ( a 1 a 2 1 − r a 2 ) 2 )     − f ( ρ 1 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 1 , 3 − ρ 2 , 3 ) )     ∗ ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 + ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 ( a 1 a 2 1 − r a 2 ) 2     + 2 f ( ρ 1 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 1 , 3 − ρ 2 , 3 ) )     ∗ ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 + ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2

∂ f ∂ log ( 1 + ρ 2 , 3 1 − ρ 2 , 3 ) 1 / 2 = f ∗ ( ( 1 − ρ 2 , 3 2 ) ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) + ( ρ 2 , 3 ( μ 1 i σ 1 ) 2 − ρ 1 , 3 μ 1 i μ 2 i σ 1 σ 2 + μ 2 i μ 3 i σ 2 σ 3 − ρ 1 , 2 μ 1 i μ 3 i σ 1 σ 3 ) a 2 2 ( a 1 a 2 1 − r a 2 ) 2 )     − f ( ρ 2 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 2 , 3 − ρ 1 , 3 ) )     ∗ ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 + ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 ( a 1 a 2 1 − r a 2 ) 2     + 2 f ( ρ 2 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 2 , 3 − ρ 1 , 3 ) )     ∗ ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 + ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2

Finally, we compute derivatives with respect to β 1 , β 2 , β 3 , σ , ρ 1 , ρ 2 , ρ 3 , σ 1 , σ 2 , σ 3 , ρ 1,2 , ρ 1,3 and ρ 2,3 for the three equations with two binary outcomes case. To do so, we consider notations defined for this case in subsection 3.1 concerning b i t 1 , b i t 2 , a 1 , a 2 , r a , r n , r m , ρ , p i t 1 , and p i t 2 , such that:

∂ f ∂ β 1 = f ∗ ∑ t = 1 T i p i t 1 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ )

∂ f ∂ β 2 = f * ∑ t = 1 T i p i t 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ )

∂ f ∂ β 3 = f ∗ ∑ t = 1 T i ( ν i t 3 σ 2 − ρ 3 p i t 2 σ q i t 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − ρ 2 p i t 1 σ q i t 1 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) )

∂ f ∂ log ( σ ) = f ∗ ∑ t = 1 T i ( − 1 + ( ν i t 3 σ ) 2 − ρ 3 ν i t 3 p i t 2 σ q i t 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) − ρ 2 ν i t 3 p i t 1 σ q i t 1 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) )

∂ f ∂ log ( 1 + ρ 1 1 − ρ 1 ) 1 / 2 = f ∗ ∑ t = 1 T i q i t 1 q i t 2 ( 1 − ρ 1 2 ) ϕ ( b i t 2 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) r n r m 1 − ρ 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ )

∂ f ∂ log ( 1 + ρ 2 1 − ρ 2 ) 1 / 2 = f ∗ ∑ t = 1 T i ( q i t 1 q i t 2 ( ρ 1 ρ 2 − ρ 3 ) ϕ ( b i t 2 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) r n r m 1 − ρ 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) + ( ρ 2 b i t 1 − r n ν i t 3 σ ) p i t 1 q i t 1 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) )

∂ f ∂ log ( 1 + ρ 3 1 − ρ 3 ) 1 / 2 = f ∗ ∑ t = 1 T i ( q i t 1 q i t 2 ( ρ 1 ρ 3 − ρ 2 ) ϕ ( b i t 2 ) ϕ ( b i t 1 − q i t 1 q i t 2 ρ b i t 2 1 − ρ 2 ) r n r m 1 − ρ 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) + ( ρ 3 b i t 2 − r m ν i t 3 σ ) p i t 2 q i t 2 Φ 2 ( b i t 1 , b i t 2 ; q i t 1 q i t 2 ρ ) )

∂ f ∂ log ( σ 1 ) = f ∗ ( − 1 + ( 1 − ρ 2 , 3 2 ) ( μ i 1 σ 1 ) 2 − ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ i 1 μ i 2 σ 1 σ 2 − ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ i 1 μ i 3 σ 1 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( σ 2 ) = f ∗ ( − 1 + ( 1 − ρ 1 , 3 2 ) ( μ i 2 σ 2 ) 2 − ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ i 1 μ i 2 σ 1 σ 2 − ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ i 2 μ i 3 σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( σ 3 ) = f ∗ ( − 1 + ( 1 − ρ 1 , 2 2 ) ( μ i 3 σ 3 ) 2 − ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ i 1 μ i 3 σ 1 σ 3 − ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ i 2 μ i 3 σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( 1 + ρ 1 , 2 1 − ρ 1 , 2 ) 1 / 2 = f ∗ ( 1 − ρ 1 , 2 2 ( a 1 a 2 1 − r a 2 ) 2 ( ρ 1 , 2 ( μ 3 i σ 3 ) 2 + μ 1 i μ 2 i σ 1 σ 2 − ρ 2 , 3 μ 1 i μ 3 i σ 1 σ 3 − ρ 1 , 3 μ 2 i μ 3 i σ 2 σ 3 ) )

+ f ( 1 − ρ 1,2 2 ) ( ρ 1,2 − ρ 1,3 ρ 2,3 ) ( a 1 a 2 1 − r a 2 ) 2 ( 1 − ( 1 − ρ 2,3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1,3 2 ) ( μ 2 i σ 2 ) 2 ( a 1 a 2 1 − r a 2 ) 2 )     + f ( 1 − ρ 1,2 2 ) ( ρ 1,2 − ρ 1,3 ρ 2,3 ) ( a 1 a 2 1 − r a 2 ) 2 ( − ( 1 − ρ 1,2 2 ) ( μ 3 i σ 3 ) 2 + 2 ( ρ 1,2 − ρ 1,3 ρ 2,3 ) μ 1 i μ 2 i σ 1 σ 2 ( a 1 a 2 1 − r a 2 ) 2 )     + f ( 1 − ρ 1,2 2 ) ( ρ 1,2 − ρ 1,3 ρ 2,3 ) ( a 1 a 2 1 − r a 2 ) 2 ( 2 ( ρ 1,3 − ρ 1,2 ρ 2,3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2,3 − ρ 1,2 ρ 1,3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2 )

∂ f ∂ log ( 1 + ρ 1 , 3 1 − ρ 1 , 3 ) 1 / 2 = f ∗ ( ( 1 − ρ 1 , 3 2 ) ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) + ( ρ 1 , 3 ( μ 2 i σ 2 ) 2 − ρ 2 , 3 μ 1 i μ 2 i σ 1 σ 2 + μ 1 i μ 3 i σ 1 σ 3 − ρ 1 , 2 μ 2 i μ 3 i σ 2 σ 3 ) a 1 2 ( a 1 a 2 1 − r a 2 ) 2 )     − f ( ρ 1 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 1 , 3 − ρ 2 , 3 ) )     ∗ ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 + ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 ( a 1 a 2 1 − r a 2 ) 2     + 2 f ( ρ 1 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 1 , 3 − ρ 2 , 3 ) )     ∗ ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 + ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2

∂ f ∂ log ( 1 + ρ 2 , 3 1 − ρ 2 , 3 ) 1 / 2 = f ∗ ( ( 1 − ρ 2 , 3 2 ) ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) + ( ρ 2 , 3 ( μ 1 i σ 1 ) 2 − ρ 1 , 3 μ 1 i μ 2 i σ 1 σ 2 + μ 2 i μ 3 i σ 2 σ 3 − ρ 1 , 2 μ 1 i μ 3 i σ 1 σ 3 ) a 2 2 ( a 1 a 2 1 − r a 2 ) 2 )     − f ( ρ 2 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 2 , 3 − ρ 1 , 3 ) )     ∗ ( 1 − ρ 2 , 3 2 ) ( μ 1 i σ 1 ) 2 + ( 1 − ρ 1 , 3 2 ) ( μ 2 i σ 2 ) 2 + ( 1 − ρ 1 , 2 2 ) ( μ 3 i σ 3 ) 2 ( a 1 a 2 1 − r a 2 ) 2     + 2 f ( ρ 2 , 3 ( a 1 a 2 1 − r a 2 ) 2 + ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) ( ρ 1 , 2 ρ 2 , 3 − ρ 1 , 3 ) )

∗ ( ρ 1 , 2 − ρ 1 , 3 ρ 2 , 3 ) μ 1 i μ 2 i σ 1 σ 2 + ( ρ 1 , 3 − ρ 1 , 2 ρ 2 , 3 ) μ 1 i μ 3 i σ 1 σ 3 + ( ρ 2 , 3 − ρ 1 , 2 ρ 1 , 3 ) μ 2 i μ 3 i σ 2 σ 3 ( a 1 a 2 1 − r a 2 ) 2

4. Examples and Comparisons with Roodman’s Command

In order to shed light on the advantages and the consistency of our method (xtcmp), we decide to implement examples using a dataset, previously used for xtsur1, in Stata software. This database is an unbalanced panel database of 1672 observations, corresponding to 142 individuals followed between 1990 and 2003. All explanatory variables used are quantitative and do not contain any missing values.

We implement two cases: 1) a system of two equations with one linear and one binary dependent variables; 2) a system of three equations with two binary and one continuous outcomes. Indeed, let us consider the three following equations:

y ˜ i t 1 = β 0 + x 1 i t β 1 + x 2 i t β 2 + x 3 i t β 3 + x 4 i t β 4 + μ i 1 + ν i t 1 (20)

y ˜ i t 2 = γ 0 + x 4 i t γ 1 + x 6 i t γ 2 + x 7 i t γ 3 + μ i 2 + ν i t 2 (21)

y ˜ i t 3 = α 0 + x 7 i t α 1 + x 9 i t α 2 + μ i 3 + ν i t 3 (22)

where y i t 1 and y i t 2 are binary variables equal to 1 if y ˜ i t 1 > 0 and y ˜ i t 2 > 0 ; and where y ˜ i t 3 = y i t 3 is a linear outcome.

When considering the first case (a system of two equations) with one continuous and one binary outcomes, we focus on the simultaneous estimation of Equation (20) and Equation (22). In this case, the associated variance/covariance matrices of the error components are given by:

Σ ν = ( 1 ρ σ ρ σ σ 2 )     and     Σ μ = ( σ 1 2 ρ 1 σ 1 σ 2 ρ 1 σ 1 σ 2 σ 2 2 )

We consider four estimation techniques: 1) each equation is estimated separately as a single panel equation; 2) the two equations are estimated with cmp, while considering a pooled equation; 3) the two equations are estimated with cmp, with a posterior estimate of random effects; and, 4) the two equations are estimated with our method (xtcmp) presented in Section 2. Results are presented in Table 1.

Results suggest that first, estimating equations separately is misleading since the covariance between idiosyncratic errors and individual random effects is not considered; and, the significance of the coefficients appears to be false. On the other hand, considering equations simultaneously aims at obtaining consistent estimates among the three tests performed (coefficients are closer to each others in the last three columns). However, one should notice that, when using cmp with random effects, the estimation of the covariance matrices for both individual random effects and idiosyncratic errors is done after the estimation of the

Two equations system with both a linear and a non-linear outcome
VariablesSeparateCmp onCmp withOur method
equationspooled datarandom effectsxtcmp
Binary outcome: y ˜ i t 1
x10.0238 ( 0.0166 )− 0.0056 ( 0.0062 )0.0021 ( 0.0224 )0.0119 ( 0.0082 )
x20.2569 * * * ( 0.0238 )0.0297 * * * ( 0.0034 )0.0484 * * * ( 0.0045 )0.0204 * * * ( 0.0037 )
x30.3171 * * ( 0.1228 )0.0053 ( 0.006 )0.0351 ( 0.0224 )0.0254 ( 0.0265 )
x4− 0.5748 * * * ( 0.2182 )− 0.0084 ( 0.0112 )− 0.058 ( 0.0399 )− 0.0463 ( 0.0471 )
Intercept− 6.4382 * * * ( 0.375 )− 0.7083 * * * ( 0.0379 )1.1074 * * * ( 0.1339 )− 2.7151 * * * ( 0.0535 )
Continuous outcome: y ˜ i t 3
x7− 0.1528 * * * ( 0.0207 )− 0.0321 * ( 0.0168 )− 0.1044 * * * ( 0.015 )− 0.0422 * * * ( 0.0158 )
x90.0384 * * * ( 0.004 )0.1035 * * * ( 0.0047 )0.0361 * * * ( 0.004 )0.096 * * * ( 0.0043 )
Intercept15.9964 * * * ( 0.7339 )10.0568 * * * ( 0.5086 )17.0243 * * * ( 0.4477 )7.4308 * * * ( 0.4779 )
Covariance matrix: individual effects
σ 13.8635 * * * ( 0.2135 )-2.8431 * * * ( 0.2144 )2.0702 * * * ( 0.1263 )
σ 26.024-5.706 * * * ( 0.1789 )2.2481 * * * ( 0.1421 )
ρ 1--0.5135 * * * ( 0.0247 )0.8196 * * * ( 0.0417 )
Covariance matrix: idiosyncratic errors
σ2.65966.7719 * * * ( 0.1172 )2.6725 * * * ( 0.0481 )6.472 * * * ( 0.1182 )
ρ-0.1525 * * * ( 0.0349 )0.0342 ( 0.0528 )0.5087 * * * ( 0.0418 )

***: significant at the 1% level; **: significant at the 5% level; *: significant at the 10% level.

coefficients (post-estimation). Comparing with our results (xtcmp, last column), we can see that the variance of the individual effects seems to be overestimated in cmp’s case.

Then, we offer an example for the second case, a three equations system with two binary outcomes and one continuous dependent variable, such that we consider Equations (20)-(22). In this case, the associated variance/covariance matrices of the error components are given by:

Σ ν = ( 1 ρ 1 ρ 2 σ ρ 1 1 ρ 3 σ ρ 2 σ ρ 3 σ σ 2 )     and     Σ μ = ( σ 1 2 ρ 1,2 σ 1 σ 2 ρ 1,3 σ 1 σ 3 ρ 1,2 σ 1 σ 2 σ 2 2 ρ 2,3 σ 2 σ 3 ρ 1,3 σ 1 σ 3 ρ 2,3 σ 2 σ 3 σ 3 2 )

For this example, we cannot provide the third estimation technique where we used cmp command with a posterior estimate of the random effects, because this procedure does not converge. However, we provide the other estimation techniques: 1) each equation is estimated separately as a single panel equation; 2) the three equations are estimated with cmp, considering a pooled database; and, 3) the three equations are estimated with our method (xtcmp). Results are provided in Table 2.

Three equations system with two binary outcomes
VariablesSeparateCmp onOur method
equationspooled dataxtcmp
Binary outcome 1: y ˜ i t 1
x10.0238 ( 0.0166 )− 0.0067 ( 0.0063 )0.0399 * * * ( 0.0146 )
x20.2569 * * * ( 0.0238 )0.0326 * * * ( 0.0034 )0.2313 * * * ( 0.0042 )
x30.3171 * * ( 0.1228 )0.0065 ( 0.0072 )0.0445 ( 0.0379 )
x4− 0.5748 * * * ( 0.2182 )− 0.0107 ( 0.0132 )− 0.0826 ( 0.0669 )
Intercept− 6.4382 * * * ( 0.375 )− 0.7176 * * * ( 0.0379 )− 2.7205 * * * ( 0.1247 )
Binary outcome 2: y ˜ i t 2
x40.1678 * * * ( 0.0480 )0.0076 ( 0.0078 )0.1345 * * ( 0.0584 )
x6− 0.1068 * * * ( 0.0136 )− 0.0499 * * * ( 0.0031 )− 0.0201 * * * ( 0.0041 )
x7− 0.0812 * * * ( 0.0193 )− 0.0418 * * * ( 0.0040 )− 0.0230 * * * ( 0.0071 )
Intercept8.8666 * * * ( 1.0978 )4.2916 * * * ( 0.2625 )1.8381 * * * ( 0.3885 )
Continuous outcome: y ˜ i t 3
x7− 0.1528 * * * ( 0.0207 )− 0.0343 * ( 0.0168 )− 0.0643 * * * ( 0.0123 )
x90.0384 * * * ( 0.004 )0.1005 * * * ( 0.0049 )0.0608 * * * ( 0.0034 )
Intercept15.9964 * * * ( 0.7339 )10.2621 * * * ( 0.5172 )12.4296 * * * ( 0.3771 )
Covariance matrix: individual effects
σ 13.8635 * * * ( 0.2135 )-24.123 * * * ( 1.518 )
σ 23.0296 * * * ( 0.3417 )-6.5402 * * * ( 0.485 )
σ 36.024-3.7135 * * * ( 0.235 )
ρ 1,2--0.2091 * * ( 0.0853 )
ρ 1,3--0.2379 * * * ( 0.0854 )
ρ 2,3--0.6346 * * * ( 0.0565 )
Covariance matrix: idiosyncratic errors
σ2.65966.7756 * * * ( 0.1175 )4.6060 * * * ( 0.0870 )
ρ 1-0.2468 * * * ( 0.0464 )0.8719 * * * ( 0.0681 )
ρ 2-0.1132 * * * ( 0.0404 )0.8631 * * * ( 0.0274 )
ρ 3-0.1501 * * * ( 0.0354 )0.5906 * * * ( 0.0565 )

***: significant at the 1% level; **: significant at the 5% level; *: significant at the 10% level.

Results suggest that, as before, estimating equations separately leads to errors in the significance of coefficients, especially for the first outcome. Moreover, such method does not consider the covariance between idiosyncratic errors and individual random effects. On the other hand, considering equations simultaneously allows to obtain more consistent estimates (coefficient estimates seem closer in the last two columns, and significance of the latter is persistent along the two last columns).

5. Conclusions

xtcmp is a command implemented under Stata software. We focus on three main cases: 1) a simultaneous equations model with two equations (including one linear and one binary outcomes); 2) a case with three equations composed of two linear and one binary outcomes; and, 3) a three equations case with one linear and two binary dependent variables. This command further develops Roodman’s  command cmp which does not explicitly consider the panel dimension of the data, nor simultaneous equations model since it is written as a SUR estimator. This technical note gives detailed description of the computations, namely likelihood functions, log-integrand associated, Hessian matrices and gradient vectors with respect to each parameter of interest, specific to the three cases described above.

xtcmp’s estimation framework could be further developed in order to consider a broader range of non-linear outcomes (such as ordered probit, multinomial probit or truncated framework, for instance) or to consider much more equations simultaneously, in a dynamic setup.

Still, as it stands, xtcmp represents a significant development in Stata’s commands. Indeed, it allows researchers to resolve endogeneity issues in a panel dimension context by analyzing correlation in the error terms of the equations and thus specific individual effects depending on the outcomes.

Acknowledgements