Simultaneous Equations Model with Non-Linear and Linear Dependent Variables on Panel Data

This paper provides an estimation approach for the multi-equations’ systems in panel data. Multi-equations systems are at the heart of economic modeling. Researchers who want to establish causal links between two outcomes, often need to consider simultaneity between the latter, to overcome endogeneity issues (for instance when considering supply and demand equations). Diffi-culties arise when considering linear and non-linear outcomes at the same time and this is why Roodman [1] implemented the Stata module cmp for multidimensional models. In this paper, we further develop this technique to allow researchers to implement a simultaneous equations model in a panel dimension setting. Implemented under Stata, our method, xtcmp, is a Full Information Maximum Likelihood (FIML) estimator. This paper explains the associated theory (derivation of the log-likelihood function, the associated gradient and the Hessian matrices of the log-integrand function) and offers an application of t xtcmp, while making comparisons with cmp.


Introduction
In empirical economics, a common approach is to consider a linear data-generating process. However, non-linear outcomes are often present and important in research questions. This is due to the structure of the database where interviewers transcribe yes-no answers into binary outcomes. However, when researchers point out a project, they often have to take into account different variables, continuous and categorical, at the same time, while considering simul-Theoretical Economics Letters taneous equations framework in a dynamic setting, in which each dependent variable is endogenous in one (or more) equation(s) of the model. The advantage of simultaneous equations model is to consider the correlation between the error terms of each equation. More specifically, in a dynamic setup, such models allow researchers to consider different individual effects (which are part of error terms, the latter being decomposed into an individual effect fixed across time and an effect which depends on time) across equations. This is of importance since these terms are unobserved, specific to each outcome, and might imply endogeneity issues. For instance in health economics, when investigating causal relationship between health and income, this can run from income to health and from health to income such that both are endogenous to each other [2]. In this way, considering a dynamic simultaneous equation model allows to consider unobserved individual effects such as physical maturity (thanks to genetics for health) or intellectual abilities (for income).
The framework of multi-equations models has been widely used in the literature to address several issues including the case of an endogenous binary outcome. Greene [3] has reformulated the estimation of the impact of an endogenous treatment on a continuous outcome using a multi-equations model. This reformulation has been extended to the analysis of endogenous binary outcome by [4]. Thus, several papers have analyzed the effects of an endogenous treatment on diverse types of outcomes including continuous and count outcomes [5] [6] [7]. Also, some generalizations for the case of noncompliance and nonresponse have been introduced by [8]. However, all these methods focus on cross sectional data and do not account for panel data.
There is almost no automated estimation method in Stata software to estimate parameters in this multi-equations model. An exception is the cmp command which is the first general Stata tool for this class of models, and is written as a Seemingly Unrelated Regressions (SUR) estimator [1]. However, this command does not explicitly consider panel dimension of the data which might be an issue due to the effervescence of databases with a temporal dimension. Moreover, simple relationships among variables at a point in time do not capture adequately the dynamic interaction of changing humans in changing environments. Thus, there is a need to develop a command for simultaneous equations model for panel database.
As a result, we offer an extension of the cmp framework, in a case where there are either two equations (one linear and one binary outcomes) or three equations (either two linear dependent variables with one binary, or one linear and two binary dependent variables), while explicitly considering the panel dimension of the data. In this way, our command xtcmp is a Full Information Maximum Likelihood (FIML) estimator, taking into account time dimension of the data, as well as, linear and non-linear outcomes (which is not feasible with three-stage least squares, because the latter only takes into account linear dependent variables).
As a result, the likelihood function is a multidimensional integral, such that we use the adaptive Gauss-Hermite quadrature method as an approximation (as In this way, the error term for individual i at period t in the k th equation is given by The full model can be written as follows: where Y  contains the related latent variables for the first 1 d equation and the original continuous variables for the others. The explanatory matrix X is given by ( ) β is the parameter vector of the k th equation. We suppose that the classical hypotheses on independence between 1) error components, and 2) the error components and explanatory variables, are satisfied. Furthermore, let us assume that the error components are independent and identically distributed with zero means and covariance matrices µ Σ and ν Σ , the latter being defined as follows: in which 1 Σ is a 1 d dimension matrix with 1 over the diagonal (which corresponds to the covariance matrix structure for simultaneous equations with only binary outcomes), 2 Σ is a 2 d dimension matrix, and 3 Σ is a 2 1 d d * dimension matrix. Thus, the overall individual likelihood is given by: We now focus on two cases. The first one, related to a simultaneous equations model with two outcomes, one binary and the other one continuous, is treated in Section 2.1. Then, we focus on a case with three outcomes, composed of either one binary and two continuous variables (that is developed in Section 2.2), or two binary and one continuous variables (analyzed in Section 2.3).

Case with Two Outcomes: One Binary and One Continuous
Let us consider the two following equations: is a linear outcome. The associated variance/covariance matrices of error components are:

Case with Three Outcomes: One Binary and Two Continuous
Let us consider the two previous equations (Equation (8) and Equation (9)), and a new one: also corresponding to a linear equation with 3 3 it it y y =  . The associated variance/covariance matrices of error components are:

Case with Three Outcomes: Two Binary and One Continuous
In order to derive the likelihood function for a case composed of two binary and By identification, we have likelihood has the following form:

Estimation Requirement
The likelihood function being a d-dimensional integral function, we use the Gauss-Hermite quadrature method (see Moussa and Delattre [11]). Implementing this method requires to 1) compute the mode μ of the log-integrand and to derive the Hessian matrix H at μ with respect to µ ; and, 2) to derive the gradient of the overall likelihood function with respect to the parameters.
Let Q denotes the selected number of quadrature points, x denotes the Q dimension vector of quadrature nodes, and w denotes the Q dimension vector of quadrature weight. By applying the adaptive Gauss-Hermite quadrature [9], the  (3) can be rewritten as: The derivation of the Hessian matrix is explained in Section 3.1 while the gradient of the overall likelihood function is derived in Section 3.2.

Hessian Matrix at μ
Based on the expressions of it  for each case described in Section 2 (Equation (10), Equation (12), and Equation (14)), we first need to write the associated log-integrand ( ) log f corresponding to each three cases. Then, we can focus on the calculation of the Hessian matrices, where we need to derive Focusing on the first case with two equations, we have the following log-integrand: Thus, the Hessian matrix is given by: Then, focusing on the case of three equations with one binary outcome, the log-integrand is given by: The associated derivatives, assuming , are given by: 3  3  2  3  2  2  2  3  2 3  1 2   2   1  1  1  2  3 3  1  2 3  2  2  2  2  2  1  1  2  3 log 1 1 Finally, for the three equations with two binary outcomes case, the log-integrand can be written as: Then, considering 1 a , 2 a and ra previously defined and the following notations: We find the following:

Gradient Vector with Respect to the Parameters
Based on the likelihood function given by Equation (15), parameters to estimate are k β , with 1, , k d =  , and the associated covariance matrices µ Σ and ν Σ .
Thus, the gradient has to be calculated with respect to these parameters. The first order derivative of the log-likelihood function with respect to a parameter α , in the set of parameters, is given by: Focusing on the three cases, we apply this formula to compute derivatives with respect to each parameter. First, considering the two outcomes case, we need to consider the six following parameters: 1 ρ . As in subsection 3.1, we consider the previously defined it b , which is specific to the case with two outcomes, such that we have: Now, focusing on the case of three equations with one binary outcome, we consider notations associated to this case in subsection 3.1 ( it b , 2 a , and ra ).

Examples and Comparisons with Roodman's Command
In order to shed light on the advantages and the consistency of our method (xtcmp), we decide to implement examples using a dataset, previously used for xtsur 1 , in Stata software. This database is an unbalanced panel database of 1672 observations, corresponding to 142 individuals followed between 1990 and 2003. All explanatory variables used are quantitative and do not contain any missing values.
We implement two cases: 1) a system of two equations with one linear and one binary dependent variables; 2) a system of three equations with two binary and one continuous outcomes. Indeed, let us consider the three following equations:  coefficients (post-estimation). Comparing with our results (xtcmp, last column), we can see that the variance of the individual effects seems to be overestimated in cmp's case.
Then, we offer an example for the second case, a three equations system with two binary outcomes and one continuous dependent variable, such that we consider Equations (20)-(22). In this case, the associated variance/covariance matrices of the error components are given by: 3 1 3  2  1  3  1,2 1 2  2  2,3 2 3  2  2  2  3  1,3 1 3  2,3 2 3  3   1 1 and ν µ ρ ρ σ σ ρ σ σ ρ σ σ ρ ρ σ ρ σ σ σ ρ σ σ ρ σ ρ σ σ ρ σ σ ρ σ σ σ  Table 2.  Results suggest that, as before, estimating equations separately leads to errors in the significance of coefficients, especially for the first outcome. Moreover, such method does not consider the covariance between idiosyncratic errors and individual random effects. On the other hand, considering equations simultaneously allows to obtain more consistent estimates (coefficient estimates seem closer in the last two columns, and significance of the latter is persistent along the two last columns).

Conclusions
xtcmp is a command implemented under Stata software. We focus on three main cases: 1) a simultaneous equations model with two equations (including one linear and one binary outcomes); 2) a case with three equations composed of two linear and one binary outcomes; and, 3) a three equations case with one linear and two binary dependent variables. This command further develops Roodman's [1] command cmp which does not explicitly consider the panel dimension of the data, nor simultaneous equations model since it is written as a SUR estimator. This technical note gives detailed description of the computations, namely likelihood functions, log-integrand associated, Hessian matrices and gradient vectors with respect to each parameter of interest, specific to the three cases described above. xtcmp's estimation framework could be further developed in order to consider a broader range of non-linear outcomes (such as ordered probit, multinomial probit or truncated framework, for instance) or to consider much more equations simultaneously, in a dynamic setup. Still, as it stands, xtcmp represents a significant development in Stata's commands. Indeed, it allows researchers to resolve endogeneity issues in a panel dimension context by analyzing correlation in the error terms of the equations and thus specific individual effects depending on the outcomes.