An approximation method for a maximum likelihood equation system and application to the analysis of accidents data

There exist many iterative methods for computing the maximum likelihood estimator but most of them suffer from one or several drawbacks such as the need to inverse a Hessian matrix and the need to find good initial approximations of the parameters that are unknown in practice. In this paper, we present an estimation method without matrix inversion based on a linear approximation of the likelihood equations in a neighborhood of the constrained maximum likelihood estimator. We obtain closed-form approximations of solutions and standard errors. Then, we propose an iterative algorithm which cycles through the components of the vector parameter and updates one component at a time. The initial solution, which is necessary to start the iterative procedure, is automated. The proposed algorithm is compared to some of the best iterative optimization algorithms available on R and MATLAB software through a simulation study and applied to the statistical analysis of a road safety measure.


Introduction
Approximation methods for Maximum Likelihood (ML) systems of equations are of interest and are motivated in this paper by the need to find estimation methods that are simple and easy to implement in the specific field of the statistical evaluation of the impact of a road safety measure. In practice, the estimation methods dedicated to this evaluation depend both on the nature of the measure and the available data. Methods based on frequencies combination have received considerable attention [1] [2] [3] and for the most part of them, we are faced with the estimation of unknown parameters which are often functionally dependent.
Many approximation methods for maximum likelihood estimation need to solve systems of linear or non-linear equations with or without constraints [4]- [9]. Newton-Raphson's method and Fisher scoring are certainly the most commonly used approximation methods. They consist in updating the whole parameter vector using the iterative scheme: where ( ) k φ is the estimate of the vector parameter at the step k ,  is the log-likelihood function, Within the framework of crash data analysis, different iterative estimation methods have been proposed [2] [11]. For example, Mkhadri et al. [11] propose Despite the above advantages, the choice of the starting value ( ) 0 φ remains a major issue since a value of ( ) 0 φ relatively far from the true unknown value of the vector parameter can lead to erroneous solutions or to non-convergence. In addition to this, it must be noted that obtaining explicit expressions of standard errors is not generally easy.
In this paper, we present an estimation method without matrix inversion based on a linear approximation of the likelihood equations in a neighborhood of the constrained maximum likelihood estimator. We obtain closed-form approximations of solutions and standard errors. Then, we propose a partial linear approximation (PLA) algorithm which cycles through the components of the vector parameter and updates one component at a time. The initial solution is automated and standard errors are obtained in a closed-form. The PLA is compared to some of the best available algorithms on R and MATLAB software through a simulation study and applied to real crash data.

134
The remainder of the paper is organized as follows. Section 2 is devoted to the statistical model and the main assumptions used to get closed-form approximations and standard errors. The proposed estimation method and the method for computing standard errors are presented in Section 3. The general framework of the proposed algorithm is also described. In Section 4, we give an illustration of our results using a crash data model. The numerical performance of the proposed algorithm is examined in Section 5 through a simulation study while a real-data application is given in Section 6. The appendix is devoted to the technical details of the main results.

ML Estimation and Main Assumptions
Assumption 1. The vector parameter φ is partitioned as is a vector and 0 j β > for all 1, , j r =  . Assumption 2. The unknown vector φ is subject to a linear constraint where λ is the Lagrange multiplier. The information matrix linked to the constrained maximum likelihood estimator φ is where ( ) Assumption 4. For any 0 θ > , there exists a non-singular r r × matrix and the non-linear system is approximated by the linear system ˆ,  and y D is a 1 r × vector whose components are obtained from y . Assumption 5. There exists a function 1 : Assumption 6. There exist two strictly positive real numbers , n a φ and , n b φ , a non-singular r r × diagonal matrix φ Σ and a vector Assumption 3 specifies the form of function C . Particularly, C is only a function of sub-vector β . Assumptions 4 and 5 enable us to get β from θ and inversely. Assumption 6 enables us to transform the Fisher information matrix in order to use classical results on matrix inversion with Schur's complement [17].

Partial Linear Approximation Principle
The general problem of finding the constrained maximum likelihood estimator A. N'Guessan et al.

136
has been discussed by many authors [12] [13] [14]. The classical approach is based on a Newton-type algorithm and computes the components of φ at once.
Except from some few simple cases, it is not generally possible to get explicit expressions of the components of φ . One shows the following lemma (we refer the reader to the appendix for a proof). Lemma 1. The constrained maximum likelihood estimator φ , provided it exists, is solution to the non-linear system ˆ, 0 and 0 ,  . Our approach consists in dividing Equation The estimation process may be completed by the computation of standard errors with Theorem 2 below.
Theorem 2. Under assumptions 3 -6, the asymptotic variance of the components of φ are ( ) 1 12 and A. N'Guessan et al.

General Framework of the Partial Linear Approximation Algorithm
Algorithm 1 (The partial linear approximation algorithm).
Step 0 (Initialization) Given ( ) Step 1 (Loop for computing φ ) For a given   , then replace k by 1 k + and return to Step 1. Else, set and go to Step 2.
Step 2 (Computation of standard errors) a) Compute ( ) The aim of this paper is not to conduct a theoretical study of the convergence of the proposed algorithm. We rather focus on the numerical aspect of this convergence through an application model. Nevertheless, we can notice that the estimation of φ using our algorithm does not require any matrix inversion. It is thus easy to think that getting the constrained maximum likelihood estimator φ is improved in terms of computation time.

Statistical Model
We apply the above algorithm to estimate the parameters of a statistical model used to assess the effect of a road safety measure applied to an experimental site presenting 0 r > mutually exclusive accident types (fatal accidents, seriously injured people, slightly injured people, material damage, etc ) over a fixed period. Let us consider the random vector ( ) respectively represent the number of accidents of type j registered on the experimental site before and after the application of the road safety measure. In order to take into account some external factors (such as traffic flow, speed limit variation, weather conditions, regression to the mean effect, etc  ), the experimental site is linked to a control area where the safety measure was not directly applied. The accidents data for the control area over the same period is given by the non-random vector where j z denotes the ratio of the number of accidents of type j registered in the period after to the number of accidents of type j registered in the period before. Following N'Guessan et al. [18], we assume that the vector Y has the multi- where n is the total number of crashes recorded at the experimental site and ( ) 11 1 21 2 , , , , , r r π π π π π =   is a vector of class probabilities whose components are ( ) By construction, the parameter vector proposed by N'Guessan et al. [18] which was applied simultaneously on several sites.

Cyclic Estimation of the Average Effect and the Different Accident Risks
The log-likelihood is specified up to an additive constant by where .  [18] showed that a stationary point of ( ) must be the solution to the following system of non-linear equations: Drawing our inspiration from the work by N'Guessan [19], we show that the linear system (13)   Technical proof uses the Schur complement approach and stems from [17].
One shows (see the appendix) that the elements of the asymptotic information

Practical Aspect of the Partial Linear Approximation Algorithm
Algorithm 2.
Step 0 (Initialization) Given 0 >  and y D , set ( ) Step 1 (Loop for computing φ ) For a given   , then replace k by 1 k + and return to Step 1. Else, set and go to Step 2.
Step 2 (Computation of standard errors) a) Compute The partial linear approximation algorithm for computing the constrained maximum likelihood estimator φ of the model presented in subsection 4.1 stems from the cyclic algorithm of N'Guessan and Geraldo [20]. The PLA proceeds as follows: step 1 allows to estimate φ alternating between its two components θ et β . To start the procedure, we initialize ( ) But, we could also initialize ( ) 0 β using the problem's data and get ( ) 0 θ . The process is repeated until the stopping criterion is satisfied. We note that our algorithm is automated and can be started as soon as the problem's data y D is entered.
We can also note that the second partial derivatives of the log-likelihood function are no longer used in our algorithm. The aim of this paper is not to carry out a theoretical study of the convergence of the proposed algorithm. We rather focus on the numerical aspect of this convergence using simulated datasets.

Data Simulation Principle
For a given value of r (the number of crash types), we generate the components of vector The true value of θ denoted 0 θ is fixed and the true value of β , denoted ( )

Numerical Results
This subsection deals with the numerical convergence of the partial linear approximation algorithm. As usual in the study of iterative algorithms, we analyse the influence of the initial solution ( )  Varadhan [30].
The simulation process was performed on many simulated crash datasets. For     algorithm is defined as the ratio between the mean computation time of this latter and the mean computation time of the PLA (therefore the duration ratio of the partial linear approximation algorithm always equals 1). The computation time depends on the computer used to perform the simulations while the duration ratio is computer-free and therefore more useful.
To analyse the convergence, we used the mean squared error (MSE) defined as: One can see that the MSE decreases when n (the total number of road crashes) increases. The PLA is at least as efficient as the other algorithms. It always converges to reasonable and acceptable parameter vector estimates and the estimate gets closer to the the true value as n increases.
For a small value of n (see Table 1, Table 3 and Table 5), the estimate φ produced by the PLA is relatively close to the true parameter vector and quite close to those of the other methods. More generally, all the compared methods have a MSE of order 2 10 − . However the estimates produced by the BFGS and Nelder-Mead's methods are very far from the true values (see Table 5). In the case of random initialisation, the MSE for Nelder-Mead's and BFGS algorithms is 20 times greater than those of the other algorithms.
When n increases ( )  (Table 4 and Table   6) while those of the PLA and the other methods are

Real-Data Analysis
We apply the partial linear approximation algorithm to the data concerning the changes applied to road markings on a rural Nord Pas-de-Calais road (France) [31]. This road lay-out consisted in what is called "Italian marking". The road markings were modified in order to make overtaking impossible in both directions at the same time. On a short distance, there are two lanes for the first direction (overtaking is then allowed in that direction) and there is only one for the other direction. Both directions are separated by a white line. A bit farther away, the system is inverted so that overtaking is possible in the opposite direction, and so on. The data recorded for eight years (four years before and four years after) on the marked area are given by Table 7.
Note that the number of fatal crashes (Fatal) decreased from four (in the before period) to one after the lay-out change. Indeed there were four accidents with at least one person killed in the period before the road works and one crash with at least one person killed in the period after. The number of slight crashes (no serious bodily injuries involved) recorded in the same period were more than halved. For the same lengths of time, on a portion of National Road 17 used as a control area, accident reports are the following.
The values given in Table 8 are obtained by dividing, for each accident type, the number of crashes after the changes by the number of crashes before. On the whole, a decrease can be noted in comparison with the control area accident numbers between the 4-year period after the changes and the 4-year period before. All the algorithms can be applied to Table 7 and Table 8. Since the simulation results suggested that the partial linear approximation algorithm converges after a few iterations and remains steady, we only present the estimations related to the partial linear approximation algorithm (Table 9).

Concluding Remarks
We We prove that the assumptions are all satisfied and we obtain simple expressions of the estimators and their asymptotic variance. Afterwards, we give a practical version of our algorithm.
The numerical performance of the proposed algorithm is examined through a simulation study and compared to those of classical methods available in R and MATLAB software. The choice of these other methods is dictated not only by the fact that they are relevant and integrated in most statistical software but also by the fact that some need second order derivatives and others do not. The comparisons suggest that not only the partial linear approximation algorithm is competitive in statistical accuracy and computational speed with the best currently available algorithm, but also it is not disturbed by the initial guess.
The link between the numerical performance of our algorithm and the particular model used in this paper seems to be a limitation to the competitiveness of our algorithm with regards to the other methods considered in this paper. However, this particular choice of model allows not only to show the feasibility of our algorithm but also represents a good basis for approaching, under additional conditions, more general models.
A. N'Guessan et al. 148 The simulations results obtained on the particular model considered in this paper suggest that our algorithm may be extended to other families of multinomial models (such as the model of N'Guessan et al. [18]). Drawing our inspiration from [32], we may also prove the asymptotic strong consistency of the estimator obtained from our algorithm. This perspective will give a wider interest to our algorithm in the context of a large-scale use.

Proof of Theorem 2
Proof. Under conditions 6, φ Γ is non singular and after some matrix manipulations, we get