Estimation of the Parameters of the Reiber’s Hyperbolic Function with the Levenberg-Marquardt Algorithm ()
1. Introduction
Thirty years ago, Hansotto Reiber proposed a hyperbolic model for the diffusion of proteins through the blood-cerebrospinal fluid (CSF) barrier [1]. His model revolutionized the CSF analysis and provided solid foundations for modern clinical neurochemistry. The model was theoretically derived from the laws of diffusion (the Fick’s laws, see [2]) and empirically confirmed on large-scale patient cohorts, but the parametrization of the hyperbolic functions was originally performed with laborious manual iterations due to important, physiologically-driven constraints that have to be taken into account but are difficult to formalize. Details of this procedure are discussed in [3]. In the current paper, I show how to apply a numerical optimization algorithm to find those parameters. I need to emphasize that I am not attempting to readdress theoretical derivations of the blood-CSF barrier diffusion. I am merely trying to explain the mathematical background for its optimization. Of course, virtually every statistical package today offers fast and reliable methods for non-linear regression; however, I believe that it is of paramount importance that neuroscientists interested in the CSF protein analysis understand the process of non-linear optimization and not only use commands built into a statistical software.
2. Materials and Methods
Consider function
(1)
parametrized by three positive real numbers,
,
, and
, which we group into a vector,
. The first-order derivatives of this function with respect to the three parameters are:
(2)
Now, we assume a study with
paired observations,
, with the vector of the explanatory variables,
(e.g. albumin quotients,
), and the vector of the corresponding outcome variables,
(e.g. IgG quotients,
), where a quotient is a dimensionless quantity obtained by division of the concentration of a given protein in the CSF by its concentration in the serum,
. Our goal is to find the vector of the parameters,
, minimizing Euclidean norm of the vector of the residuals,
or, stated equivalently, minimizing the sum of the squared vertical distances between the empirical observations,
, and the model predictions,
. This is an unconstrained optimization problem in non-linear regression1. Formally:
(3)
This can be achieved, for example, with the Levenberg-Marquardt Algorithm (LMA) [4] [5].
To do so, first we define the Jacobian
matrix,
, such that its i-th row contains the three derivatives defined in Equation (2) evaluated at the i-th explanatory variable,
. We also define a dumping parameter,
. After initialization of the vector of the model parameters and the dumping parameter, in a
-th iteration of the algorithm the vector of the parameters is updated by:
(4)
where
(5)
where
is the identity matrix (for notational simplicity, the superscripts referring to the iteration step were omitted). In each step of the algorithm, the dumping parameter,
, is adjusted as follows. First, the error is evaluated at the new vector of the parameters. If the error has increased, the update is rejected and the dumping parameter is increased by some factor (say, 100). This is repeated until the error decreases below the error of the previous step. If the error has decreased as the result of the update, the updated parameters are accepted and the dumping parameter is decreased (say, by factor 100). The algorithm proceeds until convergence, which somehow reflects lack of further significant improvements (for example, when the norm of the updating vector,
, gets below some predefined small positive number, say 10−12).
3. Results
The algorithm was implemented in R (4.3.2)2. It was first tested with one hundred repetitions of simulated datasets with 10,000 paired observations each, set such way that a heteroscedastic Normal random noise (with a constant CV of 10%) was added to the deterministic hyperbolic function. The empirical bias, variation, and the Mean Square Error (all multiplied by one thousand for better readability) are reported in Table 1.
Table 1. Results of the simulation study; the true values were:
,
, and
.
Parameter |
Mean |
103 × Bias |
103 × Var |
103 × MSE |
|
0.1998 |
−0.1589 |
0.0337 |
0.0337 |
|
0.0999 |
−0.0909 |
0.0061 |
0.0061 |
|
0.0499 |
−0.0932 |
0.0250 |
0.0251 |
With those results, clearly indicating that the estimates are characterized with very low bias and variance, the algorithm was applied on the real-life dataset kindly provided by H. Reiber. This dataset overlaps with the set used in his paper in 1994 to optimize the regression curve in the diagnostically most relevant region of
. Briefly, the samples were collected from patients aged between 0.5 and 75 years routinely diagnosed for neurologic conditions. Only subjects without intrathecal synthesis of immunoglobulins and without intracerebral haemorrhage were included in the study; CSF samples with blood contamination were excluded. The concentrations of the proteins were measured by automated nephelometry. All further details are given in [1]. With the arbitrarily chosen initial values
, the algorithm converged after 11 iterations and delivered the following estimates:
(6)
with the standard errors resulting from well-known good approximation of the covariance matrix by
at the convergence [6]. The same estimates were obtained with different initial values, confirming robustness of the algorithm. It needs to be emphasized that the estimates published in the original paper [1] are astonishingly similar to those obtained here. Figure 1 presents the empirical data points and the fitted hyperbolic curve.
Figure 1. Empirical data from 4010 patients and the fitted hyperbolic curve.
4. Discussion and Conclusions
The function in Equation (1) is linear in
and
but non-linear in
, which makes the problem of its optimization non-linear. Several numerical optimization methods are available for fitting a non-linear model to a set of data [7]. For example, the Gradient Descent (GD) solves Equation (3) by iteratively updating the parameters of the model in the “downhill” direction of the objective function (the one to be minimized). The update in Equation (5) is replaced by:
(7)
with the positive real number
defining the length of the step in the (steepest-)descent direction. The method is simple and computationally cheap, as it requires only the Jacobian matrix. However, this conceptual and computational simplicity comes at the cost of a large number of iterative steps required for convergence. In the current setting, a simplistic GD with non-adaptive
needed about 100,000 iterations to converge and turned out very sensitive to the initial values. The Gauss-Newton Algorithm (GNA) relies on the assumption that the objective function is approximately quadratic near the optimal solution. Information on this (quadratic) curvature enters the algorithm as the second-order differential equations matrix, the Hessian matrix, of which
is often a good approximation. This results in a rapid local convergence. The update becomes:
(8)
The LMA varies the updates between the two optimization methods, taking advantages of both, by adaptation of the dumping parameter,
, in each iteration. Small values make the LMA approach the GNA (as the term
becomes negligible compared to
). For large values of
, the LMA mimics the GD method (with
). The intuition is that if the error increases we are likely far from the minimum we are looking for; therefore, we need to increase the dumping parameter in order to turn the algorithm into a simple GD. If the error decreases, on the other hand, we are probably getting closer to the minimum and we should emphasize the information on the local curvature of the optimization function.
Hyperbolic functions can be easily, yet erroneously, mixed with a linear function. Indeed, for very small, compared to
, positive
the term under the square root becomes
and the hyperbolic function converges to a “linear approximation”,
. This, however, leads to an obvious problem, since for positive
, the linear function will become negative for small
, which is physiologically impossible. Further, it might be tempting to use statistical properties of the regression curve (e.g. the confidence or the prediction limits) to establish diagnostic references for the disease-related interpretations. This is not appropriate, because the confidence bands of a regression curve do not reflect the biological variability, they only show the region where the “true” curve is expected to lie. Instead, following Reiber’s idea, the upper limit of the
can be defined by finding parameters of a hyperbolic function fitted to the largest empirical outcomes (stratified according to
), again following the same principle as it is outlined above for the average curve. Finally, it needs to be emphasized that the diffusion model proposed by H. Reiber applies equally well to the whole range of
(whereas the values beyond that limit are never seen in practice) and to other proteins (IgA, IgM, prothrombin), too, and the corresponding hyperbolic curves can be analogously parametrized with the LMA outlined here.
NOTES
1Stringently spoken, since has to be positive for all , this is a constrained problem, requiring ; however, since empirical observations in studies this model is developed for are always strictly positive, we may omit this constrain, only checking if the estimates indeed obey .
2The code is available at https://github.com/LewczukPiotr/Hyperbolic_Function.