Imaging Applications with Variable Projections

We review two important fields of application for the method of Variable Projections (VARPRO): Seismic Prospecting and Medical Imaging. We cover the period since 2002 when a first review was published. Variable Projections is a method for the solution of nonlinear least squares problems where the model is a linear combination of nonlinear functions. Its success is based on the fact that the reduced functional, where the linear parameters are elimi-nated converges always faster than if the same method of solution is applied to the original problem. More importantly, many of these problems are very hard to solve in their original format and VARPRO has shown its value in many different applications.


Introduction
We consider nonlinear data fitting problems that have as their underlying model a linear combination of nonlinear functions. Models of this type are very common inasmuch as many inverse problems can be viewed as nonlinear data fitting problems [1].
Given a set of observations { } i y , a separable nonlinear least squares problem is defined in [1] where the ith component of the residual vector is written as where the linear parameters have been eliminated.
We define Φ , which will be called the Variable Projection (VP) of y . Its name stems from the fact that the matrix in parentheses is the projector on the orthogonal complement of the column space of ( ) α Φ , which we will denote in what follows by . We will also refer to ( ) This is a more powerful paradigm than the simple idea of alternating between minimization of the two sets of variables (such as the NIPALS algorithm of Wold and Lyttkens [2]), which can be proven theoretically and practically not to result, in general, in the same enhanced performance.
In [3] you can find a full survey of applications of VARPRO in many different fields. The purpose of the current excerpt is to tell the story of these developments and applications since the first review in 2003 [4] in two important fields: geophysics and medical imaging. The common thread for the majority of the papers is the use of variable projections for separable models in a least squares context, while a small percentage uses only the results on the derivatives of pseudoinverses and projectors.

Geophysics, Petroleum Engineering
This is an area of particular interest to one of the authors, who has worked many years in exploration geophysics, mainly on geological modeling, seismic ray tracing and inverse problems [5] [6] [7]. In the past few years VARPRO has been discovered as one of the keys for solving the fundamental earth imaging problem using full waveform inversion [8]. This is a notoriously expensive procedure (it requires many solutions of the wave equation in 3D) and it leads also to multi-V. Pereyra, G. Scherer American Journal of Computational Mathematics modal optimization problems, where local optimization algorithms have difficulties in avoiding sub-optimal minima. It turns out that several different applications within this area can be stated as separable problems, making them amenable to the use of VARPRO, which tends to deliver much better behaved optimization problems in the reduced space of the nonlinear variables. It turns out that Bill Symes had foreseen this early on with his related method of differential semblance [9].
For instance, in [10] the authors tackle the well-known global convergence issue associated to any full waveform inversion (FWI) approach by solving an extended-image space least-squares migration problem to remove any local minima present in the FWI objective function. They discuss the connection between the reflectivity and migration velocity inversion and show the importance of combining the two problems using one objective function. Moreover, they show the full separability of the two inverse problems by using the variable projection method.
Furthermore, in [11], the same authors indicate that the main issue inherent to full waveform inversion (FWI) is its inability to correctly recover the Earth's subsurface seismic parameters from inaccurate starting models. This behavior is due to the presence of local minima in the FWI objective function. To overcome this problem, they propose a new objective function in which they modify the nonlinear modeling operator of the FWI problem by adding a correcting term that ensures phase matching between predicted and observed data. This additional term is computed by demigrating an extended model variable and its contribution is gradually removed during the optimization process while ensuring convergence to the true solution. Since the proposed objective function is quadratic with respect to the extended model variable, they make use of the variable projection method. They refer to this technique as full waveform inversion by model extension (FWIME) and illustrate its potential on two synthetic examples for which FWI fails to retrieve the correct solution.
In [12] the authors consider planar waves events recorded in a seismic array that can be represented as lines in the Fourier domain. However, in the real world, seismic events usually have curvature or amplitude variability, which means that their Fourier transforms are no longer strictly linear but rather occupy conic regions of the Fourier domain that are narrow at low frequencies but broaden at high frequencies, where the effect of curvature becomes more pronounced. One can consider these regions as localized "signal cones". In this work, the authors consider a space-time variable signal cone to model the seismic data. The variability of the signal cone is obtained through scaling, slanting, and translation of the kernel for cone-limited (C-limited) functions (functions whose Fourier transform lives within a cone) or C-Gaussian function (a multivariate function whose Fourier transform decays exponentially with respect to slowness and frequency), which constitutes their dictionary. The authors find a discrete number of scaling, slanting, and translation parameters from a continuum by optimally matching the data. This is a non-linear optimization problem, which is solved by a fixed-point method that utilizes a variable projection method with 1 l constraints on the linear parameters and bound constraints on the non-linear parameters.

Medical and Biological
Many applications to imaging and spectroscopy in biological and medical systems can be found in this section. Again, identification problems are common. A software tool which appears in several articles is the open source package, TIMP [13]. TIMP is a problem solving environment for fitting separable nonlinear models to measurements arising in Physics and Chemistry written in R. It has been extensively tested for time-resolved spectroscopy and FLIM-FRET data (FLIM: Fluorescent Lifetime Imaging Microscopy; FRET: Forster resonance energy transfer).
In [14], a tool for the processing and automatic quality grading in the fish industry is developed based on diffuse reflectance imaging and the subsequent unmixing of the absorption spectra using a constrained least squares model to detect hemoglobin concentration. It is common for the absorption lines to have a Gaussian or Lorentzian distribution shape, so VARPRO is used in the decomposition of the spectra.
In [15], a dynamical model, described by the equations of motion of a springmass-damper system, was set up to estimate the impedance (force-position) for the human elbow. Then, a system identification technique based on predictionerror minimization (PEM) was developed of this non-parametric time-domain model augmented with a parametric noise model. VARPRO is used in the approximation.
The paper [16] describes a methodology to determine the Forster resonance energy transfer in live cells as measured using fluorescence imaging microscopy.
The parameters of the nonlinear quantitative model can be computed using a variable projection type of algorithm. In [17] VARPRO is used to fit NMR spectra to depict small and highly oblique nerves of the lumbosacral plexus. The aim is to use the methodology for diagnostics of pathologies. For medical imaging applications, a radioligand (a radioactive biochemical substance, in particular a radiolabelled substance) or tracer is introduced, usually intravenously. The transport and the binding rate of the tracer are assumed to depend linearly of the difference of the tracer concentration between two compartments, so defining the system of ODE in the tracer concentrations. Often the data measured are sums of the concentrations.

Spectroscopy
After obtaining expressions for the identification of relaxation times associated with kinetic fluorescence decay and those associated with the dynamic evolution of fluorophores (chemical compounds that can re-emit light upon light excitation), the author of [22] suggests the use of variable projection algorithms in the evaluation of photochemical bioimaging, when the fluorophores are used as the probe molecules. In these studies a multi-exponential decay surface can be ascribed to each pixel, where the fluorescence decay times and the corresponding emission or excitation wavelength dependent amplitudes can be recovered by the VARPRO algorithm. The first part of [23] is a survey of the adaptation of the variable projection algorithm to the case of matrix data and of constraints on the linear parameters. This form of least squares approximation to fit linear combination of nonlinear functions is common in the applications considered in the paper, namely, spectroscopy, microscopy and mass spectrometry. The authors emphasize the importance of forming the residual vector in a particular manner to avoid storing and operating with tensor products, and describe a way to compute the covariance for the linear parameters using less memory resources.
The application to spectroscopy involves determining the kinetics of a compartmental model of a photo-system for conversion of photons into chemical energy, using the time-resolved fluorescence measurements. This is a separable nonlinear least squares problem with matrix data Ψ and unknown kinetic model ( ) T C z E , where z are the non-linear parameters and the spectra is represented by the (non-negative) linear coefficient matrix T E : . It is solved using the software package TIMP.
In the application to microscopy the authors consider the detection of a protein-protein interaction by the simultaneous analysis of multiple FLIM images. FLIM counts the photons detected at several time intervals and over many locations. The data analysis gives rise to a separable nonlinear problem with the same nonlinear parameters but different linear ones, and multiple right hand sides (also termed global analysis problem) and with a matrix data: The kinetic processes of the fluorescence decays are exponential and are represented in the columns of ( ) C z convolved with an instrumental response American Journal of Computational Mathematics function (IRF). The rows of E are the amplitudes corresponding to each kinetic process.
An additional point to consider here is the distribution of errors in the FLIM data, a count of number of photons detected at a given position and time. If the count is large, then to assume that the data errors have a Gaussian distribution with mean 0 and variance 2 σ is valid and the least squares criteria acceptable.
For smaller counts, the errors have a Poisson distribution and the least squares estimates are not very good. One possible correction is to weight the data points The third area of application considered is mass spectrometry (MS), in con- Here C are the elution profiles and E the mass spectra resolved with respect to the mass-to-charge ratio. When several samples of the same compound are considered, the elution profiles are different but the mass spectra is considered the same. Often C can be well represented by choosing columns of exponentially modified Gaussians with 4 parameters, width, location, decay rate and amplitude.
The problems can be solved with variable projections algorithms.
The performance of three gradient type algorithms: alternating least squares, VARPRO and the Kaufman simplification are compared in the case of a multi-exponential model of a photo-physical system [24]. Corroborating results of other authors, alternating least squares where the linear and nonlinear parameters are alternatively fixed and the problem minimized over the complementary set, is found to be the least efficient. The Fisher information matrix computation enables the authors to determine a lower bound for the covariance estimate of the precision of the nonlinear parameters, and conclude that in the present case the Kaufman simplification is the most cost efficient.
MEG or magneto-encephalogram is an imaging tool that can measure changes in the neural activity on a very small time scale (of the order of milliseconds). In [25] the authors compare several algorithms that solve the inverse problem: given magnetic field values at a number of measurement points, reconstruct the sources, i.e., compute the location and moment parameters of the set of dipoles whose fields best approximates the data in the least squares sense. This is a large nonlinear optimization problem with a complex objective function and many local minima. However, the model is a separable function, i.e., a linear combina-V. Pereyra, G. Scherer American Journal of Computational Mathematics tion, with coefficients depending on the dipole moment parameters, of nonlinear functions of the location parameters, amenable to variable projection. The authors compare several gradient-free algorithms for the reduced nonlinear problem: simulated annealing, genetic algorithms and a tabu search, and conclude that for the given problem, the best algorithm is a local genetic algorithm.
In 2D spectroscopy, contrary to conventional 1D, the third order optic signal at given population times is a function of two frequencies. In [26] the authors propose a method to analyze the 2D signals using a global analysis method based on the variable projection algorithm. To reduce the dimensionality, the 3D com- The data analysis can also be extended to fit rephasing and non-rephasing data simultaneously by building appending blocks of data to form the matrix Y. As this problem is computationally challenging due to its size, the authors suggest using in the minimization step a subsample of the data in the frequency dimensions, reporting that even using only 5% of the data gives satisfactory results.

Tomography
The next articles concern tomography: the imaging by sectioning of objects, or more generally, 2 and 3D imaging of the inside of objects using different signals and the reconstruction techniques employed to recover the information about these objects from the imaging data. For some of the medical imaging methods, a radiation source is used and the data acquired are multiple 2D images or projections taken from different angles. In transmission tomography, such as computerized tomography (CT), the radiation source is outside the object: a rotating X-ray tube generates X-rays that travel through the object. The detector measures the line integral of the beam intensity and the quantity to be reconstructed is the attenuation coefficient of the medium. In emission tomography (ECT), like PET and SPECT, the radiation source is inside: a radioactive tracer is introduced into the body. In the case of SPECT, the tracers emit gamma radiation that is measured. PET tracers emit positrons that when colliding with nearby electrons produce two gamma photons. The gamma radiation is detected by a rotating gamma camera that acquires multiple 2D images from different angles.
The quantity to be reconstructed in both cases is the distribution and concentration of the radioactive tracer in the different parts of the body. Finally, photo-acoustic tomography is based on the PA effect, the formation of sound waves following light absorption. The PA signals are acquired at several locations around the object using a transducer array, while the goal of the photo-acoustic imaging reconstruction is to retrieve the local pressure rise inside the tissue.
The reconstruction problem is then reduced to solving the linear rectangular system W = u p, where W is the m n × projection matrix, depending on the parameters ( ) timization methods directly. They also consider that a Poisson distribution is more appropriate for the data. Finally, articles [30] and [31] study the newest field in biomedical imaging: photo-acoustic computed tomography (PACT). In this imaging technique, short laser pulses are directed at the object. The absorption of the optical energy produces local heating causing expansion of the tissue and consequent photo-acoustic wave-fields that can be measured outside the body using piezoelectric ultrasonic transducers. The signals that they receive are convolved with their acoustic-electric impulse response (EIR). In [31] the authors choose to incorporate the effect of EIR into the reconstruction. This results in an inverse model with separable linear and nonlinear parameters: To estimate the parameters, the authors alternate the minimization between the linear and the nonlinear parameters.

Nuclear Magnetic Resonance (NMR) Spectroscopy and Imaging
This is one of the most important applications of VARPRO as explained in [4] and we keep seen strong use as many more specific and new results come out for practical applications. The physical phenomenon associated with NMR involves a sample that is placed in a magnetic field and is irradiated with a radio-frequency (RF) pulse of a determined resonant frequency. The nuclei in the sample emit a signal that can be recorded and interpreted. NMR can be used to identify what molecules are present because the resonant frequency (Larmor frequency) depends among other factors, on the molecules. In imaging applications, hydrogen is typically the molecule of interest.
In more detail, the The authors have compared two different algorithms to obtain the perfusion parameters: using Levenberg-Marquardt to estimate all the parameters and applying the variable projection separation of variables idea to define a nonlinear LSQ problem in the parameter p k , followed by the solution of the linear LSQ problem in the two linear parameters. To solve the nonlinear optimization problem in one variable they use Golden Section Search. Their accuracy and noise sensitivity comparisons included numerical simulations using parameter values in the range reported both for normal and malignant tissue to generate simulated MR signals that were later converted back into concentration functions after the determination of the parameters by the two optimization algorithms mentioned above. The results were comparable, but the VP based technique was three times as fast as LM for all the variables. More important for the medical application were the clinical trials with 20 patients. Here the LM failed to converge in approximately 15% of the tissues, including normal and cancerous, whereas the VP technique converged in 100% of cases.
In article [33], global and target analysis of the time-resolved spectra obtained in bioenergetic applications are reviewed. Spectroscopy is used here as a tool to investigate the dynamic properties of complex biological systems through the application of a short pulse of high energy that produces reactions like absorp-V. Pereyra, G. Scherer tion or fluorescence. Often the data collected depend on two parameters, one variable is the wave length and the second is the time after excitation.
To analyze the measurements and estimate the physicochemical parameters, both the kinetics (the compartmental model determined by transitions between the states) and the spectra must be modeled. One assumption is that the system is separable, meaning that the spectroscopic data of a complex compound is the superposition of the spectroscopic properties of the components weighted by their concentration. The simplest unidirectional kinetic model is used in global analysis, when the response is considered to be a sum of a few (2 -4) exponential decays convolved with the instrument response function (IRF), i.e., the data at different wavelengths are all approximated with a single set of exponentials.
Target analysis is used when the problem requires a more complicated kinetic model involving for example forward and reversible reactions, independent decays, etc.
In both cases the parameters in the final model (both the kinetic and the spectra) must be fitted. Assuming normally distributed noise, a nonlinear least squares fit gives the maximum likelihood estimator. The nonlinear least squares approximation leads to a separable nonlinear problem that is solved using the variable projection algorithm. Results for the application to the study of ultrafast dynamics of the photoactive yellow protein are presented.
In [34] a complementary approach to global analysis is described, the so called lifetime density analysis (LDA). It is a technique used in ultrafast (femto-and pico-second) spectroscopy, when investigating energy and charge transfer in complex photosystems. Instead of using a small set of exponential decays to fit the data, one assumes that they are better represented by the integral of a continuous distribution of decays. This integral is discretized using a sum of a large (~100) lifetimes distributed along the time of the experiment. The resulting approximation problem in matrix notation is: Here, n A is the ( ) m n × measurement matrix at the m time delays and n wavelengths, D is the matrix of the IRF convoluted with the exponential delays and x contains the amplitudes associated with the lifetimes. Again, it is a separable nonlinear least squares problem and it is solved by the variable projection algorithm.
Contrary to the global analysis strategy, the LDA approach has a large number of parameters and overfitting must be considered. The authors suggest the use of the well-known regularization techniques, truncated SVD and Tikhonov regularization. They also discuss briefly several other methods which they have included into the software, an open source Python package with GUI, that can be downloaded from the web.
Article [35] describes a technique to obtain more accurate estimates of spectral parameters when evaluating MR spectroscopic imaging (MRSI). One of the difficulties in spectral quantification is that the problem is not well conditioned. American Journal of Computational Mathematics A way to avoid the large uncertainties is to incorporate the spatial smoothness information contained in the tissue properties. So, instead of computing the spectral parameters for each voxel independently, the idea is to work with a joint formulation: Here , a θ are the linear and nonlinear parameters, d the data, and ( ) , R a θ is a two-term penalty function added for regularization: The terms in W θ and  a are designed to impose smoothness of all the parameters across the voxels. The proposed algorithm reformulates the optimization problem (3) as two consecutive subproblems, the first is the nonlinear least squares problem (3) but only with the 2 l penalty term of R , while the second subproblem is again equation (3), now only with the 1 l penalty term of R .
The first subproblem is solved using the variable projection strategy to compute an updated value for the nonlinear parameters θ , which is then fed into the second subproblem. This is then a linear least squares problem in the parameters a with an 1 l -norm regularization term and is solved by an alternating direction method of multipliers.
The technique was tested using several data sets, both from simulated and from in vivo experiments, and its performance was compared with QUEST, another method. The conclusion is that the accuracy is considerably improved, For long echo-time MRS signal modeling, VARPRO is used to fit a sum of complex damped exponentials. In [36] and [37] data quantification of metabolites in the case of short echo-time is considered. The in vivo short echo-time MRS are richer and therefore a more efficient solution is to create first a database of in vitro spectra measurements k v of the individual metabolites. The in vivo signals ( ) y t are then fitted with a combination of these k v , corrected by parameters k α , k ς , and k η to be determined in order to allow for specific scans.
A baseline term ( ) b t that globally represents the signals of other non-dominant, non-specified metabolites possible present, may also be added. This term is approximated by splines. The approximation problem considered is a non-linear least squares problem with a regularizing term: Here, the matrix D is a discrete differential operator acting as a regularization It is shown in [37], by numerical tests, that this algorithm is more efficient and more accurate than a "full" LS solution, where the linear and nonlinear parameters are optimized together. An open source software package AQSES based on the above techniques is described and tested. See [36]. In article [38] this soft- 1 exp If the radio-frequency pulse flip angles are almost perfect (Ernst angle), then the parameter C in the above rational model is close to zero and the model is a very simple one where the linear parameters can easily be separated from the nonlinear one. This model is used in [39] and [41]. The rational model is considered in [40] and the parameters are again determined using VARPRO.
The two articles [42] and [43] describe a software for the quantification of brain metabolites using data obtained from 2D J-resolved magnetic resonance spectroscopy, a technique developed to reduce the spectral overlap common when using clinical strength 3T NMR. The ProFit tool described in the articles implements a variable projection algorithm that also takes into account possible linear relations between parameters of different metabolites, differences between parameters of different spins in the same metabolite and the fixing of specific parameters.
The Dixon technique is a chemical shift imaging method that allows to create fat only or water only images, and can therefore be used when fat or water conceal the signal of interest. In [44], in order to reconstruct the images from a dual-echo Dixon, a voxel-wise cost function is defined in the linear parameters, water and fat magnitudes W ν , F ν and the nonlinear ones, initial phase shared by water and fat 0 ν Φ and ν ω , the phase induced by the inhomogeneity of the static magnetic field. To determine the maximum likelihood estimates of these parameters, a variable projection technique is used to solve a simplified nonlinear least squares problem. The method is applied to in vivo studies of foot/ankle and CE-MRA of thighs. The following study [45], on the effect of intra-or-extracellular water accumulation and intracellular acidification in muscles on the rate of transverse relaxation was performed using spectroscopy before and after exercise. The resulting imaging data was modeled by a sum of exponentials and the maximum likelihood fit, a nonlinear least squares approximation, was solved using VARPRO.
The subject of article [46], diffusion MRI (dMRI) uses the diffusion of water molecules in the generation of MRI tissue images. Molecular diffusion in tissues is constrained by interactions with obstacles such as macromolecules, fibers or membranes. In this paper, MIX, a method to characterize the tissue microstruc-V. Pereyra, G. Scherer ture of white matter fibers is developed. It uses a multi-compartmental model (intra-axonal, extra-axonal, isotropic) and allows for multiple fiber orientations. The data are fitted in the least squares sense to a combination of exponentials with constraints on the linear coefficients. In a first step, the algorithm constructs good initial values both of the linear and the nonlinear parameters. This is accomplished by applying the variable projection technique to separate linear from nonlinear parameters disregarding for the time being the fact that there are constraints on the linear parameters. The reduced nonlinear problem is solved using a stochastic method, a genetic algorithm, and the linear problem with constrained parameters is solved by CVX (Convex programming Matlab program designed by S. Boyd). In a second step, a trust region method is used to estimate all the parameters. The method was tested on synthetic and ex-vivo and in-vivo brain data.
An interesting technique for quantitative MRI inspired by Compressed Sensing and developed recently, Magnetic Resonance Fingerprinting (MRF), is the subject of the article [47]. The aim is to acquire enough magnetic resonance signal information in a reasonable short time to be able to deduce multiple tissue-specific parameters such as 1 T and 2 T times and spin density ( ) ρ x . Given the time restriction, and based on the assumption that the image can be approximated by a low-dimensional model, MRF opts for an under-sampled k-space using incoherent data acquisition schemes.
The data model has the form: Wilson [48] considers a new method for spectral registration. In contrast to previous approaches, the registration problem is formulated as a variable-projection (VARPRO) in the frequency domain. The use of VARPRO allows the incorporation of baseline modeling, whilst also reducing the iterative optimization complexity from two parameters (phase and frequency) to one (frequency). The approach is compared with TDSR (time-domain spectral registration), and found to be more robust to large frequency shifts (>7 Hz), baseline distortions and edited-MRS frequency misalignment. In his Ph. D. Thesis, Zhou [49] discusses and compares various methods for exponential fitting associated with NMRI, including VARPRO. In [50] the authors consider an interesting application to the assessment of myofascial trigger point via MRI for patients with migraine.

Brain Imaging
This is an interesting area with great future potential as more brain research with electro-magnetic methods is performed.
In [51] the authors consider diffusion MRI to map the brain microstructure.
For this purpose they introduce a novel data fitting procedure based on multi-compartment models. Their model is separable and they use VARPRO and a global search algorithm in the nonlinear phase. They show that the algorithm is faster than the commonly used linear search without separation. The accuracy and robustness is demonstrated on synthetic and real data.
Data-acquisition time length is a factor in the applicability of in-vivo spectroscopic imaging. Short data-acquisition time is possible when applying echoplanar spectroscopy imaging (EPSI) but with the disadvantage of a poor signal-to-noise ratio. The approach taken in [52] to obtain a high spatiotemporal resolution is to use a hybrid technique using a first step of double-echo chemical shift imaging (CSI) followed by an EPSI step. In the CSI step, the data sets 1s D and 1L D with a limited number of spatial values but with high temporal resolution are acquired. In the second EPSI stage, a data set 2 D with extended space coverage but limited temporal sampling is obtained. The algorithm uses the union-of-spaces idea, namely that the measured signals are a sum of nuisance, baseline (macromolecules) and metabolite signals. It also assumes partial separability of each sub-signal, i.e., that it can be modeled by a linear combination of

Conclusion
We have reviewed in detail the application of the Variable Projection method for two important areas: Geophysical and Medical Imaging. The common thread among these applications is the solution of separable nonlinear least squares problems. These are problems in which the model is a linear combination of nonlinear functions. The Variable Projection method proceeds to eliminate analytically the linear variables from the problem. This leads to a much improved behavior that has been analyzed quantitatively in [54]. The bottom line is that problems that hitherto have been very difficult to solve, become much tamer under the VP transformation. The reason for this behavior is an area that deserves more scrutiny in the future.