A Connection between Geometrical Spreading and the Adjoint Field in Travel Time Tomography ()

The goal of tomography is to reconstruct a spatially-varying image function *s*(*x*,*m*), where *x* is position and *m* is a finite-length vector of parameters. Many reconstruction methods minimize the total *L*_{2} error *E *≡ **e**^{T}**e**, where individual errors *e*_{i} quantify misfit between predictions and observations, to quantify goodness of fit. So-called adjoint state methods allow the gradient *∂**E*/*∂**m*_{i} to be computed extremely efficiently from an adjoint field, facilitating image reconstruction by gradient-descent methods. We examine the structure of the differential equation for the adjoint field under the ray approximation and find that it has the same form as the transport equation, whose solution involves the well-known geometrical spreading function *R* Consequently, as *R* is routinely tabulated as part of a ray calculation, no extra work is needed to compute the adjoint field, permitting a rapid calculation of the gradient *∂**E*/*∂**m*_{i}.

KEYWORDS

Cite this paper

Menke, W. (2020) A Connection between Geometrical Spreading and the Adjoint Field in Travel Time Tomography. *Applied Mathematics*, **11**, 84-96. doi: 10.4236/am.2020.112009.

1. Introduction

Acoustic, electromagnetic and seismic waves are routinely used to probe the media through which they propagate, and especially to image the spatially-varying velocity field. A fundamental property of these waves that commonly is exploited is their travel time T, defined as the time between the generation of a wave at its source to its detection by a distant observer. In many cases, travel times can be computed under the ray approximation of the exact wave equation, which is valid at high-frequencies when the scale length of heterogeneities in the medium is much larger than the wavelength of the waves. Since the 1970’s, the simplicity of ray calculations has underpinned the use of travel time tomography in a variety of disciplines, including seismology [1] [2], oceanography [3], petroleum exploration [4], geotechnical engineering [5] and cosmology [6]. In some disciplines, ray-based tomography is being superseded by full wavefield methods [7] [8]; nevertheless, it remains an important part of a tomographer’s toolbox on account of its computational efficiency.

Over the last several decades, the development of the so-called adjoint state method [9] has allowed tomographic imaging to be applied in cases where it was hitherto fore infeasible, because of vastly reduced computational effort. To date, this efficiency mainly has used to enable computationally-intensive forms of tomography, and especially to full wavefield tomography [7] [8]. Nevertheless, adjoint state methodology is very widely applicable. It has the potential for significantly speeding up even computationally-light problems, including ray-based tomography. The feasibility of using adjoint state methods in this form of tomography was first investigated by [10], who demonstrate its effectiveness. In this paper, we further explore it application. We study the mathematical structure of the differential equation that arises out of the adjoint state method (the equation for the so-called adjoint field) and show that it is very closely related to and in important cases identical to the transport equation of ray theory. This relationship provides an intuitive understanding of the adjoint field and suggests further ways of obtaining further computational efficiency.

Our analysis is divided into four sections: first, we review how the adjoint state method is used to streamline the computation of a critical quantity need to perform tomography; second, we review the concept of the geometrical spreading of rays and its connection to the transport equation; third, we use the adjunct state method to derive and solve the differential equation for the adjoint field; and lastly, we show that the adjoint equation is very closely related to the transport equation and that its solution can be trivially constructed when the solution to the transport equation (the geometrical spreading function) is known.

2. The Adjoint State Method for Computing the Error Derivative

The main purpose of this section is to define the error derivative, discuss its usefulness and review how the adjoint state method is used to compute it, in the special case where the unknown image is linked to the observed data via the source term in a linear differential equation.

Many types of tomography involve a set of observations $\left\{{T}_{j}^{obs},j=1,\cdots ,N\right\}$, each associated with a spatial position ${x}^{\left(j\right)}$, which are related to an unknown image function $s\left(x\right)$ by the possibly-nonlinear map ${T}_{j}={g}_{j}\left(s\right)$. Here, $x\equiv {\left[x,y,z\right]}^{\text{T}}$ are real spatial coordinates and ${[.]}^{\text{T}}$ denotes transpose. Usually, the data, spatial coordinates and image function are presumed to be real. Because no finite number of observations can define a continuous function, the image is usually approximated with a finite number of parameters; that is, as $s\left(x,m\right)$ with $\left\{{m}_{j},j=1,\cdots ,M\right\}$. For example, the image might be divided into voxels, each with a value ${m}_{j}$. A common approach to image reconstruction is to define individual errors, ${e}_{j}\equiv {T}_{j}^{obs}-{T}_{j}$ and total ${L}_{2}$ error $E\equiv {e}^{\text{T}}e$ and then to find the $m$ that minimizes $\Phi =E+{\epsilon}^{2}\Vert \mathcal{P}s\Vert $ ( [11]; see also [12] [13] [14]). Here, $\epsilon $ is an empirically-chosen constant and $\mathcal{P}$ is a linear operator that embodies prior information, such as smoothness. Among the many optimization procedures put forward for solving this problem, several commonly-used ones, based on the linearized least-squares method (e.g. [14] [15]), employ the partial derivative ${G}_{jk}\equiv \partial {T}_{j}/\partial {m}_{k}$ to compute the data perturbation $\Delta {T}_{j}$ associated with the image perturbation $\Delta m$ via $\Delta T=G\Delta m$. Alternatively, other common-used procedures, based on the gradient-descent method (e.g. [16]), use the partial derivative ${H}_{j}\equiv \partial E/\partial {m}_{j}$ to compute the error perturbation $\Delta E$ associated with the image perturbation $\Delta m$ via $\Delta E={H}^{\text{T}}\Delta m$.

The error derivative ${H}_{j}$ can be computed from the data derivative ${G}_{jk}$ :

${H}_{j}\equiv \frac{\partial E}{\partial {m}_{j}}=\frac{\partial}{\partial {m}_{j}}{\displaystyle {\sum}_{k=1}^{N}{e}_{k}{e}_{k}}=-2{\displaystyle {\sum}_{k=1}^{N}{G}_{kj}{e}_{k}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{or}\text{\hspace{0.17em}}\text{\hspace{0.05em}}H=-2{G}^{\text{T}}e$ (1)

However, recent advances in tomography have followed the realization that ${H}_{j}$ often can be computed without first computing ${G}_{jk}$, giving gradient-descent methods tremendous computational advantage over least-squares methods. The underlying idea of these adjoint state methods [9] [17] is to promote the error to a field $e\left(x\right)$, with the assumption that the data have been measured everywhere, so that $E=\left(e,e\right)$ and:

${H}_{j}\equiv \frac{\partial E}{\partial {m}_{j}}=\frac{\partial}{\partial {m}_{j}}\left(e,e\right)=2\left(e,\frac{\partial e}{\partial {m}_{j}}\right)=-2\left(e,\frac{\partial T}{\partial {m}_{j}}\right)$ (2)

Here $\left(.,.\right)$ is the inner product over spatial coordinates. Now, consider the simple case in which the data solves the linear differential equation $\mathcal{L}T=s$ (together with some appropriate boundary condition). Here, the image $s\left(x,m\right)$ is the source term in the differential equation. By differentiating the differential equation, we obtain an expression for ${G}_{jk}$ :

${G}_{jk}={\frac{\partial T}{\partial {m}_{k}}|}_{{x}^{\left(j\right)}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\mathcal{L}\frac{\partial T}{\partial {m}_{k}}=\frac{\partial s}{\partial {m}_{k}}$ (3a)

The partial derivative of total error ${H}_{j}$ is computed by differentiating $T={\mathcal{L}}^{-1}s$ to yield $\partial T/\partial {m}_{j}={\mathcal{L}}^{-1}\partial s/\partial {m}_{j}$ and inserting into (2):

${H}_{j}=-2\left(e,{\mathcal{L}}^{-1}\frac{\partial s}{\partial {m}_{j}}\right)=-2\left({\left({\mathcal{L}}^{-1}\right)}^{\u2020}e,\frac{\partial s}{\partial {m}_{j}}\right)=-2\left(\lambda ,\frac{\partial s}{\partial {m}_{j}}\right)$ (3b)

Here $\u2020$ denotes adjoint and $\lambda \equiv {\left({\mathcal{L}}^{-1}\right)}^{\u2020}e={\left({\mathcal{L}}^{\u2020}\right)}^{-1}e$ is an adjoint field that satisfies the so-called adjoint equation ${\mathcal{L}}^{\u2020}\lambda =e$ (with appropriate boundary conditions). Note that the error $e\left(x\right)$ plays the role of the source term in the adjoint equation.

Cases in which the error is known everywhere are uncommon, since they imply observations within the medium, as contrasted to on its boundary. More typical are the cases where error is at discrete points ${x}^{\left(j\right)}$ on the boundary. These cases are handled by writing defining a partial adjoint field $\stackrel{\u02dc}{\lambda}\left(x\right)$ and error density $\stackrel{\u02dc}{e}\left(x\right)$ :

$\lambda \left(x\right)={\displaystyle \iiint \stackrel{\u02dc}{\lambda}\left(x,{x}^{\prime}\right){\text{d}}^{3}{x}^{\prime}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}e\left(x\right)={\displaystyle \iiint \stackrel{\u02dc}{e}\left({x}^{\prime}\right)\delta \left(x-{x}^{\prime}\right){\text{d}}^{3}{x}^{\prime}}$ (4)

Here, $\delta (.)$ is the Dirac impulse function. The resulting equation ${\mathcal{L}}^{\u2020}\stackrel{\u02dc}{\lambda}=\stackrel{\u02dc}{e}\left({x}^{\prime}\right)\delta \left(x-{x}^{\prime}\right)$ is then solved only for those points ${x}^{\prime}={x}^{\left(j\right)}$ at which the error is known and adjoint field $\lambda $ is taken to be the sum of the $\stackrel{\u02dc}{\lambda}s$. This procedure is equivalent to solving the original adjoint equation with error:

$e\left(x\right)={\displaystyle \underset{j=1}{\overset{N}{\sum}}{e}_{j}\delta \left(x-{x}^{\left(j\right)}\right)}$ (5)

3. The Transport Equation of Ray Theory

The main purpose of this section is to review the geometrical interpretation of the transport equation and to highlight its link to ray divergence. However, in order to provide some background for readers unfamiliar with ray theory, and to establish nomenclature, we also present an abridged derivation of the equation.

In many cases, the imaging problem involves a field $u\left(x,t\right)$ that is a function of time t as well as spatial coordinates $x$ and that satisfies a wave equation of the form ${\partial}^{2}u/\partial {t}^{2}-\mathcal{L}\left(m\right)u=\delta \left(t-{t}_{0}\right)\delta \left(x-{x}^{\left(0\right)}\right)$. Here, the differential operator $\mathcal{L}$ contains only spatial derivatives and depends on parameters $m$. The equation reduces to the spatial equation $-{\omega}^{2}-\mathcal{L}\left(m\right)\stackrel{^}{u}=\delta \left(x-{x}^{\left(0\right)}\right)$ after Fourier transformation of time t to angular frequency $\omega $, where $\stackrel{^}{}$ denotes a transformed variable. The ray approximation is the solution to this equation in the limit $\left|\omega \right|\to \infty $, and is achieved by postulating that the solution can be written as a Laurent series of the form [18] :

$\stackrel{^}{u}\left(x,\omega \right)=A\left(x,\omega \right)\mathrm{exp}\left\{i\omega T\left(x\right)\right\}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}A\left(x,\omega \right)={\displaystyle \underset{k=0}{\overset{\infty}{\sum}}{\left(i\omega \right)}^{-k}{A}^{\left(k\right)}\left(x\right)}$ (6)

Here i is the imaginary unit. The travel time function $T\left(x\right)$ represents the time needed for a fluctuation in u to propagate from ${x}^{\left(0\right)}$ to $x$, and $A\left(x,\omega \right)$ represents its amplitude. The details of the ray solution depend on $\mathcal{L}\left(m\right)$ ; we consider the simple (and common) case $\mathcal{L}\left(m\right)={s}^{2}{\nabla}^{2}$, where $s\left(x,m\right)$ is a slowness function; that is, a material property that is inversely proportional to the local propagation velocity. Inserting (4) into the differential equation and equating equal powers of $\omega $ lead to the Eikonal equation for $T\left(x\right)$ :

$\nabla T\cdot \nabla T={s}^{2}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\text{boundary}\text{\hspace{0.17em}}\text{condition}\text{\hspace{0.17em}}\text{\hspace{0.05em}}T\left({x}^{\left(0\right)}\right)=0$ (7)

and a sequence of equations for ${A}^{\left(k\right)}$, the lowest order of which is the transport equation [19] :

$2\nabla T\cdot \nabla {A}^{\left(0\right)}+{A}^{\left(0\right)}{\nabla}^{2}T=0$ (8)

The unit normal to a surface of equal travel time is $\stackrel{^}{t}\left(x\right)={s}^{-1}\nabla T$. A sequence of these vectors connecting surfaces of increasing travel times defines a ray; that is, a parametric curve $x\left(\mathcal{l}\right)$ with arclength $\mathcal{l}$ and tangent $\stackrel{^}{t}\left(\mathcal{l}\right)$ (Figure 1(A)). The volume enclosed by a group of rays is called a ray tube. The Eikonal equation, written as two coupled first order equations in $x\left(\mathcal{l}\right)$ and $\stackrel{^}{t}\left(\mathcal{l}\right)$ is:

$\frac{\text{d}x}{\text{d}\mathcal{l}}=\stackrel{^}{t}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{\text{d}\stackrel{^}{t}}{\text{d}\mathcal{l}}=\stackrel{^}{t}\times \left({s}^{-1}\nabla s\times \stackrel{^}{t}\right)$

with boundary conditions

$x\left(0\right)={x}^{\left(0\right)}$ and $\stackrel{^}{t}\left(0\right)={\stackrel{^}{t}}^{\left(0\right)}$ (9)

The ray’s starting point is ${x}^{\left(0\right)}$ and its take-off direction is ${\stackrel{^}{t}}^{\left(0\right)}$. Travel time is then the path integral of the slowness along the ray, as can be seen by manipulating the formula for the directional derivative $\text{d}/\text{d}\mathcal{l}={\stackrel{^}{t}}_{0}\cdot \nabla $ :

$T\left(x\left(\mathcal{l}\right)\right)={\displaystyle {\int}_{0}^{\mathcal{l}}\left(\stackrel{^}{t}\cdot \nabla T\right)\text{d}{\mathcal{l}}^{\prime}}={\displaystyle {\int}_{0}^{\mathcal{l}}s\left({\mathcal{l}}^{\prime}\right)\text{d}{\mathcal{l}}^{\prime}}$ (10)

The transport equation, written in terms of $\stackrel{^}{t}\left(\mathcal{l}\right)$, is:

$-\frac{\nabla \mathcal{E}}{\mathcal{E}}\cdot \stackrel{^}{t}=\nabla \cdot \stackrel{^}{t}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\mathcal{E}\equiv s{\left({A}^{\left(0\right)}\right)}^{2}$ (11)

Figure 1. (A) Basic ray theory nomenclature. Wave propagates outward from a source at ${x}^{\left(0\right)}$ (black circle), through the medium, to the surface ${x}_{S}$ (with normal $\stackrel{^}{n}$ ). Surfaces of equal travel time (wave fronts, grey curves) are labeled with their travel times ${T}_{1}$, ${T}_{2}$, etc., Normals to wave fronts define rays (blue curves) with tangents $\stackrel{^}{t}$. Neighboring rays enclosing a solid angle $\text{d}\Omega $ at the source define a ray tube. (B) Relationship between ray tangents $\stackrel{^}{t}$ and ray tube cross-sectional area S. Gauss’s theorem is applied to a small volume V along the ray tube, with the shape of a section of a cone, whose cross-sectional area S changes with arc-length $\mathcal{l}$ and whose volume is $V=S\text{d}\mathcal{l}$. The tangent $\stackrel{^}{t}$ is parallel to the sides of the section and normal to its ends. See text for further discussion.

The quantity $\nabla \cdot \stackrel{^}{t}$ has a simple geometric interpretation, as can be seen by applying Gauss’ theorem (e.g. [20]) to a volume V along a ray tube, which has the shape of a section of a cone (Figure 1(B)). The cross-sectional area of the ray tube increases from S on the end nearest to the source, to $S+\text{d}S$ at a distance $\text{d}\mathcal{l}$ further away. For small volumes, the integral in Gauss’ theorem is $\left(\nabla \cdot \stackrel{^}{t}\right)V$ where $V=S\text{d}\mathcal{l}$. The surface integral in Gauss’ theorem has contributions only from the two ends of the cone, of $-S$ and $\left(S+\text{d}S\right)$ respectively, which sum to dS. Consequently, Gauss’s theorem implies $\left(\nabla \cdot \stackrel{^}{t}\right)={S}^{-1}\text{d}S/\text{d}\mathcal{l}$ and the transport equation becomes:

$-\frac{1}{\mathcal{E}}\frac{\text{d}\mathcal{E}}{\text{d}\mathcal{l}}=\frac{1}{S}\frac{\text{d}S}{\text{d}\mathcal{l}}$ (12)

According to the transport equation, the fractional decrease in $\mathcal{E}$, measured along a ray, is equal to the fractional increase in area S of the ray tube. In many cases, the quantity $\mathcal{E}$ has the interpretation of the energy density, so the transport equation embodies conservation of energy. Conventionally, the area of the ray tube is written $S\left(\mathcal{l}\right)={R}^{2}\left(\mathcal{l}\right)\text{d}\Omega $, where ${R}^{2}\left(\mathcal{l}\right)$ is the geometrical spreading function and $\text{d}\Omega $ is the solid angle subtended by the ray tube at the source (e.g. [19]). Consequently, $\mathcal{E}\left(\mathcal{l}\right)=c{R}^{-2}\left(\mathcal{l}\right)$ where c is a constant. Ray-tracing algorithms that solve (9) typically tabulate both T and R (e.g. [21] [22]).

4. Adjoint Equation for Travel Time Tomography

The main purpose of this section is to derive and solve the adjoint equation needed to compute the quantity ${H}_{j}$, the derivative of the total travel time error with respect to a model parameter controlling the slowness of the medium. Our derivation focuses on expressing the equation in terms of quantities that vary along rays, so that it can be readily compared to the transport Equation (11). Our derivation is equivalent to, but different than, the one by [10], being a direct application of perturbation theory, as contrasted to one that employs Lagrange multipliers.

In travel time tomography, travel time observations $T\left({x}^{\left(j\right)}\right)$ are considered to be the data, and the slowness $s\left(x\right)$, or rather its approximation $s\left(x,m\right)$, is the image function. In order to apply the adjunct methodology as outlined in the Introduction, the non-linear Eikonal equation must be linearized about a “background” solution. Let the slowness equal a background slowness ${s}_{0}$ plus a small perturbation $\epsilon {s}_{1}$, where $\epsilon $ is a small parameter, and the corresponding travel time equal a background travel time ${T}_{0}$ plus a small perturbation $\epsilon {T}_{1}$. Then to first order in $\epsilon $, the Eikonal equation becomes:

$\nabla T\cdot \nabla T=\nabla \left({T}_{0}+\epsilon {T}_{1}\right)\cdot \nabla \left({T}_{0}+\epsilon {T}_{1}\right)={\left({s}_{0}+\epsilon {s}_{1}\right)}^{2}\approx {s}_{0}^{2}+2\epsilon {s}_{0}{s}_{1}$ (13)

Equating terms of equal order in $\epsilon $ yields equations for the background travel time ${T}_{0}$ and the perturbation in travel time ${T}_{1}$ :

$\nabla {T}_{0}\cdot \nabla {T}_{0}={s}_{0}^{2}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\stackrel{^}{t}}_{0}\cdot \nabla {T}_{1}={s}_{1}$ (14a,b)

Equation (14b) indicates that the component of $\nabla {T}_{1}$ in the direction of the background ray direction ${\stackrel{^}{t}}_{0}$ is ${s}_{1}$. Since ${s}_{1}$ plays the role the source term in the differential equation, the formulation in (3) is applicable. If we define $\text{d}\mathcal{l}$ to be an increment of arc length along the unperturbed ray, then this is just an equation involving the directional derivative $\text{d}/\text{d}\mathcal{l}={\stackrel{^}{t}}_{0}\cdot \nabla $ :

$\frac{\text{d}{T}_{1}}{\text{d}\mathcal{l}}={s}_{1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{which}\text{\hspace{0.17em}}\text{has}\text{\hspace{0.17em}}\text{solution}\text{\hspace{0.17em}}{T}_{1}\left(\mathcal{l}\right)={\displaystyle {\int}_{0}^{\mathcal{l}}{s}_{1}\left({\mathcal{l}}^{\prime}\right)\text{d}{\mathcal{l}}^{\prime}}$ (15)

The perturbation in travel time is the integral of the perturbation in slowness along the unperturbed ray. We rewrite the equation for ${T}_{1}$ as:

$\mathcal{L}{T}_{1}={s}_{1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{where}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\mathcal{L}={s}_{0}^{-1}\nabla {T}_{0}\cdot \nabla ={s}_{0}^{-1}\left[\begin{array}{ccc}\frac{\partial {T}_{0}}{\partial x}& \frac{\partial {T}_{0}}{\partial y}& \frac{\partial {T}_{0}}{\partial z}\end{array}\right]\left[\begin{array}{c}\partial /\partial x\\ \partial /\partial y\\ \partial /\partial z\end{array}\right]$ (16)

Using the rules ${\left({\mathcal{L}}_{1}{\mathcal{L}}_{2}\right)}^{\u2020}={\mathcal{L}}_{2}^{\text{T}\u2020}{\mathcal{L}}_{1}^{\text{T}\u2020}$ and ${\left(\text{d}/\text{d}x\right)}^{\u2020}=-\text{d}/\text{d}x$ (e.g., [23]) we obtain an expression for the adjoint equation:

${\mathcal{L}}^{\u2020}\lambda =-\left[\begin{array}{ccc}\frac{\partial}{\partial x}& \frac{\partial}{\partial y}& \frac{\partial}{\partial z}\end{array}\right]\left[\begin{array}{c}\partial {T}_{0}/\partial x\\ \partial {T}_{0}/\partial y\\ \partial {T}_{0}/\partial z\end{array}\right]\left({s}_{0}^{-1}\lambda \right)=-\nabla \cdot \left[\nabla {T}_{0}\left({s}_{0}^{-1}\lambda \right)\right]=-\nabla \cdot \left[{\stackrel{^}{t}}_{0}\lambda \right]={e}_{0}$ (17)

As is typical of first-order equations, the “left hand” boundary condition associated with $\mathcal{L}$ implies a “right hand” boundary condition for ${\mathcal{L}}^{\u2020}$ (e.g. [14]); that is, while $T=0$ at the source $\mathcal{l}=0,\lambda =0$ at as the end point of the ray $\mathcal{l}={\mathcal{l}}_{B}$ (where it touches the boundary of the medium).

The adjoint Equation (17) can be further manipulated:

$\begin{array}{l}-\frac{\nabla \lambda}{\lambda}\cdot {\stackrel{^}{t}}_{0}-\frac{{e}_{0}}{\lambda}=\nabla \cdot {\stackrel{^}{t}}_{0}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{or}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\frac{\text{d}\lambda}{\text{d}\mathcal{l}}+P\left(\mathcal{l}\right)\lambda =Q\left(\mathcal{l}\right)\\ \text{with}\text{\hspace{0.05em}}\text{\hspace{0.17em}}P\left(\mathcal{l}\right)=\frac{1}{S}\frac{\text{d}S}{\text{d}\mathcal{l}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{and}\text{\hspace{0.05em}}\text{\hspace{0.17em}}Q\left(\mathcal{l}\right)=-{e}_{0}\end{array}$ (18)

The formal solution to (17) is well-known (e.g. [24]):

$\lambda \left(\mathcal{l}\right)=\frac{\left(C+v\left(\mathcal{l}\right)\right)}{\mu \left(\mathcal{l}\right)}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\mu \left(\mathcal{l}\right)=\mathrm{exp}\left\{\stackrel{\mathcal{l}}{{\displaystyle \int}}P\left({\mathcal{l}}^{\prime}\right)\text{d}{\mathcal{l}}^{\prime}\right\}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.05em}}v\left(\mathcal{l}\right)=\stackrel{\mathcal{l}}{{\displaystyle \int}}\mu \left({\mathcal{l}}^{\prime}\right)Q\left({\mathcal{l}}^{\prime}\right){\mathcal{l}}^{\prime}$ (19)

Here the constant C is chosen to enforce the boundary condition $\lambda \left({\mathcal{l}}_{B}\right)=0$.

5. Analysis of the Role of the Geometrical Spreading

The main purpose of this section is show that the solution to the adjoint equation can be constructed from the geometrical spreading function, and to interpret this result.

In any region in which ${e}_{0}=0$, the adjoint Equation (18) has the same form as the transport Equation (12). Since the error $e\left(x\right)$ is rarely known within the medium, but rather only on its boundary ${x}_{B}$, this restriction is satisfied by all commonly-encountered cases. As we will show below, the similarity of form provides considerable insight into the behavior of the adjoint field $\lambda $.

Ray divergence enters into the adjoint equation through the $\nabla \cdot {\stackrel{^}{t}}_{0}$ term. In order to highlight its contribution, we first examine a solution in which this term is zero. Consider a plane wave propagating in the z-direction through a homogenous layer with $0\le z\le {z}_{B}$ (Figure 2). The background travel time is ${T}_{0}={s}_{0}z$, the ray direction is $\stackrel{^}{t}={s}_{0}^{-1}\nabla {T}_{0}=\stackrel{^}{z}$ and $\mathcal{l}=z$. The plane wave satisfies the background Eikonal equation (14a),since $\nabla {T}_{0}\cdot \nabla {T}_{0}={s}_{0}^{2}\left(\stackrel{^}{z}\cdot \stackrel{^}{z}\right)={s}_{0}^{2}$. Since the rays of a plane wave do not diverge, $\nabla \cdot {\stackrel{^}{t}}_{0}=0$.

Now consider the case where the background slowness is everywhere too small by an amount b, so that the background error ${e}_{0}={T}^{obs}-{T}_{0}$ grows linearly with distance z; that is, ${e}_{0}\left(x,y,z\right)=bz$. We will assume that this error is known only on the boundary $z={z}_{B}$. Following (5), the adjoint equation is $\text{d}\lambda /\text{d}z=-b{z}_{B}\delta \left(z-{z}_{B}\right)$. Because of the Dirac impulse function, the boundary condition for $\lambda $ requires some scrutiny. We will consider that the error is defined just below the boundary, at $\equiv {z}_{B}-{\u03f5}^{2}$, where ${\u03f5}^{2}\ll {z}_{B}$. In order to satisfy both the boundary condition of $\lambda \left({z}_{B}\right)=0$ and the adjoint equation, the solution must be discontinuous at ${z}_{B}^{-}$ ; and in the immediate vicinity of ${z}_{B}^{-}$ must be $\lambda \left(z\right)=b{z}_{B}^{-}H\left({z}_{B}^{-}-z\right)$. Effectively, the boundary condition is $\lambda \left({z}_{S}\right)={e}_{0}\left(x,y,{z}_{S}\right)$. The solution of the adjoint equation is $\lambda \left(z\right)=b{z}_{B}$ ; note that it does not depend upon z.

Now consider a slowness perturbation in the form of a very thin rectangular prism, centered at ${z}_{H}$, of thickness D, and having sides at ${x}_{1}$ and ${x}_{2}={x}_{1}+L$, and ${y}_{1}$ and ${y}_{2}={y}_{1}+L$ (so that its volume is $D{L}^{2}$ ). Since the prism is very thin, it can be approximated as a Dirac impulse function in depth z:

$\begin{array}{l}\epsilon {s}_{1}\left(x,y,z\right)={m}_{1}W\left(x,{x}_{1},{x}_{2}\right)W\left(y,{y}_{1},{y}_{2}\right)D\delta \left(z-{z}_{H}\right)\\ \text{with}\text{\hspace{0.17em}}\text{\hspace{0.05em}}W\left(x,{x}_{1},{x}_{2}\right)\equiv H\left(x-{x}_{1}\right)H\left({x}_{2}-x\right)\end{array}$ (20)

Here, $H(.)$ is the Heaviside function, which is unity when its argument is positive and zero otherwise. The partial derivative of total error is:

$\begin{array}{c}{H}_{1}=-2\left(\frac{\text{d}\left(\epsilon {s}_{1}\right)}{\text{d}{m}_{1}},\lambda \right)\\ =-2D{\displaystyle \iiint W\left(x,{x}_{1},{x}_{2}\right)W\left(x,{y}_{1},{y}_{2}\right)\delta \left(z-{z}_{H}\right)b{z}_{B}^{}\text{d}x\text{d}y\text{d}z}\\ =-2b{z}_{B}D{L}^{2}\end{array}$ (21)

An expected, ${H}_{1}<0$, since increasing ${m}_{1}$ lowers the error. Also as expected, ${H}_{1}$ is proportional to the area ${L}^{2}$ of the prism, since the larger its area, the larger the region to which the slowness perturbation is applied. Interestingly, ${H}_{1}$ is independent of the position ${z}_{H}$ of the prism; that is, the prism can be moved up or down without affecting the error. As we will show below, this insensitivity to position is due to the absence of ray divergence in this plane wave case.

We now consider a spherical wave propagating in the r-direction in through a homogenous sphere with $0\le r\le {r}_{B}$ (Figure 3), described by spherical polar coordinates $\left(r,\theta ,\phi \right)$. The background travel time is ${T}_{0}={s}_{0}r$, the ray direction is $\stackrel{^}{t}={s}_{0}^{-1}\nabla {T}_{0}=\stackrel{^}{r}$ and $\mathcal{l}=r$. The spherical wave satisfies the background Eikonal Equation (14a), since $\nabla {T}_{0}\cdot \nabla {T}_{0}={s}_{0}^{2}\left(\stackrel{^}{r}\cdot \stackrel{^}{r}\right)={s}_{0}^{2}$. The area of a ray tube is

Figure 2. Rays (blue) of a plane wave cross a layer with bottom and top surfaces at $z=0$ and $z={z}_{B}$, respectively. A prismatic slowness perturbation (red rectangle) is placed within the layer, with left and right edges at $x={x}_{1}$ and $x={x}_{2}$, respectively. The travel time error $e\left(x,{z}_{B}\right)$, measured on the upper surface (top plot), is reduced in the region ${x}_{1}<x<{x}_{2}$ where the rays project the prism. Because the rays do not diverge, the size of this region is independent of the depth of the perturbation.

Figure 3. Rays (blue) of a spherical wave start at a source at the center of the sphere at $r=0$ and propagate outward through the sphere to its surface at $r={r}_{B}$. A slowness perturbation with the shape of a spherical cap (red cap) is placed within the sphere at radius ${r}_{H}$, with left and right edges at polar angle $-{\theta}_{H}$ and $+{\theta}_{H}$, respectively. The travel time error $e\left(\theta ,{r}_{B}\right)$, measured on the upper surface, is reduced in the region where the perturbation is projected by the rays (graph at top).

$S={r}^{2}\text{d}\Omega $, from whence we conclude that the geometrical spreading function is ${R}^{2}\left(r\right)={r}^{2}$ and the ray divergence is $\nabla \cdot {\stackrel{^}{t}}_{0}={S}^{-1}\text{d}S/\text{d}\mathcal{l}=2/r$. As in the plane wave case, the background slowness is everywhere too small by an amount b, leading to a background error ${e}_{0}\left(r,\theta ,\phi \right)=br$. We will assume that this error is known only on the boundary $r={r}_{B}$. The adjoint Equation (18) reduces to:

$\frac{\text{d}\lambda}{\text{d}r}+\frac{2}{r}\lambda =0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\text{boundary}\text{\hspace{0.17em}}\text{condition}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\lambda \left({r}_{B},\theta ,\phi \right)=b{r}_{H}$ (22)

The solution is $\lambda \left(r,\theta ,\phi \right)=\left(b{r}_{B}\right)\left({r}_{B}^{2}/{r}^{2}\right)$. As is asserted in the Introduction, the solution to this transport-like equation is related to the geometrical spreading function by $\lambda \left(r,\theta ,\phi \right)\propto {R}^{-2}$.

Now consider a slowness perturbation in the form of a very thin spherical cap of fixed thickness D, centered at ${r}_{H}$ and ${\phi}_{H}=0$ and subtending a variable polar angle area ${\theta}_{H}$ such that its area is fixed as ${L}^{2}=2\pi {r}_{H}^{2}\left(1-\mathrm{cos}{\theta}_{H}\right)$ :

$\epsilon {s}_{1}\left(x,y,z\right)={m}_{1}H\left({\theta}_{H}-\theta \right)D\delta \left(r-{r}_{H}\right)$ (23)

For a position ${r}_{H}$ away from the origin where a spherical cap of thickness D and area ${L}^{2}$ is possible, the partial derivative of total error is:

$\begin{array}{c}{H}_{1}=-2\left(\frac{\text{d}\left(\epsilon {s}_{1}\right)}{\text{d}{m}_{1}},\lambda \right)\\ =-2b{r}_{B}^{3}D{\displaystyle \iiint [\n}\end{array}$