_{1}

^{*}

The goal of tomography is to reconstruct a spatially-varying image function
*s*(
*x*,
*m*), where
*x* is position and
*m* is a finite-length vector of parameters. Many reconstruction methods minimize the total
*L*
_{2} error
*E *
≡
e
^{T}
e, where individual errors
*e _{i}* quantify misfit between predictions and observations, to quantify goodness of fit. So-called adjoint state methods allow the gradient

*<i>∂</i>*

*E*/

*<i>∂</i>*

*m*to be computed extremely efficiently from an adjoint field, facilitating image reconstruction by gradient-descent methods. We examine the structure of the differential equation for the adjoint field under the ray approximation and find that it has the same form as the transport equation, whose solution involves the well-known geometrical spreading function

_{i}*R*Consequently, as

*R*is routinely tabulated as part of a ray calculation, no extra work is needed to compute the adjoint field, permitting a rapid calculation of the gradient

*<i>∂</i>*

*E*/

*<i>∂</i>*

*m*.

_{i}Acoustic, electromagnetic and seismic waves are routinely used to probe the media through which they propagate, and especially to image the spatially-varying velocity field. A fundamental property of these waves that commonly is exploited is their travel time T, defined as the time between the generation of a wave at its source to its detection by a distant observer. In many cases, travel times can be computed under the ray approximation of the exact wave equation, which is valid at high-frequencies when the scale length of heterogeneities in the medium is much larger than the wavelength of the waves. Since the 1970’s, the simplicity of ray calculations has underpinned the use of travel time tomography in a variety of disciplines, including seismology [

Over the last several decades, the development of the so-called adjoint state method [

Our analysis is divided into four sections: first, we review how the adjoint state method is used to streamline the computation of a critical quantity need to perform tomography; second, we review the concept of the geometrical spreading of rays and its connection to the transport equation; third, we use the adjunct state method to derive and solve the differential equation for the adjoint field; and lastly, we show that the adjoint equation is very closely related to the transport equation and that its solution can be trivially constructed when the solution to the transport equation (the geometrical spreading function) is known.

The main purpose of this section is to define the error derivative, discuss its usefulness and review how the adjoint state method is used to compute it, in the special case where the unknown image is linked to the observed data via the source term in a linear differential equation.

Many types of tomography involve a set of observations { T j o b s , j = 1 , ⋯ , N } , each associated with a spatial position x ( j ) , which are related to an unknown image function s ( x ) by the possibly-nonlinear map T j = g j ( s ) . Here, x ≡ [ x , y , z ] T are real spatial coordinates and [ . ] T denotes transpose. Usually, the data, spatial coordinates and image function are presumed to be real. Because no finite number of observations can define a continuous function, the image is usually approximated with a finite number of parameters; that is, as s ( x , m ) with { m j , j = 1 , ⋯ , M } . For example, the image might be divided into voxels, each with a value m j . A common approach to image reconstruction is to define individual errors, e j ≡ T j o b s − T j and total L 2 error E ≡ e T e and then to find the m that minimizes Φ = E + ε 2 ‖ P s ‖ ( [

The error derivative H j can be computed from the data derivative G j k :

H j ≡ ∂ E ∂ m j = ∂ ∂ m j ∑ k = 1 N e k e k = − 2 ∑ k = 1 N G k j e k or H = − 2 G T e (1)

However, recent advances in tomography have followed the realization that H j often can be computed without first computing G j k , giving gradient-descent methods tremendous computational advantage over least-squares methods. The underlying idea of these adjoint state methods [

H j ≡ ∂ E ∂ m j = ∂ ∂ m j ( e , e ) = 2 ( e , ∂ e ∂ m j ) = − 2 ( e , ∂ T ∂ m j ) (2)

Here ( . , . ) is the inner product over spatial coordinates. Now, consider the simple case in which the data solves the linear differential equation L T = s (together with some appropriate boundary condition). Here, the image s ( x , m ) is the source term in the differential equation. By differentiating the differential equation, we obtain an expression for G j k :

G j k = ∂ T ∂ m k | x ( j ) with L ∂ T ∂ m k = ∂ s ∂ m k (3a)

The partial derivative of total error H j is computed by differentiating T = L − 1 s to yield ∂ T / ∂ m j = L − 1 ∂ s / ∂ m j and inserting into (2):

H j = − 2 ( e , L − 1 ∂ s ∂ m j ) = − 2 ( ( L − 1 ) † e , ∂ s ∂ m j ) = − 2 ( λ , ∂ s ∂ m j ) (3b)

Here † denotes adjoint and λ ≡ ( L − 1 ) † e = ( L † ) − 1 e is an adjoint field that satisfies the so-called adjoint equation L † λ = e (with appropriate boundary conditions). Note that the error e ( x ) plays the role of the source term in the adjoint equation.

Cases in which the error is known everywhere are uncommon, since they imply observations within the medium, as contrasted to on its boundary. More typical are the cases where error is at discrete points x ( j ) on the boundary. These cases are handled by writing defining a partial adjoint field λ ˜ ( x ) and error density e ˜ ( x ) :

λ ( x ) = ∭ λ ˜ ( x , x ′ ) d 3 x ′ and e ( x ) = ∭ e ˜ ( x ′ ) δ ( x − x ′ ) d 3 x ′ (4)

Here, δ ( . ) is the Dirac impulse function. The resulting equation L † λ ˜ = e ˜ ( x ′ ) δ ( x − x ′ ) is then solved only for those points x ′ = x ( j ) at which the error is known and adjoint field λ is taken to be the sum of the λ ˜ s . This procedure is equivalent to solving the original adjoint equation with error:

e ( x ) = ∑ j = 1 N e j δ ( x − x ( j ) ) (5)

The main purpose of this section is to review the geometrical interpretation of the transport equation and to highlight its link to ray divergence. However, in order to provide some background for readers unfamiliar with ray theory, and to establish nomenclature, we also present an abridged derivation of the equation.

In many cases, the imaging problem involves a field u ( x , t ) that is a function of time t as well as spatial coordinates x and that satisfies a wave equation of the form ∂ 2 u / ∂ t 2 − L ( m ) u = δ ( t − t 0 ) δ ( x − x ( 0 ) ) . Here, the differential operator L contains only spatial derivatives and depends on parameters m . The equation reduces to the spatial equation − ω 2 − L ( m ) u ^ = δ ( x − x ( 0 ) ) after Fourier transformation of time t to angular frequency ω , where ^ denotes a transformed variable. The ray approximation is the solution to this equation in the limit | ω | → ∞ , and is achieved by postulating that the solution can be written as a Laurent series of the form [

u ^ ( x , ω ) = A ( x , ω ) exp { i ω T ( x ) } with A ( x , ω ) = ∑ k = 0 ∞ ( i ω ) − k A ( k ) ( x ) (6)

Here i is the imaginary unit. The travel time function T ( x ) represents the time needed for a fluctuation in u to propagate from x ( 0 ) to x , and A ( x , ω ) represents its amplitude. The details of the ray solution depend on L ( m ) ; we consider the simple (and common) case L ( m ) = s 2 ∇ 2 , where s ( x , m ) is a slowness function; that is, a material property that is inversely proportional to the local propagation velocity. Inserting (4) into the differential equation and equating equal powers of ω lead to the Eikonal equation for T ( x ) :

∇ T ⋅ ∇ T = s 2 with boundary condition T ( x ( 0 ) ) = 0 (7)

and a sequence of equations for A ( k ) , the lowest order of which is the transport equation [

2 ∇ T ⋅ ∇ A ( 0 ) + A ( 0 ) ∇ 2 T = 0 (8)

The unit normal to a surface of equal travel time is t ^ ( x ) = s − 1 ∇ T . A sequence of these vectors connecting surfaces of increasing travel times defines a ray; that is, a parametric curve x ( l ) with arclength l and tangent t ^ ( l ) (

d x d l = t ^ and d t ^ d l = t ^ × ( s − 1 ∇ s × t ^ )

with boundary conditions

x ( 0 ) = x ( 0 ) and t ^ ( 0 ) = t ^ ( 0 ) (9)

The ray’s starting point is x ( 0 ) and its take-off direction is t ^ ( 0 ) . Travel time is then the path integral of the slowness along the ray, as can be seen by manipulating the formula for the directional derivative d / d l = t ^ 0 ⋅ ∇ :

T ( x ( l ) ) = ∫ 0 l ( t ^ ⋅ ∇ T ) d l ′ = ∫ 0 l s ( l ′ ) d l ′ (10)

The transport equation, written in terms of t ^ ( l ) , is:

− ∇ E E ⋅ t ^ = ∇ ⋅ t ^ with E ≡ s ( A ( 0 ) ) 2 (11)

The quantity ∇ ⋅ t ^ has a simple geometric interpretation, as can be seen by applying Gauss’ theorem (e.g. [

− 1 E d E d l = 1 S d S d l (12)

According to the transport equation, the fractional decrease in E , measured along a ray, is equal to the fractional increase in area S of the ray tube. In many cases, the quantity E has the interpretation of the energy density, so the transport equation embodies conservation of energy. Conventionally, the area of the ray tube is written S ( l ) = R 2 ( l ) d Ω , where R 2 ( l ) is the geometrical spreading function and d Ω is the solid angle subtended by the ray tube at the source (e.g. [

The main purpose of this section is to derive and solve the adjoint equation needed to compute the quantity H j , the derivative of the total travel time error with respect to a model parameter controlling the slowness of the medium. Our derivation focuses on expressing the equation in terms of quantities that vary along rays, so that it can be readily compared to the transport Equation (11). Our derivation is equivalent to, but different than, the one by [

In travel time tomography, travel time observations T ( x ( j ) ) are considered to be the data, and the slowness s ( x ) , or rather its approximation s ( x , m ) , is the image function. In order to apply the adjunct methodology as outlined in the Introduction, the non-linear Eikonal equation must be linearized about a “background” solution. Let the slowness equal a background slowness s 0 plus a small perturbation ε s 1 , where ε is a small parameter, and the corresponding travel time equal a background travel time T 0 plus a small perturbation ε T 1 . Then to first order in ε , the Eikonal equation becomes:

∇ T ⋅ ∇ T = ∇ ( T 0 + ε T 1 ) ⋅ ∇ ( T 0 + ε T 1 ) = ( s 0 + ε s 1 ) 2 ≈ s 0 2 + 2 ε s 0 s 1 (13)

Equating terms of equal order in ε yields equations for the background travel time T 0 and the perturbation in travel time T 1 :

∇ T 0 ⋅ ∇ T 0 = s 0 2 and t ^ 0 ⋅ ∇ T 1 = s 1 (14a,b)

Equation (14b) indicates that the component of ∇ T 1 in the direction of the background ray direction t ^ 0 is s 1 . Since s 1 plays the role the source term in the differential equation, the formulation in (3) is applicable. If we define d l to be an increment of arc length along the unperturbed ray, then this is just an equation involving the directional derivative d / d l = t ^ 0 ⋅ ∇ :

d T 1 d l = s 1 which has solution T 1 ( l ) = ∫ 0 l s 1 ( l ′ ) d l ′ (15)

The perturbation in travel time is the integral of the perturbation in slowness along the unperturbed ray. We rewrite the equation for T 1 as:

L T 1 = s 1 where L = s 0 − 1 ∇ T 0 ⋅ ∇ = s 0 − 1 [ ∂ T 0 ∂ x ∂ T 0 ∂ y ∂ T 0 ∂ z ] [ ∂ / ∂ x ∂ / ∂ y ∂ / ∂ z ] (16)

Using the rules ( L 1 L 2 ) † = L 2 T † L 1 T † and ( d / d x ) † = − d / d x (e.g., [

L † λ = − [ ∂ ∂ x ∂ ∂ y ∂ ∂ z ] [ ∂ T 0 / ∂ x ∂ T 0 / ∂ y ∂ T 0 / ∂ z ] ( s 0 − 1 λ ) = − ∇ ⋅ [ ∇ T 0 ( s 0 − 1 λ ) ] = − ∇ ⋅ [ t ^ 0 λ ] = e 0 (17)

As is typical of first-order equations, the “left hand” boundary condition associated with L implies a “right hand” boundary condition for L † (e.g. [

The adjoint Equation (17) can be further manipulated:

− ∇ λ λ ⋅ t ^ 0 − e 0 λ = ∇ ⋅ t ^ 0 or d λ d l + P ( l ) λ = Q ( l ) with P ( l ) = 1 S d S d l and Q ( l ) = − e 0 (18)

The formal solution to (17) is well-known (e.g. [

λ ( l ) = ( C + v ( l ) ) μ ( l ) with μ ( l ) = exp { ∫ l P ( l ′ ) d l ′ } and v ( l ) = ∫ l μ ( l ′ ) Q ( l ′ ) l ′ (19)

Here the constant C is chosen to enforce the boundary condition λ ( l B ) = 0 .

The main purpose of this section is show that the solution to the adjoint equation can be constructed from the geometrical spreading function, and to interpret this result.

In any region in which e 0 = 0 , the adjoint Equation (18) has the same form as the transport Equation (12). Since the error e ( x ) is rarely known within the medium, but rather only on its boundary x B , this restriction is satisfied by all commonly-encountered cases. As we will show below, the similarity of form provides considerable insight into the behavior of the adjoint field λ .

Ray divergence enters into the adjoint equation through the ∇ ⋅ t ^ 0 term. In order to highlight its contribution, we first examine a solution in which this term is zero. Consider a plane wave propagating in the z-direction through a homogenous layer with 0 ≤ z ≤ z B (

Now consider the case where the background slowness is everywhere too small by an amount b, so that the background error e 0 = T o b s − T 0 grows linearly with distance z; that is, e 0 ( x , y , z ) = b z . We will assume that this error is known only on the boundary z = z B . Following (5), the adjoint equation is d λ / d z = − b z B δ ( z − z B ) . Because of the Dirac impulse function, the boundary condition for λ requires some scrutiny. We will consider that the error is defined just below the boundary, at ≡ z B − ϵ 2 , where ϵ 2 ≪ z B . In order to satisfy both the boundary condition of λ ( z B ) = 0 and the adjoint equation, the solution must be discontinuous at z B − ; and in the immediate vicinity of z B − must be λ ( z ) = b z B − H ( z B − − z ) . Effectively, the boundary condition is λ ( z S ) = e 0 ( x , y , z S ) . The solution of the adjoint equation is λ ( z ) = b z B ; note that it does not depend upon z.

Now consider a slowness perturbation in the form of a very thin rectangular prism, centered at z H , of thickness D, and having sides at x 1 and x 2 = x 1 + L , and y 1 and y 2 = y 1 + L (so that its volume is D L 2 ). Since the prism is very thin, it can be approximated as a Dirac impulse function in depth z:

ε s 1 ( x , y , z ) = m 1 W ( x , x 1 , x 2 ) W ( y , y 1 , y 2 ) D δ ( z − z H ) with W ( x , x 1 , x 2 ) ≡ H ( x − x 1 ) H ( x 2 − x ) (20)

Here, H ( . ) is the Heaviside function, which is unity when its argument is positive and zero otherwise. The partial derivative of total error is:

H 1 = − 2 ( d ( ε s 1 ) d m 1 , λ ) = − 2 D ∭ W ( x , x 1 , x 2 ) W ( x , y 1 , y 2 ) δ ( z − z H ) b z B d x d y d z = − 2 b z B D L 2 (21)

An expected, H 1 < 0 , since increasing m 1 lowers the error. Also as expected, H 1 is proportional to the area L 2 of the prism, since the larger its area, the larger the region to which the slowness perturbation is applied. Interestingly, H 1 is independent of the position z H of the prism; that is, the prism can be moved up or down without affecting the error. As we will show below, this insensitivity to position is due to the absence of ray divergence in this plane wave case.

We now consider a spherical wave propagating in the r-direction in through a homogenous sphere with 0 ≤ r ≤ r B (

S = r 2 d Ω , from whence we conclude that the geometrical spreading function is R 2 ( r ) = r 2 and the ray divergence is ∇ ⋅ t ^ 0 = S − 1 d S / d l = 2 / r . As in the plane wave case, the background slowness is everywhere too small by an amount b, leading to a background error e 0 ( r , θ , φ ) = b r . We will assume that this error is known only on the boundary r = r B . The adjoint Equation (18) reduces to:

d λ d r + 2 r λ = 0 with boundary condition λ ( r B , θ , φ ) = b r H (22)

The solution is λ ( r , θ , φ ) = ( b r B ) ( r B 2 / r 2 ) . As is asserted in the Introduction, the solution to this transport-like equation is related to the geometrical spreading function by λ ( r , θ , φ ) ∝ R − 2 .

Now consider a slowness perturbation in the form of a very thin spherical cap of fixed thickness D, centered at r H and φ H = 0 and subtending a variable polar angle area θ H such that its area is fixed as L 2 = 2 π r H 2 ( 1 − cos θ H ) :

ε s 1 ( x , y , z ) = m 1 H ( θ H − θ ) D δ ( r − r H ) (23)

For a position r H away from the origin where a spherical cap of thickness D and area L 2 is possible, the partial derivative of total error is:

H 1 = − 2 ( d ( ε s 1 ) d m 1 , λ ) = − 2 b r B 3 D ∭ [ H ( θ H − θ ) δ ( r − r H ) ] [ r − 2 ] r 2 sin θ d r d θ d φ = − 2 b r B 3 D ∬ H ( θ H − θ ) sin θ d θ d φ ∫ δ ( r − r H ) d r = ( − 2 b D r B 3 ) 2 π ( 1 − cos θ H ) r H 2 r H 2 = ( − 2 b r B 3 D ) L 2 r H 2 = − 2 b r B D L 2 ( r B r H ) 2 = − 2 b r B D L 2 R 2 ( r B ) R 2 ( r H ) (24)

The spherical wave solution (24) differs from the plane wave solution (21) by a factor that involves the ratio of geometric spreading functions, R ( r B ) / R ( r H ) , evaluated at the heterogeneity and the surface. The area, on the surface of the sphere, subtended by the prism decreases with its radius r H , decreasing the error e ( r S , θ , φ ) over wider region (

Although the adjoint field is singular at the source (ray starting point) r = 0 , the partial derivative H k is finite there, as can be seen by considering a spherical heterogeneity of radius r H centered on the origin of the form ε s 1 ( x , y , z ) = m 1 H ( r H − r ) :

H 1 = − 2 ( d ( ε s 1 ) d m 1 , λ ) = − 2 b r B 3 ∭ [ H ( r H − r ) ] [ r − 2 ] d r r d θ r sin θ d φ = − 2 b r B 3 ∬ sin θ d θ d φ ∫ 0 r H d r = ( − 2 b r B 3 ) ( 4 π ) r H = − 8 π b r B 3 r H (25)

when the background slowness s 0 ( x ) is spatially varying, the rays have a complicated spatial pattern and the background error e 0 ( x B ) , measured on the boundary x B , is spatially varying. Suppose that the medium has a surface x B with outward pointing normal n ^ B ( x B ) . A ray connecting an interior point x to x B can be labeled by x B . Then, x B ( x ) means the point on a boundary at which a ray passing through x ends, and arc-length l ( x , x B ) means the distance at x along a ray that ends at x B ( x ) . Similarly, the geometrical spreading

function can be written as R ( x , x B ) ; that is, the geometrical spreading function at x associated with the ray that ends at x B . Then, the adjoint field is then:

λ ( x ) = e 0 ( x B ) t ^ ( x B ) ⋅ n ^ B ( x B ) R 2 ( x B , x B ) R 2 ( x , x B ) (26)

Here, the dot product between the ray tangent and surface normal is introduced to account for the increased surface area intersected by the ray tube, in the case (unlike the examples, above) where the ray tube obliquely impinges upon the boundary. Now, suppose that slowness perturbation is represented with voxels, where voxel k has volume V k , amplitude m k , and centroid position x ( k ) . When the adjoint field varies slowly compared to the length scale of a voxel (a requirement that excludes the source point) the error derivative is:

H k = − 2 ( d ( ε s 1 ) d m k , λ ) ≈ − 2 V k e 0 ( x B ) t ^ ( x B ) ⋅ n ^ B ( x B ) R 2 ( x B , x B ) R 2 ( x ( k ) , x B ) (27)

Here x B is the end point of the ray passing through x ( k ) . This result emphasizes the link between the geometrical spreading function R and the partial derivative of total error E. (When the voxel is close to, or overlaps the origin, H k is still well-defined and finite, but the inner product in (27) must be computed appropriately).

The key result in this paper is the demonstration that the adjoint equation in ray-based travel time tomography has the same form as the well-known transport equation for ray theoretical amplitudes. Consequently, the spatial variation of the adjoint field λ is completely controlled by the geometrical spreading function R. This result provides an intuitive understanding of the primary factor controlling the size of the partial derivative H j = ∂ E / ∂ m of total L 2 error E with respect to the slowness m j of a voxel. The partial derivative H j is large when ray divergence causes the projection of the voxel on the measurement surface to be large. Since this result provides an explicit formula for λ in terms of R, it enables H j to be calculated without resorting to the numerical solution of the adjoint equation. Only an inner product needs to be calculated, and in the case of a voxel parameterization of the slowness image, it can be calculated trivially.

I thank the graduate students who participated in Columbia University’s 2017 Seminar in Adjoint Methods for helpful discussion.

The author declares no conflicts of interest regarding the publication of this paper.

Menke, W. (2020) A Connection between Geometrical Spreading and the Adjoint Field in Travel Time Tomography. Applied Mathematics, 11, 84-96. https://doi.org/10.4236/am.2020.112009