A Connection between Geometrical Spreading and the Adjoint Field in Travel Time Tomography ()
1. Introduction
Acoustic, electromagnetic and seismic waves are routinely used to probe the media through which they propagate, and especially to image the spatially-varying velocity field. A fundamental property of these waves that commonly is exploited is their travel time T, defined as the time between the generation of a wave at its source to its detection by a distant observer. In many cases, travel times can be computed under the ray approximation of the exact wave equation, which is valid at high-frequencies when the scale length of heterogeneities in the medium is much larger than the wavelength of the waves. Since the 1970’s, the simplicity of ray calculations has underpinned the use of travel time tomography in a variety of disciplines, including seismology [1] [2], oceanography [3], petroleum exploration [4], geotechnical engineering [5] and cosmology [6]. In some disciplines, ray-based tomography is being superseded by full wavefield methods [7] [8]; nevertheless, it remains an important part of a tomographer’s toolbox on account of its computational efficiency.
Over the last several decades, the development of the so-called adjoint state method [9] has allowed tomographic imaging to be applied in cases where it was hitherto fore infeasible, because of vastly reduced computational effort. To date, this efficiency mainly has used to enable computationally-intensive forms of tomography, and especially to full wavefield tomography [7] [8]. Nevertheless, adjoint state methodology is very widely applicable. It has the potential for significantly speeding up even computationally-light problems, including ray-based tomography. The feasibility of using adjoint state methods in this form of tomography was first investigated by [10], who demonstrate its effectiveness. In this paper, we further explore it application. We study the mathematical structure of the differential equation that arises out of the adjoint state method (the equation for the so-called adjoint field) and show that it is very closely related to and in important cases identical to the transport equation of ray theory. This relationship provides an intuitive understanding of the adjoint field and suggests further ways of obtaining further computational efficiency.
Our analysis is divided into four sections: first, we review how the adjoint state method is used to streamline the computation of a critical quantity need to perform tomography; second, we review the concept of the geometrical spreading of rays and its connection to the transport equation; third, we use the adjunct state method to derive and solve the differential equation for the adjoint field; and lastly, we show that the adjoint equation is very closely related to the transport equation and that its solution can be trivially constructed when the solution to the transport equation (the geometrical spreading function) is known.
2. The Adjoint State Method for Computing the Error Derivative
The main purpose of this section is to define the error derivative, discuss its usefulness and review how the adjoint state method is used to compute it, in the special case where the unknown image is linked to the observed data via the source term in a linear differential equation.
Many types of tomography involve a set of observations
, each associated with a spatial position
, which are related to an unknown image function
by the possibly-nonlinear map
. Here,
are real spatial coordinates and
denotes transpose. Usually, the data, spatial coordinates and image function are presumed to be real. Because no finite number of observations can define a continuous function, the image is usually approximated with a finite number of parameters; that is, as
with
. For example, the image might be divided into voxels, each with a value
. A common approach to image reconstruction is to define individual errors,
and total
error
and then to find the
that minimizes
( [11]; see also [12] [13] [14]). Here,
is an empirically-chosen constant and
is a linear operator that embodies prior information, such as smoothness. Among the many optimization procedures put forward for solving this problem, several commonly-used ones, based on the linearized least-squares method (e.g. [14] [15]), employ the partial derivative
to compute the data perturbation
associated with the image perturbation
via
. Alternatively, other common-used procedures, based on the gradient-descent method (e.g. [16]), use the partial derivative
to compute the error perturbation
associated with the image perturbation
via
.
The error derivative
can be computed from the data derivative
:
(1)
However, recent advances in tomography have followed the realization that
often can be computed without first computing
, giving gradient-descent methods tremendous computational advantage over least-squares methods. The underlying idea of these adjoint state methods [9] [17] is to promote the error to a field
, with the assumption that the data have been measured everywhere, so that
and:
(2)
Here
is the inner product over spatial coordinates. Now, consider the simple case in which the data solves the linear differential equation
(together with some appropriate boundary condition). Here, the image
is the source term in the differential equation. By differentiating the differential equation, we obtain an expression for
:
(3a)
The partial derivative of total error
is computed by differentiating
to yield
and inserting into (2):
(3b)
Here
denotes adjoint and
is an adjoint field that satisfies the so-called adjoint equation
(with appropriate boundary conditions). Note that the error
plays the role of the source term in the adjoint equation.
Cases in which the error is known everywhere are uncommon, since they imply observations within the medium, as contrasted to on its boundary. More typical are the cases where error is at discrete points
on the boundary. These cases are handled by writing defining a partial adjoint field
and error density
:
(4)
Here,
is the Dirac impulse function. The resulting equation
is then solved only for those points
at which the error is known and adjoint field
is taken to be the sum of the
. This procedure is equivalent to solving the original adjoint equation with error:
(5)
3. The Transport Equation of Ray Theory
The main purpose of this section is to review the geometrical interpretation of the transport equation and to highlight its link to ray divergence. However, in order to provide some background for readers unfamiliar with ray theory, and to establish nomenclature, we also present an abridged derivation of the equation.
In many cases, the imaging problem involves a field
that is a function of time t as well as spatial coordinates
and that satisfies a wave equation of the form
. Here, the differential operator
contains only spatial derivatives and depends on parameters
. The equation reduces to the spatial equation
after Fourier transformation of time t to angular frequency
, where
denotes a transformed variable. The ray approximation is the solution to this equation in the limit
, and is achieved by postulating that the solution can be written as a Laurent series of the form [18] :
(6)
Here i is the imaginary unit. The travel time function
represents the time needed for a fluctuation in u to propagate from
to
, and
represents its amplitude. The details of the ray solution depend on
; we consider the simple (and common) case
, where
is a slowness function; that is, a material property that is inversely proportional to the local propagation velocity. Inserting (4) into the differential equation and equating equal powers of
lead to the Eikonal equation for
:
(7)
and a sequence of equations for
, the lowest order of which is the transport equation [19] :
(8)
The unit normal to a surface of equal travel time is
. A sequence of these vectors connecting surfaces of increasing travel times defines a ray; that is, a parametric curve
with arclength
and tangent
(Figure 1(A)). The volume enclosed by a group of rays is called a ray tube. The Eikonal equation, written as two coupled first order equations in
and
is:
with boundary conditions
and
(9)
The ray’s starting point is
and its take-off direction is
. Travel time is then the path integral of the slowness along the ray, as can be seen by manipulating the formula for the directional derivative
:
(10)
The transport equation, written in terms of
, is:
(11)
![]()
Figure 1. (A) Basic ray theory nomenclature. Wave propagates outward from a source at
(black circle), through the medium, to the surface
(with normal
). Surfaces of equal travel time (wave fronts, grey curves) are labeled with their travel times
,
, etc., Normals to wave fronts define rays (blue curves) with tangents
. Neighboring rays enclosing a solid angle
at the source define a ray tube. (B) Relationship between ray tangents
and ray tube cross-sectional area S. Gauss’s theorem is applied to a small volume V along the ray tube, with the shape of a section of a cone, whose cross-sectional area S changes with arc-length
and whose volume is
. The tangent
is parallel to the sides of the section and normal to its ends. See text for further discussion.
The quantity
has a simple geometric interpretation, as can be seen by applying Gauss’ theorem (e.g. [20]) to a volume V along a ray tube, which has the shape of a section of a cone (Figure 1(B)). The cross-sectional area of the ray tube increases from S on the end nearest to the source, to
at a distance
further away. For small volumes, the integral in Gauss’ theorem is
where
. The surface integral in Gauss’ theorem has contributions only from the two ends of the cone, of
and
respectively, which sum to dS. Consequently, Gauss’s theorem implies
and the transport equation becomes:
(12)
According to the transport equation, the fractional decrease in
, measured along a ray, is equal to the fractional increase in area S of the ray tube. In many cases, the quantity
has the interpretation of the energy density, so the transport equation embodies conservation of energy. Conventionally, the area of the ray tube is written
, where
is the geometrical spreading function and
is the solid angle subtended by the ray tube at the source (e.g. [19]). Consequently,
where c is a constant. Ray-tracing algorithms that solve (9) typically tabulate both T and R (e.g. [21] [22]).
4. Adjoint Equation for Travel Time Tomography
The main purpose of this section is to derive and solve the adjoint equation needed to compute the quantity
, the derivative of the total travel time error with respect to a model parameter controlling the slowness of the medium. Our derivation focuses on expressing the equation in terms of quantities that vary along rays, so that it can be readily compared to the transport Equation (11). Our derivation is equivalent to, but different than, the one by [10], being a direct application of perturbation theory, as contrasted to one that employs Lagrange multipliers.
In travel time tomography, travel time observations
are considered to be the data, and the slowness
, or rather its approximation
, is the image function. In order to apply the adjunct methodology as outlined in the Introduction, the non-linear Eikonal equation must be linearized about a “background” solution. Let the slowness equal a background slowness
plus a small perturbation
, where
is a small parameter, and the corresponding travel time equal a background travel time
plus a small perturbation
. Then to first order in
, the Eikonal equation becomes:
(13)
Equating terms of equal order in
yields equations for the background travel time
and the perturbation in travel time
:
(14a,b)
Equation (14b) indicates that the component of
in the direction of the background ray direction
is
. Since
plays the role the source term in the differential equation, the formulation in (3) is applicable. If we define
to be an increment of arc length along the unperturbed ray, then this is just an equation involving the directional derivative
:
(15)
The perturbation in travel time is the integral of the perturbation in slowness along the unperturbed ray. We rewrite the equation for
as:
(16)
Using the rules
and
(e.g., [23]) we obtain an expression for the adjoint equation:
(17)
As is typical of first-order equations, the “left hand” boundary condition associated with
implies a “right hand” boundary condition for
(e.g. [14]); that is, while
at the source
at as the end point of the ray
(where it touches the boundary of the medium).
The adjoint Equation (17) can be further manipulated:
(18)
The formal solution to (17) is well-known (e.g. [24]):
(19)
Here the constant C is chosen to enforce the boundary condition
.
5. Analysis of the Role of the Geometrical Spreading
The main purpose of this section is show that the solution to the adjoint equation can be constructed from the geometrical spreading function, and to interpret this result.
In any region in which
, the adjoint Equation (18) has the same form as the transport Equation (12). Since the error
is rarely known within the medium, but rather only on its boundary
, this restriction is satisfied by all commonly-encountered cases. As we will show below, the similarity of form provides considerable insight into the behavior of the adjoint field
.
Ray divergence enters into the adjoint equation through the
term. In order to highlight its contribution, we first examine a solution in which this term is zero. Consider a plane wave propagating in the z-direction through a homogenous layer with
(Figure 2). The background travel time is
, the ray direction is
and
. The plane wave satisfies the background Eikonal equation (14a),since
. Since the rays of a plane wave do not diverge,
.
Now consider the case where the background slowness is everywhere too small by an amount b, so that the background error
grows linearly with distance z; that is,
. We will assume that this error is known only on the boundary
. Following (5), the adjoint equation is
. Because of the Dirac impulse function, the boundary condition for
requires some scrutiny. We will consider that the error is defined just below the boundary, at
, where
. In order to satisfy both the boundary condition of
and the adjoint equation, the solution must be discontinuous at
; and in the immediate vicinity of
must be
. Effectively, the boundary condition is
. The solution of the adjoint equation is
; note that it does not depend upon z.
Now consider a slowness perturbation in the form of a very thin rectangular prism, centered at
, of thickness D, and having sides at
and
, and
and
(so that its volume is
). Since the prism is very thin, it can be approximated as a Dirac impulse function in depth z:
(20)
Here,
is the Heaviside function, which is unity when its argument is positive and zero otherwise. The partial derivative of total error is:
(21)
An expected,
, since increasing
lowers the error. Also as expected,
is proportional to the area
of the prism, since the larger its area, the larger the region to which the slowness perturbation is applied. Interestingly,
is independent of the position
of the prism; that is, the prism can be moved up or down without affecting the error. As we will show below, this insensitivity to position is due to the absence of ray divergence in this plane wave case.
We now consider a spherical wave propagating in the r-direction in through a homogenous sphere with
(Figure 3), described by spherical polar coordinates
. The background travel time is
, the ray direction is
and
. The spherical wave satisfies the background Eikonal Equation (14a), since
. The area of a ray tube is
![]()
Figure 2. Rays (blue) of a plane wave cross a layer with bottom and top surfaces at
and
, respectively. A prismatic slowness perturbation (red rectangle) is placed within the layer, with left and right edges at
and
, respectively. The travel time error
, measured on the upper surface (top plot), is reduced in the region
where the rays project the prism. Because the rays do not diverge, the size of this region is independent of the depth of the perturbation.
![]()
Figure 3. Rays (blue) of a spherical wave start at a source at the center of the sphere at
and propagate outward through the sphere to its surface at
. A slowness perturbation with the shape of a spherical cap (red cap) is placed within the sphere at radius
, with left and right edges at polar angle
and
, respectively. The travel time error
, measured on the upper surface, is reduced in the region where the perturbation is projected by the rays (graph at top).
, from whence we conclude that the geometrical spreading function is
and the ray divergence is
. As in the plane wave case, the background slowness is everywhere too small by an amount b, leading to a background error
. We will assume that this error is known only on the boundary
. The adjoint Equation (18) reduces to:
(22)
The solution is
. As is asserted in the Introduction, the solution to this transport-like equation is related to the geometrical spreading function by
.
Now consider a slowness perturbation in the form of a very thin spherical cap of fixed thickness D, centered at
and
and subtending a variable polar angle area
such that its area is fixed as
:
(23)
For a position
away from the origin where a spherical cap of thickness D and area
is possible, the partial derivative of total error is:
(24)
The spherical wave solution (24) differs from the plane wave solution (21) by a factor that involves the ratio of geometric spreading functions,
, evaluated at the heterogeneity and the surface. The area, on the surface of the sphere, subtended by the prism decreases with its radius
, decreasing the error
over wider region (Figure 4). This example illustrates the importance of geometric spreading on the amplitude of the adjoint field and on the effectiveness of a given perturbation to reduce the error E. Given several perturbations of equal size, the most effective is one whose projection on the boundary, by rays interacting with it, is the largest.
Although the adjoint field is singular at the source (ray starting point)
, the partial derivative
is finite there, as can be seen by considering a spherical heterogeneity of radius
centered on the origin of the form
:
(25)
when the background slowness
is spatially varying, the rays have a complicated spatial pattern and the background error
, measured on the boundary
, is spatially varying. Suppose that the medium has a surface
with outward pointing normal
. A ray connecting an interior point
to
can be labeled by
. Then,
means the point on a boundary at which a ray passing through
ends, and arc-length
means the distance at
along a ray that ends at
. Similarly, the geometrical spreading
![]()
Figure 4. Rays (blue) of a spherical wave, as in Figure 3. One of two alternate slowness perturbations (green and red caps) are placed within the sphere, at radii
and
, respectively, with
. These caps have equal area
and equal thickness D. The travel time error
, measured on the surface of the sphere, is reduced in the region where the perturbation is projected by the rays (green and red curves in top plot). The reduction in error in this region is the same in both cases, because the thicknesses of the perturbations are equal. However, because the rays diverge, the size affected region is larger for the perturbation at
.
function can be written as
; that is, the geometrical spreading function at
associated with the ray that ends at
. Then, the adjoint field is then:
(26)
Here, the dot product between the ray tangent and surface normal is introduced to account for the increased surface area intersected by the ray tube, in the case (unlike the examples, above) where the ray tube obliquely impinges upon the boundary. Now, suppose that slowness perturbation is represented with voxels, where voxel k has volume
, amplitude
, and centroid position
. When the adjoint field varies slowly compared to the length scale of a voxel (a requirement that excludes the source point) the error derivative is:
(27)
Here
is the end point of the ray passing through
. This result emphasizes the link between the geometrical spreading function R and the partial derivative of total error E. (When the voxel is close to, or overlaps the origin,
is still well-defined and finite, but the inner product in (27) must be computed appropriately).
6. Conclusion
The key result in this paper is the demonstration that the adjoint equation in ray-based travel time tomography has the same form as the well-known transport equation for ray theoretical amplitudes. Consequently, the spatial variation of the adjoint field
is completely controlled by the geometrical spreading function R. This result provides an intuitive understanding of the primary factor controlling the size of the partial derivative
of total
error E with respect to the slowness
of a voxel. The partial derivative
is large when ray divergence causes the projection of the voxel on the measurement surface to be large. Since this result provides an explicit formula for
in terms of R, it enables
to be calculated without resorting to the numerical solution of the adjoint equation. Only an inner product needs to be calculated, and in the case of a voxel parameterization of the slowness image, it can be calculated trivially.
Acknowledgements
I thank the graduate students who participated in Columbia University’s 2017 Seminar in Adjoint Methods for helpful discussion.