Two-Level Block Decompositions for Solving Helmholtz Equation via Chebyshev Pseudo Spectral Method ()
1. Introduction
In this paper, we consider the numerical solution to the Helmholtz equation
in the Cartesian domain
with homogeneous Dirichlet boundary condition, where
is the Laplacian operator,
is a real parameter representing the wavenumber, u is the amplitude associated with wave propagation, and
is a forcing function. Without loss of generality we assume u = 0 on the boundary, i.e.,
(1)
We present a highly parallel numerical algorithm for solving the Helmholtz equation discretized by the Chebyshev pseudospectral scheme. Chebyshev pseudo-spectral methods have long been used to numerically solve partial differential equations [1] [2] [3] . This discretization yields a linear system with coefficient matrix of a special block structure that allows for a block diagonalization which decouples the original matrix, along with some proper permutation of the decoupled matrix into several independent submatrices. By exploiting a reflexivity property that is inherent in the Chebyshev differentiation matrix, it is possible to further decompose each of the submatrices into two independent submatrices when the boundary conditions at both ends of the decoupled 1D subproblems are the same. This second level of decomposition effectively doubles the number of independent submatrices, yielding a higher degree of course-grain parallelism.
This paper is organized as follows. In Section 2, we present the general structure of the linear system derived from the Chebyshev collocation method discretized independently in x and y direction. In Section 3, we address the two-level block decomposition. This decomposition was proposed by the author in [4] for solving Poisson equations with identical grids in either direction. In this paper, we reformulate the decomposition to allow for the number of grid points in the x direction to be different from that in the y direction and derive the explicit forms for the decomposed submatrices, making applications of this approach more flexible. The computational procedure and a numerical example to demonstrate its usefulness are presented in Sections 4 and 5, respectively.
2. Linear System from the Chebyshev Collocation Method
In this section, we briefly describe the Chebyshev collocation method for solving the Helmholtz equation in two dimensions. We assume that the problem is discretized with a tensor product grid
,
, where
and
are the Chebyshev points in x and y direction, respectively. Let
, indexed from 0 to
, be the
Chebyshev spectral differentiation matrix associated with the x direction. The entries of
are given as [1] [2] [3]
where
. The Chebyshev spectral differentiation matrix
associated with the y direction can be obtained in a similar may.
Now, let
be the stripped matrix of
by removing the first and last rows and columns of
and
be the stripped matrix of
. We have
(2)
With the grid just described, the Chebyshev collocation method yields, after removing the boundary nodes whose values are already known from the system, the following linear system
(3)
where
,
, and
represents the identity matrix of dimension k.
Many solution schemes can be employed to solve Equation (3), including Gaussian eliminations, Gauss-Seidel iterations, ADI iterations, and matrix diagonalizations [5] [6] [7] [8] . In this paper, the block version of the matrix diagonalization method will be employed to first decompose the matrix K into a block diagonal matrix consisting of q diagonal blocks. Each diagonal block will then be further decomposed into two smaller diagonal sub-blocks using reflexive decompositions, yielding a total of 2q diagonal blocks. Note that q is the number of internal grid points along the y direction. This two-step block decomposition scheme allows for the linear system Equation (3) to be decoupled into 2q linear subsystems which can then be solved in parallel with course-grain parallelism using multiprocessors or networked computers.
3. Two-Level Block Decompositions
3.1. Level-1 Decomposition: Block Diagonalization
The diagonalization scheme has been used elsewhere [7] [8] [9] . In this section, however, we address the decomposition of the matrix K into q diagonal blocks using a different representation. To begin with, let
be such a matrix that the similarity transformation
diagonalizes
into
:
A natural choice of
is to have its columns formed from the eigenvectors of
so that
is a diagonal matrix that consists of the eigenvalues of
. Now split the coefficient matrix K in Equation (3) into three parts
,
, and
:
and let
. We have the transformed matrix
(4)
with
We now show that the transformed matrix
can be permuted to yield a block diagonal matrix. First, we observe that both
and
are diagonal matrices and
is a diagonal block matrix since
and
Accordingly,
(5)
Let
be the
element of
,
. In its matrix form,
can be expressed as
where I is of dimension q. Apparently,
is also a diagonal block matrix because both I and
are diagonal matrices. The diagonal block matrix
can be rearranged to yield a block diagonal matrix
:
via appropriate permutations of rows and columns as shown below. Let
be the
block of
and P be such a permutation matrix that the transformation
rearranges the entries of
in such a way that
where
is the
element of
and
is the
diagonal element of
. Then we have
(6)
where
with
being the
eigenvalue of
. Accordingly,
since
. In other words, the matrix K has been transformed to a block diagonal matrix consisting of q blocks. This decomposition takes the advantage of the homogeneity property of the problem with the boundary conditions mentioned in Equation (1) along the y direction to decouple the 2D problem into q independent 1D problems. A similar decomposition can be performed by using the homogeneity property of the problem along the x direction.
3.2. Level-2 Decomposition: Reflexive Decomposition
As seen from the previous section, the matrix K can be transformed to a block diagonal matrix, implying that the original problem can be decomposed into independent subproblems which can then be solved in parallel using multi-processors. Although each diagonal block
in Equation (6) can be further decomposed into a diagonal matrix using a single transformation matrix
that consists of the eigenvectors of
, this diagonalization yields little advantage in general, especially a multiprocessing environment, unless the eigenpairs are readily available, which is not the case for the matrix
.
In this section, we shall employ the reflexivity property of the diagonal blocks
to further decompose each of them into two independent sub-blocks using another decomposition technique called the reflexive decomposition [10] .
Let
be the cross identity matrix of dimension n:
. It is
trivial to see that
. Now, we note that the Chebyshev differentiation matrix
is anti-reflexive and
is reflexive, with respect to
. In other words,
Accordingly, the
matrix
, which is the stripped matrix of
(Equation (2)), is reflexive with respect to
. In fact, it is a centro-symmetric matrix [11] [12] . Recall that
. Therefore,
satisfy the reflexivity property
indicating that each
can be decomposed via reflexive decompositions into two block diagonal submatrices of almost equal size, depending on whether p is even or odd. The decompositions can be done through orthogonal transformations which have been shown in [10] . Here we just present the results.
First consider the case when p is even. Assume
for some
,
and
is evenly partitioned into
sub-blocks:
. Taking
the advantage of the reflexivity property of
by applying the orthogonal
transformation
to
where
, one can easily
decompose
into two decoupled diagonal subblocks of equal size:
(7)
where
and
In the case when p is odd, say
, we consistently partition
and J as
By taking the transformation matrix
to be
, the matrix
can be decoupled via the orthogonal transformation
as
where
and
In either case, let
. The orthogonal transformation
yields
which obviously consists of 2q independent diagonal blocks since each
has two independent diagonal blocks.
4. Computational Procedure
The numerical solution to the Helmholtz Equation (1) discretized by the Chebyshev pseudospectral method yields the linear system
where
is the identity matrix of dimension p,
is a
,
a
, and K a
matrix. Here p (q) is the number of internal grid points in the x (y) direction. The two-level decomposition scheme described in this paper yields the following transformed linear system consisting of 2q independent
subsystems, each of dimension
when p is even.
The sizes of the sub-blocks in D differ at most by 1 when p is an odd number. Computationally, this decomposition is very attractive. Not only can the decoupled subsystems be solved much more efficiently, they can also be solved in parallel with course-grain parallelism using a computer system with multi-processors. This two-level composition consists of the following three stages.
1) Transform
to
where
,
, and
.
This is a typical similarity transformation that is achieved by premultiplying both sides of
with
and inseting
, which is simply an identity matrix, between K and u. The matrix Q has been defined in Section 3.1 and
can be found in Equation (5).
2) Transform
to
where
,
, and
.
The main purpose of this transformation is to reorder the unknowns of the linear system so that the decoupled coefficent matrix
is converted to a block diagonal matrix (
) for the sake of computational convenience. The matrix P, which is just a permutation matrix, has been explained in Section 3.1 and
can be found in Equation (6).
3) Transform
to
where
,
, and
.
This transformation is analogous to that in stage 1 except that X here is an orthogonal matrix. Therefore we replace
by
. The matrices X and D can be found in Section 3.2
The advantage of this approach is that the decomposed form of D is explicitly known and, therefore, the actual decompositions from K to
, from
to
, and from
to D are not necessary. Furthermore, the diagonal blocks of D can be obtained without any matrix-matrix multiplications. The computational procedure of this two-level decomposition boils down to the following three typical steps.
1) Decomposition step. In this step, we shall compute D and g. Since each diagonal block of D,
is independent and explicitly known, D can be obtained in parallel using 2q processors, without any difficulty. The main task here is, thus, to obtain g. This can be achieved by transforming f to
first, which involves solving the linear system
. It is essential to note that Q,
, is a block diagonal matrix and, therefore,
can be obtained by solving p independent linear subsystems, each of size q, instead of solving the linear system as a whole, which is of size pq. This step can be performed in parallel with large granularity. The matrix
, consisting of the eigenvectors of
, is in general not known explicitly and, thus, has to be computed from
. Fortunately,
is only of dimension q. Once
has been computed,
and g can be obtained easily because
is simply a permuted version of
and
involves no matrix-vector multiplications due to the special form of X.
2) Solution step. In this step, we solve the linear system
for v. Note that D consists of 2q diagonal blocks:
and
,
. By consistently partitioning the vectors x and g into 2q parts, based on the size of the blocks, the linear system
can now be expressed as
for
. Since all of the subsystems are independent, the linear system
can be solved in parallel, using 2q processors, one for each subsystem.
3) Retrieval step. Once v has been obtained, the solution u to the original system can be retrieved from v as follows. We first compute
from v using
, which again involves no matrix-vector multiplication. We then obtain
by simply permuting
using
. The final solution u can now be computed by the matrix multiplication
. Note that
and
, implying that
and u can be obtained in parallel with coursegrain parallelism.
To end this section, it deserves mentioning that this two-level decomposition is a similarity transformation and therefore, all eigenvalues of the original matrix K are preserved in matrix D. Obtaining eigenvalues from D is far more efficient than from K.
5. Numerical Example
To demonstrate the usefulness of the two-level decomposition scheme, we present a numerical example derived from the Helmholtz equation with
over a square domain on
, subject to homogeneous Dirichlet boundary conditions on a grid with
and
. The numerical results presented in this section were conducted using GNU Octave which is software that offers many powerful high-level programming features for scientific computations, similar to Matlab. It suffices to show the final decomposed submatrices to illustrate the advantage of this approach. By excluding the grid points on the boundary and using the lexicographic ordering to number the internal nodes [3] , the matrix K, of dimension 12, yields
where
For simplicity, we have rounded off the numbers after the last digit shown. First we observe that both
and
are reflexive. In fact, they are centrosymmetric matrices. Let the columns of
be the eigenvectors of
, of dimension 3, and
be the following permutation matrix of dimension 12,
From the Level-1 decomposition presented in Section 3,
we easily obtain the decoupled diagonal blocks
without the need of any matrix multiplications once the eigenvalues of
,
, are available:
where
The numerical values of
clearly indicate that each
is reflexive with respect to
. Now, applying the Level-2 decomposition to each
by taking
yields
where the unshown entries in the matrices are 0. Apparently, each
has been further decomposed into two independent diagonal blocks, yielding a total of six independent diagonal blocks from the original matrix K. Note that the decomposed matrices
can be obtained directly from
,
, and
using Equation (7) without the need of computing
. Note also that the eigenvalues of K, of dimension 12, can now be easily obtained from the six diagonal blocks, each of dimension only 2, a clear indication of the advantage of this approach.
6. Conclusions
In this paper, we have presented a two-level block decomposition scheme for the numerical solution to the Helmholtz equation
in the Cartesian domain
, subject to homogeneous Dirichlet boundary condition, discretized with the Chebyshev pseudo-spectral method. For a computational grid with
grid points in the x direction and
in the y direction, our first-level decomposition yields
independent 1D subproblems, excluding the known values at boundary points, using the eigen-pairs of the 1D second-order Chebyshev differentiation matrix along y. The decoupled coefficient matrices are then rearranged to form a block diagonal matrix through a proper permutation. The second level of decomposition further decouples each of the subproblem into two smaller and independent subproblems using reflexive decompositions, yielding a total of
independent subproblems.
The general form of the final decoupled submatrices has also been explicitly derived, eliminating the need of performing matrix-matrix multiplications during the decomposition step. A numerical example is presented to illustrate the applicability and to demonstrate the advantage of this two-level decomposition scheme. This scheme leads naturally to a highly parallel numerical algorithm for solving the problem under consideration, since all subproblems can be solved in parallel using multiprocessors.
To close our conclusion, it deserves mentioning that this block decomposition can be readily employed for finding the eigenvalues of the problem as well. This is due to the fact that all eigenvalues of the original matrix are preserved in the decomposed submatrices because the first-level decomposition is a similarity transformation and the second-level decomposition is an orthogonal transformation. The preservation of all eigenvalues in the submatrices makes this block decomposition scheme even more attractive because finding eigenvalues from the decomposed submatrices is obviously much more efficient than from the original matrix, not to mention the computational benefit that can be achieved from the large-grain parallelism induced by the decomposition.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.