Two-Level Block Decompositions for Solving Helmholtz Equation via Chebyshev Pseudo Spectral Method

In this paper, we consider solving the Helmholtz equation ( ) 2 2 , u u f x y ω ∇ + = in the Cartesian domain [ ] [ ] 1,1 1,1 Ω = − ⊗ − , subject to homogeneous Dirichlet boundary condition, discretized with the Chebyshev pseudo-spectral method. The main purpose of this paper is to present the formulation of a two-level decomposition scheme for decoupling the linear system obtained from the discretization into independent subsystems. This scheme takes advantage of the homogeneity property of the physical problem along one direction to reduce a 2D problem to several 1D problems via a block diagonalization approach and the reflexivity property along the second direction to decompose each of the 1D problems to two independent subproblems using a reflexive decomposition, effectively doubling the number of subproblems. Based on the special structure of the coefficient matrix of the linear system derived from the discretization and a reflexivity property of the second-order Chebyshev differentiation matrix, we show that the decomposed submatrices exhibits a similar property, enabling the system to be decomposed using reflexive decompositions. Explicit forms of the decomposed submatrices are derived. The decomposition not only yields more efficient algorithm but introduces coarse-grain parallelism. Furthermore, it preserves all eigenvalues of the original matrix.


Introduction
In this paper, we consider the numerical solution to the Helmholtz equation ∇ is the Laplacian operator, ω is a real parameter representing the wavenumber, u is the amplitude associated with wave propagation, and ( ) , f x y is a forcing function.Without loss of generality we assume u = 0 on the boundary, i.e., ( ) ( ) , , , and 0 on u u f x y x y u We present a highly parallel numerical algorithm for solving the Helmholtz equation discretized by the Chebyshev pseudospectral scheme.Chebyshev pseudospectral methods have long been used to numerically solve partial differential equations [1] [2] [3].This discretization yields a linear system with coefficient matrix of a special block structure that allows for a block diagonalization which decouples the original matrix, along with some proper permutation of the decoupled matrix into several independent submatrices.By exploiting a reflexivity property that is inherent in the Chebyshev differentiation matrix, it is possible to further decompose each of the submatrices into two independent submatrices when the boundary conditions at both ends of the decoupled 1D subproblems are the same.This second level of decomposition effectively doubles the number of independent submatrices, yielding a higher degree of course-grain parallelism.
This paper is organized as follows.In Section 2, we present the general structure of the linear system derived from the Chebyshev collocation method discretized independently in x and y direction.In Section 3, we address the two-level block decomposition.This decomposition was proposed by the author in [4] for solving Poisson equations with identical grids in either direction.In this paper, we reformulate the decomposition to allow for the number of grid points in the x direction to be different from that in the y direction and derive the explicit forms for the decomposed submatrices, making applications of this approach more flexible.The computational procedure and a numerical example to demonstrate its usefulness are presented in Sections 4 and 5, respectively.

Linear System from the Chebyshev Collocation Method
In this section, we briefly describe the Chebyshev collocation method for solving the Helmholtz equation in two dimensions.We assume that the problem is discretized with a tensor product grid ( ) x y , 0 ,0 , where i x and j y are the Chebyshev points in x and y direction, respectively.Let x D , indexed from 0 to x N , be the ( ) ( ) tiation matrix associated with the x direction.The entries of x D are given as [1]   [2] [3] ( ) ( ) , where 2 for 0 or 1 otherwise The Chebyshev spectral differentiation matrix y D associated with the y direction can be obtained in a similar may.

Now, let
x A be the stripped matrix of x With the grid just described, the Chebyshev collocation method yields, after removing the boundary nodes whose values are already known from the system, the following linear system ( ) where 1 x p N = − , 1 y q N = − , and k I represents the identity matrix of dimen- sion k.
Many solution schemes can be employed to solve Equation (3), including Gaussian eliminations, Gauss-Seidel iterations, ADI iterations, and matrix diagonalizations [5] [6] [7] [8].In this paper, the block version of the matrix diagonalization method will be employed to first decompose the matrix K into a block diagonal matrix consisting of q diagonal blocks.Each diagonal block will then be further decomposed into two smaller diagonal sub-blocks using reflexive decompositions, yielding a total of 2q diagonal blocks.Note that q is the number of internal grid points along the y direction.This two-step block decomposition scheme allows for the linear system Equation (3) to be decoupled into 2q linear subsystems which can then be solved in parallel with course-grain parallelism using multiprocessors or networked computers.

Level-1 Decomposition: Block Diagonalization
The diagonalization scheme has been used elsewhere [7] [8] [9].In this section, however, we address the decomposition of the matrix K into q diagonal blocks using a different representation.To begin with, let y Q be such a matrix that the similarity transformation A natural choice of y Q is to have its columns formed from the eigenvectors of y A so that y Λ is a diagonal matrix that consists of the eigenvalues of y A .
Now split the coefficient matrix K in Equation ( 3) into three parts 1 K , 2 K , and 3 K : ( ) We have the transformed matrix with , , , where .
We now show that the transformed matrix K can be permuted to yield a block diagonal matrix.First, we observe that both 1 K and 3 K are diagonal matrices and 2 K is a diagonal block matrix since .
Let ij a be the ( ) , i j q = . In its matrix form, K can be expressed as where I is of dimension q.Apparently, K is also a diagonal block matrix because both I and y Λ are diagonal matrices.The diagonal block matrix K can be rearranged to yield a block diagonal matrix K : via appropriate permutations of rows and columns as shown below.Let ij K be the ( ) block of K and P be such a permutation matrix that the transformation T P KP rearranges the entries of K in such a way that , where λ being the th l eigenvalue of y A .Accordingly, In other words, the matrix K has been transformed to a block diagonal matrix consisting of q blocks.This decomposition takes the advantage of the homogeneity property of the problem with the boundary conditions mentioned in Equation ( 1) along the y direction to decouple the 2D problem into q independent 1D problems.A similar decomposition can be performed by using the homogeneity property of the problem along the x direction.

Level-2 Decomposition: Reflexive Decomposition
As seen from the previous section, the matrix K can be transformed to a block diagonal matrix, implying that the original problem can be decomposed into independent subproblems which can then be solved in parallel using multiprocessors.Although each diagonal block ˆl K in Equation ( 6) can be further decomposed into a diagonal matrix using a single transformation matrix x Q that consists of the eigenvectors of x A , this diagonalization yields little advan- tage in general, especially a multiprocessing environment, unless the eigenpairs are readily available, which is not the case for the matrix x A .
In this section, we shall employ the reflexivity property of the diagonal blocks ˆl K to further decompose each of them into two independent sub-blocks using another decomposition technique called the reflexive decomposition [10].
Let n J be the cross identity matrix of dimension n: . It is trivial to see that n n n J J I = .Now, we note that the Chebyshev differentiation matrix x D is anti-reflexive and D is reflexive, with respect to 1 N J + .In other words, and . x Accordingly, the p p × matrix x A , which is the stripped matrix of ), is reflexive with respect to p J .In fact, it is a centro-symmetric matrix [11] [12].Recall that ( ) ( ) indicating that each ˆl K can be decomposed via reflexive decompositions into two block diagonal submatrices of almost equal size, depending on whether p is even or odd.The decompositions can be done through orthogonal transformations which have been shown in [10].Here we just present the results.
First consider the case when p is even.Assume and x A is evenly partitioned into 2 2 × sub-blocks: Taking the advantage of the reflexivity property of ˆl K by applying the orthogonal transformation T Â l A X K X to ˆl K where 1 , one can easily decompose ˆl K into two decoupled diagonal subblocks of equal size: where By taking the transformation matrix where ( ) ( ) which obviously consists of 2q independent diagonal blocks since each l D has two independent diagonal blocks.

Computational Procedure
The numerical solution to the Helmholtz Equation ( 1) discretized by the Chebyshev pseudospectral method yields the linear system where p I is the identity matrix of dimension p, y A is a q q × , x A a p p × , and K a pq pq × matrix.Here p (q) is the number of internal grid points in the x (y) direction.The two-level decomposition scheme described in this paper yields the following transformed linear system consisting of 2q independent subsystems, each of dimension 2 p when p is even., .
The sizes of the sub-blocks in D differ at most by 1 when p is an odd number.
Computationally, this decomposition is very attractive.Not only can the decoupled subsystems be solved much more efficiently, they can also be solved in parallel with course-grain parallelism using a computer system with multiprocessors.This two-level composition consists of the following three stages.1) Transform Ku f This is a typical similarity transformation that is achieved by premultiplying both sides of Ku f = with 1 Q − and inseting 1 QQ − , which is simply an identity matrix, between K and u.The matrix Q has been defined in Section 3.1 and K can be found in Equation (5). 2 where The main purpose of this transformation is to reorder the unknowns of the linear system so that the decoupled coefficent matrix K is converted to a block diagonal matrix ( K ) for the sake of computational convenience.The matrix P, which is just a permutation matrix, has been explained in Section 3.1 and K can be found in Equation (6). 3 This transformation is analogous to that in stage 1 except that X here is an orthogonal matrix.Therefore we replace 1 X − by T X .The matrices X and D can be found in Section 3.2 The advantage of this approach is that the decomposed form of D is explicitly known and, therefore, the actual decompositions from K to K , from K to K , and from K to D are not necessary.Furthermore, the diagonal blocks of D can be obtained without any matrix-matrix multiplications.The computational procedure of this two-level decomposition boils down to the following three typical steps.
1) Decomposition step.In this step, we shall compute D and g.Since each diagonal block of D,

(
) ( ) ( ) is independent and explicitly known, D can be obtained in parallel using 2q processors, without any difficulty.The main task here is, thus, to obtain g.This can be achieved by transforming f to f first, which involves solving the linear system Qf f , is a block diagonal matrix and, therefore, f can be obtained by solving p independent linear subsystems, each of size q, instead of solving the linear system as a whole, which is of size pq.This step can be performed in parallel with large granularity.The matrix y Q , consisting of the eigenvectors of y A , is in general not known explicitly and, thus, has to be computed from y A .Fortunately, y A is only of dimension q.Once f has been computed, f and g can be obtained easily because f is simply a permuted version of f and T ĝ X f = involves no matrix-vector multiplications due to the special form of X.
2) Solution step.In this step, we solve the linear system Dv q = for v.Note that D consists of 2q diagonal blocks: 1, 2, , l q = . By consistently partitioning the vectors x and g into 2q parts, based on the size of the blocks, the linear system Dv g = can now be expressed as . Since all of the subsystems are independent, the linear system Dv g = can be solved in parallel, using 2q processors, one for each subsystem.
3) Retrieval step.Once v has been obtained, the solution u to the original system can be retrieved from v as follows.We first compute û from v using û Xv = , which again involves no matrix-vector multiplication.We then obtain u by simply permuting û using û Pu = . The final solution u can now be computed by the matrix multiplication u Qu = .Note that q A X I X = ⊗ and p y Q I Q = ⊗ , implying that û and u can be obtained in parallel with coursegrain parallelism.
To end this section, it deserves mentioning that this two-level decomposition is a similarity transformation and therefore, all eigenvalues of the original matrix K are preserved in matrix D. Obtaining eigenvalues from D is far more efficient than from K.

Numerical Example
To demonstrate the usefulness of the two-level decomposition scheme, we For simplicity, we have rounded off the numbers after the last digit shown.
First we observe that both x A and y A are reflexive.In fact, they are centrosymmetric matrices.Let the columns of y Q be the eigenvectors of y A , of dimension 3, and T P be the following permutation matrix of dimension 12, ˆˆˆˆ, , we easily obtain the decoupled diagonal blocks ˆl K without the need of any matrix multiplications once the eigenvalues of y A , ( ) where the unshown entries in the matrices are 0. Apparently, each ˆl K has How to cite this paper: Chen, H.-C. (2018) Two-Level Block Decompositions for Solving Helmholtz Equation via Chebyshev condition, where 2 present a numerical example derived from the Helmholtz equation with 3 ω = over a square domain on [ ] [ ]