New Approach for the Inversion of Structured Matrices via Newton’s Iteration ()
1. Introduction
Frequently, matrices encountered in practical computations that have some special structures which can be exploited to simplify the computations. In particular, computations with dense structured matrices are ubiquitous in sciences, communications and engineering. Exploitation of structure enables dramatic acceleration of the computations and a major decrease in memory space, but sometimes it also leads to numerical stability problems. The best-known classes of structured matrices are Toeplitz and Hankel matrices, but Cauchy and Vandermonde types are also quite popular. The computations with such matrices are widely applied in the areas of algebraic coding, control, signal processing, solution of partial differential equations and algebraic computing. For ex- ample, Toeplitz matrices arise in some major signal processing computations and Cauchy matrices appear in the study of integral equations and conformal mappings. The complexity of computations with
dense structured matrices dramatically decreases in comparison with the general
matrices, that is, from the order of
words of storage space and
arithmetic operations (ops) with
in the best algorithms, to
words of storage space and to
ops (and sometimes to
ops) (Table 1). The displacement rank
for an
Toeplitz and Hankel matrix is at most 2. A matrix is Toeplitz-like or Hankel-like if
is small or bounded by a small constant independent of
and
. Those matrices can be represented by
entries of its short displacement generators instead of
entries which leads to more efficient storage of the entries in computer memory and much faster computations [1] . In this paper, we will focus our study on Newton’s iteration for computing the inverse of a structured matrix and we will analyze the resulting algorithms. This iteration is well-known for its numerically stability and faster convergence. Newton’s iteration
(1)
for matrix inversion was initially proposed by Schultz in 1933 and studied by Pan and others. The authors of [2] described quadratically convergent algorithms for the refinement of rough initial approximations to the inverse of Toeplitz and Toeplitz-like matrices and to the solutions of Toeplitz and Toeplitz linear systems of equations. The algorithms are based on the inversion formulas for Toeplitz matrices. The Cauchy-like case was studied in [3] . Since those matrices can be represented by their short generators, which allow faster com- putations based on the displacement operators tool, we can employ Newton’s iteration to compute the inverse of the input structured matrices. However, Newton’s iteration destroys the structure of the structured matrices when used. Therefore, in Section 5 we will study two compression techniques in details, namely, truncation of the smallest singular values of the displacement and the substitution of approximate inverses for the inverse matrix into its displacement expression to preserve the structure. Finally, each iteration step is reduced to two multiplications of structured matrices and each iteration is performed by using nearly linear time and optimal memory space which leads to superfast algorithm, especially if the input matrix is a well-conditioned structured matrix. Our main algorithm of Section 6 can be applied to more general classes of structured matrices. We will support our analysis with numerical experiments with Toeplitz matrices in Section 7.
2. Definition of Structured Matrices
Definition 1 A matrix
is a Toeplitz matrix if
for every pair of its entries
and
. A matrix
is a Hankel matrix if
for every pair of its entries
and
. For a given vector
, the matrix
of the form
is called a Vander-
monde matrix. Given two vectors s and t such that
for all i and
, the
matrix ![]()
is a Cauchy (generalized Hilbert) matrix where
.
Example 1
,
,
, and ![]()
are
Teoplitz, Hankel, Vandermonde, and Cauchy matrices respectively that can be represented by the 5-dim vector
, the 5-dim vector
, the 3-dim vector
, and the 3- dim vectors
and
respectively.
We refer the reader to [4] and [5] for more details on the basic properties and features of structured matrices.
3. Displacement Representation of Structured Matrices
The concept of displacement operators and displacement rank, initially introduced by Kailath, Kung, and Morf in 1979, and studied by other authors such as Bini and Pan, is a powerful tool for dealing with matrices with structure. The displacement rank approach was intended for more restricted use when it was initially introduced in [6] (also [7] ), namely, to measure how “close” to Toeplitz a given matrix is. Then the idea turned out to be even deeper and more powerful, thus it was developed, generalized and extended to other structured matrices. In this paper, we consider the most general and modern interpretation of the displacement of a matrix
(see Definition 2). The main idea is, for a given structured matrix
, we need to find an operator
that transforms the matrix
into a low rank matrix
, such that one can easily recover
from its image
and operate with low rank matrices instead. Such operators that shift and scale the entries of the structured matrices turn out to be appropriate tools for introducing and defining the matrices of Toeplitz-like, Hankel-like, Vandermonde-like, and Cauchy-like types which we will introduce in Definition 4.
Definition 2 For any fixed field
(say the complex field
) and a fixed pair
of operator matrices, we define the linear displacement operators
of Sylvester type
:
(2)
and Stein type
:
. (3)
The image
of the operator
is called the displacement of the matrix
. The operators of Sylvester and Stein types can be transformed easily into one another if at least one of the two associated operator matrices is nonsingular. This leads to the following theorem.
Theorem 1
if the operator matrix
is nonsingular, and
if the
operator matrix
is nonsingular.
Proof.
, and
.
The operator matrices that we will use are the unit f-circulant matrix
, and the diagonal matrix
. Note that
is unit lower triangular Toeplitz matrix.
is an f-circulant for a vector
. For example,
is a lower triangular matrix.
![]()
Table 1. Parameters and flops (floating point arithmetic operations) count.
Example 2 (Toeplitz Matrices). Let
. Then if we apply the displacement operator of Sylvester
type (2) to Toeplitz matrix
using the shift operator matrices
and
, we will get:
.
This operator displaces the entries of the matrix
so that the image can be represented by the first row and the last column. Furthermore, the image is a rank
matrix
Notice that the operator matrices vary depending on the structure of the matrix
. We use shift operator matrices (such as
and
for any scalar
) for the structures of Toeplitz and Hankel type, scaling operator matrices (such as
) for the Cauchy structure, and shift and scaling operator matrices for the structure of Vandermonde type. One can restrict the choices of
to 0 and 1. However, other operator matrices may be used with other matrix structures (see Table 2).
Definition 3 For an
structured matrix
and an associated operator
, or
, the number
is called the
-rank or the displacement rank of the matrix
.
The constant
in Definition 3 usually remains relatively small as the size of
grows large, that is,
is
small relative to
, say
or
as
and
grow large. In this case, we
call the matrix
-like and having structure of type
. Now let us recall the linear operators
of (2) and (3) that map a matrix
into its displacements:
(4)
and
(5)
where
and
are operator matrices of Table 2,
and
are
ma- trices. Then the matrix pair
is called a (nonunique)
-generator (or displacement generator) of length
for
,
(the
-rank or the displacement rank of
). If
is an
matrix and
is small relative to
and
, then
is said to have
-structure or to be an
-structured matrix. The displacement
(as a matrix of small rank) can be represented with a short generator defined by only a small number of parameters. For instance, the singular value decomposition (SVD) of the matrix
of size
gives us its orthogonal generator that consists of two generator matrices
and
having orthogonal columns and of sizes
and
respectively, that is a total of
entries. The matrix pair
is called an
-generator (or displacement generator) of length
for
. It is also a generator of length
for
. Note that the pair
is nonunique for a fixed
. In particular, for Toeplitz, Hankel, Vandermonde, and Cauchy matrices, the length
of the associated generators (4) and (5) is as small as 1 or 2 for some appropriate choices of the operator matrices
and
of Table 2. The four classes of structured matrices are naturally extended to Toeplitz-like, Hankel-like, Vandermonde-like, and Cauchy- like matrices, for which
is bounded by a fixed (not too large) constant.
Definition 4 An
matrix
is Toeplitz-like if
is small relative to
and if
![]()
Table 2. Operator Matrices for
and
.
is given with its
-generator of length
, where
,
,
, and
for a fixed pair of scalars
and
. A matrix
is Hankel-like if
or
is Toeplitz-
like where
is the reflection matrix.
Therefore, the problems of solving Toeplitz-like and Hankel-like linear systems are reduced to each other.
Remark 1 In the case where the operator
is associated with Toeplitz, Hankel, Vandermonde, or Cauchy matrices
, we call an
-like matrix Toeplitz-like, Hankel-like, Vandermonde-like, or Cauchy-like matrix, respectively, and we say that the matrix has the structure of Toeplitz, Hankel, Vandermonde, or Cauchy type respectively. To take advantage of the matrix structure, we are going to COMPRESS the structured input matrix
via its short L-generators based on Theorem 2 below, then we will OPERATE with L-generators rather than with the matrix itself, and finally RECOVER the output from the computed short L-generators.
Definition 5 A linear operator
is nonsingular if the matrix equation
implies that
.
Definition 6 We define the norm of a nonsingular linear operator
and its inverse
as follows:
and
, for
, or
, and where
the supremum is taken over all matrices
having a positive
-rank of at most
. In this case the condition
number
of the operator
is defined as ![]()
Theorem 2 [8] Let
and
be a pair of
matrices,
and
, where
and
denote the
-th columns of
and
respectively. Let
. Then we have one of
the following Toeplitz type matrices:
, where
,
.
, where
,
.
, where
,
.
, where
,
.
Theorem 3 Let M be a nonsingular matrix, then we have the following:
1)
.
2)
, if
is a nonsingular matrix.
3)
, if
is a nonsingular matrix.
Proof.
1) ![]()
![]()
2) ![]()
![]()
3) ![]()
![]()
Theorem 4 Let a pair of
matrices G and
form a
-generator of length
for a nonsingular matrix
. Write
and
. Then
.
Proof. By Theorem 3,
, where
and
. ![]()
4. Newton’s Iteration for General Matrix Inversion
Given an
nonsingular matrix
and an initial approximation
to its inverse
that satisfies the bound
for some positive
and some fixed matrix norm, Newton’s iteration (1) can be written in the form
(6)
and rapidly improves the initial approximation
to the inverse.
can be supplied by either some computational methods such as preconditioning conjugate gradient method or by an exact solution algorithm that requires refinement due to rounding errors.
Lemma 1 Newton’s iteration (6) converges quadratically to the inverse of an
nonsingular matrix
.
Proof. From (6) , since
,
, then
. We define the error matrices
,
and the residual matrices
,
. Since
, then
and hence the norm of the residual matrices
is bounded above by
. This proves that
Newton’s iteration converges quadratically. ![]()
If
is a given residual norm bound, the bound
(7)
is guaranteed in
recursive steps (6), where
stands for the smallest integer not exceeded by
. (7) also implies that
. Since
, it follows that
,
so that the matrix
approximates
with a relative error norm less than
.
Remark 2 Each step of Newton’s iteration (6) involves the multiplication of
by
, the addition of 2 to all of the diagonal entries of the product, which takes
additions, and the pre-multiplication of the resulting matrix by
. The most costly steps are the two matrix multiplications; each step takes
multiplications and
additions for a pair of general
input matrices
and
. Of course, this cost will be reduced if the input matrix is structured (see Section 5).
Example 3 (Newton’s iteration versus linear convergent schemes.) Let
and
. Then by quadratic convergence of Newton’s iteration, we only need 5 iterative steps of (6) to compute
that satisfies the bound
and hence
. If we consider any linearly convergent scheme such as Jacobi’s or Gauss-Seidels’s iteration ([9] , p. 611) for the same input matrix, the norm of the error matrix of the computed approximation to
decreases by factor 2 in each iteration step. The error norm decreases to below
in
iteration steps, so this will take 19 iteration steps to ensure the bound (7) for the same
.
5. Newton’s Iteration for Structured Matrix Inversion
5.1. Newton-Structured Iteration
In Section 4, we showed that for a general input matrix
, the computational cost of iteration (6) is quite high because matrix products must be computed at every step (see Remark 2). However, for structured matrices
, matrix multiplication has a lower computational cost of
flops for
, provided that we operate with displacement generators of length
and
for the two input matrices
and
res- pectively. So if
and
are structured matrices, say Toeplitz or Toeplitz-like, the computations required for the Newton’s iteration can be carried out more efficiently. In general, for the four classes of structured matrices of Definition 1, we usually associate various linear displacement operators
of Sylvester type
and/or Stein type
. In this case, Newton’s iteration can be efficiently applied and rapidly improves a rough initial approximation to the inverse, but also destroys the structure of the four classes of structured matrices. Therefore, Newton’s iteration has to be modified to preserve the initial displacement struc- ture of the input structure matrix during the iteration and maintain matrix structure throughout the computation. The main idea here is to control the growth of the length of the short displacement generators so that we can operate with matrices of low rank and carry out the computations much faster. This can be done based on one of the following two techniques. The first technique is to periodically chop off the components corresponding to the smallest singular values in the SVDs of the displacement matrices defined by their generators. This tech- nique can be applied to a Toeplitz-like input matrix. For a Cauchy-like input matrix
, there is no need to involve the SVD due to the availability of the inverse formula for
[10] . The second technique is to control the growth of the length of the generators. This can be done by substituting the approximate inverses for the inverse matrix into its displacement. Here we assume that the matrices involved can be expressed via their displacement generators. Now, let us assume that the matrices
and
have short generators
and
respectively. So by operating with short
-generators of the matrices
,
, and
(or
), we will have to modify Newton’s iteration and use one of the above two techniques to control the length of a
-generator of
, which seems to triple at every iterative step of (6). The same technique applies in the case when
and
-generators are used. In this paper, we restrict our presentation to
-generators (see Theorem 1).
5.2. SVD-Based Compression of the Generators
For a fixed operator
and a matrix
, one can choose the orthogonal (SVD-based)
-generator matrices to obtain better numerical stability [11] .
Definition 7 A matrix
is said to be unitary iff
. For real matrices,
is orthogonal iff
. If
is orthogonal, then
form an orthogonal basis for
.
Definition 8 Every
matrix
has its SVD (singular value decomposition):
(8)
(9)
(10)
where
,
are singular values of the matrix W,
and
denote the Hermitian (conjugate) transpose of the orthogonal matrices
and
, respectively. That is,
and
.
Note that if
and
are real matrices, then
and
. Based on the SVD of a matrix W, its orthogonal generator of the minimum length is defined by
(11)
However, one can use an alternative diagonal scaling, in this case (11) can be written as
. (12)
Note that if
for a matrix
, then the pair
is an orthogonal L-generator of minimum length for the matrix
.
Example 4 Consider the following matrix
.
Using Matlab command “
” produces the above results. Clearly,
is the largest singular value and
is the smallest singular value.
,
, and
(Frobenius norm). Furthermore, by (12),
and
. Note that
.
Remark 3 In numerical computation of the SVD with a fixed positive tolerance
to the output errors caused by rounding (
is the output error bound which is also an upper bound on all the output errors of the singular values
and the entries of the computed matrices
and
and for
), one truncates (sets to zero) all the smallest singular values
up to
. This will produce a numerical orthogonal generator
of
-length
for a matrix
, which is also a numerical orthogonal L-generator for
. Here,
is the
-rank of the matrix
, which is equal to the minimum number of the singular values of the displacement
exceeding
,
.
Theorem 5 ([9] , p. 79) Given a matrix
of rank
and a nonnegative integer
such that
, it
holds that
. That is, the error in the optimal choice of an approximate of
whose
rank does not exceed
is equal to the
-th singular value of
.
Theorem 6 If M is a nonsingular matrix, then
.
Proof.
. On the other hand,
. This
implies that
. ![]()
Definition 9 Let
,
,
. We define
(13)
. (14)
6. Our Main Algorithm and Results
Outline of the Techniques
By Theorem 6, we assume that
and
. The ma-
trices
that approximate
for large
have a nearby matrix of
-displacement rank
for large
. This fact motivates the following approach to decrease the computational cost of the iteration in (6): We will replace
in (6) with a nearby matrix
having
-displacement rank of at most
and then restart the iteration with
instead of
. The advantage of this modification is the decrease of the computational cost at the
-th Newton step and at all subsequent steps. However, the disadvantage is a possible deterioration of the approximation, since we do not know
, and the transition from
to
may occur in a wrong direction. But since
and
are supposed to lie close to
, hence they both lie close to each other. This observation enables us to bound the deterioration of the approximation relative to the current approximation error, so that the quadratic improvement in the error bound at the next step of Newton’s iteration will imme- diately compensate us for such a deterioration. Furthermore, the transition from
to a nearby matrix
of a small displacement rank can be done at a low computational cost. In the next main algorithm we describe this approach for Sylvester type operators only. The same results can be extended to Stein operators
using Theorem 1 with slight modifications to the technique and the theory.
Algorithm 1 (Newton’s Iteration Using Sylverster Type Operators)
INPUT: An integer
, two matrices
and
defining the operator
, an
nonsingular structured matrix
having
-rank
and defined by its
-generator
of length at most
, a sufficiently close initial approximation
to the inverse
given with its
-generator of length at most
, an upper bound
on the number of Newton’s iteration steps, and a compression subalgorithm
or
for the transition from a
-generator of length at most
for an
matrix approximating
to
-generator of length at most
for a nearby matrix.
OUTPUT: A
-generator of length at most
for a matrix
approximating
.
COMPUTATION: For
, recursively compute a
-generators of length at most
for the matrices
(15)
and then apply the compression subalgorithm
or
(transition from
to
) to the matrix
to compute
-generators of length at most
for the matrices
.
At the
-th step of (15) of Algorithm 1, a
-generator of length at most
for the matrix
can be computed using
flops, for
. As mentioned earlier, the main idea is to control the growth of the length of the short displacement generators. So the question is how can we control the length of the computed
-generators? That is, we need to compress the length of the generators. Therefore, to complete Algorithm 1, we need to introduce two compression subalgorithms, namely, subalgorithm
and
. First,
we describe the compression subalgorithm
based on the SVD of the displacement
. To do this,
we compute the SVD based orthogonal generator of the displacement
and decrease the length of the
-generator to
by zeroing (truncating) all singular values
for
. Then output the resulting orthogonal
-generator of at most
for a nearby matrix
which is unique and lies near
.
Remark 4 By Theorem 5 we can deduce that
because
and this also implies that
lies near
if the operator
is invertible.
Lemma 2 (1)
, (2)
, where
as in (13).
Proof. 1) Since
(largest singular value), then (1) follows from this fact. 2) Since
then
. Now,
![]()
![]()
![]()
Lemma 3
, where
as in (13).
Proof. Using the estimate from ([9] , p. 442), that is, for any two matrices
and
,
for
and conclude that
where
are defined in (8) - (10) of
Definition 8. For all
we have
and then we obtain
. Now, we use (2) of Lemma 2 and deduce
, for
. If we combine the latest inequality with equation (1) of Lemma 2
for
, then we conclude that
. ![]()
Now let us use Lemma 2 and Lemma 3 to prove the bound in the next theorem.
Theorem 7 If
is a nonsingular linear operator and
is a positive scalar defined in (13), then the
bound
holds for
of Definition 6.
Proof. By Definition 6 replacing L by
, d = 2, and M by Xi − Yi we have
,
which implies that
by linearity of
. Now,
. Then replace
and
to deduce
. Finally, use the inequality of Lemma 3 to
conclude that
. This completes the proof. ![]()
Subalgorithm 1 (
: Compression of Displacement by Truncation of their Smallest Singular Values.)
INPUT: An integer
, two matrices
and
defining the operator
for an
nonsingular
structured matrices
,
, and a
-generator
of length
at most
for a matrix
such that
,
.
OUTPUT: A
-displacement generator of length at most
for a matrix
such that
, where
as in (13) and
of Definition 6.
COMPUTATIONS: Compute the SVD of the displacement
(the computation is not costly since it is performed with
, where
,
and since
is small, see [11] ). Set to zero the diagonal entries
of
, that is, turning
into a diagonal matrix of rank at most
. (
are
smallest singular values of the matrix
.) Compute and output the matrices
,
obtained from
and
respectively, by deleting their last
columns.
Example 5 For
, consider the operator
,
where the
pair
,
is the orthogonal, SVD based generator for the displacement
. Let
. Then Subalgorithm 1 is applied to the matrix
and outputs the
matrix
. The computational cost of performing Subalgorithm 1 is dominated
by the computation of the SVD of the displacement which requires
flops for
.
Next, we will present the second compression technique based on substitution which does not depend on the SVD. By Theorem 4 we conclude that
, and by Theorem 3 we
also have
if
is nonsingular, and
if
is nonsingular. Clearly this expresses the displacement of
via the displacement of
, the operator matrices
and
, and the product of the inverse with
specified vectors, where
for
or
. Now since
, let us substitute
for
on the right hand side and define a short
-
generator for the matrix
:
(16)
Since
, we expect
because
.
Subalgorithm 2 (
: Compression of the displacement by substitution.)
INPUT: An integer
, a pair of
operator matrices
and
defining the operator
, a
-generator of length r for a nonsingular
matrix M where
,
and a
-generator
of length at most
for a matrix
of Newton’s iteration (6).
OUTPUT: A
-generator
of length at most
for a matrix
satisfying the equation
with the bound
(17)
where
,
of Definition 6,
of (14), and for
of (13).
COMPUTATIONS: Compute the matrix products
(18)
and
. (19)
Now the pair
is a
-generator of length at most
for a matrix
satisfying the equation
. Here we assume the operator
, then the matrix
is uniquely defined by its L-
generators
,
and thus (18) and (19) completely define Newton’s process without involving the matrix
in
. This subalgorithm achieves compression because
,
and preserves approximation. Since
, we deduce from Theorem 3 that
. The computation of the
matrices of (16) is reduced to multi-
plication of the matrix
by the
matrix
which requires
flops for
.
Now, we need to prove bound (17).
Theorem 8 For the matrices
,
,
, and for
, we
have
.
Proof. Recall the matrix equations
, and
. Then we can write
, and
. Now, let us write
, then
![]()
Now, take the norm of
:
![]()
Simplifying the latest inequality yield the following bound:
(20)
![]()
Theorem 9 For
and
for
in (13) and
of
Definition 6. Then we have
for
.
Proof. Recall that
, and since the operator
is linear, then we
have
. If we use (20) for
, we get
. Therefore,
. ![]()
7. Numerical Experiments
Algorithm 1 was tested numerically for
Toeplitz input matrices
by applying the compression Su- balgoritm 1. We use Matlab 8.3.0 on a Window 7 machine to run our tests. The Matlab SVD subroutine was used to obtain the singular value decomposition. In general, the complexity of the SVD on an
square matrix is
arithmetic operations. However, it is actually applied to structured matrices with smaller rank, so it is not really expensive. In addition, CLAPACK is one of the faster SVD algorithms that can be applied on real and complex matrices with less time complexity (see [11] ). The tests results are summarized in the follow- ing example.
Example 6
refers to number of Newton’s steps,
is the matrix size,
is number of times we ran the test,
is a selected bound, and
is the condition number of
, where
is the
-th condition number of the matrix
. We restricted our numerical experiments to the following classes of
Toeplitz matrices:
1) Real symmetric tridiagonal Toeplitz matrices:
,
where
,
,
(the first table of Table 3) or
(the second table of Table 3) (see Figure 1).
2) The matrices
(third table represents symmetric positive definite and the fourth table
represents symmetric positive indefinite of Table 3) (see Figure 1).
3) Randomly generated Toeplitz matrices, whose entries are chosen randomly from the interval
and are uniformly distributed with mean 0 and standard deviation of 1 (see Table 4 and Figure 2).
For a symmetric positive definite matrix M, the initial matrix
was used, whereas for unsym-
metric and symmetric indefinite Toeplitz matrices M, Newton’s iteration was initialized with
.
![]()
Table 3. First top two tables represent Tridiagonal Matrices of Class 1 and the last two tables represent matrices of Class 2.
![]()
Table 4. Random Matrices of Class 3.
Acknowledgements
We thank the editor and the referees for their comments.