The Equivalence between Orthogonal Iterations and Alternating Least Squares

This note explores the relations between two different methods. The first one is the Alternating Least Squares (ALS) method for calculating a rank-k approximation of a real m n × matrix, A. This method has important applications in nonnegative matrix factorizations, in matrix completion problems, and in tensor approximations. The second method is called Orthogonal Iterations. Other names of this method are Subspace Iterations, Simultaneous Iterations, and block-Power method. Given a real symmetric matrix, G, this method computes k dominant eigenvectors of G. To see the relation between these methods we assume that T G A A = . It is shown that in this case the two methods generate the same sequence of subspaces, and the same sequence of low-rank approximations. This equivalence provides new insight into the convergence properties of both methods.

More precisely, we seek a matrix that solves the problem ( ) 2 minimize subject to ,    The details of the ALS iteration are discussed in the next section.
The orthogonal iterations method has different aim and different motivation.
be a given symmetric matrix. Then this method is aimed at computing k dominant eigenvectors of G. It is best suited for handling large sparse matrices in which a matrix-vector product needs only ( ) 0 n flops. It is also assumed that k is considerably smaller than n. Other names of this method are "subspace iterations", "simultaneous iterations" and "block-Power method", e.g., [3]  Step 1: Given V  , compute the matrix Step 2: Compute to be a matrix whose columns constitute an orthonormal basis of ( ) 1 Range W +  . In practice 1 V +  is obtained by applying a QR factorization of 1 W +  . Using the Rayleigh-Ritz procedure it is possible to extract from 1 V +  the corresponding estimates of the desired eigenpairs of G. The details are discussed in Section 3.
The aim of this note is to show that ALS is closely related to "orthogonal iterations". To see this relation we assume that In this case the two methods generate the same sequence of subspaces, and the same sequence of lowrank approximations. The proof is given in Section 4.
The equivalence relations that we derive provide important insight into the behavior of both methods. In particular, as explained in Section 3, the rate of convergence of orthogonal iterations is determined by ratios between certain eigenvalues of G. This implies that the rate of convergence of the ALS method obeys a similar rule. Moreover, there are several ways to accelerate orthogonal iterations, and these methods can be adapted to accelerate the ALS method. Conversely, being a minimization method the objective function of the ALS method is monotonic decreasing. This suggests that the orthogonal iterations method has an analogous property.
The relation between ALS and the block-Power method was recently observed by Jain et al. [24] in the context of matrix completion algorithms. A further discussion of this relation is given in Hardt [21] [22]. However, the observations made in these works are using several assumptions on the data matrix. For example, it is assumed that A has missing entries, and that the locations of the missing entries (or the known entries) satisfy certain statistical conditions. It is also assumed that the singular vectors of A satisfy a certain coherence requirement, that A is a low-rank matrix, and in [21] [22] it is assumed to be symmetric. In contrast, our analysis makes no assumption on A. Consequently, the algorithms considered in [21] [22] [24] are quite different from the classic versions that are discussed below, which yield different results. Anyway, an important consequence made in [21] [22] [24] is that the convergence properties of orthogonal iterations can be used to understand the behavior of ALS when applied to matrix completion problems. The equivalence relations that we derive in the next sections help to achieve this goal.

The Alternating Least Squares (ALS) Method
In this section we describe two versions of the ALS iteration. The basic scheme solves the linear systems by using a QR factorization that is followed by a back substitution process, while the modified scheme avoids back substitution. This reduces the computational effort per iteration and helps to see the relation with orthogonal iterations. Advances in Linear Algebra & Matrix Theory The first step of the basic iteration requires the solution of (1.3). Let T , 1, , , denote the ith row of A, and let T , 1, , ⋅ denotes the Euclidean vector norm. The solution of (2.1) is carried out by applying a QR factorization of Y Similar arguments enable us to solve (1.4). Let , 1, , j j n =  c , denote the jth column of A, and let T , 1, , we see that j y solves the linear least squares prob- The computation of j y is carried out by applying a QR factorization of is an upper triangular matrix.
respectively. In practice, a matrix-vector product of the form is computed by solving the system R = y b   via back substitution. In matrix notations the above equalities are summarized as and Below we describe a modified scheme which save the multiplications by (That is, it avoids the back substitution processes.) The modified scheme is based on the following observations. Let The last equality allows us to replace while the last problem has explicit solution, 1 .
Similarly, it is possible to replace while the last problem has explicit solution, The modified ALS iteration is summarized in the following two steps.
Step 2: Given while substituting the last expression for Q  into (2.8) gives  [45]. The modified ALS iteration has recently been considered in Oseledets et al. [38] under the name "simultaneous orthogonal iterations". It is shown there that the modified version is equivalent to ALS and, therefore, has the same rate of convergence. .

Orthogonal Iterations
Step 2: Compute The approximation of the desired eigenpairs is achieved by applying the Rayleigh-Ritz procedure. For this purpose the basic iteration is extended with the following three steps.
Step 3: Compute the related Rayleigh-quotient matrix Step 4: Compute a spectral decomposition of H  : (The diagonal entries of D  are called "Ritz values".) Step 5: If desired compute the related matrix of k "Ritz vectors", It is important to note that Steps 3 -5 are not essential for the computation of 1 Y +  . Hence it is possible to defer these steps. The orthogonal iterations method and its convergence properties are derived in the pioneering works of Bauer [4], Jennings [25] [26], Stewart [44], Rutishauser [40] [41], and Clint and Jennings [11]. In these papers the method is called the simultaneous iteration. See also the discussions in ( [3], pp. 54  The last observations open the gate for accelerating the basic orthogonal iteration. Below we mention a number of ways to achieve this task.
Increasing the subspace dimension. In this approach the number of columns in the matrix Y  is increased to be k q + where 1 q ≥ is a small integer.
(Typical values of q are k or 2k.) The advantage of this modification is that the convergence ratio changes to and it can be much smaller than (3.8). The price paid for this gain is that the storage requirements and the computational efforts per iteration are increased. The next acceleration attempts to avoid this penalty. Power acceleration. In this iteration the updating of Step 2 is changed to where 2 p ≥ is a small integer. The advantage of this modification is that now the convergence ratio is reduced to Thus one iteration of this kind has the same effect as p iterations of the basic scheme. The main saving is, therefore, a smaller number of orthogonalizations.
In practice p is often restricted to stay smaller than 10. The reason lies in the following difficulty. Assume for a moment that 1 2 λ λ  . In this case G has a unique dominant eigenvector. Then, as p increases the columns of the matrix p G Q  tend toward the dominant eigenspace of G, and 1 Y +  becomes highly rank-deficient. Other helpful modifications include polynomial acceleration (which is often based on Chebyshev polynomials), and locking (a type of deflation), e.g., [3] and [39].

Equivalence Relations
In this section we derive equivalence relations between the ALS method and the orthogonal iterations method. To see these relations we make the assumption that These equalities lead to the following conclusion.
Theorem 1. Assume that the initial matrices satisfy Then in exact arithmetic we have In other words, the two methods generate the same sequence of subspaces! Proof. The proof is a direct consequence of (4.4) and (4.5) using induction on  . □ We have seen that the matrix AQ   solves (2.10). Hence the rank-k approximation of A that corresponds to T Q   has the form T .
Similarly, the rank-k approximation of A that corresponds to T Q  has the form The next theorem shows that these approximations are equal.
Theorem 2 (Rank-k approximations). Using the former assumptions and notations we have the equality In other words, the two methods generate the same sequence of rank-k approximations.
Proof. Let r be some vector in n  . Then from (4.8) we see that the projection of r on Range ( Q  ) equals the projection of r on Range ( Q   ). That is, while the last equality implies (4.11).
Corollary 3 (The decreasing property). Recall that the ALS method has the decreasing property (2.14). Now (4.11) implies that this property is also shared by the orthogonal iteration method.
The next lemma helps to convert the decreasing property into an equivalent increasing property.  We have seen in Section 3 that the rate of convergence of the orthogonal iterations method depends on the ratio (3.8). Now the equivalence relations that we have proved suggest that the ALS method behaves in a similar way. To state this result more precisely we need the following notations. Let denote the singular values of A, and let denote the singular values of The fact that the two methods converge at the same speed raises the question of which iteration is more efficient to use. One advantage of the orthogonal iterations method is that it stores and updates only (estimates for) the right singular vectors of A. This halves the storage requirements and the number of orthogonalizations. The orthogonal iterations method achieves one QR factorization per iteration, while ALS requires two QR factorizations per iteration. The computation of the left singular vectors and the related low-rank approximation of A is deferred to the end of the iterative process. A further saving can be gained by applying Power acceleration.  [51], and this approach is considerably faster than orthogonal iterations. Consequently the ALS method is expected to be slower than restarted Krylov methods for low-rank approximations, e.g., [1] [2] [33] [34]. This drawback is, perhaps, the reason that the use of ALS has been moved to problems in which it is difficult to apply a standard SVD algorithm or a restarted Krylov method.

Concluding Remarks
As noted in the introduction, the relations between ALS and the block-Power method were recently observed in the context of matrix completion algorithms.
However, the related matrix completion algorithms differ substantially from the classic versions discussed in this paper. Indeed, the equivalence between ALS and orthogonal iterations is somewhat surprising, as these methods are well known for many years, and the basic ALS iteration, which uses back-substitutions, is quite different from the orthogonal iteration. The modified version avoids back substitutions, which helps to see the similarity between the two methods.
The equivalence relations bring important insight into the behavior of both methods. One consequence is that the convergence properties of ALS are identical to those of orthogonal iterations. This means that the rate of convergence of the ALS method is determined by the ratios in (4.24), which appears to be a new result. Similarly, the descent property of ALS implies a trace increasing property of the orthogonal iteration method.
The orthogonal iterations method needs less storage requirements, and less QR factorizations per iteration. In addition, it has a number of useful accelerations. These advantages suggest that replacing ALS with orthogonal iterations might be helpful in some applications. On the other hand, the ALS method can be modified to handle problems that other methods cannot handle, such as non-negative matrix factorizations (NMF), matrix completion problems, and tensor decompositions. The ALS iteration that is implemented in these problems is often quite different from the basic iteration (2.7) -(2.8). Yet in some cases it has a similar asymptotic behavior. This happens, for example, in NMF problems when (nearly) all the entries in the converging factors are positive. Another example is the proximal-ALS algorithm in matrix completion, see ( [14], p. 134). In such cases the new results provide important insight into the asymptotic behaviour of the algorithm.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.