Factor analysis (FA) is a time-honored multivariate analysis procedure for exploring the factors underlying observed variables. In this paper, we propose a new algorithm for the generalized least squares (GLS) estimation in FA. In the algorithm, a majorization step and diagonal steps are alternately iterated until convergence is reached, where Kiers and ten Berge’s (1992) majorization technique is used for the former step, and the latter ones are formulated as minimizing simple quadratic functions of diagonal matrices. This procedure is named a majorizing-diagonal (MD) algorithm. In contrast to the existing gradient approaches, differential calculus is not used and only elmentary matrix computations are required in the MD algorithm. A simuation study shows that the proposed MD algorithm recovers parameters better than the existing algorithms.

Exploratory Factor Analysis Generalized Least Squares Estimation Matrix Computations Majorization
1. Introduction

Using for a observation vector w hose expect ation equal s the zero vector, the f act or analysis (FA) model is ex press ed as

with a p- vari able s × m- f act or s load ing matrix , an latent factor score vector, a error vector, and. The expectations for and are assumed to satisfy

Here, is the matrix of zeros, is the identity matrix, and is the diagonal matrix whose diagonal elements are called unique variance s. The FA model (1) with the assumptions in (2) imply that the covariance matrix is modeled as

  . A main purpose of FA is to estimate the parameter matrices and from the inter-variable sample covariance matrix corresponding to (3). Some authors classify FA as exploratory (EFA) or con firm atory (CFA)  , where is unconstrained in EFA, while some elements of are constrained in CFA. In this paper, we refer to EFA simply as FA.

Three major appro ach es for the para mete r estimation are l east square s (LS), gene ra l ize d l east square s (GLS), and maxim um like li hood (ML) procedure s  . They differ in the definition of the loss f unction to be minimized over . The f unction s for the LS and GLS estimation procedure s are de fine d as and

respectively, while is de fine d as the negative of the log- like li hood derive d under the norm al ity assumption for and in the ML estimation   .

In all estimation procedure s, iterative algorithms are need ed for minimizing loss f unction . They can be rough ly c lass ified into gradient and in e qual ity - base d algorithms. Here , the gradient ones refer to the algorithms u sing Newton and r e late d methods  , in which the part ial diffe r ent iation of with respect to is used for updating it. On the other hand , the term “ in e qual ity - base d algorithms” is not a popular one. We use the term for the algorithms, in which diffe r ent iation is not used and the in e qual ity underlies that which gua rant ee s the weak ly mono tone de cr ease in the loss f unction value with up dating to. Similar dichotomization of minimization methodology is also found in  .

For all of the LS, GLS, and ML estimation , gradient algorithms have been dev e lop ed: t hose with the Fletcher- Powell and Newton-Raphson methods have been pro p ose d for the ML estimation   , while the algorithms u sing the Newton-Raphson and Gauss-Newton method s have been dev e lop ed for GLS   with the gradient algorithms for GLS also used for LS. On the other hand , in e qual ity - base d algorithms have been dev e lop ed for the LS and ML estimation excluding GLS. Such an algorithm for LS is MINRES  in which is part ition ed into the subsets of para mete r s with and the minimization of over each subset for is ite rate d. The in e qual ity - base d one for the ML estimation is the EM algorithm for FA  in which decreases mono toni c ally with the alternate iteration of so- call ed E- and M-steps  . A feat ure of MINRES and the EM algorithm is that only simple matrix computations such as the in version of ma t rice s are require d and t heir compute r - program s are easily form ed. In contrast , the gradient algorithms require more complicate d computation s such as obtain ing or numeri c all y approxi mating the second derivatives of.

As found in the above discussion, an in e qual ity - base d algorithm has not been dev e lop ed for the GLS estimation in which (4) is minimized over. To propose it is the pur pose of this paper . The algorithm to be proposed is also computation ally simple as in the existing inequality-based ones: only elementary matrix computation s are require d such as the in version and sing ular value de com position (SVD) of ma t rice s. A feat ure of the pro p ose d algorithm to be ad dress ed is using major ization in one of step s. The major ization gene r a l ly refer s to a c lass of the techniques in which a major izing f unction is utilized for minimizing a f unction over . Here , satisfies, with being the minimizer of the major izing f unction for its latter argument matrix kept fixed  . It show s that de cr ease s with the up date of into. As described in the next sect ion , the step with a major ization technique and the steps for minimizing the functions of diagonal ma t rice s form the algorithm to be p resent ed. It is thus call ed majorizing-diagonal (MD) algorithm in this paper .

The MD algorithm is not the first one with major ization in FA. In deed , the above EM algorithm  can be regard ed as a major ization procedure with its major izing f unction being the full log like li hood derive d by suppo sing that la te nt f act or s core s in were ob serve d.  has also pro p ose d an FA algorithm with a major ization technique . How ever , in that algorithm, the estimation of a new type   is con side r ed, which are diffe r ent from the LS, GLS, and ML estimation treat ed as the major procedure s in this paper :  is beyond the s cope of this paper .

The remaining parts of this paper are organized as follows: the MD algorithm is detailed in the next section, and it is illustrated with a real data set in Section 3. A simulation study for assessing the algorithm is reported in Section 4, which is followed by discussions.

2. Proposed Algorithm

We propose the MD algorithm for minimizing the GLS loss function (4) over the loadings in and the unique variances in the diagonal matrix . Here, it is supposed that the sample covariance matrix is positive-definite and is of full-column rank, i.e., its rank is with. This supposition and the covariance matrix being modeled as (3) imply that, without loss of generality, we can reparameterize as

where is a matrix satisfying

and is an positive-definite diagonal matrix. By substituting (5) into the GLS loss function (4), it is rewritten as

This function is minimized over, , and subject to (6) and the latter two matrices being diagonal ones, by alternately iterating the majorizing and diagonal steps described in the next subsections.

2.1. Majorization Step

Let us consider minimizing (7) over subject to (6) while and are kept fixed. Summarizing the parts irrelevant to in (7) into, the loss function (7) is rewritten as

Though the optimal that minimizes (8) under (6) is not give n explicit ly, the solution can be obtain ed u sing Kiers and ten Berge’s  major ization technique , w hose earlier version is also found in  . This technique pur pose s to minimize a f unction ex press ed as the form . Compa ring this with (8), we can find (8) to be a special case of the above with being the zero matrix , , , , , and. T here fore , the up date form ula in  (pp. 374-375) can be straight for ward ly used for (8).

According to the formula, the update of by

decreases the value of (8) with. Here, stands for the matrix before the update; and are the column-orthonormal matrices that are obtained from the SVD defined as

with the diagonal matrix including the singular values of the matrix in the left-hand side, and, , and the largest eigenvalues of, , and, respectively.

2.2. Diagonal Steps

In this section, we describe updating each of diagonal matrices and. First, let us consider minimizing the loss function (7) over with keeping and fixed. Since the terms relevant to in the loss function (7) are the same as those relevant to, the expression (8) into which (7) is rewritten is to be noted again. By taking account of the fact that is a diagonal matrix, (8) can be rewritten as

Here, and with denoting the diagonal matrix whose diagonal elements are those of the parenthesized matrix. Further, we can rewrite (11) as with denoting the Frobenius norm. It shows that the function

(11) is minimized for

for fixed and.

Next, we consider minimizing (7) over with and fixed. Summarizing the parts irrelevant to in (7) into and using the fact of being a diagonal matrix, the loss function (7) can be rewritten as

with and. We can find that (13) is minimized for

for fixed and.

2.3. Whole Algorithm

The results in the last two subsections show that the proposed MD algorithm can be listed as follows:

Step 1. Initialize, , and.

Step 2. Update with (9) times.

Step 3. Update with (12).

Step 4. Update with (14).

Step 5. Finish with set to (5) if convergence is reached; otherwise, return to Step 2.

It should be noted in Step 2 that the update of by (9) does not minimize (7) but only decreases its value, which implies that that update can be replicated (times) for further decreasing the value of (7). In this paper, we set.

In Step 1, the initialization is performed using the principal component analysis of sample covariance matrix. That is, the initial and are given by and, respectively, with the diagonal matrix whose diagonal elements are the largest eigenvalues of, and the columns of being the eigenvectors corresponding to. The initial is set to.

In Step 5, we define the convergence as the decrease in the value of (7) or (4) from the previous round being less than.

3. Illustration

In this section, we illustrate the performance of the MD algorithm with a 190- person × 25- item data matrix, which was collect ed by the author and public ly avail able at http://bm.osaka-u.ac.jp/data/big5/. Th is data set con- tains the self - rating s of the person s (university students) for to what ex tent s they are character ize d by the personalities des crib e d by the 25 items . Ac cord ing to a theory in person a l ity psych ology  , the item s can be c lass ified into the five group s show n in the first column of Table 1. The 25 × 25 matrix of the correlation coefficients among those items was obtain ed from the data set.

We carried out the MD algorithm for the cor relation matrix with the number of factors m set to five. Figure 1 shows the change in the value of loss function (4) until the steps in Section 2.3 were iterated ten times and the change after the tenth iteration. There, we can find that the function value decreased monotonically with iteration, which was finally reached to convergence at the 542 nd iteration.

NeuroticWorry0.72−0.14−0.120.220.130.32
Sensitive0.74−0.06−0.040.01−0.080.38
Pessimistic0.67−0.29−0.13−0.030.270.35
Unrest0.600.06−0.15−0.08−0.340.33
Careful0.62−0.09−0.160.100.320.32
ExtrovertSociable−0.160.810.090.090.120.23
Talkative0.040.82−0.05−0.02−0.090.25
Voluntary−0.100.730.170.100.130.29
Cheerful−0.210.800.080.160.000.23
Showy0.080.660.23−0.04−0.050.38
OpenCreative−0.090.030.85−0.04−0.070.23
Progressive−0.190.240.640.080.100.40
Flexible−0.280.310.370.200.030.57
Imaginative0.190.090.470.11−0.340.44
Agree ableMild−0.15−0.120.150.630.020.38
Tenderhearted0.090.200.130.62−0.010.39
Altruistic0.130.010.000.710.150.39
Cooperative0.000.27−0.140.660.080.25
Sympathetic0.110.12−0.040.730.210.32
ConsciousDeliberate0.14−0.040.090.220.600.37
Reliable−0.130.320.080.260.570.36
Diligent−0.040.05−0.070.150.770.29
Systematic0.09−0.02−0.090.010.710.40
Methodical0.210.02−0.190.040.740.31
Change in the GLS loss function value as a function of the number of iteration.

As the result ing load ing matrix has rotational freedom, that is, the post-multiplied by arbitrary orthonormal matrix satisfies (1) and (2), the loading matrix was rotated by the varimax method  . The solution is presented in Table 1. There, bold font is used for the loadings whose absolute values are greater than 0.35. They show that the 25 items are clearly classified into the five groups as predicted by the theory in personality psychology  , which demonstrates that the MD algorithm provided the reasonable solution.

4. Simulation Study

A simulation study was performed in order to assess how well parameter matrices are recovered by the proposed MD algorithm and compare it with the existing algorithms for the GLS estimation in the goodness of the recovery. We first describe the procedure for synthesizing the data to be analyzed, which is followed by results.

An n-observations × p-variables data matrix was synthesized according to the matrix versions of the FA model (1) and the assumptions in (2):

Here , denotes the vector of ones, is an b loc k matrix w hose right b loc k is post -multiplied by to give the error matrix , and is an b loc k matrix including the load ing matrix and the square root s of unique variance s. It should be notice d that each row of, , and cor res pond s to, , and, respect ively, w hose transposed vectors ap pear in (1), and the five equation s in (2) can be summarize d into the two matrix ex press ion s in (16). The data syn thesis procedure follow s the next step s:

Step 1. Draw from, from, and from, with denoting the discrete uniform distribution defined for the integers within the range.

Step 2. Draw each loading in from and each unique variance in from with denoting the uniform distribution over the range.

Step 3. Draw each elements of in (15) from which is followed by centering and post- multiplying it by the matrix that allows the resulting to satisfy (16).

Step 4. Form with (15) and obtain the covariance matrix.

In Step 3 we have used a uniform distribution for, rather than the normal distribution typically used for such a matrix, as a feature of the GLS estimation is that it does not need the normality assumption required in the ML estimation. We replicated the above steps to have 2000 sets of. For them, the MD and the existing algorithms were carried out, where the latter are the two gradient algorithms   , as described in Section 1. We refer to the ones in  and  as the Newton-Raphson (NR) and Gauss-Newton (GN) algorithms, respectively. In the NR one, we obtained the gradient vector in  , Equation (32), by pre-multiplying the vector of first derivatives by the Moore Penrose inverse of the corresponding Hessian matrix. Also in the NR and GN algorithms, we used the same initialization and definition of convergence as in Section 2.3.

Let us express the true simply as and use for the solution given by the NR, GN, or MD algorithm. For assessing the recovery of the loading matrix, the averaged absolute difference (AAD) of the elements in to the corresponding estimates, i.e.,

can be used with denoting the norm. Here, it should be noted that has rotational freedom and must be rotated so that the resulting is optimally matched to. Such a rotated can be obtained by the orthogonal Procrustes method  with a target matrix. The loading matrix in (17) thus stands for the one rotated by the Procrustes method. The recovery of unique variances can also be assessed with the AAD index, where the unique variances are uniquely determined, thus the additional procedure as for is unnecessary. Smaller values of those AAD indices stand for better recovery.

The statistics of AAD values over 2000 data sets are p resent ed in Table 2. T here , the ave r age s show that the recovery by the MD algorithm is the best and that for the NR one is the worst. It should be note d that the 50 and 75 per cen t iles for the NR algorithm are zero , while the maxim um and 99 per cen t ile are very large . That is, the recovery by the NR algorithm was perfect for more than 75 percent of the 2000 data sets, but for a few percent of them, recovery was considerably bad, which increased the averages for the NR one. In contrast , the maxim um AAD of load ing s and unique variance s for the MD algorithm are 0.0041 and 0.0013, respect ively, which are

Statistics for the differences between the true parameter values and their estimated counterparts
AlgorithmNRGNMDNRGNMD
Average0.00260.00200.00050.00300.00070.0000
50 percentile0.00000.00130.00050.00000.00010.0000
75 percentile0.00000.00310.00070.00000.00050.0000
95 percentile0.00500.00610.00110.01030.00350.0000
99 percentile0.08210.00950.00140.08570.01050.0001
Maximum0.21990.02030.00410.14060.03230.0013

s mall enough to be ignore d. That is, the proposed MD algorithm well recovered the true parameter values for all of the 2000 data sets. We can thus conclude that the MD algorithm is superior to the existing ones in the goodness of recovery.

5. Discussion

We proposed the majorizing-diagonal (MD) algorithm for the GLS estimation in FA. In the algorithm, the loading matrix is reparameterized as the product of a column-orthonormal matrix and a diagonal one, and the former one is updated with Kiers and ten Berge’s  major ization technique, while the latter diagonal matrix and another diagonal one including unique variances are updated so that their quadratic functions are minimized. The iteration of those updates decreases monotonically the GLS loss function. The simulation study demonstrated the exact recovery of loadings and unique variances by the MD algorithm and its superiority to the existing gradient algorithms in the recovery.

One of the tasks remaining for the MD algorithm is to study its mathematical properties as have been done for the algorithms in the other estimation procedures. For example, it has been found that the EM algorithm for the ML estimation  can never give an improper solution under a certain condition  , where the improper solution refers to the one including a negative unique variance. Whether such special features are possessed by the MD algorithm is considered to be found by studying the properties of the matrix update formulas in the algorithm.

ReferencesHarman, H.H. (1976) Modern Factor Analysis. 3rd Edition, The University of Chicago Press, Chicago.Mulaik, S.A. (2010) Foundations of Factor Analysis. 2nd Edition, CRC Press, Boca Raton.Yanai, H. and Ichikawa, M. (2007) Factor Analysis. In: Rao, C.R. and Sinharay, S., Eds., Handbook of Statistics, Vol. 26: Psychometrics, Elsevier, Amsterdam, 257-296.Anderson, T.W. and Rubin, H. (1956) Statistical Inference in Factor Analysis. In: Neyman, J., Ed., Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 5, University of California Press, Berkeley, 111-150.Lange, K. (2010) Numerical Analysis for Statisticians. 2nd Edition, Springer, New York.ten Berge, J.M.F. (1993) Least Squares Optimization in Multivariate Analysis. DSWO Press, Leiden.J&ouml;reskog, K.G. (1967) Some Contributions to Maximum Likelihood Factor Analysis. Psychometrika, 32, 443-482. http://dx.doi.org/10.1007/BF02289658Jennrich, R.I. and Robinson, S.M. (1969) A Newton-Raphson Algorithm for Maximum Likelihood Factor Analysis. Psychometrika, 34, 111-123. http://dx.doi.org/10.1007/BF02290176J&ouml;reskog, K.G. and Goldberger, A.S. (1972) Factor Analysis by Generalized Least Squares. Psychometrika, 37, 243-250. http://dx.doi.org/10.1007/BF02306782.Lee, S.Y. (1978) The Gauss-Newton Algorithm for the Weighted Least Squares Factor Analysis. Journal of the Royal Statistical Society: Series D (The Statistician), 27, 103-114. http://dx.doi.org/10.2307/2987906Harman, H.H. and Jones, W.H. (1966) Factor Analysis by Minimizing Residuals (Minres). Psychomerika, 31, 351-369. http://dx.doi.org/10.1007/BF02289468Rubin, D.B. and Thayer, D.T. (1982) EM Algorithms for ML Factor Analysis. Psychometrika, 47, 69-76. http://dx.doi.org/10.1007/BF02293851Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38.Groenen, P.J.F. (1993) The Majorization Approach to Multidimensional Scaling: Some Problems and Extensions. DSWO Press, Leiden.Unkel, S. and Trendafilov, N.T. (2010) A Majorization Algorithm for Simultaneous Parameter Estimation in Robust Exploratory Factor Analysis. Computational Statistics and Data Analysis, 54, 3348-3358. http://dx.doi.org/10.1016/j.csda.2010.02.003Unkel, S. and Trendafilov, N.T. (2010) Simultaneous Parameter Estimation in Exploratory Factor Analysis: An Expository Review. International Statistical Review, 78, 363-382. http://dx.doi.org/10.1111/j.1751-5823.2010.00120.xAdachi, K. (2012) Some Contributions to Data-Fitting Factor Analysis with Empirical Comparisons to Covariance-Fitting Factor Analysis. Journal of the Japanese Society of Computational Statistics, 25, 25-38. http://dx.doi.org/10.5183/jjscs.1106001_197Kiers, H.A.L. and ten Berge, J.M.F. (1992) Minimization of a Class of Matrix Trace Functions by Means of Refined Majorization. Psychometrika, 57, 371-382. http://dx.doi.org/10.1007/BF02295425Kiers, H.A.L. (1990) Majorization as a Tool for Optimizing a Class of Matrix Functions. Psychometrika, 55, 417-428. http://dx.doi.org/10.1007/BF02294758Costa, P.T. and McCrae, R.R. (1992) NEO PI-R Professional Manual: Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO-FFI). Psychological Assessment Resources, Odessa, FL.Kaiser, H.F. (1958) The Varimax Criterion for Analytic Rotation in Factor Analysis. Psychometrika, 23, 187-200. http://dx.doi.org/10.1007/BF02289233Gower, J.C. and Dijksterhuis, G.B. (2004) Procrustes Problems. Oxford University Press, Oxford. http://dx.doi.org/10.1093/acprof:oso/9780198510581.001.0001Adachi, K. (2013) Factor Analysis with EM Algorithm Never Gives Improper Solutions When Sample Covariance and Initial Parameter Matrices Are Proper. Psychometrika, 78, 380-394. http://dx.doi.org/10.1007/s11336-012-9299-8