Computational Identification of Confirmatory Factor Analysis Model with Simplimax Procedures

Confirmatory factor analysis (CFA) refers to the FA procedure with some loadings constrained to be zeros. A difficulty in CFA is that the constraint must be specified by users in a subjective manner. For dealing with this difficulty, we propose a computational method, in which the best CFA solution is obtained optimally without relying on users’ judgements. The method consists of the procedures at lower (L) and higher (H) levels: at the L level, for a fixed number of zero loadings, it is determined both which loadings are to be zeros and what values are to be given to the remaining nonzero parameters; at the H level, the procedure at the L level is performed over the different numbers of zero loadings, to provide the best solution. In the L level procedure, Kiers’ (1994) simplimax rotation fulfills a key role: the CFA solution under the constraint computationally specified by that rotation is used for initializing the parameters of a new FA procedure called simplimax FA. The task at the H level can be easily performed using information criteria. The usefulness of the proposed method is demonstrated numerically.


Introduction
In factor analysis (FA), the variation of p observed variables is assumed to be explained by m common factors and p unique factors, with m < p and the two types of factors mutually uncorrelated.The m common factors serve to explain the variations of all p variables.On the other hand, each of the p unique factors has a one-to-one correspondence to each variable: a unique factor explains specifically the variation of the corresponding variable that remains unaccounted for by the common factors [1] [2] [3].
The parameters to be estimated in FA are a factor loading matrix Λ = (λ ij ) (p × m), a unique variance matrix Ψ = (ψ ii′ ) (p × p), and a factor correlation matrix Φ = (φ jk ) (m × m).Here, the loadings in Λ stand for how the variables load on the common factors, Ψ is the diagonal matrix whose diagonal element ψ ii expresses the variance of the ith unique factor, and Φ contains the correlation coefficients among the m common factors.Upon making certain distributional assumptions for the factors, the covariance matrix Σ among p observed variables is modeled as (e.g., Adachi, 2019 [1]; Mulaik, 2010 [2]).Thus, a loss function ( ) , , | f S Λ Ψ Φ can be defined, which stands for the discrepancy between (1) and its sample counterpart S (p × p): FA can be formulated as minimizing ( ) , , | f S Λ Ψ Φ over Λ, Ψ, and Φ, for a given S.
FA can be classified into two types; confirmatory (CFA) and exploratory (EFA) [2].In CFA, some loadings in Λ are constrained to take zero values, while no constraints are imposed on the loadings of EFA.In this paper, we focus on CFA.An example of the CFA model with a particular constraint is illustrated in Figure 1.There, a constrained loading matrix Λ is shown on the left, and the corresponding CFA model is depicted on the right as a diagram, in which only pairs of variables and factors with unconstrained loadings are linked by paths.In Figure 1, the binary matrix B is also presented which specifies the constraints in Here, • denotes the element-wise Hadamard product with ( ) . It should be kept in mind that specifying a CFA model amounts to selecting a particular link matrix B. Thus, CFA can be formally expressed as ( ) A problem in CFA (3) is that the link matrix B = (b ij ) defined as (2) must be selected by users.That is, it must be decided in a subjective manner, which elements in B are to be zeros/ones, in other words, which pairs of variables and factors are linked as in Figure 1 whether the CFA model with a particular constraint is accepted or not is checked afterwards by referring to the goodnessof-fit of the solution [4] [5] [6].However, even if the model is found acceptable, better CFA models with other link matrices B may exist.It implies that all possible B in B  must be considered for finding the best B. However, this is unfeas- ible, unless pm is very small, as the number of all possible B is enormous.That number can be calculated as 2 pm , since each of the pm elements in B takes zero or one as in (2): for example, for p = 12 and m = 3 one finds 2 pm ≅ 6.87 × 10 10 .In short, the problems in CFA can be summarized next: [P1] The link matrix B (specifying a CFA model) must be selected subjectively by users.
[P2] An enormous number of possible matrices B in B  must be considered for finding the best B.
To the best of our knowledge, the problems [P1] and [P2] in CFA have not been considered in the existing papers.In order to deal with those problems, we propose an FA procedure for computationally and optimally identifying a suitable CFA model.Here, the model identification includes estimating the model parameter values.The outline of our proposed procedure is described in the next section.Then, we detail the procedure in Section 3, report its assessment in a simulation study in Section 4, give numerical examples in Section 5, and conclude this paper in Section 6.

Outline of the Proposed Procedure
First, we outline our approach to the CFA model identification under the condition that the number of zero loadings is fixed to a particular integer in Section 2.1.Then, in Section 2.2, the approach is extended to the cases with the number of zero loadings not being fixed.We then summarize the prospects for the following sections.

Model Identification for a Specified Number of Nonzero Loadings (Cardinality)
For dealing with the difficulties [P1] and [P2] in CFA, we can consider the two procedures introduced in the next paragraphs, on the condition that ( ) ( ) , that is, the number of nonzero values in Λ and B the matrix between parentheses equals a specified integer c hence ( ) , or equivalently, ( ) Clearly, pm − c equals the number of zeros in B or Λ.
The first procedure considered can be formulated as ( ) This minimization is rewritten as performing CFA under the constraint indicated by link matrix B, with B estimated by another method in advance.Thus, the problem [P1] is overcome.Further, we can also consider that [P2] is dealt with, supposed that the value c in (4) and B are suitable.
The above procedure consists of two stages: first B is estimated, then CFA is performed, see (5).In contrast, the second procedure considered is formulated with a single stage as follows: ( ) Here, B has been added to the subscripts of "min": the link matrix B, which indicates the pairs of variables and factors to be linked, is estimated jointly with the other parameters Λ, Ψ, and Φ. Between ( 5) and ( 6) or (6′), we can find the following difference: in minimizing loss function ( ) , the B value is kept fixed in (5), but allowed to change in (6).The difference implies that the resulting loss function value in (6) cannot exceed that value in (5): This inequality shows that (6) can provide a better solution than (5).We will propose an iterative algorithm for (6), which will be started by the optimum for (5).As empirically shown later, in almost all cases, the solutions of ( 5) and ( 6) are equivalent: we can obtain the final solutions only by (5), without performing the iterative algorithm for (6).However, it is worth to perform the latter step, as (6) can provide a better solution in a few cases.
In this paper, we use the maximum likelihood (ML) method for estimating the parameters in (5) and (6).This implies that the loss function to be minimized is given as the negative of the log likelihood.It is explicitly expressed as following from the normality assumptions for factors [7].For minimizing (8), we use Rubin & Thayer's [8] EM algorithm for FA, whose properties are discussed in Adachi [9].This algorithm is detailed in Appendix A1.

Selection of the Best Cardinality
We should notice that the above approach is conditional upon c in (4).Thus, it remains to select the best value for c.This can be attained by the following procedure: Select the value c with the lowest IC(c) among Here, c min /c max expresses the reasonable minimum/maximum of c, and IC(c) denotes the value of an information criterion statistic [10], with the statistic being a function of c.The information criteria consist of some statistics which can be defined in ML-based procedures, and they show smaller values for better solutions.Explicit expressions of IC(c) are introduced later.We can rewrite (9) as follows: the procedure of ( 6) following (5) This is because, solutions with c smaller than p would surely have one or more zero rows of loadings.On the other hand, it is considered in the c max value that m(m − 1)/2 elements in Λ can be set to zeros without a change in the values of loss function (8).Now, let us discuss how our proposed procedure is related to the set B  containing all possible B, using for the subset of B  that contains the matrices B with c nonzeros.The number of B in ( ) i.e., the number of the combinations of the c elements being ones among all pm elements in B. Thus, the number of all B contained in B  is given by ( ) with ( 10) and (11).For example, when p = 12 and m = 3, the value in ( 12) is approximately 1.25 × 10 9 , which is enormous (though less than 2 pm ≅ 6.87 × 10 10 presented in the last section where (10) was not yet considered).However, it is not required to assess all N B matrices B in B  in our proposed procedure, as the performance of ( 6) following ( 5) allows us to find the optimal B in The remaining parts of this paper are organized as follows: in Sect 3.1 and Sect 3.2, we detail ( 5) and ( 6) in turn.There, it is described that Kiers' [11] simplimax rotation is used for the optimal estimation of B subject to (4) in ( 5) and the idea of this rotation also fulfills a key role in (6).Thus, the procedures for ( 5) and ( 6) are particularly called simplimax-based CFA (SbCFA) and simplimax FA (SimpFA), respectively.In the section after treating SbCFA (5) and SimpFA (6), we detail the whole process of the proposed procedure, i.e., SimpFA (6) following SbCFA (5) with the cardinality selection by (9).The proposed procedure is studied in a simulation study and illustrated with real data examples, as reported in the sections before concluding this paper.

Simplimax-Based Confirmatory Factor Analysis
In this section, we detail the procedure formulated as ( 5) with ( ) (8).A key point is that we will use exploratory FA (EFA) followed by Kiers' [11] simplimax rotation for estimating B optimally subject to (4) in (5).That is, in the procedure, firstly EFA is performed for a data set, secondly simplimax is applied to the EFA solution for estimating the link matrix B, and, finally, the resulting B is used for minimizing (8) subject to = • B Λ Λ.These three stages are described respectively in the following subsections.The last one can be regarded as confirmatory FA (CFA) on the basis of the B given by simplimax.We thus call the procedure in this section simplimax-based CFA (SbCFA).

Exploratory Factor Analysis
Let T denote any m × m nonsingular matrix satisfying ( ) where diag(TT′) is the diagonal matrix whose diagonal elements are those of TT′.EFA has rotational indeterminacy as shown next: with Φ = TT′ and Λ T = ΛT −1 .Here, the latter matrix can also be regarded as the loading matrix.Thus, even if Φ = TT′ is fixed to I m by choosing T that meets TT′ = I m , the (8) value remains unchanged when Λ is replaced by ΛT −1 .By taking account of this property, ( ) (8) with Φ = I m , is minimized over Λ and Ψ in EFA.This is attained with the EM algorithm described in Appendix A1.We use Λ E for Λ resulting from EFA.
The indeterminacy (14) implies that Λ T = Λ E T −1 and Φ = TT′ can also be the EFA solutions of factor loading and correlation matrices, respectively, with . This fact is often exploited by obtaining T that allows Λ T = Λ E T to be interpretable.The procedures for obtaining T are generally referred to as factor rotation.

Simplimax Rotation
This subsection concerns the factor rotation intermediating between the first EFA and the final CFA stages.Here, the loading matrix is unconstrained in EFA, but constrained to meet = • B Λ Λ in CFA as in (3).The latter constrainedness leads to where ( ) stands for the loss function in (3) in which the CFA constraint = • B Λ Λ substituted.Inequality (15) implies that the value of the gives the lower limit of ( ) to be minimized in CFA (3).This suggests that the best attainable CFA solution would be one which provides the ( ) value.Now if T can be chosen in the factor rotation such that Λ T = Λ E T −1 is similar to a matrix that can be written as • B Λ for particular matrices B and Λ, then we can expect that such B and Λ will be good candidates for giving a CFA solution with a fit close to that of EFA.This can be attained by the simplimax rotation [11], which is formulated as minimizing the simplimax function ( ) over B, Λ, and T subject to (4) and (13).Thus, this rotation serves as a suitable bridge between the first and final stages.
For the constrained minimization of ( 16), two steps are iterated alternately.In one of them, the simplimax function ( 16) is minimized over T under (13) for a given • B Λ , using Browne's [12] algorithm.In another step, ( 16) is minimized over B = (b ij ) and Λunder (8) for a given Λ E T −1 .As this problem is also related to the procedure in the next section, we express the minimization in a generalized form as for a given p×m matrix ( ) with H = Λ E T −1 in the present context.As explained in Appendix A3, ( 17) can be attained for with [ ] h denoting the cth largest value of all elements in ( ) . The steps in the simplimax rotation algorithm is listed in Appendix A2.

Confirmatory Factor Analysis Following Simplimax Rotation
The final stage is simply to perform CFA using B obtained by the simplimax rotation.Thus, SbCFA is formulated by making (5) concrete as ( ) ( ) defined as (8).The minimization in (5) or (5′) is attained with the EM algorithm, which is described in Appendix A1.

Simplimax Factor Analysis
We will now describe simplimax FA (SimpFA).This is the procedure minimizing (6), using the solution from SbCFA as a starting configuration.Specifically, we minimize the negative of the log likelihood (see ( 8)), with • B Λ is substituted into Λ, over B, Λ, Ψ, and Φ subject to card(å) = c (see (4)).We call this procedure simplimax FA (SimpFA), as this is a new FA procedure, and the minimization of the simplimax function ( 17) fulfills a key role as will be seen in the next paragraph.
The EM algorithm described in Appendix A1 can also be used for SimpFA.The algorithm for SimpFA differs from that for the other FA procedures, in that the former includes the step for minimizing (8) over both B and Λ with Ψ and Φ fixed.An innovative feature in the algorithm is to use Kiers' [13] [14] majorization method for dealing with difficulties in treating (8) as a function of B and Λ.In that method, an auxilary function that majorizes (8) is minimized.For this minimization, ( ) i.e., the simplimax function in (17) with H = W, fulfills a key role.Here, 1 and Λ C is the cur- rent Λ value (before update), see Appendix A4 for a derivation.

Multiple Runs of SimpFA Following SbCFA
Here, we describe the procedures in SimpFA following SbCFA, in which SbCFA provides the initial values of the parameters in SimpFA providing the final solution.We take a multiple run approach for SimpFA following SbCFA, in order to reduce the possibility of selecting a local minimizer as the optimal solution: the algorithm of SbCFA followed by SimpFA is run multiple times by starting SbCFA with mutually different initial values, and the best solution is selected among the multiple SimpFA solutions.The procedure is listed as follows: Stage 1.For S, perform EFA to provide Λ E and set l = 1.Here, the initial value T l for T in Stage 2.1 is chosen as follows: T l is obtained by the varimax rotation for Λ E if l = 1; otherwise, T l is set to ( ) T with the elements in T 0 chosen randomly: the resulting T l can be substituted into T.In Figure 2, the above stages are graphically illustrated, so that they can be captured visually.

Selection of Cardinality by Information Criteria
The final solution { }  As IC(c) in (20), we consider using either of Akaike's [15] information criterion (AIC) or Schwarz's [16] Bayesian information criterion (BIC), which are particularly popular in the information criteria.AIC and BIC can be calculated as respectively, for a particular value of c.Here, n is the number of observations, and κ(c) = c + α with α the number of unique variances plus the number of inter-factor correlations.Thus, (21) or ( 22) is substituted into IC(c) in (20).Whether we should use (21) or ( 22) is assessed in the next section.

Simulation Study
We performed a simulation study to assess the proposed procedure in the last section, with respect to [1] how often the c value in (8) is selected correctly by AIC and BIC; [2] how similar the SbCFA and SimpFA solutions are; [3] how well parameter values are recovered.The procedures for synthesizing and analyzing data in the study are described in the first subsection, then the results for [1] [2] and [3] are reported in the following ones.

Data Synthesis and Analysis
With p = 12, and m =3, we set the true {Λ, Ψ, Φ} as in Table 1 and sampled n = 300 rows of an n × p data matrix X from the p-variate normal distribution whose average is 0 p and covariance matrix is defined as ′ = + Σ ΛΦΛ Ψ , see (1).The resulting X provided the sample covariance matrix S = n −1 X′X.We replicated this procedure 200 times to have 200 matrices S.
For each S, we carried out the procedure of SimpFA following SbCFA with the selection of the best c by (20), using both AIC and BIC.hand, c * expresses Card(Λ) selected by the proposed procedure (20).We obtained the deviation c * − c true for each of the 200 data set.The averages (standard deviation) of these deviations over all data sets are 2.43 (1.40) when using AIC and 0.34 (0.62) when using BIC.This result shows that AIC tends to overestimate Card(Λ).Moreover, the mean absolute bias |c * − c true | for BIC was smaller than that for IC for 184 data sets among the 200 ones.This result shows that BIC is substantially better than AIC.Accordingly, we take only the BIC-based solution into consideration from here.

Equivalence of SbCFA and SimpFA Solutions
It was found that 98 percent of the 200 solutions of SimpFA were equivalent to those of the SbCFA ones used for initializing the SimpFA parameters: no iteration in the SimpFA algorithm was required.Also in the remaining two percent of the runs, SbCFA and SimFA solutions were almost equivalent, with the average of the differences between the two solutions over the 200 data sets being 0.001.Here, the difference is defined as

Real Data Demonstration
In order to demonstrate how useful our proposed procedure is, we apply it to two data sets that have already been analyzed by CFA with zero constraints selected by users.The solutions of the latter user-based CFA are compared with those for our proposed procedure.
The first data set is Carlson and Mulaik's [17] personality trait data matrix of n = 280 (participants) × p = 15 (personality traits).Its correlation matrix (Mulaik, 2010, p. 198) has been analyzed by Mulaik [2] using CFA with his selected constraints, and its solution is shown in Mulaik (2010, p. 437) and also in Table 3 together with the solution of the proposed procedure.Here, the latter solution is the same as the one of SbCFA before SimpFA: iteration was not required in the SimpFA algorithm.Comparing the BIC values in Table 3, we can find that the solution of the proposed procedure was better than Mulaik's [2] one.
Another data set is Kojima's housing preference one with n = 1120 (participants) and p = 13 (features).It describes to what degree the participants wish to  live in the houses featured by the variables.The correlation matrix for this data set is presented in Table 4, as it is described in Japanese and not easily available.This matrix has been analyzed by Kojima using CFA with his selected constraints, and its solution was shown in Kojima and also in Table 5 with the solution of the proposed procedure.Here, this solution differs from the SbCFA counterpart: iteration was required in the SimpFA algorithm.In Table 5, BIC shows that the solution of the proposed procedure was better.
In both of the above examples, the proposed procedure outperformed in the BIC values, but its solutions are similar to the counterparts of CFA with users' selected constraints (Table 3 and Table 5).This similarity shows that the selected constraints were rational.However, the proposed procedure is fully computational and does not require any users' effort for considering constraints.This property would be helpful as soon as there is some uncertainty as to which loadings to constrain to 0.

Conclusions
A problem in the confirmatory factor analysis (CFA) is that users must select Here, C = SA and Q = A′SA + U, with A and U being computed as ( ) ( ) ( ) using the current Λ,Ψ, and Φ values.In the EM algorithm, (A1) rather than ( 8) is minimized iteratively.This minimization is known to allow (8) to be minimized (Rubin & Thayer, 1982).This fact also holds true, when Λ is constrained as = • Λ Λ B . Thus, the minimization of (8) in EFA, SbCFA (5) or (5′), and SimpFA ( 6) or (6′) is attained with the EM algorithm.
The EM algorithm for EFA, SbCFA, and SimpFA can be summarized as follows: Step 1. Initialize Λ,Ψ, and Φ, with B also initialized in SimpFA.
Step 6. Transform Φ and Λ so that Φ is a correlation matrix and finish if convergence is reached; otherwise.Go back to Step 2.
Here, the minimization in Step 3 is attained when the ith diagonal elements in Ψ are updated as ), with i ′ λ the ith row of Λ, i ′ c that of C, and s ii the (i,i) element of S. The task of Step 5 is attained simply by updating Φ as Φ = Q.This is because Φ may be regarded simply as a covariance matrix during the iteration and finally be transromed into a correlation matrix.Thus, ifconvergenceisreachedin Step 6, we must transform Φ and Λ as ( ) ( ) ( ) , then regard the resulting Φ R and Λ R as the solutions of Φ and Λ, respectively.Here, we should notice that the transformation does not change the value of (8), which is shown by and TT′, respectively, that are obtained by the preceding simplimax rotation.In SimpFA, Λ, Ψ, and Φ are intialized at their SbCFA solution and B at its simplimax rotation solution.
The minimization in Step 4 is attained by the update of Λ with Λ = CQ −1 in EFA and the row-wise ( ) Step 4 in SimpFA is detailed in Appendix A4.

A2. Algorithm for Simplimax Rotation
The algorithm for the simplimax rotation can be summarized as follows: Step How T is initialized is described in the section for the whole process of the proposed procedure.The convergence is defined that the decrease in ( 16) from the previous round is less than 10 −6 in this paper.

Λ
and the links in the right diagram.The elements in B = (b ij ) (p × m) are deother words, b ij = 1 if variable i is linked to factor j; otherwise, b ij = 0.In this sense, we call B a link matrix.Any CFA constraint can be expressed as = • B Λ Λ.

Figure 1 .
Figure 1.Illustration of constrained loadings and the corresponding CFA model.
our proposed procedure can find the optimal B among all N B matrices B in B  by the max min 1 c c − + runs of (6) following (5), but not by assessing all B. The example of max min 1 22 c c − + = for p = 12 and m = 3 demonstrates how easily we can arrive at the optimal B.

Stage 2 . 3 .
For each of 1, ,100 l = , perform the following sub-stages: Stage 2.1.Initialize T to T l and perform simplimax.Express the resulting B as B l .Stage 2.2.For S, perform CFA using B l as B. Express the resulting Λ, Ψ, and Φ as l Λ , l Ψ , and l Φ , respectively, with their set For S, perform SimpFA with Λ, Ψ, and Φ initialized at l Λ , l Ψ , and l Φ , respectively.Express the resulting Λ, Ψ, and Φ as Λ l , Ψ l , and Φ l , respectively, with their set resulting in Stage 3 (the last subsection) depends on the cardinality c value in(8).We thus use { } for a particular value of c.The best value for c can be selected with the procedure (9).This can be rewritten using { } with c min and c max defined as(10).DOI: 10.4236/ojs.2021.1160621052 Open Journal of Statistics
, which is provided by (20) with IC(c) = BIC(c).As DOI: 10.4236/ojs.2021.1160621054 Open Journal of Statistics the indices standing for the badness in the recovery for Λ, Ψ, and Φ,we obtained assessing the incorrectness in identifying the true zero and nonzero loadings, we recorded MIR 0 = the number of the true zero loadings estimated as nonzero/(pm − c), and MIR # = the number of the true nonzero loadings estimated as zero/c, for each data set, with M = m(m − 1).Here MIR stands for misidentification rate.The statistics of the resulting five index values over the 200 data sets are presented in Table 2.It shows that the proposed procedure recovered { } i.e., the links of variables to factors and parameter values, fairly well, except that the 95 percentile for factor correlations can be considered large.These results allow us to conclude that the true CFA model can be recovered satisfactorily, though the estimates of factor correlations might deviate from the true counterparts in a few cases.
m × m diagonal matrix.Convergence inStep 6    is defined here as the decrease in (6) from the previous round being less than 10 −6 .The details in Steps 1 and 4 differ among EFA, SbCFA, and SimpFA, as described in the next paragraphs.The initialization in Step 1 is detailed here.In EFA, the eigenvalue decomposition of S defined as S = L∆ 2 L′ is used, with ∆ 2 the p × p diagonal matrix whose diagonal elements are arranged in descending order, and LL′ = I p .That is, the initial Λ and Ψ are set to L m ∆ m and ( ) , respectively, in EFA, with 2 m ∆ the first m × m diagonal block of ∆ 2 and L m the p × m matrix containing the first m columns of L. In SbCFA, the initial Ψ is set to the one obtained in the preceding EFA, while DOI: 10.4236/ojs.2021.1160621060 Open Journal of Statistics the initial Λ and Φ are set to the matrices • Λ B with i ′ b the ith row of the link matrix B = (b ij ) defined as (2 (h ij ).Here, ℵ denotes the set of the index pairs (i, j) for b ij = 1, while ⊥ ℵ is the set of (i, j) for b ij = 0, and we have used if b ij λ ij = h ij .Further- with Λ C the current Λ val- ue, (A4) can be rewritten as ⊗ denoting the Kronecker product.Using that inequality, a function majorizing (A4′) can be defined as is a function of Λ with β ≥ 0 on the right side of (A5), the task of Step 4 is attained by minimizing Thus, B and Λ to be obtained in Step 4 are given by (18) with H = (h ij ) and [ ]

Table 1 .
True parameters in Λ, Ψ1p, and Φ with blank cells indicating zero elements.

Table 2 .
Percentiles, average (Avg) and standard deviation (SD) of each index over200 data sets.

Table 3 .
Solution for personality trait data, with blank cells indicating zero elements.

Table 5 .
Solution for housing preference data, with blank cells indicating zero elements.
1. Initialize T Step 2. Update B and Λ as (18) with H = Λ E T −1 Step 3. Update T with Browne's (1972) algorithm Step 4. Finish if convergence is reached; otherwise, go back to Step 2.