R-factor analysis of data based on population models comprising R- and Q-factors leads to biased loading estimates

Effects of performing R-factor analysis of observed variables based on population models comprising R- and Q-factors were investigated. It was noted that estimating a model comprising R- and Q-factors has to face loading indeterminacy beyond rotational indeterminacy. Although R-factor analysis of data based on a population model comprising R- and Q-factors is nevertheless possible, this may lead to model error. Accordingly, even in the population, the resulting R-factor loadings are not necessarily close estimates of the original population R-factor loadings. It was shown in a simulation study that large Q-factor variance induces an increase of the variation of R-factor loading estimates beyond chance level. The results indicate that performing R-factor analysis with data based on a population model comprising R- and Q-factors may result in substantial loading bias. Tests of the multivariate kurtosis of observed variables are proposed as an indicator of possible Q-factor variance in observed variables as a prerequisite for R-factor analysis.

The factor model (Mulaik, 2012;Harman, 1976) allows for the investigation of measurement models in psychology and several areas of the social sciences.There are several estimation methods for the factor model, and researchers have the choice between several different methods for exploratory and confirmatory factor analysis (Flora & Curran, 2004;Mulaik, 2012;Muthén & Asparouhov, 2012;Wirth & Edwards, 2007).Although a very large number of studies is based on the factor model, the real-world phenomena may not correspond exactly to this model.Tucker and MacCallum (1991) emphasized that the factor model may not fit perfectly to real population data.The possible difference between the factor model and population real-world data has been termed 'model error' (Tucker & MacCallum, 1991;MacCallum, 2003).Accordingly, in a factor analysis performed on a real-world data sample, some misfit of the factor model might be due to sampling error and some misfit might be due to model error.The modeling of common and unique factors together with a large number of minor factors has successfully been used in order to generate more realistic data containing model error in simulation studies (e.g., de Winter, Dodou, & Wieringa, 2009).However, other types of model error that are not based on a large number of minor factors may also be relevant for the fit of the factor model to real data.
One form of model error considered here is that the covariances between observed variables may be affected by covariances between individuals.In psychology, factor analysis is mainly performed in order to identify latent variables explaining the covariation between variables that are observed in samples of individuals.Factor analysis of the covariances or correlations between variables that are observed across many individuals is often termed Rfactor analysis whereas factor analysis of the covariances or correlations between individuals observed across many variables is termed Q-factor analysis (Ramlo & Newman, 2010;Broverman, 1961).A data matrix of individuals for Q-factor analysis is obtained when the matrix of observed variables used for R-factor analysis is transposed.Note that the empirical data used for R-and Q-factor analysis may be the same, although the number of observed variables will typically be larger than the number of individuals in Q-factor analysis whereas the number of individuals will typically be larger than the number of observed variables in Rfactor analysis.Moreover, there are other preferences for factor extraction and rotation in Qfactor analysis (Akhtar-Danesh, 2016;Ramlo, 2016) than in R-factor analysis.Nevertheless, there is consensus that Q-factor analysis may be useful for the investigation of subjective individual views (Ramlo, 2016) and Q-factor analysis is sometimes preferred over R-factor analysis in the context of questionnaire development (e.g., Cadman, Belsky & Pasco Fearon, 2018).
The similarities and differences of R-and Q-factor analysis have primarily been discussed from the perspective of factor analysis as a tool for data analysis (Burt & Stephenson, 1939;Cattell, 1952).In consequence, the effects of the R-and Q-factor model as data generating population models on the results of R-or Q-factor analysis have rarely been in the focus of the investigation.It is therefore widely unknown what happens when data that are based on a population model comprising R-and Q-factors are submitted to R-factor analysis.As models are never true (Box, 1979) it is not the fact that model error occurs that is important here, but the question whether the loading estimates from R-factor analysis are substantially biased when a combined R-and Q-factor model holds.Therefore, and because most studies perform R-factor analysis, the focus of the present study is on the effect of a combined R-and Q-factor model as a population model on subsequent R-factor analysis.It is, however, acknowledged that a combined R-and Q-factor population model might also be a source of error for Q-factor analysis.
An example for R-factors in a context where Q-factors may also be relevant is the analysis of personality types in the context of personality traits (Gerlach, Farb, Revelle & Amaral, 2018), although the robustness of the results has been challenged (Freudenstein, Strauch, Mussel & Ziegler, 2019).Freudenstein et al. (2019) also noted that only 42% of the sample were associated with the proposed personality types indicating that the types are probably of moderate relevance.Although Gerlach et al. (2018) used cluster-methodology (Gaussian mixture models) for the identification of types, similarities of individuals have also been investigated by means of Q-factor analysis (Ramlo, & Newman, 2010).Thus, personality research shows that relevant similarities of variables as well as relevant similarities of individuals may co-occur.This does not imply that Q-factors yield a superior representation of personality variance nor that they allow for improved predictions of outcomes like, for example, social adjustment or job achievement (Asendorpf, 2003).For the present study it is only important to acknowledge that Q-factors may also be relevant for a complete description of the data.However, if we accept the idea that Q-factors may co-occur with R-factors, the consequences of a population model based on a combination of R-and Q-factors for the estimation of model parameters of R-factor analysis should be investigated.This has until now not been done as similarities of individuals have often been investigated by means of cluster analysis (Gerlach et al., 2018;Freudenstein et al., 2019), latent class analysis (Lazarsfeld & Henry, 1968), or factor mixture models (Lubke & Muthén, 2005).The achievements of these approaches for the analysis of typological variance are not questioned here.The focus of the present study is on the effect of population Q-factors co-occurring with population R-factors on the loading estimates of R-factor analysis which does not take into account the Q-factors.
After some definitions, the effects of population models based on R-and Q-factors on the covariance and correlation of observed variables and the resulting effects on the estimation of R-factor loadings are described for the population.Then, a simulation study is performed in order to give an account of the effect of population models comprising R-and Q-factors on loading estimates of R-factor analysis.Finally, a method indicating whether a data set contains a relevant amount of Q-factor variance is proposed and demonstrated by means of simulated data sets.

Definitions
Let R X be a p  n matrix of p variables observed for n individuals (Harman, 1976).The R-factor model can then be written as where R f is a qR  n matrix of normally distributed common R-factor scores, R Λ is a p  qR matrix of common R-factor loadings, R e a p  n matrix of normally distributed linear independent unique R-factor scores, and R Ψ is a p  p diagonal positive definite matrix of unique R-factor ) , Let Q X be a n  p matrix of n individuals for which p variables were observed.The Q-factor model can then be written as where Q f is a qQ  p matrix of normally distributed common Q-factor scores, Q Λ is a n  qQ matrix of common Q-factor loadings, Q e is a n  p matrix of normally distributed linear independent unique Q-factor scores, and Q Ψ is a n  n diagonal positive definite matrix of unique Q-factor loadings.It is furthermore assumed that It is assumed that the observed variables R X and Q X are statistically independent with (5) for ´Q ()

A combined model of R-and Q-factors
The data in the following section are assumed to be analyzed from the perspective of R-factor analysis whereas the observed variables RQ X are based on an aggregation of variables resulting from R-and Q-factors.This can be written as Therefore, not all R-and Q-factors need not to be well represented by the observed variables in RQ X when R-factor analysis is performed.For a complete description of the population model there is the symmetric and idempotent centering matrix  6.It has been noted by Cattell (1952) and others that mean centering of Q X implies that the variance that would be based on a single common factor (qQ =1) in Q X would be eliminated in R-factor analysis of RQ X .Therefore, only the condition qQ >1 is considered here.
It follows from Equation 6that the covariances of RQ The element in the first row and first column of RQ () E = H0 .Therefore, Equation 7 can be written as Two implications of assuming this model based on generating R-and Q-factors will be outlined here.The first consequence is the indeterminacy of the loadings resulting from Rfactor analysis, the second consequence is the difference between the population R-factor loadings and the loading estimates resulting from R-factor analysis, which will be denoted as bias in the following.

Indeterminacy of estimated R-factor loadings
It follows from Equation 8that the non-diagonal elements of RQ Σ do not only depend on R Λ but also on Q When R-factor analysis is performed for RQ Σ , the resulting estimated population loading estimates R Λ will therefore not only have rotational indeterminacy but will also depend on the size of the elements in and Q e , which typically remain unknown in empirical settings.The number of model parameters is much larger than the number of elements of RQ Σ .As the variances and covariances of the observed variables are relevant, only the elements of the lower triangle and the main diagonal of RQ Σ represent independent data points, so that there are (p 2 + p)/2 independent data points.Although the number of parameters of the R-factor model (Equation 1) is pqR + p, the parameters are typically identified successively, so that pqR common factor loadings are estimated first, and the p unique factor loadings are calculated from the common factor loadings.As qR is typically considerably smaller than p the factor model can easily be estimated.However, the number of elements of the model comprising R-and Q-factors (Equation 8) can be calculated from the number of Ψ , the loadings of the unique factors are calculated in a second step, there are pqR + nqQ + pqQ + pn model parameters.In order to give a further account of the number of model parameters it is helpful to consider the condition of n = p, qR = qQ, this condition yields pqR + pqR + pqR + p² = p² + 3pqR model parameters.It follows that even for qR = 1 the number of model parameters is p² + 3p which is more than two times (p 2 + p)/2, the number of data points.Thus, even the simplest combined R-and Q-factor model has considerably more parameters than there are elements in RQ Σ .Therefore, the combined R-and Q-factor model is inherently non-identified and indeterminate, unless further constraints are imposed.Performing R-factor analysis is nevertheless possible because the loadings and scores of the generating Qfactors are simply ignored.However, as will be investigated in the following, the incomplete estimation of the parameters of the generating R-and Q-factor model by means of R-factor analysis may result in biased model parameters.

Bias of estimated R-factor loadings
For Q e Ψ e e Ψ e e Ψ e e Ψ e (9) where SSQ denotes the sum of squares.It follows from ´Q () identity matrix containing the eigenvalues in the main diagonal in descending order with Magnus and Neudecker (2007, p. 248) the trace of the power of a positive semidefinite square matrix is equal to the trace of the power of the eigenvalues of the matrix so that , the right-hand side of Equation 10can be written as .
It follows from where Q W is a nn  diagonal matrix with qQ non-zero eigenvalues in decreasing order and which implies that the variance of the elements in Q-factors and all common Q-factors account for the same amount of variance of each observed variable ( , the right-hand side of Equation 13 can be written as ( 14) It follows from Equations 14 and 10 and for n > qQ that , i.e., that common Q-factors introduce n/qQ times more variability into the elements of RQ Σ than unique Q-factors.More generally, Equations 10 and 11 imply that some variability in the elements of is introduced by the common and unique Q-factors.
To sum up, Q-factors tend to enhance the variance of the covariances of observed variables (Equations 10, 11).However, the abovementioned analyses do not inform on the size of the respective effects and which amount of Q-factor variance might substantially distort an R-factor solution.

Simulation study on the effect of generating Q-factors on R-factor loadings
A simulation was performed in order to give an account of the bias of R-factor loadings that is due to Q-factors.As the number of individuals or cases n is part of the Q-factor model, the population has to comprise a large number of samples of a given n.The first population was based on 2,000 samples of n = 300 cases, the second population comprised 2,000 samples of n = 600 cases, and the third population comprised 2,000 samples of n = 900 cases.Accordingly, the conditions of the simulation study were qR = 3, qQ = 3, and p = 15.To investigate the effect of Q-factors on the variability of R-factor loading estimates, the salient loading sizes were set equal within each population model.The size of salient loadings in the common R-factor loading matrices R Λ was R  {.50, .70}and the size of salient loadings in common Q-factor loading matrices Q Λ was Q = .90.The non-salient loadings were zero in all population models.
According to Equations 1, 3 and 6, the R-and Q-factor loadings were combined in order to generate the observed variables.This can be written as Each set of p observed variables was submitted to R-factor analysis.The dependent variables of the simulation study were the mean and standard deviation of the estimated loadings R Λ resulting from principal-axis R-factor analysis of the sample data with subsequent orthogonal target-rotation (Schönemann, 1966) of the estimated R-factor loadings R Λ towards the R-factor loadings R Λ of the population model based on R-and Q-factors.Therefore, differences between the means of R Λ and cannot be due to different rotations of the factors. ))

Results
The most important result of the simulation study is that the standard deviation of the salient loadings increases with decreasing 2 R w (Table 1).The results of 2 R w = 1.00 show the standard deviations of the loadings that are only due to sampling error, as rotational variation of loadings was excluded by means of orthogonal target-rotation towards the population loadings.
Especially, the results of 2 R w = 0.25 show that the standard deviation of the loading estimates was about twice as large as the variation due to sampling error, when there was a substantial amount of Q-factor variance.This additional loading variation is a bias of the loading estimates as there was no salient loading variation in the population.In order to show the possible effect of the loading variation (comprising salient and nonsalient loadings) on possible factor identification, a scatterplot of the target-rotated loadings of factor 1 and 2 is presented for R = .50 in Figure 1 and for R = .70 in Figure 2. Obviously, the overlap of salient and non-salient loadings for samples of n = 300 cases is substantial for 2 R w = 0.25 and might be an obstacle for factor identification.In contrast, salient and non-salient loadings can clearly be separated for n = 300 cases, R = .50and 2 R w = 1.00 or for samples sizes of n = 600 and n = 900.For all conditions based on R = .70,the overlap of salient and nonsalient loadings was small, indicating that factor identification would be possible (Figure 2).To sum up, when a substantial amount of Q-factor variance is expected, large sample sizes should be analyzed or very large R-factor loadings should be the expected as a basis for successful factor identification.

An indicator of Q-factor variance
As R-factor analysis of data from a population based on a relevant amount of Q-factor variance may result in biased R-factor loadings, it is interesting to know whether there is a relevant amount of Q-factor variance in a data set.Note that a population model based on an additive combination of R-and Q-factors implies that a row-centered matrix of individual R-factor scores is combined with a row-and-column-centered matrix of individual Q-factor scores (Equations 15,16).Burt (1937) demonstrated that the eigenvalues of R-and Q-factor analysis of a row-and-column-centered matrix are identical, so that a high similarity of eigenvalues should be expected for combined R-and Q-factor models, even when the resulting matrix is not perfectly column-centered.Therefore, Q-factor analysis will yield a number of substantial eigenvalues, even when the data can perfectly be described by R-factors analysis.Thus, the eigenvalues of Q-factor analysis do not inform unambiguously on the amount of Q-factor variance.
It is therefore proposed to consider the bivariate scatterplot of observed variables in order to ascertain whether between-subject variance that could be due to R-factors is combined with a substantial amount of within-subject variance that could be due to Q-factors.Different within-subject profiles that might be caused by qQ > 1 Q-factors imply that not all differences between two observed z-standardized variables z1 and z2 are equal.For qQ = 2, for example, there could be one group of participants with z1 -z2 > 0 and a second group with z1 -z2 < 0. It follows that the variance of the z-score differences d, d, is greater zero for qQ ≥ 2. According to Rodgers and Nicewander (1988, p. 64) the correlation can be written as   .10, and .20.As the test is employed in order to evaluate conditions for R-factor analysis, an alpha-level beyond the conventional .05-levelmight be justified.Note that the 2 R w = 1.00 condition is a condition without any effect of Qfactors, so that no detection rate beyond chance level should be expected for this condition.

Discussion
As R-factor analysis of variables observed for a large number of individuals is the dominant form of factor analysis in several areas of social sciences, it might happen that R-factor analysis is routinely performed even when the population model comprises R-and Q-factors.For example, in the domain of personality research it has been assumed that Q-factors or typefactors may be relevant in addition to the well-known R-factors (e.g., Gerlach et al., 2018;Gilbert, et al., 2021;Ramos, Mata, & Nacar, 2021).This leads to the question whether performing R-factor analysis of data from a population model comprising R-and Q-factors may result in biased loading estimates.R-factor analysis of data from population models comprising R-and Q-factors were therefore investigated.It was noted that a model comprising R-and Q-factors introduces loading indeterminacy beyond rotational indeterminacy.For such a model, the number of model parameters is substantially larger than the number of elements of the covariance matrix of observed variables.It was shown that R-factor analysis of data based on a population model comprising R-and Q-factors leads to biased R-factor loading estimates.For such data R-factor analysis introduces variability into the loading estimates.Thus, when the observed variables have equal R-factor loadings in a population model comprising R-and Qfactors, the loading estimates resulting from R-factor analysis of the observed variables will typically be unequal.This bias of R-factor loading estimates and the variation of R-factor loading estimates beyond chance level was also shown in a simulation study.It was illustrated in the simulation study that the additional loading variability may hamper factor identification.
As the use of R-factor analysis for data drawn from a population based on R-and Qfactors may result in biased R-factor loading estimates, it might be of interest to detect Q-factor variance in observed variables as a prerequisite of R-factor analysis.As eigenvalues of correlation matrices may be in ambiguous and because Q-factor variance leads to platykurtic multivariate distributions of observed scores, it was proposed to use tests for the multivariate normality as indicators for Q-factor variance.In a simulation study Mardia's test of the multivariate kurtosis was more sensitive for the detection of relevant Q-factor variance than Srivastava's and Small's test.However, a slight tendency of false positive results was also found with Mardia's test so that Srivastava's and Small's test might also be recommended.As different reasons are possible for departures of the kurtosis from the kurtosis of the multivariate normal distribution are possible, an inspection of scatterplots is recommended when a test of the multivariate kurtosis of the data is significant.The inspection of scatterplots may be combined with pairwise tests of the bivariate kurtosis in order to eliminate observed variables with substantial Q-factor variance from R-factor analysis.
To sum up, the present paper is a caveat that population models comprising R-and Qfactors do have loading indeterminacy beyond rotational indeterminacy and that performing Rfactor analysis of data based on such models results in biased R-factor loading estimates.The bias is due to the fact that the model of R-factor analysis does not correspond exactly to the population model comprising R-and Q-factors.Tests of the multivariate kurtosis might be used for the detection of Q-factor variance as a prerequisite for R-factor analysis.Further research should compare the effect of model error due to Q-factor variance on the results of R-factor analysis with the effect of model error based on minor factors as it has been discussed by MacCallum (2003).The combined effect of both types of model error based on minor factors and model error based on Q-factor variance might be investigated in future research as it may occur in real data.

RX
represents the part of the observed variables based on R-factors and Q X is the transposed matrix of observed individuals based on Q-factors.Although adding R X and Q X is only possible for n = p, it should be noted that -in the combined model of R-and Q-factorsonly RQ X is observed whereas R X and Q X are parts of the assumed population model.
all unique Q-factors and all common Q-factors account for the same amount of variance of each observed variable (

Σ
Although the relative effect of R-and Q-factors can be determined by the size of the respective common and unique R-and Q-factor loadings, it is helpful to control for the relative effect of R-and Q-factors more directly by means of (16) with and , which is needed to standardize the transposed part of the observed variables based on Q-factors.The usual metric of standardized factor loadings was maintained in the population with observed variables were computed from Equation 16 by means of qR common factor scores R f , p unique factor scores R e , n/qQ common factor scores Q f , and n unique factor scores Q e , which were generated from normal distributions with  = 0 and  = 1 by the Mersenne twister random number generator integrated in IBM SPSS, Version 26.w = , half of the unique R-factor variance is replaced by common and unique Q-factor variance.Four levels of 2 R w (1.00, .75,.50, and .25)with the corresponding 2 Q w were combined with two levels of R and three sample sizes n, which leads to 4  2  3 = 24 populations, each comprising 2,000 samples.

Figure 1 .Figure 2 .
Figure 1.Scatterplot of R-factor loading estimates of factor 1 and 2 based on 2,000 samples (n = 600, n = 900) drawn from populations based on R = .50,qR = 3 R-factors ( 2 R w = 1.00) and from populations dots).The concentration of points on three lines is extreme for qQ = 3, so that the bivariate distribution is quite different from the bivariate distribution for the same correlation and qQ = 0 (Figure3, crosses).For qQ = 0 there is a bivariate normal distribution, which is clearly not the case for qQ = 3.As the distributions in Figure3are not skewed, only tests of the multivariate kurtosis were performed with the macro provided by at  = .05.Srivastava's (1984) test for multivariate kurtosis (2,p = 2.26, N(2,p)=-2.59, p < .01),Small's (1980) test of multivariate kurtosis (Q2=298.95,df = 2, p < .01),andMardia's (1970) test indicate a significant departure from multivariate normal kurtosis (2,p = 6.36,N(2,p)=-2.47, p < .05).The example shows that a bivariate distribution clearly based on qQ = 3 may result in a platykurtic departure from the kurtosis of the bivariate normal distribution.Even when different reasons for platykurtic multivariate distributions are possible, tests of the multivariate kurtosis might be useful indicators of qQ > 1. Visual inspection of scatterplots may be performed when significant departures from the multivariate normal distribution occur.

Table 1 .
Mean and standard deviation of target-rotated salient loading estimates of R-factor analysis for R = .50,.70 and Q = .90for 2