R-Factor Analysis of Data Based on Population Models Comprising R- and Q-Factors Leads to Biased Loading Estimates

André Beauducel

doi:10.4236/ojs.2024.141002

Open Journal of Statistics > Vol.14 No.1, February 2024

R-Factor Analysis of Data Based on Population Models Comprising R- and Q-Factors Leads to Biased Loading Estimates

André Beauducel
Department of Psychology, University of Bonn, Bonn, Germany.
DOI: 10.4236/ojs.2024.141002 PDF HTML XML 143 Downloads 457 Views

Abstract

Effects of performing an R-factor analysis of observed variables based on population models comprising R- and Q-factors were investigated. Although R-factor analysis of data based on a population model comprising R- and Q-factors is possible, this may lead to model error. Accordingly, loading estimates resulting from R-factor analysis of sample data drawn from a population based on a combination of R- and Q-factors will be biased. It was shown in a simulation study that a large amount of Q-factor variance induces an increase in the variation of R-factor loading estimates beyond the chance level. Tests of the multivariate kurtosis of observed variables are proposed as an indicator of possible Q-factor variance in observed variables as a prerequisite for R-factor analysis.

Keywords

R-Factor Analysis, Q-Factor Analysis, Loading Bias, Model Error, Multivariate Kurtosis

Share and Cite:

Beauducel, A. (2024) R-Factor Analysis of Data Based on Population Models Comprising R- and Q-Factors Leads to Biased Loading Estimates. Open Journal of Statistics, 14, 38-54. doi: 10.4236/ojs.2024.141002.

1. Introduction

The factor model [1] [2] allows for the investigation of measurement models in psychology and several areas of the social sciences. There are several estimation methods for the factor model, and researchers have the choice between several different methods for exploratory and confirmatory factor analysis [1] [3] [4] [5] . Although a very large number of studies are based on the factor model, the real-world phenomena may not correspond exactly to this model. [6] emphasized that the factor model may not fit perfectly into real population data. The possible difference between the factor model and population real-world data has been termed “model error” [6] [7] . Accordingly, in a factor analysis performed on a real-world data sample, misfit of the factor model might be due to sampling error and misfit might be due to model error. The modeling of common and unique factors together with a large number of minor factors has successfully been used in order to generate more realistic data containing model error in simulation studies (e.g., [8] ). However, other types of model error that are not based on a large number of minor factors may also be relevant for the fit of the factor model to real data. As other types of model error have not yet been investigated, their effect on the estimation of model parameters remains unknown. The research problem addressed in the present study is therefore the effect of another type of model error on the results of factor analysis.

The model error considered here is that the covariances between observed variables are affected by covariances between individuals. In psychology and social sciences, factor analysis is mainly performed in order to identify latent variables explaining the covariation between variables that are observed in samples of individuals. However, covariances of variables imply a pattern of covariances of individuals as shown in the following example (Table 1, Example 1). The perfect correlation of variables x₁ and x₂ may be caused by a common factor and the correlation of variables x₃ and x₄ may be caused by another common factor. As the scores of individuals i₁ and i₂ have a zero variance, the corresponding inter-correlations of individuals are zero. Only a perfect negative inter-correlation between individuals i₃ and i₄ occurs. Whereas the inter-correlations of variables can be explained by two uncorrelated factors, the corresponding inter-correlations

Table 1. Mean-centered scores of four individuals (i₁ - i₄) on four observed variables (x₁ - x₄), inter-correlations of variables without inter-correlation between individuals and with superimposed inter-correlation between individuals.

Note. Standard deviations are given behind the slash.

of individuals cannot be explained by two uncorrelated factors. In Example 2, perfect negative correlations between individuals i₁ and i₂ and between i₃ and i₄ and moderate inter-correlations between the remaining individuals occur, which considerably modifies the inter-correlations of variables when compared to Example 1. The examples demonstrate the mutual inter-relation of inter-correlations of variables and inter-correlations of individuals. In order to elucidate this relationship, the effect of latent factors explaining the common variance of individuals on the common variance of variables is investigated in the present study.

Factor analysis of the covariances or correlations between variables that are observed across many individuals is often termed R-factor analysis whereas factor analysis of the covariances or correlations between individuals observed across many variables is termed Q-factor analysis [9] [10] . A data matrix of individuals for Q-factor analysis is obtained when the matrix of observed variables used for R-factor analysis is transposed. Note that the empirical data used for R- and Q-factor analysis may be the same, although the number of observed variables will typically be larger than the number of individuals in Q-factor analysis whereas the number of individuals will typically be larger than the number of observed variables in R-factor analysis. Moreover, there are other preferences for factor extraction and rotation in Q-factor analysis [11] [12] than in R-factor analysis. Nevertheless, there is consensus that Q-factor analysis may be useful for the investigation of subjective individual views [12] and Q-factor analysis is sometimes preferred over R-factor analysis in the context of questionnaire development (e.g., [13] ).

The similarities and differences of R- and Q-factor analysis have primarily been discussed from the perspective of factor analysis as a tool for data analysis [14] [15] . In consequence, the effects of the R- and Q-factor model as data generating population models on the results of R- or Q-factor analysis have rarely been compared. It is therefore widely unknown what happens when data that are based on a population model comprising R- as well as Q-factors are submitted to R-factor analysis. As models are never true [16] , it is not the fact that model error occurs that is important here, but the question of whether the loading estimates from R-factor analysis are substantially biased when a combined R- and Q-factor model holds. Therefore, and because most studies perform R-factor analysis, the focus of the present study is on the effect of a combined R- and Q-factor model as a population model on subsequent R-factor analysis. It is, however, acknowledged that a combined R- and Q-factor population model might also be a source of error for Q-factor analysis.

An example of R-factors in a context where Q-factors may also be relevant is the analysis of personality types in the context of personality traits [17] , although the robustness of the results has been challenged [18] . [18] also noted that only 42% of the sample was associated with the proposed personality types indicating that the types are probably of moderate relevance. Although [17] used cluster-methodology (Gaussian mixture models) for the identification of types, similarities of individuals have also been investigated by means of Q-factor analysis [9] . Thus, personality research shows that relevant similarities of variables as well as relevant similarities of individuals may co-occur. This does not imply that Q-factors yield a superior representation of personality variance or that they allow for improved predictions of outcomes like, for example, social adjustment or job achievement [19] . For the present study, it is only important to acknowledge that Q-factors may also be relevant for a complete description of the data. However, if we accept the idea that Q-factors may co-occur with R-factors, the consequences of a population model based on a combination of R- and Q-factors for the estimation of model parameters of R-factor analysis should be investigated. This has until now not been done as similarities of individuals have often been investigated by means of cluster analysis [17] [18] , latent class analysis [20] , or factor mixture models [21] . The achievements of these approaches for the analysis of typological variance are not questioned here. The focus of the present study is on the effect of population Q-factors co-occurring with population R-factors on the loading estimates of R-factor analysis which does not take into account the Q-factors.

After some definitions, the effects of population models based on R- and Q-factors on the covariance and correlation of observed variables and the resulting effects on the estimation of R-factor loadings are described for the population. Then, a simulation study is performed in order to give an account of the effect of population models comprising R- and Q-factors on loading estimates of R-factor analysis. Finally, a method indicating whether a data set contains a relevant amount of Q-factor variance is proposed and demonstrated by means of simulated data sets.

2. Definitions

2.1. Separate R- and Q-Factor Models

Let $X_{R}$ be a p × n matrix of p variables observed for n individuals [2] . The R-factor model can then be written as

$X_{R} = Λ_{R} f_{R} + Ψ_{R} e_{R},$ (1)

where $f_{R}$ is a q_R × n matrix of normally distributed common R-factor scores, $Λ_{R}$ is a p × q_R matrix of common R-factor loadings, $e_{R}$ a p × n matrix of normally distributed linear independent unique R-factor scores, and $Ψ_{R}$ is a p × p diagonal positive definite matrix of unique R-factor loadings. It is furthermore assumed that $E (f_{R}) = 0$ , $E (f_{R} {f^{'}}_{R}) = I_{q_{R}}$ , $E (e_{R}) = 0$ , $E (f_{R} {e^{'}}_{R})$ $= 0$ , and $E (e_{R} {e^{'}}_{R}) = I_{p}$ , so that

$Σ_{R} = E (X_{R} {X^{'}}_{R}) = Λ_{R} {Λ^{'}}_{R} + Ψ_{R}^{2} .$ (2)

Let $X_{Q}$ be a n × p matrix of n individuals for which p variables were observed. The Q-factor model can then be written as

$X_{Q} = Λ_{Q} f_{Q} + Ψ_{Q} e_{Q},$ (3)

where $f_{Q}$ is a q_Q × p matrix of normally distributed common Q-factor scores, $Λ_{Q}$ is a n × q_Q matrix of common Q-factor loadings, $e_{Q}$ is a n × p matrix of normally distributed linear independent unique Q-factor scores, and $Ψ_{Q}$ is a n × n diagonal positive definite matrix of unique Q-factor loadings. It is furthermore assumed that $E (f_{Q}) = 0$ , $E (f_{Q} {f^{'}}_{Q}) = I_{q_{Q}}$ , $E (e_{Q}) = 0$ , $E (f_{Q} {e^{'}}_{Q})$ $= 0$ , $E (e_{Q} {e^{'}}_{Q}) = I_{n}$ , so that

$Σ_{Q} = E (X_{Q} {X^{'}}_{Q}) = Λ_{Q} {Λ^{'}}_{Q} + Ψ_{Q}^{2} .$ (4)

It is assumed that the observed variables $X_{R}$ and $X_{Q}$ are statistically independent with

$\begin{array}{l} E (X_{R} {X^{'}}_{Q}) = E ((Λ_{R} f_{R} + Ψ_{R} e_{R}) ({f^{'}}_{Q} {Λ^{'}}_{Q} + {e^{'}}_{Q} Ψ_{Q})) {(p - 1)}^{- 1} \\ = E (Λ_{R} f_{R} {f^{'}}_{Q} {Λ^{'}}_{Q} + Λ_{R} f_{R} {e^{'}}_{Q} Ψ_{Q} + Ψ_{R} e_{R} {f^{'}}_{Q} {Λ^{'}}_{Q} + Ψ_{R} e_{R} {e^{'}}_{Q} Ψ_{Q}) {(p - 1)}^{- 1} \\ = 0, \end{array}$ (5)

for $E ({X^{'}}_{Q}) = 0$ and with $E (f_{R} {f^{'}}_{Q}) = 0$ , $E (f_{R} {e^{'}}_{Q}) = 0$ , $E (e_{R} {f^{'}}_{Q}) = 0$ , and $E (e_{R} {e^{'}}_{Q}) = 0$ .

2.2. A Combined Model of R- and Q-Factors

The data in the following section are assumed to be analyzed from the perspective of R-factor analysis whereas the observed variables $X_{RQ}$ are based on an aggregation of variables resulting from R- and Q-factors. This can be written as

$X_{RQ} = X_{R} + {X^{'}}_{Q} C_{n},$ (6)

where $X_{R}$ represents the part of the observed variables based on R-factors and $X_{Q}$ is the transposed matrix of observed individuals based on Q-factors. Although adding $X_{R}$ and $X_{Q}$ is only possible for n = p, it should be noted that -in the combined model of R- and Q-factors, only $X_{RQ}$ is observed whereas $X_{R}$ and $X_{Q}$ are parts of the assumed population model. Therefore, not all R- and Q-factors need not to be well represented by the observed variables in $X_{RQ}$ when R-factor analysis is performed. For a complete description of the population model n = p is nevertheless assumed in the following. Moreover, as $E ({X^{'}}_{Q})$ is not necessarily zero, there is the symmetric and idempotent centering matrix $C_{n} = I_{n} - n^{- 1} 1_{n} {1^{'}}_{n}$ , based on the n × n identity matrix $I_{n}$ and the n × 1 column unit-vector $I_{n}$ , for row mean centering of ${X^{'}}_{Q}$ on the right side of Equation (6). It has been noted by [15] and others that mean centering of ${X^{'}}_{Q}$ implies that the variance that would be based on a single common factor (q_Q = 1) in $X_{Q}$ would be eliminated in R-factor analysis of $X_{RQ}$ . Therefore, only the condition q_Q > 1 is considered here. It follows from Equation (6) that the covariances of $X_{RQ}$ are

$Σ_{RQ} = E (X_{RQ} {X^{'}}_{RQ}) = E (X_{R} {X^{'}}_{R} + H_{RQ} + {H^{'}}_{RQ} + {X^{'}}_{Q} C_{n} X_{Q}),$ (7)

with $H_{RQ} = X_{R} X_{CQ}$ and $X_{CQ} = C_{n} X_{Q}$ . The element in the first row and first column of $H_{RQ}$ is computed as $h_{RQ 11} = {x_{R 11} x_{CQ 11} + x_{R 12} x_{CQ 21} + \dots + x_{R 1 n} x_{CQ n 1}}$ . As $X_{R}$ and $X_{CQ}$ are mean-centered, symmetrically distributed and as $E (X_{R} {X^{'}}_{Q}) = 0$ all elements in the brackets are from the normal product distribution [22] [23] , which is symmetric so that $E (h_{RQ 11}) = 0$ . This holds for all elements of $H_{RQ}$ so that $E (H_{RQ}) = 0$ . Therefore, Equation (7) can be written as

$Σ_{RQ} = E (Λ_{R} {Λ^{'}}_{R} + Ψ_{R}^{2}) + E ({f^{'}}_{Q} {Λ^{'}}_{Q} C_{n} Λ_{Q} f_{Q} + {e^{'}}_{Q} Ψ_{Q} C_{n} Ψ_{Q} e_{Q}) .$ (8)

2.3. Bias of Estimated R-Factor Loadings

For ${X^{'}}_{Q}$ being mean centered ( $E ({X^{'}}_{Q}) = 0$ ) Equation (8) implies that the variance of the elements in $Σ_{RQ}$ is also affected by Q-factors. For $E ({X^{'}}_{Q}) = 0$ the numerator of the variance of the elements in ${e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}$ is

$S S Q ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}) = t r (({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q} - E ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q})) {({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q} - E ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}))}^{'}),$ (9)

where SSQ denotes the sum of squares. It follows from $E ({X^{'}}_{Q}) = 0$ , $E ({e^{'}}_{Q} e_{Q}) = 0$ , $E ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}) = 0$ , and $E (e_{Q} {e^{'}}_{Q}) = I_{n}$ that the eigen-decomposition of ${e^{'}}_{Q} Ψ_{Q}^{2} e_{Q} = K_{Q} V_{Q} {K^{'}}_{Q}$ , where $V_{Q}$ is a n × n matrix containing the eigenvalues in the main diagonal in descending order with $K_{Q} {K^{'}}_{Q} = {K^{'}}_{Q} K_{Q} = I_{n} = V_{Q}$ . According to [24] (p. 248) the trace of the power of a positive semidefinite square matrix is equal to the trace of the power of the eigenvalues of the matrix so that

$S S Q ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}) = t r ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q} {e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}) = t r (V_{Q}^{2}) = t r (Ψ_{Q}^{2})$ . (10)

When all unique Q-factors and all common Q-factors account for the same amount of variance of each observed variable ( $1 / 2 d i a g (Σ_{Q}) = d i a g (Λ_{Q} {Λ^{'}}_{Q}) =$ $Ψ_{Q}^{2}$ ), the right-hand side of Equation (10) can be written as

$t r (Ψ_{Q}^{2}) = \frac{1}{2} n σ_{Q}^{2}$ . (11)

It follows from $S S Q ({e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}) > 0$ that ${e^{'}}_{Q} Ψ_{Q}^{2} e_{Q}$ introduces variability into the elements of $Σ_{RQ}$ . For $E ({X^{'}}_{Q}) = 0$ the numerator of the variance of ${f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}$ is

$\begin{array}{l} S S Q ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}) \\ = t r (({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q} - E ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q})) {({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q} - E ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}))}^{'}) . \end{array}$ (12)

It follows from $E ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}) = 0$ that the eigen-decomposition of ${f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q} = L_{Q} W_{Q} {L^{'}}_{Q}$ , where $W_{Q}$ is a n × n diagonal matrix with q_Q non-zero eigenvalues in decreasing order and $L_{Q} {L^{'}}_{Q} = {L^{'}}_{Q} L_{Q} = I_{q_{Q}}$ . The numerator of the variance of the elements of ${f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}$ is

$S S Q ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}) = t r ({f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q} {f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}) = t r (W_{Q}^{2}),$ (13)

which implies that the variance of the elements in ${f^{'}}_{Q} {Λ^{'}}_{Q} Λ_{Q} f_{Q}$ is greater zero. When all unique Q-factors and all common Q-factors account for the same amount of variance of each observed variable ( $1 / 2 d i a g (Σ_{Q}) = d i a g (Λ_{Q} {Λ^{'}}_{Q}) =$ $Ψ_{Q}^{2}$ ), the right-hand side of Equation (13) can be written as

$t r (W_{Q}^{2}) = t r (\frac{n^{2}}{2 q_{Q}^{2}} σ_{Q}^{2}) = \frac{n^{2}}{2 q_{Q}} σ_{Q}^{2} .$ (14)

It follows from Equations (14) and (10) and for n > q_Q that $σ_{Q}^{2} n^{2} {(2 q_{Q})}^{- 1} > 0.5 n σ_{Q}^{2}$ , i.e., that common Q-factors introduce n/q_Q times more variability into the elements of $Σ_{RQ}$ than unique Q-factors. More generally, Equations (10) and (11) imply that some variability in the elements of $Σ_{RQ}$ is introduced by the common and unique Q-factors. To sum up, Q-factors tend to enhance the variance of the covariances of observed variables (Equations (10) and (11)). However, the abovementioned analyses do not inform on the size of the respective effects and which amount of Q-factor variance might substantially distort an R-factor solution.

3. Simulation Study on the Effect of Q-Factors on R-Factor Loadings

3.1. Conditions and Specifications

A simulation was performed in order to give an account of the bias of R-factor loadings that is due to Q-factors when the data are based on R- as well as Q-factors. As the number of individuals or cases n is part of the Q-factor model, the finite population of the simulation study has to comprise a large number of samples of a given n. The first population was based on 2000 samples of n = 300 cases, the second population comprised 2000 samples of n = 600 cases, and the third population comprised 2000 samples of n = 900 cases. Accordingly, the conditions of the simulation study were q_R = 3, q_Q = 3, and p = 15. To investigate the effect of Q-factors on the variability of R-factor loading estimates, the salient loading sizes were set equal within each population model. The size of salient loadings in the common R-factor loading matrices $Λ_{R}$ was λ_R Î {0.50, 0.70} and the size of salient loadings in common Q-factor loading matrices $Λ_{Q}$ was λ_Q = 0.90. The non-salient loadings were zero in all population models. According to Equations (1), (3) and (6), the R- and Q-factor loadings were combined in order to generate the observed variables. This can be written as

$X_{RQ} = Λ_{R} f_{R} + Ψ_{R} e_{R} + ({f^{'}}_{Q} {Λ^{'}}_{Q} + {e^{'}}_{Q} Ψ_{Q}) C_{n} .$ (15)

Although the relative effect of R- and Q-factors can be determined by the size of the respective common and unique R- and Q-factor loadings, it is helpful to control for the relative effect of R- and Q-factors more directly by means of

$X_{RQ} = Λ_{R} f_{R} + Ψ_{R} (w_{R} e_{R} + w_{Q} D^{- 0.5} ({f^{'}}_{Q} {Λ^{'}}_{Q} + {e^{'}}_{Q} Ψ_{Q}) C_{n}),$ (16)

with $1 = w_{R}^{2} + w_{Q}^{2}$ and $D = d i a g (({f^{'}}_{Q} {Λ^{'}}_{Q} + {e^{'}}_{Q} Ψ_{Q}) C_{n} {C^{'}}_{n} (Λ_{Q} f_{Q} + Ψ_{Q} e_{Q}))$ , which is needed to standardize the transposed part of the observed variables based on Q-factors. The usual metric of standardized factor loadings was maintained in the population with $I_{p} = d i a g (Λ_{R} {Λ^{'}}_{R} + Ψ_{R}^{2})$ and $I_{n} = d i a g (Λ_{Q} {Λ^{'}}_{Q} + Ψ_{Q}^{2})$ . The observed variables were computed from Equation (16) by means of q_R common factor scores $f_{R}$ , p unique factor scores $e_{R}$ , n/q_Q common factor scores $f_{Q}$ , and n unique factor scores $e_{Q}$ , which were generated from normal distributions with μ = 0 and σ = 1 by the Mersenne twister random number generator integrated in IBM SPSS, Version 26.0.

For $w_{R}^{2} = 1.00$ and $w_{Q}^{2} = 0.00$ Equation (16) yields a conventional R-factor model. For $w_{R}^{2} = 0.50$ and $w_{Q}^{2} = 0.50$ , half of the unique R-factor variance is replaced by common and unique Q-factor variance. Four levels of $w_{R}^{2}$ (1.00, 0.75, 0.50, and 0.25) with the corresponding $w_{Q}^{2}$ were combined with two levels of λ_R and three sample sizes n, which leads to 4 × 2 × 3 = 24 populations, each comprising 2000 samples.

Each set of p observed variables was submitted to R-factor analysis. The dependent variables of the simulation study were the mean and standard deviation of the estimated loadings ${\hat{Λ}}_{R}$ resulting from principal-axis R-factor analysis of the sample data with subsequent orthogonal target-rotation [25] of the estimated R-factor loadings ${\hat{Λ}}_{R}$ towards the R-factor loadings $Λ_{R}$ of the population model based on R- and Q-factors. Therefore, differences between the means of ${\hat{Λ}}_{R}$ and cannot be due to different rotations of the factors.

3.2. Results

The most important result of the simulation study is that the standard deviation of the salient loadings increases with decreasing $w_{R}^{2}$ (Table 2). The results of $w_{R}^{2} = 1.00$ show the standard deviations of the loadings that are only due to sampling error, as rotational variation of loadings was excluded by means of orthogonal target-rotation towards the population loadings. Especially, the results of $w_{R}^{2} = 0.25$ show that the standard deviation of the loading estimates was about twice as large as the variation due to sampling error, when there was a substantial amount of Q-factor variance. This additional loading variation is a bias of the loading estimates as there was no salient loading variation in the population.

In order to show the possible effect of the loading variation (comprising salient and non-salient loadings) on factor identification, a scatterplot of the target-rotated loadings of factors 1 and 2 is presented for λ_R = 0.50 in Figure 1 and for λ_R = 0.70 in Figure 2. Obviously, the overlap of salient and non-salient loadings for samples of n = 300 cases is substantial for λ_R = 0.50 and $w_{R}^{2} = 0.25$ and might be an obstacle for factor identification (Figure 1). In contrast, salient and

Table 2. Mean and standard deviation of target-rotated salient loading estimates of R-factor analysis for λ_R = 0.50, 0.70 for $w_{R}^{2}$ = 0.25, 0.50, 0.75, and 1.00 (n = 300, 600, 900).

Note. Standard deviations are given behind the slash.

Figure 1. Scatterplot of R-factor loading estimates ${\hat{λ}}_{R}$ of factor 1 and 2 based on 2000 samples (n = 300, 600, 900) drawn from populations based on λ_R = 0.50, q_R = 3 R-factors ( $w_{R}^{2}$ = 1.00) and from populations comprising q_R = 3 R- and q_Q = 3 Q-factors ( $w_{R}^{2}$ = 0.25).

non-salient loadings can clearly be separated for n = 300 cases, λ_R = 0.50 and $w_{R}^{2} = 1.00$ or for samples sizes of n = 600 and n = 900. For all conditions based on λ_R = 0.70, the overlap of salient and non-salient loadings was small, indicating that factor identification would be possible (Figure 2). To sum up, when a substantial amount of Q-factor variance is expected, large sample sizes should be analyzed or very large R-factor loadings should be the expected as a basis for successful factor identification.

Figure 2. Scatterplot of R-factor loading estimates ${\hat{λ}}_{R}$ of factor 1 and 2 based on 2000 samples (n = 300, 600, 900) drawn from populations based on λ_R = 0.70, q_R = 3 R-factors ( $w_{R}^{2}$ = 1.00) and from populations comprising q_R = 3 R- and q_Q = 3 Q-factors ( $w_{R}^{2}$ = 0.25).

4. An Indicator of Q-Factor Variance

As R-factor analysis of data from a population based on a relevant amount of Q-factor variance may result in biased R-factor loadings, it is interesting to know whether there is a relevant amount of Q-factor variance in a data set. Note that a population model based on an additive combination of R- and Q-factors implies that a row-centered matrix of individual R-factor scores is combined with a row-and-column-centered matrix of individual Q-factor scores (Equations (15) and (16)). [26] demonstrated that the eigenvalues of R- and Q-factor analysis of a row-and-column-centered matrix are identical, so that a high similarity of eigenvalues should be expected for combined R- and Q-factor models, even when the resulting matrix is not perfectly column-centered. Therefore, Q-factor analysis will yield a number of substantial eigenvalues, even when the data can perfectly be described by R-factor analysis. Thus, the eigenvalues of Q-factor analysis do not inform unambiguously on the amount of Q-factor variance.

It is therefore proposed to consider the bivariate scatterplot of observed variables in order to ascertain whether between-subject variance that could be due to R-factors is combined with a substantial amount of within-subject variance that could be due to Q-factors. Different within-subject profiles that might be caused by q_Q > 1 Q-factors imply that not all differences between two observed z-standardized variables z₁ and z₂ are equal. For q_Q = 2, for example, there could be one group of participants with z₁ − z₂ > 0 and a second group with z₁ − z₂ < 0. It follows that the variance of the z-score differences d, σ_d, is greater zero for q_Q ≥ 2. According to [27] (p. 64) the correlation can be written as

$ρ_{z_{1}, z_{2}} = 1 - σ_{d}^{2} / 2 .$ (17)

As q_Q ≥ 2 implies σ_d > 0, it follows from Equation (17) that $ρ_{z_{1}, z_{2}} < 1$ . An example for n = 145 cases and q_Q = 3 is given for $r_{z_{1}, z_{2}} = 0.40$ in Figure 3 (dots). The concentration of points on three lines is extreme for q_Q = 3, so that the bivariate distribution is quite different from the bivariate distribution for the same correlation and q_Q = 0 (Figure 3, crosses). For q_Q = 0 there is a bivariate normal distribution, which is clearly not the case for q_Q = 3. As the distributions in Figure 3 are not skewed, only tests of the multivariate kurtosis were performed with the macro provided by [28] at α = 0.05. Srivastava’s [29] test for multivariate kurtosis (β₂_,p = 2.26, N(β₂_,p) = −2.59, p < 0.01), Small’s [30] test of multivariate kurtosis (Q₂ = 298.95, df = 2, p < 0.01), and Mardia’s [31] test indicate a significant departure from multivariate normal kurtosis (β₂_,p = 6.36, N(β₂_,p) = −2.47, p < 0.05).

The example shows that a bivariate distribution clearly based on q_Q = 3 may result in a platykurtic departure from the kurtosis of the bivariate normal distribution. Even when different reasons for platykurtic multivariate distributions are possible, tests of the multivariate kurtosis may also indicate that q_Q > 1. Visual inspection of scatterplots may be performed when significant departures from the multivariate normal distribution occur because a pattern with separable clouds of points will provide further evidence for the presence of Q-factors.

In order to investigate the usefulness of tests for the kurtosis of the multivariate normal distribution as indicators for q_Q > 1, tests were performed for q_R = q_Q = 3 and p = 15. The tests were based on 2000 samples with n = 300, 600 and 900 with λ_Q = 0.90, λ_R = 0.50 and 0.70, $w_{R}^{2}$ = 0.10, 0.25, 0.50, 1.00, and α-levels of 0.05, 0.10, and 0.20. As the test is employed in order to evaluate conditions for R-factor analysis, an alpha-level beyond the conventional 0.05-level might be justified. Note that the $w_{R}^{2} = 1.00$ condition is a condition without any effect of Q-factors, so that no detection rate beyond chance level should be expected for this condition. Overall, the highest detection rates for data with substantial Q-facror variance were found for Mardia’s coefficient (see Table 3). However,

Figure 3. Scatterplot of z₁ and z₂, n = 145, q_Q = 3, and $r_{z_{1}, z_{2}} = 0.40$ .

Table 3. Percentage of p-values of tests of kurtosis indicating significant departures from multivariate normality at α = 0.05, 0.10, and 0.20, for n = 300, 600, and 900, p = 15, λ_R = 0.50 and 0.70, λ_Q = 0.90.

for n = 300 and $w_{R}^{2} = 1.00$ the rate of false positives is slightly above chance for Mardia’s coefficient. As the power for the identification of substantial Q-factor variance was sufficiently high for Srivastava’s and Small’s tests without substantial false positives $w_{R}^{2} = 1.00$ , these tests might be recommended.

5. Discussion

As R-factor analysis of variables observed for a large number of individuals is the dominant form of factor analysis in several areas of social sciences, it might happen that R-factor analysis is routinely performed even when the population model comprises R- and Q-factors. For example, in the domain of personality research, it has been assumed that Q-factors or type-factors may be relevant in addition to the well-known R-factors (e.g., [17] [32] [33] ). This leads to the question of whether performing an R-factor analysis of data from a population model comprising R- and Q-factors may result in biased loading estimates. R-factor analysis of data from population models comprising R- and Q-factors was therefore investigated.

It was shown that R-factor analysis of data based on a population model comprising R- and Q-factors leads to biased R-factor loading estimates. For such data R-factor analysis introduces variability into the loading estimates. Thus, when the observed variables have equal R-factor loadings in a population model comprising R- and Q-factors, the loading estimates resulting from R-factor analysis of the observed variables will have variability beyond chance level. This bias of R-factor loading estimates and the variation of R-factor loading estimates beyond chance level were also shown in a simulation study. It was illustrated in the simulation study that the additional loading variability may hamper factor identification. These results show that effects of model error beyond the effect of minor factors [7] may be of relevance for factor analysis. The variability of R-factor loadings beyond chance level caused by Q-factors implies that significance testing of R-factor loadings cannot protect completely against erroneous conclusions when the data are drawn from populations comprising R- and Q-factors. Although the terminology of the present study was based on the distinction of variables and individuals, which is important in the social sciences, the present results are of relevance whenever the common variance of scores is combined with the common variance of transposed scores in a two-way array of scores.

From an applied perspective, the present results imply that the reproducibility of R-factor loadings may not only be hampered by sampling error, insufficient reliability of variables and an insufficient number of variables per factor, but also by the presence of Q-factors. The reproducibility crisis [34] resulted in a stronger focus on statistical power, stronger research designs, preregistration, and replication studies. The present study shows that different forms of model error and, more specifically, Q-factors may also be considered as a reason for insufficient reproducibility of research results when results are based on R-factor analysis.

As the use of R-factor analysis for data drawn from a population based on R- and Q-factors may result in biased R-factor loading estimates, it might be of interest to detect Q-factor variance in observed variables as a prerequisite of R-factor analysis. As eigenvalues of correlation matrices may be ambiguous and because Q-factor variance leads to platykurtic multivariate distributions of observed scores, it was proposed to use tests for the multivariate normality as indicators for Q-factor variance. In a simulation study, Mardia’s test of the multivariate kurtosis was more sensitive for the detection of relevant Q-factor variance than Srivastava’s and Small’s test. However, a slight tendency of false positive results was also found with Mardia’s test so that Srivastava’s and Small’s test might also be recommended. As different reasons are possible for departures of the kurtosis from the kurtosis of the multivariate normal distribution are possible, an inspection of scatterplots is recommended when a test of the multivariate kurtosis of the data is significant. The inspection of scatterplots may be combined with pairwise tests of the bivariate kurtosis in order to eliminate observed variables with substantial Q-factor variance from R-factor analysis. As there might be different reasons for departures of the multivariate kurtosis from multivariate normality, it may be considered to normalize the data (e.g., [35] ) before tests for the departure of multivariate kurtosis from normality are applied because normalization may reduce departures from normality due to outliers and other reasons whereas the Q-factor related scatterplot structure of parallel clouds of points is unlikely to be affected by normalization. The possibility to improve the specificity of tests of multivariate kurtosis for the identification of Q-factor patterns by means of normalization may be investigated in future simulation studies.

To sum up, the present paper is a caveat that R-factor analysis of data from population models comprising R- and Q-factors will result in biased R-factor loading estimates. The bias is due to the fact that the model of R-factor analysis does not correspond exactly to the population model comprising R- and Q-factors. Tests of the multivariate kurtosis might be used for the detection of Q-factor variance as a prerequisite for R-factor analysis. Further research should compare the effect of model error due to Q-factor variance on the results of R-factor analysis with the effect of model error based on minor factors as has been discussed by [7] . Another avenue of future research would be the investigation of the combined effect of R- and Q-factors in the context of parallel factor analysis (PARAFAC, [36] ), where several two-way arrays of data are analyzed. It might be interesting to enter a two-way array for R-factor analysis as well as its transposition into PARAFAC in order to investigate whether a simultaneous estimation of R- and Q-factors allows for reduction bias of factor loadings. The combined effect of both types of model error based on minor factors and model error based on Q-factor variance might be investigated in future research as it may occur in real data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Mulaik, S.A. (2009) Foundations of Factor Analysis. 2nd Edition, Chapman & Hall, Boca Raton. https://doi.org/10.1201/b15851
[2]	Harman, H.H. (1976) Modern Factor Analysis. 3rd Edition, The University of Chicago Press, Chicago.
[3]	Flora, D.B. and Curran, P.J. (2004) An Empirical Evaluation of Alternative Methods of Estimation for Confirmatory Factor Analysis with Ordinal Data. Psychological Methods, 9, 466-491. https://doi.org/10.1037/1082-989X.9.4.466
[4]	Muthén, B. and Asparouhov, T. (2012) Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory. Psychological Methods, 17, 313-335. https://doi.org/10.1037/a0026802
[5]	Wirth, R.J. and Edwards, M.C. (2007) Item Factor Analysis: Current Approaches and Future Directions. Psychological Methods, 12, 58-79. https://doi.org/10.1037/1082-989X.12.1.58
[6]	MacCallum, R.C. and Tucker, L.R (1991) Representing Sources of Error in the Common Factor Model: Implications for Theory and Practice. Psychological Bulletin, 109, 502-511. https://doi.org/10.1037/0033-2909.109.3.502
[7]	MacCallum, R.C. (2003) 2001 Presidential Address: Working with Imperfect Models. Multivariate Behavioral Research, 38, 113-139. https://doi.org/10.1207/S15327906MBR3801_5
[8]	de Winter, J.C.F., Dodou, D. and Wieringa, P.A. (2009) Exploratory Factor Analysis with Small Sample Sizes. Multivariate Behavioral Research, 44, 147-181. https://doi.org/10.1080/00273170902794206
[9]	Ramlo, S. and Newman, I. (2010) Classifying Individuals Using Q Methodology and Q Factor Analysis: Applications of Two Mixed Methodologies for Program Evaluation. Journal of Research in Education, 21, 20-31.
[10]	Broverman, D.M. (1961) Effects of Score Transformations on Q and R Factor Analysis Techniques. Psychological Review, 68, 68-80. https://doi.org/10.1037/h0044580
[11]	Akhtar-Danesh, N. (2016) An Overview of the Statistical Techniques in Q Methodology: Is There a Better Way of Doing Q Analysis? Operant Subjectivity: The International Journal of Q Methodology, 38, 29-36. https://doi.org/10.22488/okstate.17.100553
[12]	Ramlo, S. (2016) Centroid and Theoretical Rotation: Justification for Their Use in Q Methodology Research. Mid-Western Educational Researcher, 28, 73-92.
[13]	Cadman, T., Belsky, J. and Pasco Fearon, R.M. (2018) The Brief Attachment Scale (BAS-16): A Short Measure of Infant Attachment. Child Care Health Development, 244, 766-775. https://doi.org/10.1111/cch.12599
[14]	Burt, C. and Stephenson, W. (1939) Alternative Views on Correlations between Persons. Psychometrika, 4, 269-281. https://doi.org/10.1007/BF02287939
[15]	Cattell, R.B. (1952) The Three Basic Factor Analytic Research Designs: Their Intercorrelations and Derivatives. Psychological Bulletin, 49, 499-520. https://doi.org/10.1037/h0054245
[16]	Box, G.E.P. (1979) Some Problems of Statistics and Everyday Life. Journal of the American Statistical Association, 74, 1-4. https://doi.org/10.1080/01621459.1979.10481600
[17]	Gerlach, M., Farb, B., Revelle, W. and Amaral, L.A.N. (2018) A Robust Data-Driven Approach Identifies Four Personality Types across Four Large Data Sets. Nature Human Behaviour, 2, 735-742. https://doi.org/10.1038/s41562-018-0419-z
[18]	Freudenstein, J.P., Strauch, C., Mussel, P. and Ziegler, M. (2019) Four Personality Types May Be Neither Robust Nor Exhaustive. Nature Human Behaviour, 3, 1045-1046. https://doi.org/10.1038/s41562-019-0721-4
[19]	Asendorpf, J.B. (2003) Head-to-Head Comparison of the Predictive Validity of Personality Types and Dimensions. European Journal of Personality, 17, 327-346. https://doi.org/10.1002/per.492
[20]	Lazarsfeld, P.F. and Henry, N.W. (1968) Latent Structure Analysis. Houghton Mifflin, Boston.
[21]	Lubke, G.H. and Muthén, B.O. (2005) Investigating Population Heterogeneity with Factor Mixture Models. Psychological Methods, 10, 21-39. https://doi.org/10.1037/1082-989X.10.1.21
[22]	Aja-Fernández, S. and Vegas-Sánchez-Ferrero, G. (2016) Statistical Analysis of Noise in MRI. Springer International, Basingstoke. https://doi.org/10.1007/978-3-319-39934-8
[23]	Weisstein, E.W. (2021) Normal Product Distribution. MathWorld—A Wolfram Web Resource. https://mathworld.wolfram.com/NormalProductDistribution.html
[24]	Magnus, J.R. and Neudecker, H. (2007) Matrix Differential Calculus with Applications in Statistics and Econometrics. 3rd Edition, John Wiley & Sons, New York.
[25]	Schönemann, P.H. (1966) A Generalized Solution of the Orthogonal Procrustes Problem. Psychometrika, 31, 1-10. https://doi.org/10.1007/BF02289451
[26]	Burt, C. (1937) Correlations between Persons. British Journal of Psychology, 28, 59-96. https://doi.org/10.1111/j.2044-8295.1937.tb00862.x
[27]	Rodgers, J.L. and Nicewander, W.A. (1988) Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42, 59-66. https://doi.org/10.1080/00031305.1988.10475524
[28]	DeCarlo, L.T. (1997) On the Meaning and Use of Kurtosis. Psychological Methods, 2, 292-307. https://doi.org/10.1037/1082-989X.2.3.292
[29]	Srivastava, M.S. (1984) A Measure of Skewness and Kurtosis and a Graphical Method for Assessing Multivariate Normality. Statistics and Probability Letters, 2, 263-267. https://doi.org/10.1016/0167-7152(84)90062-2
[30]	Small, N.J.H. (1980) Marginal Skewness and Kurtosis in Testing Multivariate Normality. Applied Statistics, 29, 85-87. https://doi.org/10.2307/2346414
[31]	Mardia, K.V. (1970) Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika, 57, 519-530. https://doi.org/10.1093/biomet/57.3.519
[32]	Gilbert, K., Whalen, D.J., Jackson, J.J., Tillman, R., Barch, D.M. and Luby, J.L. (2021) Thin Slice Derived Personality Types Predict Longitudinal Symptom Trajectories. Personality Disorders: Theory, Research, and Treatment, 12, 275-285. https://doi.org/10.1037/per0000455
[33]	Ramos, R.I.A., Mata, R.R.M. and Nacar, R.C. (2021) Mediating Effect of Ethical Climate on the Relationship of Personality Types and Employees Mindfulness. Linguistics and Culture Review, 5, 1480-1494. https://doi.org/10.21744/lingcure.v5nS1.1722
[34]	Open Science Collaboration (2015) Estimating the Reproducibility of Psychological Science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716
[35]	Blom, G. (1954) Transformations of the Binomial, Negative Binomial, Poisson and χ2 Distributions. Biometrika, 43, 235. https://doi.org/10.1093/biomet/43.1-2.235
[36]	Harshman, R.A. and Lundy, M.E. (1994) Parafac: Parallel Factor Analysis. Computational Statistics and Data Analysis, 18, 39-72. https://doi.org/10.1016/0167-9473(94)90132-5

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies