Alabama Parenting Questionnaire—Short Form (APQ-9): Evidencing Construct Validity with Factor Analysis, CFA MTMM and Measurement Invariance in a Greek Sample

This study focused on the factor structure, measurement invariance, reliability, and validity of the Greek version of APQ-9 in a sample of 621 parents of children aged 7 13 years. The factor structure was examined first with EFA in the 30% subsample and CFA in the rest 70%. Power analysis indicated adequate CFA sample power at 80% probability of rejecting a false null hypothesis. The original structure of APQ-9 was verified. Full measurement invariance was also examined across child gender to a strict level. Convergent and discriminant validity of APQ-9 parenting practices were evaluated by the CFA MTMM framework with a model of three traits and three methods. Convergent and discriminant validity was also evaluated further with correlation analysis. A consistent pattern of correlations emerged by examining five parenting measures with 13 dimensions of parenting. APQ-9 has also adequate internal consistency and factor-based reliability and validity (α, ω, and AVE).


Introduction
Social and developmental psychology postulates a relationship between both the quality and consistency of parenting practices and psychological adjustment of offspring (Baumrind, 1967;Dadds, Maujean, & Fraser, 2003;Pickering & Sand-(explaining 26.31% of the total variance) were highly correlated with their corresponding APQ-42 scale, r = 0.89 (Positive Parenting), r = 0.90 (Inconsistent Discipline) and r = 0.76 (Poor Supervision (ps < 0.01). The item reduction from 42 to 9 was 78.57% (Elgar et al., 2007). The test developers estimated that APQ-9 could be completed in one-fifth of the time in comparison to APQ-42 (<1 minute).
Subsequently, criterion validity and psychometric properties of this shortened version were examined in an independent sample of parents from Canada (1296 mothers and 745 fathers). In this study, the developers of APQ-9 evaluated the validity in differentiating parents of children with behavior disorders and parents of children without behavior disorders. The Conners Parent Rating Scale-Revised (CPRS-R; Conners, Sitarenios, Parker, & Epstein, 1998) was used to evaluate criterion validity. CPRS-R is an 80-item measure of behavioral problems in children of 3 to 17 years. The 3-factor structure emerging in the first study was confirmed with Confirmatory Factor Analysis separately for mothers and fathers with good model fit for mothers, (CFI) = 0.99, NFI = 0.98 and fathers CFI = 0.99, NFI = 0.98. Factor Loadings ranged from 0.52. -0.82 for mothers and 0.46 -0.90 for fathers. Factor intercorrelations ranged from −0.24 to 0.30 for mothers and −0.21 to 0.29 for fathers (Elgar et al., 2007). In a later study, the validity of the short-scale was further supported by correlations between parenting practices and child symptoms to a sample of 133 parents (90.98% mothers) of 5-to 18-year-old children (Elgar et al., 2007).
Internal consistency reliability of the APQ-9 factors ranged from 0.59 -0.79 for mothers and 0.63 -0.84 for fathers. The internal consistency of the APQ in the third sample was moderate, ranging from α = 0.57 (Positive Parenting) to α = 0.62 (Inconsistent Discipline). Reliability per age varied for children aged 4 to 9 years, mean α = 0.44; for children aged 5 to 12 years, α = 0.59 to 0.84 and for children aged 5 to 18 years, α = 0.57 to 0.61 (Elgar et al., 2007 as summarized by Gross et al., 2015). Later, Gross et al. (2015) examined the longitudinal invariance of the APQ-9 for parents and youngsters, and the multigroup invariance between parents and adolescents during their transition from middle school to high school.

The Present Study
The purpose of this study is to examine the factor structure of APQ-9 using EFA and CFA in a Greek sample of parents of the general population with children from 7 -13 years. To this end, the study had also the following goals: 1) to evaluate measurement invariance across child gender; 2) to build evidence of convergent and discriminant validity of APQ-9 based on the CFA Multitrait-Multimethod method (CFA MTMM); 3) to reinforce convergent and discriminant validity with correlation analysis; 4) to evaluate internal consistency reliability (with α), model-based reliability (with ω), model-based convergent validity (with AVE) and finally, 5) to calculate normative data for the mean factor scores.

Measures
Alabama Parenting Questionnaire-Short Form (APQ-9, Elgar et al., 2007) This nine-item short form of the original APQ-42 (Frick, 1991;Shelton et al., 1996;Frick et al., 1999) is designed to assess parenting practices related to disruptive behaviors (Shelton et al., 1996). It was shortened for faster assessment (Gross et al., 2015). APQ-9 items (e.g. You threaten to punish your child and then do not actually punish him/her) are rated on a 5 point Likert Scale (1 = never; 2 = almost never; 3 = sometimes; 4 = often; 5 = always). Higher scores indicate higher ratings of the measured parenting practice (i.e. Positive Parenting, Inconsistent Discipline, Poor Supervision).
APQ-9 Translation procedure. APQ-9 was translated in Greek using the translation-back-translation method (Brislin, 1970). First, it was translated in Greek by the first author. Back-translation to English followed by a bilingual psychologist, not familiar with the English version. All items of the original English and the back-translated version went through an iterative process of transla-tion/ back-translation (3 times) to eliminate differences or ambiguities before the final version.

Procedure
Data were collected with the assistance of psychology students. Specifically, about 100 students forwarded a link of the study to at least 5 parents in their social environment (M = 6.21), inviting them to participate in the study. During the data collection, all parents the students recruited, first read a digital description of the study, accepting an inform consent. Then they specified a personal code to ensure anonymity. Students received extra credit for carrying out the recruitment process.
The EFA subsample was 30% and the CFA subsample was 70%. A CFA followed the EFA. After CFA, additional analyses were performed in the optimal CFA model: 1) full measurement invariance to the strict level (highest possible, Wang & Wang, 2012); 2) Internal consistency reliability using Cronbach's alpha coefficient (1951) and model-based reliability (Mair, 2018;Sha & Ackerman, 2018) using Bollen's Omega (Bollen, 1980; see also Raykov, 2001) Bentler's Omega, (Bentler, 1972), and McDonald's Omega (1999, ω t ,) and 3) model-based convergent validity with Average Variance Extracted (AVE; Fornell & Larcker, 1981). To test convergent validity, discriminant validity related to facets of APQ perceived parenting practices a comparison of nested CFA models was carried out within the CFA Multitrait-Multimethod framework (CFA MTMM; Widaman, 1985; an original non-CFA method by Campbell & Fiske, 1959). Convergent and discriminant validity were examined further by correlation analysis using five parenting measures with 13 different scales. Finally, descriptive statistics and normative data were calculated based on factor means for easier comparisons of the scales to APQ scales of different length.

Results
Data contained no missing values because all the fields of the digital test-battery were set as "required" to eliminate non-response. Twenty-six out of 621 cases were identified as multivariate outliers, with scores exceeding the critical value χ 2 [9] = 27.88, p < 0.001 for Mahalanobis distance (Mahalanobis, 1936;Tabachnick & Fidell, 2013). However, outliers did not alter results so they were included in the dataset. The final sample was N = 621 cases. The sample was randomly split in two subsamples (n EFA = 187 and n CFA = 434). The cases to measured variables ratios for n EFA and n CFA (Costello & Osborne, 2005;Ullman, 2013) were 22.78 and 48.22 respectively. The cases to estimated parameters ratio (see Schumacker & Lomax, 2016) for the hypothesized CFA model (Elgar et al., 2007) was 9.64.
The multivariate normality tests were significant, p < 0.001 for all samples (Total, EFA and CFA) as presented in Table 1.
Factors were extracted with Principal Axis Factoring and oblique rotation (Oblimin). The number of factors to retain was determined with the following methods: the scree plot (Cattell, 1966), Parallel Analysis (PA; Horn, 1965), Very Simple Structure (VSS; Revelle & Rocklin, 1979), Minimum Average Partial Correlations (MAP; Velicer, 1976), and the goodness of model fit. Model fit was evaluated with the Root Mean Square Error of Approximation (RMSEA; Note. All univariate and multivariate normality tests were significant at p < 0.001 level. Steiger & Lind, 1980), Root Mean Square of Residuals (RMSR), Comparative Fit Index (CFI; Bentler, 1990), Tucker-Lewis Index (TLI; Tucker & Lewis, 1973) and Bayesian information criterion (BIC; Schwartz, 1978). Fit criteria (Hu & Bentler, 1999;Browne & Cudeck, 1993) were RMSEA ≤ 0.06 [90% Confidence Intervals ≤ 0.06], RMSR ≤ 0.0448 (Kelley's criterion;Kelley, 1935;Harman, 1962; Lorezo-Seva & Ferrando, 2013) CFI and TLI ≥ 0.95, and lowest possible BIC PA (see Figure 1) suggested three factors. VSS complexity 1 achieved a maximum of 0.72 with 2 factors and complexity 2 achieved a maximum of 0.81 with 4 factors. MAP achieved a minimum of 0.05 with 1 factor. BIC reached a minimum with 3 factors and Sample Size adjusted BIC achieved a minimum with 4 factors. Taking into account the joined findings of the above methods, 3 factors were extracted (total explained variance of 65.11%). The Extraction Sums of Squared Loadings suggested that the first factor explained 35.44% of the variance, the second 19.11% of the variance, and the third factor 10.56% of the variance with communalities > 0.30. The fit of this model was adequate, RMSR = 0.03, TLI = 0.923, RMSEA = 0.072 [90% CI 0.021, 0.112] and BIC = −40.09. Regarding item allocation to the extracted factors, items 1, 6 and 7 loaded on the first factor (Positive Parenting) with loadings ranging from 0.513 to 0.862, items 2, 4, and 9 loaded on the second factor (Inconsistent Disciple), with loadings from 0.465 to 0.767. Items 3, 5, 8 loaded on the third factor (Poor Supervision) with loadings ranging from 0.640 to 0.777. Table 2 contains the APQ-9 factor loadings above 0.30 and factor inter-correlations (also presented in Figure 2).
Three models were tested: (A) a single-factor model with all nine items in a T. A. Kyriazos, A. Stalikas single factor to test the maximum parsimony hypothesis (Brown, 2015); (B) a first-order, Independent Cluster Model (ICM-CFA; Marsh et al., 2014;Howard et al., 2016) with two correlated factors examined (but not proposed) by Elgar et al., (2007). This model had the original PP factor and a second factor with all the non-positive-parenting items (2,4,9,3,5,8) Table 3 and the path of this optimal model in Figure 3.
A second-order 3-factor Bifactor model (Harman, 1976;Holzinger & Swineford, 1937) was also tested but it failed to converge. This model had PP, ID and PS items in three specific factors tapping simultaneously in a general factor.

Measurement Invariance
The configural, weak, strong and strict full measurement invariance were evaluated across the gender of the child, the 621 parents had completed the APQ-9 for. The nested models were compared using the cutoffs of ΔCFI ≤ 0.01 (Cheung & Rensvold, 2002;Chen, 2007) and ΔRMSEA ≤ 0.015 (Chen, 2007). The 3-factor optimal solution was tested separately for each child-gender (Table 4). These models showed an adequate fit both for girls (N = 337) and for boys (N = 284).
Nested invariance models (1 -4) also fit the data well (Table 5). The weak to configural model comparison and the strong to weak model comparison yielded ΔCFIs and ΔRMSEAs below the cutoffs of non-invariance. However, in the strict to strong model comparison, only the ΔRMSEA cutoff supported invariance.

Convergent and Discriminant Validity with Correlation Analysis
The validation measures were arranged in two groups: Positive and Non-Positive Parenting Practices (Table 10) Table 10. Bivariate correlations of APQ-9 with validation scales. with the scales of Non-Positive Parenting Practices Group, from r S (619) = −0.08, ns (PBDQ Anxious Intrusiveness) to r S (619) = 0.23, p < 0.01 (PBDQ Punitive Discipline). All correlations are presented in Table 10.  Table 11 and the measured variables means were presented in Table 1.

Descriptive Statistics and Normative Data
Regarding the correlations of the APQ-9 factors, the correlation of PP with ID was r S (619) = 0.01, ns. The correlation of PP with PS was r S (619) = −0.23, p < 0.01. Finally, the correlation of ID with PS was r S (619) = −0.20, p < 0.01.

Discussion
The purpose of this study was to evaluate the factor structure of APQ-9 in a Greek sample of the general population with EFA and CFA. The aim of the study was also: 1) to examine measurement invariance; 2) to evaluate convergent and discriminant validity of APQ-9 based on CFA Multitrait Multimethod Matrix (CFA MTMM); 3) to examine convergent and discriminant validity further with correlation analysis; 4) to estimate internal consistency (with coefficient alpha Cronbach, 1951), model-based reliability (with coefficient omega, McDonald, 1999McDonald, , 1970, and model-based convergent validity (using Average Variance Psychology Extracted/AVE, Fornell & Larcker, 1981), finally 5) to calculate normative data for the mean factor scores. The sample was recruited using a variation of the network sampling method (APA, 2014), with the difference that those who recruited volunteers did not participate in the sample themselves. The sample was randomly divided into two subsamples. EFA was carried out in the first subsample and CFA followed in the second one. Sample-splitting (Guadagnoli & Velicer, 1988;MacCallum, Browne, & Sugawara, 1996) is considered a construct validity cross-validation method (Byrne, 2012;Brown, 2015; see also Kyriazos, 2018aKyriazos, , 2018b. Sample to measured variables ratios was higher than the proposed minimums for both the EFA (Costello & Osborne, 2005) and the CFA subsample (Bentler & Chou, 1987;Bollen, 1989). The CFA sample to estimated parameters ratio was also higher than the proposed minimums of adequacy (Kline, 2016). A post hoc estimation of CFA sample power (Wang, Watts, Anderson, & Little, 2013) suggested that sample size was larger than the proposed CFA sample at 80% probability level for rejecting a false null hypothesis (Cohen, 1988(Cohen, , 1992. Moving to research findings, EFA factorability of the correlation matrix was evaluated with multiple methods and they suggested satisfactory factorability. The three factors were extracted with Principal Axis Factoring method and an oblique rotation because of the APQ-9 factor correlations. The number of factors to retain was three. The fit of this 3-factor model was good using multiple fit indicators (Brown, 2015). Communalities suggested that the shared common variance of the items was adequate. All the factor loadings were good forming three robust factors (Positive Parenting, Inconsistent Discipline, and Poor Supervision) with no cross-loadings. This EFA solution verified the structure originally proposed both by Elgar et al. (2007) subsequently by Gross et al. (2015) in a longitudinal study.
CFA followed in the second subsample with the evaluation of three alternative models. The fit was evaluated adopting the multiple assessment approaches (Bentler & Bonett, 1980), for more conservative results (Brown, 2015). Apart from the commonly accepted goodness of fit statistics, the chi-square/df ratio was calculated, although it received criticism (e.g. Kline, 2016) because its inclusion is a common practice. All chi-square-based criteria used were interpreted in tandem with the rest fit indicators as a result of chi-square over-sensitivity to samples n > 200 (Little, 2013;see Kyriazos, 2018b). A CFA Bifactor model (Harman, 1976;Holzinger & Swineford, 1937) was also specified. Generally, testing a Bi-T. A. Kyriazos, A. Stalikas factor structure is considered good practice (Hammer & Toland, 2016). Unfortunately, the Bifactor model failed to converge and it lacked a theoretical background to attempt troubleshooting the convergence problem with recommended solutions (Byrne, 2012;Heck & Thomas, 2015). We could not test a higher-order model either, because of the inherent under-identification problems for m ≤ 3 (e.g. Wang & Wang, 2012). After examining the combined evidence of model fit, factor loadings and factor inter-correlations, the 3-factor model with correlated factors was the optimal solution. This finding confirmed both the preceding EFA model and the structures proposed in the literature (Elgar et al., 2007;Gross et al., 2015). The factor loadings and inter-correlations of this optimal 3-factor solution were satisfactory and comparable to those of the APQ-9 model propose by Elgar et al. (2007). Additionally, three factors are consistent for APQ-42 validation studies (Hinshaw et al., 2000;Randolph & Radey, 2011;Zlomke et al., 2014;Molinuevo et al., 2011), except for Robert (2009) Maguin et al., 2016). However, interpreting these results is complicated by the variation of the allocation of the measured variables to factors (Maguin et al., 2016;Esposito et al., 2016).
APQ-9 measurement invariance across child gender was evaluated in the total sample using the three-factor model as a baseline model. Full invariance was examined to the strict level, i.e. the strictest possible measurement invariance level (Wang & Wang, 2012). The comparison of the nested models showed that configural, Weak and Strong invariance were fully supported and Strict invariance was partially supported. Actually, this level is often hard to establish in practice (Timmons, 2010). Thus, factor structure factor loadings and indicator means can be safely compared between parents that either care for a girl or a boy. However, indicator residuals comparisons between parents of girls and parents of boys must be made cautiously. Generally, the heterogeneity of the existing studies, along with the lack of reported results details blur the assessments of invariance across samples (Maguin et al., 2016) and family types (Adams, 2015).
Convergent and discriminant validity of APQ-9 parenting practices were evaluated with the CFA Multitrait-Multimethod method (Widaman, 1985), using three traits and 3 methods. Findings suggested strong tenability for the traits convergent and discriminant validity, and less strong for methods discriminant validity, as expected based on methods used. Convergent and discriminant validity were also examined with correlations of APQ-9 with five validity measures having 13 dimensions were examined. The validity measures were arranged in two broad categories: 1) Positive parenting practices and 2) Negative parenting practices. A fairly consistent pattern or relationships emerged for all three APQ-9 factors, in agreement with the existing literature (Elgar et al., 2007;Gross et al., 2015 andDadds et al., 2003 for the original APQ). As expected, APQ-9 Positive Parenting Scale consistently showed almost the opposite pattern of relationships, in comparison to the pattern of relationships of Inconsistent Discipline and Poor Supervision Scales. Almost all relationships were statistically significant with low to moderate magnitude, abiding by the criteria specified by Cohen (1988Cohen ( , 1992. The strength of associations is discussed in parenting literature (e.g. Seabridge, 2012;Hershkowitz et al., 2017;Burlaka et al., 2017).
Lastly, given the violation of the normality assumption, percentiles, factor means, and item means were also calculated. The findings were also comparable to the values of the original APQ-9 (Elgar et al., 2007). Future research directions could include the comparison of different models for mothers and fathers, measurement invariance in other demographics like parent age, or gender. Longitudinal measurement invariance could be also tested to replicate Gross et al., (2015) findings. The present solution could be examined in children older than 13 years. Additionally, multi-cultural studies are necessary to assess measurement invariance further. Likewise, assessments of invariance under demographic variation are also needed (Maguin et al., 2016).
Finally, the sample size didn't allow the full implementation of the 3-faced construct validation method (Kyriazos, 2018a;Kyriazos, Stalikas, Prassa, & Yotsidi, 2018). Anyhow, the findings of this study-in line with literature demands for shorter assessment (Scott, Briskman, & Dadds, 2011;Gross et al., 2015)-make the use of APQ-9 more reliable for use in future parenting interventions in Greece and provide normative data for professionals.