Is There a Universal Cognitive Diagnosis Related to the Tails of Intelligence Distribution?

The diagnosis of giftedness or intellectual deficit is related to the extremes of IQ distribution. The universality of these diagnoses will depend on whether linguistic/cultural differences or Greenwich IQ (mean differences between nations) affect individual IQ rankings when different norms for the same cognitive measure are used. We verified both perspectives through two studies of the Wechsler Intelligence for Children—Fourth Edition, the most traditional intelligence scale. Methods: Study 1 analyzed American and Brazilian norms from the WISC-IV, and study 2 analyzed forty clinical protocols of children referred for cognitive assessment with the same scale. Results: Differences between WISC-IV American and Brazilian norms (study 1) impacted the IQ classification in the analyzed protocols (study 2). Using Brazilian norms, eight cases of high IQ (giftedness), and two cases of low IQ (intellectual deficit) were identified. Using American norms, one and six cases of high and low IQ respectively were estimated. The cognitive differences were more intense on the Verbal Comprehension Index. Conclusion: A broad academic discussion regarding the universality of cognitive diagnoses is urgent, especially in a time of increasing individuals’ geographical mobility, and increasing demand for cross-cultural cognitive assessment.


Introduction
The international Latin American student mobility has increased from 0.223% to an average of 0.297% (Colombia, Brazil, and Chile) from 2010 and 2018 (OECD, 2020). United States of America (USA) is one of the top ten countries that receives a largest number of international migrants, and presents significant increasing of international students (Curtis, 2020). Thus, it is not surprising that when considering this student mobility, a reliable assessment of the student's cognitive potential is important for effective education, and supplemental education (if applicable).
Two kinds of clinical diagnosis related to the IQ scores distribution, both with considerable effects on education, are well known in applied psychology. These clinical diagnoses are intellectual development disorder (IQ < 70) and intellectual giftedness (IQ > 130). IQ scores below 70 or above 130 represent the extreme ends of the intelligence spectrum, when an intelligence test is administered to a nationally representative sample. Statistically, 2.27% of the population is considered "gifted", while 2.27% as "intellectually delayed". The 2.27% of people at each end of the intelligence spectrum would lead us to think that the larger the size of a country's population, the greater the number of people with intellectual delay or giftedness. This reasoning would be correct if the IQ average of 100 is equivalent for all nations. However, if an IQ score of 100 on a test implies a raw score of 50 points for country 1 and 60 points for country 2, there would be differences in the classification of giftedness and intellectual deficit in both countries. In other words, people that are considered gifted in country 1 would not be measured as gifted in country 2. Furthermore, people considered intellectually delayed in country 2 would not be classified as intellectually delayed in country 1. This distortion would affect not only diagnoses made by psychologists and psychiatrists, but it would also affect educational and psychotherapeutic treatments.

The International Test Commission (2018), and the American Educational
Research Association, American Psychological Association, and National Council on Measurement in Education (2014), organizations that promotes policies and guidelines for effective testing and assessment, have provided guidelines to ensure test fairness and score comparability among culturally diverse populations. They recommend that if score differences across diverse linguistic groups are non-comparable, i.e., there are differences between two or more representative groups of their populations for the same test, the adapting test procedures should be revised or a linguistic/cultural investigation should be conducted. Therefore, such organizations consider an equivalence of psychological traits between populations, and biases such as no familiarity with test-taking skills, lack of motivation, little or no test practice, and linguistic peculiarities are underlying the differences between groups. It is a position defended for several researchers (Mushquash & Bova, 2007;Helms-Lorenz, Van de Vijver, & Poortinga, 2003), especially for the Wechsler scales (Funes, Roriguez, & Lopez, 2016;Sánchez-Escobedo, Hollingworth, & Fina, 2011).
Alternatively, differential psychology proposes another position. Mean IQs for all nations were provided by Lynn & Vanhanen (2002) and updated in subsequent publications, of which the most recent was published by Lynn & Becker (2019). In these studies, national IQs are calculated in relation to a British mean of 100 and a standard deviation of 15. This has been called the Greenwich IQ, analogous to Greenwich Mean Time. The USA has a mean IQ of 98, and Latin America 86.4 (South America a mean IQ of 88-unweighted, based on means from samples of all studies found without consideration of data quality and sample size). Specifically, Brazil has a mean IQ of 87, mostly based on a nonverbal test, such as the Raven Matrices Progressives. The mean difference between Brazil and USA would be 11 points. The Greenwich IQ has been used in a considerable number of studies obtaining evidence of validity (Lynn & Becker, 2019;Rindermann, 2007).
Following the perspective of the Greenwich IQ, differences in average IQ between nations will be reflected in the standardization of intelligence tests, i.e., the IQ scale will have different psychological meaning for different countries. For instance, a Latin-American study (Duggan, Awakon, Loaiza, & Garcia-Barrera, 2019) conducted on a sample of 305 highly educated Colombian corporate executives, made a comparison in the WAIS-IV norms from Colombia, Chile, Mexico, Spain, United States, and Canada. One of their results is relevant for the present paper. The mean IQ of the Colombian sample was 120.3 (superior intelligence), but it was 112.7 (average intelligence) according to USA norms, a difference of 7.6 IQ points. Moreover, only 41.3% of the Colombian sample were within the same classification of USA norms for Full Scale IQ. The authors concluded there is an American-European assessment bias in cross-cultural test standardizations. Duggan et al. (2019) have the merit in presenting results, for the first time, regarding Latin American mean IQ differences through several norms of the Wechsler scales. However, is not clear as to why there was the highest agreement between the classifications Colombia/USA was observed in the Verbal Comprehension Index (52%), supposedly an index more sensitive to cultural/linguistic influence. Conversely, there was the lowest agreement was Working Memory Index (10%), supposedly ass sensitive index to environment influence.
To examine the relevance of both perspectives (cultural/linguistic vs Greenwich IQ) for IQ classifications in children, we compared norms from the Wechsler Intelligence Scale for Children, fourth edition (WISC-IV) between the USA and Brazil (study 1) and, we analyzed the impact of these differences on Brazilian diagnoses of giftedness and intellectual delay (study 2).
Our hypotheses are: 1) If the Greenwich IQ perspective is valid, mean IQ differences will be noted between the WISC-IV's Brazilian and American norms, independent of the measured cognitive ability. There would be a universal IQ mean, and local cognitive diagnoses regarding giftedness or intellectual delay would not be reliable, even using appropriate standardizations; 2) If the cultural/linguistic perspective is valid, differences between Brazilian and American norms will specially be concentrated on the WISC-IV verbal subtests (VCI). There would not be a universal IQ mean and cognitive diagnoses of giftedness or intellectual delay would be reliable only in specific social contexts where standardizations of the test were conducted. Psychology

Brazilian and American WISC-IV Norms
The WISC-IV is the fourth version of the Wechsler Intelligence Scale for Children, an individually administered instrument for assessing a child's cognitive abilities. The USA norms (Wechsler, 2003), were based on a sample of 2200 children, aged between 6 years 0 months and 16 years 11 months from the Northeast (17%), South (34%), Midwest (24%) and West region (24%). The date of data collection was not provided, but it was evidently between 2000 (the 2000 American census was used for selecting the standardization sample) and 2002 (the test was published in 2003). The Brazilian norms (Wechsler, 2013) were based on a sample of 1860 schoolchildren enrolled in public and private schools from eight Brazilian states (69.2% Southeast region, 24.9% South region, and 5.9% Northeast region). The data was collected between February and November 2010. The subtests (and items) are the same in the Brazilian and American versions except for a slight modification in one item, found in "Similarities". Item 10 was modified ("frown and smile" in the USA version were replaced by "laughing and crying" in the Brazilian version). In the present study only the core 10 subtests were analyzed (Block Design, Similarities, Digit Span, Picture Concept, Coding, Vocabulary, Letter-Number Sequences, Matrix Reasoning, Comprehension, and Symbol Search). The time to administer the 10 core subtests that yield the Full-Scale IQ and index factor scores average between 70 to 80 minutes.

Analysis
The differences in the scores for the Brazilian and American samples were calculated at four ages (6, 10, 14, and 16 year olds). For each age, there are data for every three months of age (e.g., 6.0 -6.3, 6.4 -6.7, and 6.8 -6.11). A total of 24 age-norms were analyzed, 12 per country.
A positive difference between the raw scores from USA and Brazil indicates that a higher raw score was required by Americans than by Brazilians to achieve the same scaled score. A negative difference stated the contrary, i.e., a higher raw score was required by Brazilians than by Americans to achieve the same scaled score. For instance, considering norms for the age of 6.0 -6.3 years, a scaled score of 4 in Block Design require 3 points for Americans while 1 point is required for Brazilians. In this case, the difference is +2 points. In Matrix Reasoning, a scaled score of 17 requires 33 points for Americans while 34 points are required for Brazilians. In this case, the average difference would be −1 point.
However, most of the scaled scores are derived from a range of raw scores. In these cases, the mean difference was calculated only for cases where the lowest raw score for country 1 was higher to the highest raw score for country 2, or vice-versa. For example, if the raw score range for country 1 was 28 -30, and the raw score range for country 2 was 25 -27, the mean difference was calculated (29 county 1 -26 country 2 = 3 points). In this case, note that the lowest score required in country 1 was 28. It is higher than the highest score required in country 2, which was 27. If the range of scores was the contrary (25 -27 country 1, 28 -30 country 2), the difference would be the same, but negative (−3 points). Otherwise, differences were not calculated (e.g. 28 -30 country 1, 26 -30 country 2 or 14 -16 country 1, 14 -15 country 2).
Finally, the Brazilian raw score equivalent to a scaled score of 10 for each subtest was converted to USA scaled scores. Note that a scaled score of 10 represents the average cognitive performance. Thus, a summed scaled score of 30 for VCI and for PRI, and 20 for WMI and for PSI is necessary to achieve an IQ of 100. Subsequently, the "new" scaled scores (in fact, American scaled scores) were summed for each index and converted to IQ according to the USA norms.

Results
Differences in average raw score from each WISC-IV subset based on the American and Brazilian norms are shown in Figure 1. Note that differences were obtained based on the scaled score (1 to 19) for each subtest and each of the 24 norms-age. Figure 1 shows the positive differences from all WISC-IV subsets, which means that a higher raw score is required for Americans than for Brazilians to reach the same scaled score. As a consequence, for any raw score, a higher scaled score is expected if Brazilian norms are used, and conversely, a lower scaled score is expected if American norms are used.
On the other hand, we plotted the average differences obtained in each age for each subtest in order to observe the trajectory of these differences. Figure 2 shows that differences increase with age. The older age, the higher the raw score required for Americans. Figure 2. Mean raw score differences for the ten subtests (inside of rectangle) in each age.
More specifically, the differences that led to more increases with age were related to: Coding (3.8 points differences at 6 years old increase to 10.7 points at 16 years old; rho = 0.951, p = 0.000), Similarities (2.45 points at 6 yrs to 4.64 at 16.8 yrs; rho = 0.909, p = 0.000) and Vocabulary (0.19 point at 6 yrs to 9.35 points at 16 yrs; rho = 0.902, p = 0.000). The differences not related to age were Digit Span (rho = −0.249, p = 0.436), Letter-Number Sequence (rho = 0.228, p = 0.476) and Matrix Reasoning (rho = 0.399; p = 0.199). The Spearman rho between total mean differences from the WISC-IV subtests and age was 0.972.
Regarding the Indexes, the PSI (IQ 89) and the VCI (IQ 91.7) were the more distant indexes from the USA mean IQ of 100, while the WMI (IQ 95) was closer. Considering the Full-Scale IQ, the Brazilian mean IQ would be 89, relative to the USA norms, a difference of 11 IQ points between USA and Brazil (Table 1).

Effects on Clinical Diagnosis
In order to verify the effect of WISC-IV norms differences (Study 1) on Brazilian clinical diagnoses, 40 WISC-IV protocols of children assessed in psychological clinics from two universities were analyzed. Twenty protocols were for children suspected of being gifted by their parents and teachers, and twenty protocols were for children with learning disabilities according to their teachers' opinions. All families signed a consent term for the use of the results for subsequent investigation, regarding giftedness or learning difficulties. The mean age of the analyzed group was 9.7 (SD = 2.6), with 80% male.

Analysis
Raw scores from each subtest and age were converted to scaled scores according to Brazilian and USA norms. In this analysis, positive differences between the scaled scores were due to the effect of the Brazilian norms. Negative differences between the scaled scores were due to the effect of the USA norms.
Regarding the indexes VCI, PRI, WMI and PSI, differences were calculated using Brazilian and American norms. However, the variations were classified into 3 types: Type 1 = Brazilian index is higher than the USA index, but falls within the USA confidence interval of 95%. In this case, the difference was not significant. Type 2 = Brazilian index is higher than the USA index, and is outside the USA confidence interval, but the USA index falls within the Brazilian confidence interval. In this case, the difference was not significant. For example, if a Brazilian PSI index was 121 with a 95% confidence interval between 110 -127, while a USA PSI index was 112 with a 95% confidence interval of 102 -120, the difference between population parameters (PSI Br 121 and PSI USA 112) would not be significant. In this case, the 121 PSI Br falls outside of the 95% USA confidence interval (102 -120), but the 112 PSI USA falls within the 95% Brazilian confidence interval (110 -127).
Type 3 = Brazilian index is higher than the USA index; it is outside of the USA confidence interval, and the USA index is below the Brazilian confidence interval. In this case, the difference is significant. For example, if a Brazilian VCI index is 115 with a 95% confidence interval between 107 -121, while the USA VCI index is 102 with a 95% confidence interval between 95 -109, differences between population parameters (VCI Br 115 vs. VCI USA 102) would be significant. The 115 VCI Br is outside of 95% USA confidence interval (95 -109), as well, the 102 VCI USA is below to the 95% Brazilian confidence interval (107 -121). From 160 calculations (40 clinical protocols × 4 Brazil and USA indexes), only nine (5.6%) showed negative differences (the USA index was higher than the Brazilian index), and they were classified as type 1 (not significantly different). Therefore, the absolute majority (94.6%) of the differences were positive, that is, the Brazilian scaled score required a lower raw score than the American norms. These positive differences will affect the estimation of Full Scale IQ.

Results
The mean IQ of Brazilian protocols was 109.5 and 100.2, according to USA norms (Median Br = IQ 115.5, and Median USA = IQ 105). Thus, a difference of 9.3 IQ points. The distribution of the forty protocols along the IQ scale is shown in Figure 3.  Figure 3 shows that more protocols with higher IQs and fewer protocols with lower IQs were observed when Brazilian norms are used. More considerable scaled score differences were observed in Symbol Search, Coding, Vocabulary and Comprehension. Conversely, fewer differences were observed in Letter-Number Sequence, Picture Concept, Digit Span and Matrix Reasoning.
Regarding the indexes, the mean IQ difference between the use of the Brazilian and the American norms was 8.9 for PSI, 7.9 for VCI, 6.3 for PRI, and 5.6 for WMI. Thus, similar to study 1, skills such as velocity (PSI) and verbal comprehension (VCI) were the more distant, while memory (WMI) was less distant from the USA norms. The criteria for classifying differences in index factors (Analysis section) were applied to the analysis of the 40 clinical protocols (Table 2). Based on these criteria, the VCI showed a high percentage of cases classified as type 3 (57.5%), while the WMI was the index factor with a low rate of cases (15%) classified as Type 3. If the Full Scale IQ is considered, 75% of protocols presented difference Type 3; 7.5% presented difference Type 2, and 17.5% presented difference Type 1. Table 2. Index factor scores according to the Brazilian and American norms, correlation between mean differences and age, and distribution of type of differences in each index factor and for the total IQ using the 40 protocols analyzed. Subsequently, we analyzed 30 clinical protocols that presented Type 3 differences. Note that Type 3 was the criteria for identifying how many protocols were significantly affected by the previously detected differences. We observed that 24 of clinical protocols (80%) changed its IQ classification. The range of IQs associated with special education is above 120 (giftedness) and below 79 (intellectually delayed). According to Brazilian norms, Table 3 shows eight protocols with IQ above 120 and two protocols with IQ below 79. However, according to American norms, there was just one protocol with IQ above 120 and seven protocols below IQ 79.

Discussion
In this paper, we analyzed the possibility of the universality of the diagnosis of giftedness or intellectual delay through two hypotheses. First, the Greenwich IQ perspective would be valid, and it affects the diagnosis linked to the extreme of the intelligence distribution. The second hypothesis, the cultural/linguistic pers-pective would be correct, and the diagnoses made using local standardized tests are reliable. The first hypothesis implies that differences will be noted in all cognitive tasks, especially in those with more cognitive demand, favoring the country with higher mean IQ. The second hypothesis implies that differences will be more significant in verbal tasks than non-verbal tasks. For testing these two hypothesis, we conducted two studies. In the first, norms for two countries (Brazil and USA), using the Wechsler Intelligence Scale for Children-IV, were compared (Wechsler, 2003;Wechsler, 2013). Before the comparison, we verified the content, artwork, and scoring processing. From the 10 WISC-IV subtests analyzed, only Similarities had 1 item modified, out of 23. The remaining subtests and items, were the same for the USA and Brazilian standardizations. In the second study, the impact of the results of study 1 on forty Brazilian clinical protocols that used the WISC-IV was analyzed. Our results indicated four points of interest, which are discussed as follows: Positive differences. The USA and Brazilian scaled scores of the 10 WISC-IV subtests showed that American children are required to obtain a higher raw score than Brazilian children to achieve the same scaled score. As a consequence, if USA norms are used instead of Brazilian norms, the final index and the Full Scale IQ would be lower (Study 1). Additionally, these differences were noted in the scoring of clinical protocols (Study 2). The difference was 11 IQ points, the same value estimated by the Greenwich IQ between the two countries. This result renders support to the first hypothesis, i.e., there would be a universal IQ mean, and local cognitive diagnoses would not be reliable, even using appropriate standardizations.
Major and minor differences. In both studies, differences were more intense in VCI (verbal comprehension), than WMI (memory). At this point, higher differences in VCI which resulted in changes in IQ classifications, as observed in Study 2, rendered support to the second hypothesis, i.e., there would not be a universal IQ mean, and cognitive diagnoses related to the tail of intelligence distribution would be valid only in the context where the standardized IQ test was conducted.
Differences and age. Measured differences increased with age (studies 1 and 2). This result can be interpreted as evidence of the influence that the environment had on intellectual differences, i.e., the earliest ages, Brazilian and American children would present little cognitive differences. As they grow older, the environmental context tended to increase the differences. Nevertheless, research in behavioral genetics has indicated the reverse, i.e., an increase of genetic influences and decrease of shared environmental influences over the years (Bartels, Rietveld, van Baal, & Boomsma, 2002;Deary, Spinath, & Bates, 2006). Cognitive diagnosis differences. Considering clinical protocols that showed differences in Type 3 (IQ differences outside confidence intervals), just one, instead of eight cases of high IQ (>120), and seven, instead two cases of low IQ (<79), were observed. Note that the children with low cognitive performance could receive specialized education if they were living in the USA. However, differences were more concentrated in VCI (57.5% of protocols), which could have affected the Full Scale IQ score.
Both studies' results render support to the second hypothesis for explaining the differences noted between the Brazilian and the American norms. Differences were more significant in verbal tasks. However, it is unclear why there were also substantial differences in PSI or PRI (considering Type 3 differences); the indexes cultural/linguistically little influenced.
Perhaps the fairest form of comparison between American and Brazilian norms would be the use of WMI, the index IQ that showed the fewest Type 3 differences (15% of protocols), and it was not influenced by age. Note that Sánchez-Escobedo et al. (2011) showed the same result with the WISC-IV Mexican version. Working Memory is considered a factor that is strongly related to psychometric intelligence (Colom, Flores-Mendoza & Rebollo, 2003;Salthouse & Pink, 2008). Unfortunately, it would be difficult to expect an academic and professional consensus into considering the WMI as the primary reference for the diagnosis of giftedness and intellectual deficit.

Conclusion
This paper analyzed how universal is the cognitive diagnoses related to the tail of intelligence distribution. The question merits a scientific answer considering the increase in students and families' geographical mobility and, consequently, the search for special education. Here we presented differences in IQ diagnosis in the most commonly used scale-the WISC-IV-for assessing intelligence. To our knowledge, there is no doubt this scale is valid in capturing performance at the extreme end of the intelligence spectrum. However, there is a higher probability of diagnosing giftedness and a lower probability of diagnosing intellectual delaying, if the Brazilian norms, instead USA norms are considered, which affect international psychological assessment practices. It is the same conclusion that other researchers reached with Spanish versions of the Wechsler scales (Sánchez-Escobedo, Hollingworth, & Fina, 2011;Funtes et al., 2016). On the other hand, differences between Brazil/USA norms were mostly influenced by differences in the VCI, which indicate that a cognitive diagnosis would be valid only for the social context where the standardization was conducted. Perhaps, the WMI index is the fairest cognitive measure when people culturally diverse are assessed through traditionally measures such the Wechsler scale. Nevertheless, the universality of cognitive diagnosis is still an open question. An accurate answer could be reached using different cognitive measures administered on representative samples from developing countries. Our study is a preliminary step in that direction. It is a challenge and a duty for behavioral science to investigate whether local standardizations hide actual group differences or flattening group linguistic-cultural differences.