Estimating One’s Own and Other’s Psychological Test Scores

This paper examines how accurate people are at estimating their own psychometric test results, which assess personality, intelligence, approach to learning and other factors. Seven groups of students completed a battery of power (general intelligence, fluid intelligence, creativity and general knowledge) tests and preference (approaches to learning, emotional intelligence, Big Five personality) tests. Two months later (before receiving feedback on their psychometric scores) they estimated their own scores and that of a class acquaintance who they claimed to know well on these variables. Results from the different samples were reasonably consistent. They showed that participants could significantly predict/estimate their own Neuroticism, Extraversion and Conscientiousness scores, as well as their General, Fluid and Crystalised intelligence, Approaches to Learning, Creativity and Happiness. Correlations between estimated and test-derived scores for an acquaintance were around half those for self-estimates and better for personality than ability. Participants self and “other” estimates were nearly all significantly positive. The discussion considers when, if ever, self-estimated scores can be used as proxy for test scores and what self-estimated scores indicate. Limitations are considered.


Introduction
There are psychological studies on the validity of self-estimates going back nearly 100 years (Shen, 1915). Most of these studies have looked at self-estimates of ability/intelligence (Ackerman & Wolman, 2007;Chan & Martinussen, 2015; supporting Paulus et al.'s (1998) conclusion . Ackerman and Wolman (2007) thoroughly tested 142 mature American students on a large number of ability tests including verbal, spatial and mathematical tests. Self-estimates were obtained prior to, and after, actual testing. All correlations were positive, though there was wide variability (.27 to .54). Higher correlations were found when both variables were aggregated to make them more reliable: r = 0.33 for spatial ability, and r = 0.44 for mathematical ability. Interestingly participants gave lower estimates for verbal than maths or spatial ability because they had better knowledge of them.
Correlations are affected by two things: whether estimates are made before or after taking the test and which tests are taken. Correlations tend to be more modest (and often more accurate) when self-estimates are made after tests. They also tend to be more modest on crystalised rather than fluid intelligence tests (Furnham, Chamorro-Premuzic, & Moutafi, 2005). In this study we examine participants' ability to predict their own score on a variety of IQ tests to examine the extent to which they vary. It is predicted that both self and other estimated and actual IQ scores would be significantly positively correlated.
Self-estimated and psychometrically assessed personality There is a small, but consistent, literature on the relationship between estimates of, and scores on, psychometrically validated personality tests. Various studies have looked at participants' ability to predict their own Extraversion and Neuroticism scores (Vingoe, 1966;Harrison & McLaughlin, 1969;Gray, 1972;Semin, Rosch, & Chassein, 1981;Blaz, 1983). Studies in this area have used a large number of personality measures, including the Fundamental Interpersonal Relations Oriented-Behaviour (FIRO-B), the Myers-Briggs Type Indicator (MBTI) (Furnham, 1990), and locus of control measures (Furnham & Henderson, 1983). Furnham (1997) used the NEO-FFI (Costa & McCrae, 1988) to measure the Big Five personality traits, and found participants were best at predicting Conscientiousness (r = .57), followed by Extraversion (r = .52) and Neuroticism (r = .51). They were least good at predicting their Openness-to-Experience score (r = .33) and Agreeableness (r = .39). Furnham and Chamorro-Premuzic (2004) looked at self-estimate and actual test derived scores on all 30 facets of the NEO-PI-R. The most consistent were for the six Conscientiousness scales (range r = .18 to r = .54; mean r = .41). Overall the correlation for six facets (N1, N2, N3, E3, E4, C5) were r > 0.50 while four (N5, O3, A2, A6) were non-significant. They also showed that less Agreeable, Neurotic participants gave lower estimates of their overall intelligence.

Approach to learning
The literature on approaches to learning antedates the research on learning styles and approaches to learning. Whereas the "style" literature is about how different people choose to process material, the "approaches" literature is clearly much more concerned with motivation and assessment. The issue is how people approach their learning task. Murray-Harvey (1994) notes that both styles and approaches researchers are concerned with the learning strategy that students use which are considered important attributes that they bring to any learning situation.
Most researchers observed that if students were given a text to read that they knew would be examined on some tried to understand, contextualize and comprehend the "big picture" content while others focused on remembering what they thought were the "facts" that they would be examined on. These two very different approaches have been called deep vs. surface approaches. To adopt the deep approach means to achieve a critical understanding and retention of concepts that are integrated into a knowledge schema and used for problem solving. The surface approach is based on a pragmatic short-term memorization of salient facts for examination or repetition.
This study will use the Biggs (1987) measure, which assesses the surface, deep, and achievement-oriented approach to learning. Because this study tested students it was predicted that correlations between self as well as other estimated actual scores would all be positive and significant.
Creativity A few have investigated the relationship between self-and objectively measured creativity and have used different measures of creativity (Kaufman, 2006;Karwowski, 2011). In one study the correlation between self-estimated creativity and a test score was r = .27 (N = 64) (Furnham, Zhang, & Chamorro-Premuzic, 2006). Because most people believe they are creative and because the concept is so loosely defined we predicted a low, but positive and significant correlation between self-estimates and test score.
This study A central question is which psychometric test scores people are able to predict with any degree of accuracy. It could be assumed that people are able to predict scores for dimensions that they understand or where they have some frame or schema of reference. If, for instance, a person is required to estimate his or her Extraversion or Conscientiousness score accurately, he or she would have to be familiar with the psychological concept, be clear about the situations or phenomena to which it applied and be aware of how he or she compared with population norms for Conscientiousness and Extraversion. Thus, to do this task well, a participant needs to access and use a cognitive category or framework concerning personality traits.
This study moves the literature forward in three ways. First, while it replicates earlier studies on measures of intelligence and personality, it uses new measures including emotional intelligence, creativity, happiness and approaches to learning. Second, this paper reports seven cohorts of students to examine the replicability of the results. Rather than combine the samples (measured over different years) on those measures, which were the same, we treated this as seven replication studies. Third, we used in all five different measures of intelligence to see if

Measures
Personality. The NEO Personality Inventory-Revised (NEO-FFI; Costa & McCrae, 1992). This 60-item, non-timed questionnaire which measures the "Big Five" personality factor. The manual shows impressive indices of reliability and validity. Test-retest reliabilities range from r = .71 for Agreeableness to r = .80 for neuroticism.
Approaches to Learning. Study Process Questionnaire (Biggs, 1987). This is a 42-item questionnaire that yields six scores. There are 3 approaches and 2 components. The first component is learning motive (why students learn): the second learning strategy (how students learn). The three approaches are surface (a reproduction of what is taught to meet the minimum requirement), deep (a real understanding of what is learned), and achieving (designed specifically to maximise grade). The questionnaire has been repeatedly shown to have satisfactory internal reliability and test-retest reliability (r = .82), content, construct and predictive validities.
Emotional Intelligence (EQ). Trait Emotional Intelligence (TEIQ) (Petrides & Furnham, 2003). Trait EI "refers to a constellation of behavioural dispositions and self-perceptions concerning one's ability to recognize, process and utilize emotion-laden information. It encompasses various dispositions from the personality domain, such as empathy, impulsivity and assertiveness as well as elements of social intelligence and personality intelligence, the latter two in the form of self-perceived abilities". Studies report test-retest reliability of between r = .74 and r = .84.
Verbal Reasoning. The Baddeley Reasoning Test (Baddeley, 1968). This 64-item test can be administered in 3 minutes and measures Gf through logical reasoning. Scores can range from 0 -64. Each item is presented in the form of a grammatical transformation that has to be answered with "true"/"false", e.g. "A precedes B -AB" (true) "A does not follow B -BA" (false). The test has been employed previously in several studies (e.g. Furnham & McClelland, 2010) to obtain a quick and reliable indicator of people's intellectual ability. It has a test-retest reliability of r = .80.
General Knowledge. General Knowledge Test (Von Stumm, 2009). This is a 72 item questionnaire that measures knowledge of six areas: literature, general science, medicine, games, fashion and finance. Each area is measured by 10 items, and each correct response is awarded 1 point (in a few cases, there are two correct responses and not one). The internal reliability of the test for the present sample was a = .78.
Creativity. The Barron-Welsh Art Scale (Barron & Welsh, 1952). This scale consists of 86 different black and white pictures arranged and numbered to 8 pictures per page. Participants are instructed to make quick, instinctive, dichotomous judgements about whether they like/dislike each picture. This test requires no language skills, can be used on children and adults, is simple and does not require extensive concentration. The test-retest reliability is r = .81 Happiness. Oxford Happiness Questionnaire (Hills & Argyle, 2002). It measures trait happiness. This is 29 item scale that was devised the "opposite" of the Beck Depression Inventory. It was one of the first measures to be used in the Positive Psychology revolution and there is a short version. The psychometrics are good though there is some question about its dimensional structure.
Fluid intelligence. Advances Progressive Matrices Set II (Raven, 1938). This is a 36 item test, possibly the most famous in psychology. Participants are shown a diagram with 9 pictures of complex shapes with one missing. Participants have to choose between 8 options of figures that logically fit in the missing space. The test has been extensively validated against other measures of fluid and crystallised intelligence.
General Intelligence. The Wonderlic Personnel Test (Wonderlic, 1990). This 50-item test can be administered in 12 minutes and measures general intelligence. Scores can range from 0 to 50. Items include word and number comparisons, disarranged sentences, serial analysis of geometric figures and story problems that require mathematical and logical solutions. The test has impressive norms and correlates very highly (r = .92) with the WAIS-R total IQ score.
Arithmetic. Mental Arithmetic (Lock, 2008). This is a 30 item test requiring a person to make 10 arithmetic calculations (multiply, divide, add, subtract) per item. It is meant to be a mental test, though some people do attempt written calculations. Ten minutes were allowed for the administration.

Procedure
Participants in each study were tested simultaneously in a large lecture theatre in the presence of five examiners who ensured the tests were appropriately completed. They completed the tests in two settings each lasting around 40 minutes.
Two months later in a lab setting the tests were explained: what each factor measured (i.e. the full definition based on the manuals), and shown population norms and means, as well as the means for their group. They were asked to estimate their (and their friends) score on the same scale shown in the results for each test. For example for the Wonderlic they were shown a normal distribution A. Furnham scores of over 100,000 showing the range (i.e. 50) the mean score and one standard deviation above and below the mean. They were also given reminders of what the tests looked like to refresh their memory. They were asked to nominate a person in the class who they knew best (i.e. "friend") and also to make an estimate for them. They also indicated on a 5-point scale how well they knew this person from "not much" to "extremely well". This task thus involved around 30 estimates. Immediately after they had completed the exercise they got their test scores, which were explained, in detail. They also saw the correlational results shown in this study two weeks after making their estimates.

Results
Study 1: Table 1 presents the descriptive statistics and correlations. Twelve of the 14 self-estimate-actual scores were significant, but only 6 of the other-estimate-actual scores. The highest correlations were for Extraversion and the lowest for Emotional Intelligence. The self-other test scores indicated that the pairs were only significantly alike in their Emotional intelligence, Extraversion and Neuroticism scores. On the other hand their self-other estimate scores indicated that they believed they were alike on 10 scales, particularly General Knowledge and Openness.
Study 2: Table 2 shows the results of the correlational analysis. Of the 14 self-estimated/actual score correlations 12 were significant, but two (Openness and Agreeableness) negative. The highest was for verbal reasoning, followed by deep approach to learning and then Extraversion. Six of the 14 other estimated actual scores were significant, but only two of the self/other actual scores. Ten self/other estimated scores were significant and all positive.
Study 3: Table 3 shows all but one of the self-estimate/actual scores was significantly positively correlated with all intelligence test scores r > .50. Six of the 17 other estimate actual scores were significant. In this study seven of the self and other actual test scores were significantly positive indicating a similarity in personality, ability and approach to learning between the participants. As in the other studies self and other estimates were nearly always significantly and positively related.
Study 4: Table 4 shows with one exception (Openness) all the self-estimate/ actual scores were significant with five being r > .50 (Neuroticism, Extraversion, Conscientiousness, Verbal Reasoning and General Knowledge. Nine of the 13 other estimate/actual scores were significant with Extraversion being the highest correlation. Seven of the self/actual scores were significant particularly General Knowledge. All but one of the self/other estimates was significant all being r > .30.
Study 5: In all 11 out of the 13 self-estimate/actual scores were significant the highest being for Happiness, Extraversion and Conscientiousness but with Agreeableness showing a negative relationship (See Table 5). Seven other estimate/actual correlations were significant and two of them were significantly     negative (Openness and Agreeableness). The highest correlation was for Conscientiousness, r = .50. Five self-other actual scores were significant and they showed that this group were alike on their Extraversion, Creativity, general intelligence and arithmetic scores but different on their agreeableness ratings. Finally all but two of the self-other estimate scores were significant.

A. Furnham
Study 6: Table 6 shows eleven of the 13 self-estimate/actual score correlations were significant though two were negative (Openness and Agreeableness). Seven of the other estimate/actual correlations were significant and one negative (Agreeableness). Only two of the self/other actual scores were significant suggesting that these participants were not very similar to each other, though nine of the thirteen self/other estimates were significant suggesting they believed they were.
Study 7: Table 7 shows seven of the 10 self-estimate/actual correlations were significant the highest being for intelligence and one being significantly negative (Agreeableness). Only two of the ten other estimate/actual correlations were significant, one being negative. Three of the self/other correlations were significant.
The data for the Big Five personality factors was aggregated and the analysis repeated. Findings are shown in Table 8. The four columns tell the story of findings in this area. Three of the five correlations were highly significant (r > .45) indicating that participants could predict their Extraversion, Neuroticism Psychology A. Furnham Arithmetic .59** .41** −.14 .59** 10.63 3.85 *Correlation is significant at the 0.05 level (2 tailed); **Correlation is significant at the 0.01 level (2 tailed).

Discussion
This study showed that with few exceptions people could predict their psychometric test scores. This was true for creativity, emotional intelligence, happiness, intelligence and three of the Big Five personality traits. These results are consistent with previous literature (Semin, Rosch, & Chassein, 1981;Blaz, 1983;Harrison & McLaughlin, 1969;Furnham, 1997). Further, the fact that there were no positive significant correlations between self-estimated and psychometric Openness and Agreeableness scores is also in line with initial predictions. It was hypothesized that the relative obscurity and low usage in the definition/label of these factors would hinder participants' capacity to estimate their scores accurately. However, it is the fact that in many studies, and overall, those correlations were negative suggesting that those who were most Open and Agreeable tended to give low scores and vice versa. Whilst the size of the correlations differed in the four studies, Extraversion showed highest correlations which is to be expected given the way it is so commonly discussed.
Second, regarding the relationship between measured and estimated IQ scores, the present results confirmed our prediction that people would be able to estimate their intelligence to a moderate but significant degree. In fact, the correlation reported between self-estimated and psychometric IQ scores (i.e., r = .30), is not only consistent with the previous literature (De Nisi & Shaw, 1977;Borkenau & Liebler, 1993;Furnham & Rawles, 1999, Zell & Krizan, 2014, and with Paulus, Lysy and Yik's (1998) meta-analysis, in which the authors concluded that self-estimated and psychometrically measured intelligence typically correlate by (r = .30) (see also Furnham, 2001 Apart from the first sample the self-actual estimated correlation for EQ were significant and between r = .26 and r = .55 which confirms Petrides and Furnham (2003). All three studies using the creativity measure showed significant correlations between r = .28 and r = .43 which are similar to the results reported by Karwowski (2011) for creative self-efficacy. This is perhaps surprising given the doubts about the validity of the measures used (Barron & Welsh, 1952), indeed for all creativity tests (Batey & Furnham, 2006). The studies also showed that students were quite able to predict their Approaches to Learning score with Deep Learning correlations higher than Surface Learning. This is not surprising, as students seem to understand this concept very well indeed. The other person estimate-actual correlations showed three things. First, that correlations were lower (and frequently not significant) compared to self-estimateactual correlations (study 1: 6/14; study 2: 6/14; study 3: 6/11; study 4: 9/13; study 5:7/13; Study 6:7/13; Study 7:3/10). Despite variation between the studies it seems friends could significantly predict each other's Extraversion, Neuroticism, Conscientiousness, Verbal Reasoning and General Knowledge. However in many studies the correlations were negative and occasionally significantly so. This was reasonably good as they had known each other on average for only three to four months. When partial correlations were computed for the rating of how well they knew the other person there were surprisingly few differences supporting the "thin slices of behavior" research which shows how little information is required for people to make accurate judgments of others.
The self-other actual test scores indicated how similar the two friends/acquaintances were. Some samples showed very few correlations (Study 2) while others yielded more (Study 3) and there was not a detectable pattern. This may be because of the length and nature of the participant's friendship with one another but also supports the similarity-attraction literature which suggests that people with similar personalities, values and abilities are attracted to each other. However most of these participants had only known each other for around three months and many other factors, such as where they lived, may have been a more important predictor of who made friends with whom. There was no clear relationship between how long and well the participant's reported knowing one another and their ability to predict their scores.
The final set of correlations (self-other estimates) replicated previous studies. They showed that most were significant (Study 1: 10/14; Study 2: 10/14; Study 3: 11/11; Study 4: 12/13) being r > .60. It is not clear whether this represented a belief on the part of participants that they really were similar to the friends/acquaintances or whether this was an artifact of the rating style or the requirements of the study where they may have seen the others actual estimates.
The question remains as to who are able to make better estimates than others (i.e. are cognitively or emotionally intelligent people better self-and otherestimators); on what characteristics (personality; ability) they are more or less likely to be accurate; and when (i.e. under what test conditions) they make better estimates. We examined some of these questions: for instance, do cognitively and/or emotionally intelligent people (those scoring over one standard deviation on the tests) do better in predicting their own scores. None of the analyses showed an unequivocally clear pattern. Equally we considered whether those individuals with higher self-actual estimate correlations had a different test profile and again the results were not clear.
These studies however can only be considered as studies of personal awareness to the extent there is considerable evidence of the construct validity of the measures. That is, for the scores to be considered "actual" measures of a characteristic, reliable and valid tests need to be used. Otherwise these studies may be described as those of personal validation of psychometric instruments suggesting that personal estimates are the valid scores that tests hope to correlate with.
Studies such as this may inform various literatures such as that on self-awareness which suggests self-awareness is a highly desirable characteristic in adults.

Conclusion
As expected, when they understand a psychological concept most people are reasonably able to predict their own personality test score. It seems the more commonly used the psychological concept, like Extraversion, the more people have an understanding of themselves, although they do not necessarily understand the psychological process or mechanism underlying the test. Typically, correlations between self-estimated and test-assessed scores vary from r = .20 to r = .50 depending on the size of the group, the characteristic being assessed, the personality, sex and culture of the participant and the motive for doing the test.
Inevitably, people are far less accurate at estimating the test scores of their friends which depends on how well they know them, understand the concept they are asked to consider and whether their friend sees the score. Overall it seems that self-estimates are only a very weak proxy for actual test scores.