Development and reliability of the Indonesian Wechsler Adult Intelligence Scale - Fourth Edition (WAIS-IV)

Through the years, several translated versions of Wechsler’s intelligence test have been used in Indonesia, in clinical, educational or industrial settings. However, instruments such as Wechsler-Bellevue Intelligence Scale are outdated, have not been validated and lack proper normative data, resulting in measurement errors and invalid decisions made on the intellectual potential of individuals. The primary aim of this study was to adapt and validate the Wechsler Adult Intelligence Scale—fourth edition (WAIS-IV) for use in Indonesia. We described the first phase in the adaptation of the WAIS-IV in the Indonesian language, including translation, item analysis, and reliability of the subtests. The sample of this research consisted of 148 healthy participants who are representative for the Indonesian population with respect to gender, age groups (ages 16 to 83), educational levels, and ethnic background. Results showed that the sequence of the US WAIS-IV cannot be applied in Indonesia due to differences in index difficulties. Cronbach’s coefficient alphas for the WAIS-IV subtests ranged from .74 - .92. For the subtests from the Verbal Comprehension Index, the inter-rater agreement ranged between .91 - .97. In all, the adaptation of the WAIS-IV for Indonesia is psychometrically promising.


Introduction
Assessment of intelligence is one of the most important topics in psychological testing. Worldwide, Wechsler's intelligence tests are best known and the most widely used for testing intelligence. The original version of the Wechsler intelligence test, the Wechsler-Bellevue Intelligence Scale (WBIS) for adolescents and adults was developed in 1939. Internationally, the WBIS has been extended and modified several times to WAIS, WAIS-R, WAIS-III, and the recently published WAIS-IV (Wechsler, 2008a(Wechsler, , 2008b. Adaptations from the original US versions of these tests have been published in many countries, such as the Netherlands (see Van der Heijden, Van den Bos, Mol, & Kessels, 2013), Japan (see Murayama, Iseki, Tagaya, Ota, Kanasuki, Fujishiro, Arai, & Sato, 2013), Finland (see Roivainen, 2013), France (see Lecerf, Golay, & Reverte, 2012), South Africa (see Grieve & Van Eeden, 2010), Canada (see Lange, 2007), Spain (see Melendez, 1994), and China (see Lynn & Dai, 1993). In the latest WAIS-IV items and subtests have been substantially revised, new subtests have been added and norms have been updated to take cohort effects into account (Wechsler, 2008b). The WAIS-IV updates also incorporated theoretical advances in neuropsychology, cognitive neuroscience, and contemporary intelligence theory, as well as increasing sophistication in psychological measurement (Weiss, Saklofske, Coalson, & Raiford, 2010).
In Indonesia, the WBIS is still being used for intelligence testing. However, details about its translation (presumably done in the 1960s or 1970s), standardization, psychometric properties and the development of the written manual are unknown, and this version is probably unauthorized (Lembaga Pengembangan Sarana Pengukuran dan Pendidikan Psikologi Fakultas Psikologi Universitas Indonesia, n.d.) and certainly outdated. Despite these shortcomings, the WBIS is still being taught and the most widely administered intelligence test in Indonesia. A translated version of the later Wechsler Adult Intelligence Scale is also available (Seksi Psikodiagnostik, 1992), but suffers from the same shortcomings, is also probably unauthorized and has never gained wide acceptance in the Indonesian psychology community. As an example, in Bandung, several faculties of psychology preferred the WBIS over the WAIS as the translated WAIS items were culturally biased (Polhaupessy, n.d). The lack of valid intelligence tests results in measurement error, misleading decisions made about the potentials of individuals (i.e., in vocational training or job selection), as well as clinical misdiagnosis. This stresses the urgent need for the development and adaptation of a valid intelligence test for the Indonesian population. The WAIS-IV can be used to identify intellectual disability, intellectual giftedness, and cognitive functioning in examinees (Wechsler, 2008a). The WAIS-IV also can be used as part of neuropsychological assessment in clinical settings (Fitrikasari, Jaeri, & Bintoro, 2013). Therefore, the focus of this study is the development of the Indonesian version of the WAIS-IV and to psychometrically evaluate the Indonesian WAIS-IV.
The WAIS-IV is an individually administered test battery for individuals aged 16 to 90. It consists of 15 subtests, namely Block Design (BD), Similarity (SI), Digit Span ( The WAIS-IV subtests are grouped into core and supplemental subtests. The first ten subtests are the core subtests and the last five (i.e. LN, FW, CO, CA, and PC) are the supplemental subtests (Wechsler, 2008). The total battery provides an assessment of general intellectual functioning (FSIQ) and four index scores. The four index scales include Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed (Wechsler, 2008b). The Verbal Comprehension scale includes three core subtests (SI, VC and IN) and one supplemental subtest (CO). The Perceptual Reasoning scale includes three core subtests (BD, MR, and VP) and two supplemental subtests (FW and PC). The Working Memory scale includes two core subtests (DS and AR) and one supplemental subtest (LN). The Processing Speed scale includes two core subtests (SS and CD) and one supplemental subtest (CA).

Participants
The Indonesian population size was estimated at 237.6 million in 2010. The majority of people live on the island of Java, and the country's six biggest islands are Kalimantan, Maluku-Papua, Sumatera, Sulawesi, Java, and Bali-Nusa Tenggara (Badan Pusat Statistik Indonesia, 2012). The official language is Indonesian, a standardized register of Malay, an Austronesian language (Lewis, Simons, & Fennig, 2013). Indonesia is a multi-ethnic society, with more than 1000 ethnic/sub-ethnic groups, However, the size of most ethnic groups is small, and only 15 groups consist of more than one million people (Suryadinata, Arifin, & Ananta, 2003). Formal education and national media are in the Indonesian language (Lewis, Simons, & Fennig, 2013). In addition to being fluent in the Indonesian language, many Indonesians are also fluent in another regional language, such as Javanese, Sundanese, or others. These regional languages are commonly used at home or in the local community.
The participants for this study were 176 Indonesian individuals aged 16 -75 years old. Of these, 148 participants completed all 15 subtests. Half of the participants (53.4%) were men and 46.6% were women. Age ranged from 16.2 to 83.9 years (M = 37.34, SD = 16.75). 17.6% of the participants had less than eight years of education, 12.8% completed junior high school, 33.8% senior high school, 7.4% completed academy, 20.3% had an undergraduate degree, 6.8% a master degree and 1.4% a doctoral degree. With respect to ethnicity, participants in our study belonged to the ethics groups of Tionghoa (37.2%), Java (25%), and Sunda (14.9%). For validity purposes we also recruited participants originally from Ambon, Batak, Betawi, Lampung, Manado, Minang, Nusa Tenggara, Palembang, Papua, Serang, Toraja, and other islands, who lived on Java in the regions mentioned above. Most of the participants were employed (64.86%) in various sectors from education, sanitation, security, transportation, and business sectors.

Sampling Method
Recruitment of participants was performed in accordance with the WAIS-IV Technical and Interpretive Manual (Wechsler, 2008b), taking the Indonesian life expectancy into account. Based on 1980Population Census (Badan Pusat Statistik Indonesia, 2012, the life expectancy for Indonesian was 70.7 years. We used a quota sampling method. A stratified sampling plan ensured that the normative sample included representative proportions of individuals according to selected demographic variables. The allocation of samples will be based on the following variables: age, sex, education level, and geographic region. In this study the geographical area covered only Java Island, in and around cities of Jakarta, Tangerang, Bogor, and Bandung. The involvement of the participants in this research is based on several considerations, as stated in the technical manual of the US WAIS-IV (Wechsler, 2008b). Participants were excluded when their primary language was not Indonesian, when they primarily used nonverbal language or were uncommunicative, when they were unable to understand instructions and participate fully in testing, when they were inadequate to comply with testing to ensure a valid assessment, when they were already tested with any intelligence test in the past six months, when they had an uncorrected visual impairment, uncorrected hearing loss. Upper-extremity disability that would affect motor performance, currently admitted to psychiatric facility, currently taking medication that might impact cognitive test performance (e.g., antidepressants, antipsychotics, etc.), recent and significant change related to cognitive status change, currently receiving chemotherapy in the past two months, and previously diagnosed with any physical condition or illness that might affect to lower the test performance, were excluded from participation.

Instruments
The original US version of the WAIS-IV was translated into Indonesian by two translators, with bilingual (English-Indonesian) ability. Both translators were educational psychologists and lecturers in the Faculty of Psychology of Atma Jaya Catholic University of Indonesia. They performed the translation in the same period of time. The first two authors (CS, MSH) compared their translations and met with the translators to review the translations, identified differences in Indonesian-English meaning, and adapted the Indonesian-language version to achieve the most accurate culturally equivalent meaning (International Test Commission, 2010).
For the back-translation process, we asked two new independent translators who had bilingual ability and a background in psychology. The first translator was a clinical psychologist and lecturer in the Faculty of Psychology of Atma Jaya Catholic University of Indonesia. The second translator was an organizational psychologist who works as a consultant. Both independently translated the Indonesian WAIS-IV back into US English. Subsequently, the authors and the translators met to review the back translations, identified differences in English-Indonesian meaning, and adapted the Indonesian-language version to achieve the most accurate culturally equivalent meaning. This final translated WAIS-IV was authorized by Pearson Assessment.
The scoring of the WAIS-IV followed the discontinue rules that have been incorporated into the scale to avoid frustration by the examinee because it includes too difficult items. Subtest items were ordered according to increasing difficulty. The discontinue rules were applied after the examinee had several number of consecutive scores of zero. The discontinue rules are different in each subtest (Wechsler, 2008b). In agreement with Wechsler (2008a), the administration of BD was discontinued when the examinee obtained two consecutive response score of zero. The administration of SI, MR, VC, AR, VP, IN, FW, CO was discontinued when the examinee obtained three consecutive response score of zero. The administration of DS was discontinued when the examinee's response on both trials of one item was zero. The administration of LN was discontinued after the examinee obtained a score of zero on all trials. The administration of PC was discontinued after the examinee obtained four consecutive zero responses. The SS and CD were discontinued after 120 seconds. For CA, each item was discontinued after 45 seconds.

Procedure
Testing took place in a counseling room at the university, at a company, school, or another place which met the requirements of a psychological testing environment. That is, no external distracting, quiet or soundproof, without interruptions during test administration.
We contacted potential participants and gave information of this study. If a potential participant agreed to participate, we set an appointment to administer the test. Before the test was administered, all participants provided written informed consents. The WAIS-IV was administered in its prescribed order and demographic data were collected. All items in all subtests were administered, and discontinue rules were applied during scoring.

Analyses
Index difficulty was computed for each item in 12 subtests of the WAIS-IV. SS, CD, and CA were excluded because these are timed tests, for which the index difficulty analyses cannot be performed. In the timed test category, items are so easy that all participants will have all items correct if they have enough time to finish them (Anastasi & Urbina, 1997). As, index difficulty can be different for participants from specific age groups, the next analysis explored whether differences exited between two age groups. The first group consisted of people aged 16 to 69 years and the other of people over 70. This division was based on the administration guidelines of the WAIS-IV subtests, as all subtests are only administered in people from 16 -69 and only 12 subtests administered in people over 70. We performed this analysis with the Wilcoxon Signed Rank Test.
The next step was to reorder the item sequence, apply the discontinue rules, and calculate the total score for each subtests. We compared these to the original sequence's raw score if we applied the discontinue rules using a dependent sample t-test. Also, we calculated Cohen's d (Cohen, 1988). This analysis determined whether the new sequence was the best solution for a subtest.
Reliability coefficients were obtained using Cronbach's coefficient alpha. The alpha coefficient takes the variance of both the subtest total score and item scores into account, and provides the reliability that is the average of all possible split-half reliabilities (Anastasi & Urbina, 1997). Cronbach's coefficient alpha cannot be determined for the subtests of the Processing Speed index (SS, CD, and CA). For the Verbal Comprehension subtests (SI, VC, and CO), we calculated the inter-rater reliability. These three subtests have a different scoring system (Wechsler, 2008a), the criteria for which were also translated into Indonesian. We asked three independent raters to score the participant's verbal responses based on the criteria in the Indonesian manual. Two of three raters were educational psychologists and one a psychometrician. None of the raters had any previous experience with the WAIS-IV scoring rules. Reliability coefficients were obtained using intraclass correlation coefficients (two-way mixed model with absolute agreement).

Item Analysis
For most subtests of the Indonesian translated version of the WAIS-IV, the item index difficulties did not follow the US sequences from easy to difficult (Table 1). Most changes in item sequences had to be made for the subtests in the Verbal Comprehension and Perceptual Reasoning scale (except for BD). Based on this analysis, we re-sequenced the item order of those subtests. Table 1 shows the range and mean of index difficulties across subtests and age categories. Subsequently, we analyzed the differences in index difficulties between the two age groups. Using the Wilcoxon Signed Rank Test, Z-test, we found that there were significant results on four subtests (DS: Z = −4.43, p < .01; PM: Z = −3.39, p < .01; VP: Z = −2.98, p < .01; PC: Z = −3.30, p < .01). All these subtests were more difficult for participants over age 70 years.
After rearranging the item order, we scored the performance using the discontinue rules ( Table 2). Significant differences in performances between the original and rearranged sequences were found for VC (t (147) = −5.01, p < .01), VP (t (147) = −3.60, p < .01), IN (t (147) = −6.50, p < .01), and CO (t (147) = −12.81, p < .01). Of these, CO had the largest effect size. No significant differences were found for BD, SI, MR, AR, FW, and PC. Table 3 shows the reliability coefficients of each WAIS-IV subtests. Cronbach's coefficient alpha of the WAIS-IV subtest ranged from .74 -.92, indicating an acceptable (BD, SI, and CO) to excellent (LN and FW) reliability. For the subtests of the Verbal Comprehension index, we also computed inter-rater reliabilities. The inter-rater agreements were high for SI (r = .97), VC (r = .96), and CO (r = .91).

Demographic Data Analysis
We found no significant differences in the scores of men and women in any of the subtests. Significant differences between participants of different educational backgrounds were found. Table 4 shows the differences between participants that had completed senior high school (N = 50) and those who had a university undergraduate degree (N = 30).
Significant differences were found on most subtests, except CO (t (78) = .22, p > .05) and CA (t (78) = 1.47, p > .05). The group that had an undergraduate degree scored higher on all subtests, in agreement with previous research (Matarazzo & Herman, 1984;Grieve & Van Eeden, 2010). Table 5 shows the Pearson product-moment correlation coefficients between age and all subtests in WAIS-IV. We found significant correlations for all subtests of the Processing Speed index, most of the subtests of the Perceptual Reasoning index, and most of the subtest of the Working Memory index. No significant correlations were found between age and the subtests of the Verbal Comprehension index.
For the subtests of the Processing Speed index, the highest correlation was between age and SS (r (146) = −.40, p < .01). These results were consistent with numerous studies about aging-related decline in speed of information processing (see Verhaeghen & Salthouse, 1997). FW was the only subtest from the Perceptual Reasoning index that did not significantly correlated with age. The highest correlation was found between age and VP, r (146) = −.25, p < .01. These results were consistent with studies about age and reasoning (see Verhaeghen & Salthouse, 1997). For the Working Memory index, significant correlations were found between  DS (r (146) = −.30, p < .01) and LN (r (137) = −.39, p < .01). These results are consistent with studies about age and working memory (see Verhaeghen & Salthouse, 1997).

Discussion
In this study, we described the translation process of the WAIS-IV into the Indonesian language. We performed psychometric analysis, including the order of the item sequences, item analyses, and reliability of each subtests of the WAIS-IV-ID. As expected, the item sequences of most of the translated WAIS-IV subtests had to undergo major changes. In all subtests from the Verbal Comprehension index, we had to reorder the item sequence, although stayed close to the original items for content purposes. For example, in the subtest information of the original US test, most of the items that refer to science, geography, world history, world figures, and literature are related to Western culture knowledge. Indonesian participants were more likely to answer correctly in science and geography, but most of them had difficulties in answering items about specific historical persons (like Sacagawea) and literature (Alice in the Wonderland). The index difficulties ranged from .41 -.02. On those items, participants either gave up or said they remembered that this figures had appeared in a movie. In one item about historical figures in the US, their answers were not fully correct although they gave correct information to some extent. Another example is the VC subtest, which also had to undergo major reordering, because the items may have different meaning  in daily conversations compared to their formal meaning in Indonesian. For instance, item number 4 ("bed", in Indonesian "ranjang") of VC, became item 11 after reordering. Specific syllables in some words in Indonesia, like the "ke" can be "erased" when spoken. In item number four, the syllable "ke" is not present. The participants who gave incorrect answers in this item, tended to listen to the word but not read it. Even if the examiner had pointed to the written word, many participants still not changed their answer. As a result, many zero responses were scored for this item and its index difficulty was .76. Another example is item number 17 (i.e. "plagiarism", in Indonesian "plagiat"), after reordering, it became item 30. The subtest Arithmetic (AR) from Working Memory index and all subtests from Perceptual Reasoning index, except for Block Design, also needed major reordering of the items sequence. Many of the Picture Completion items resulted in incorrect responses from the participants, because they did not indicate the important missing detail of the picture, but indicated other absent objects in the picture instead. For instance, for the item in which a picture with trees is shown, participants answered that the river lacked fish, or that people or vehicles were missing on the picture. Other incorrect responses were found in several pictures with which Indonesian people are unfamiliar. For example, items such as snow or a stove are less common in Indonesia (many participants responded that the stove is a kind of washing machine). Clearly, reordering the items and applying the discontinue rules improved the participants' scores. Some other recommendations can be made based on our experiences in the data collection process. First, in the similarity subtest, the example item (what is the similarity between "two and seven") was difficult to answer for some participants. They answered in a concrete way, e.g. "both have edges". If participants do not understand the instruction, then continuation is a problem. To overcome this, we repeated the instruction twice and explained the correct answer of the example item. In case they still did not understood the purpose of the subtest, the examiner gave an example using two concrete objects, e.g. what is the similarity of two pieces of fruit (mango and banana), that are more part of the participants' daily lives. We used this procedure especially in participants with a lower educational background and for senior citizens. Secondly, in the Comprehension subtest we found no significant differences between the performances of participants who completed high school and university graduates. This result was not consistent with other studies (Matarazzo & Herman, 1984;Grieve & Van Eeden, 2010). By checking the responses of those participants, we found that many of the answers were very short. From the data collection evaluation, the participants tended to say, "Ok, that's it" or "I don't know anything else" or "It's enough". Their responses usually only covered one general concept, for which they did not obtain the maximum two points. For future research, we recommend to rephrase the item appropriately to obtain another response as suggested in the Wechsler (2008a).
We found age effects that are in agreement with previous research. For instance, Verhaeghen and Salthouse (1997) performed a meta-analyses on 50 studies about aging and speed of processing, in which they reported correlations between −.23 and −.68. In our study, we found correlations between −.36 and −.40. Verhaeghen and Salthouse (1997) also performed a meta-analyses on 34 studies in which the relation between age and the Working Memory index was studied. They found that correlations varied between .03 and −.48. In our study, we found a range between −.02 until −.39. These negative correlation coefficients are consistent with previous studies, and indicate that the number of items that could be remembered immediately decreases with age and affects the total score. Overall, most of the correlations between WAIS-IV subtests with age, showed significant and negative correlations. Verhaeghen and Salthouse (1997) suggest that almost all aspects and type of information processing are affected by age, causing a broad decline in many facets of cognitive functioning when people get older. They concluded that there are two types of general factors that explain the cognition differences in adult. The first type refers to basic and relatively pervasive loss in processing speed. The second type refers to the ability to preserve information in a temporary working memory store while processing is carried out. These two types are not independent because the proportion of age related variance in cognitive performance that is related to working memory capacity is also shared to large extent with processing speed measures. Like in their study our results also show that the subtests from the Processing Speed index and most of Working Memory index are correlated with age. We did not find this correlation for AR from the Working Memory index, because this subtest also depends on the more crystallized verbal and quantitative capacities.
Comparing the reliability coefficients from the Indonesian adaptation of the WAIS-IV to the US version (Wechsler, 2008b), all reliability coefficients are acceptable to good (although in general somewhat lower than the US coefficients). Moreover, the subtests in the Verbal Comprehension index have an excellent inter-rater agreement reliability. A possible weakness is the probing used by examiner. We tested participants from different educational and ethnic backgrounds. Some of the participants needed explanations in their regional language, such as Javanese, Sundanese, or Mandarin. Sometimes the participants gave mixed responses in regional language and Indonesian language, but the examiner sometimes was less familiar with the regional language. For further research, we recommend to recruit examiners from different ethnic backgrounds who understand the regional language. Moreover, recruitment from this study was limited to Jakarta and its surroundings. For further research, the geographical area has to be expanding to other areas of Indonesia. Also, a larger sample is recommended to replicate the findings of this study and to analyze the construct validity of the translated version.
Overall, the results of this first study provide a strong foundation for developing the Indonesian version of WAIS-IV further. It is psychometrically promising; the reliabilities of the subtests are generally good, and have a high inter-rater agreement. The next step is to explore the factorial structure of the Indonesian WAIS-IV using a larger sample.