Differential Item Functioning in Online Learning Instrument (EPFun)


Providing young learners with high-quality digital primary education and literacy skills are necessary for them to excel academically. The development of the online Fun Learning Instrument known as English Primary Fun (EPFun) is based on the Piaget Constructivism Development theory. This theory could be applied to young learners as it includes cognitive, behaviourist and second-language acquisition theory as well. The purpose of this study is to detect Differential Item Functioning (DIF) in performances between genders among Malaysian young learners in rural primary schools. The data were obtained from the randomly selected respondents of 106 male and 144 female young learners from the target group. The EPFun instrument consists of four constructs: a) Usefulness (USFN), b) Ease of Use (EOU), c) Ease of Learning (EOL) and d) Satisfaction (SAT). To analyse the responses to the instruments, 40 items overall were made up using 5 levels of coded smileys from the 4 constructs. The data were analysed using SPSS and Winsteps version 3.68.2, a Rasch-based item analysis programme. The findings indicate there is no significant difference in DIF performance between male and female young learners. This conclusion is derived from the t-test for Equality of Means, which revealed the results of the Sig. (2 tailed) for every construct (USFN; Sig = 0.558; EOU; Sig = 0.638; EOL; Sig = 0.628 and SAT; Sig = 0.500) larger than 0.5. The findings also revealed that there were only 7 items detected as DIF. However, there is no significant DIF based on the size range of the logits scales. Female young learners excel in one construct-Satisfaction (SAT). However, male young leaners are on a par with female young learners with regard to other constructs. Thus, this instrument is free from DIF and could be used as an indicator to gauge young learners’ literacy skills, especially in using hypermedia reading materials in English.

Share and Cite:

Aziz, J. , Mohamad, M. , Shah, P. and Din, R. (2016) Differential Item Functioning in Online Learning Instrument (EPFun). Creative Education, 7, 180-188. doi: 10.4236/ce.2016.71018.

Received 25 November 2015; accepted 26 January 2016; published 29 January 2016

1. Introduction

As they advocate a 21st-century learning environment, educators are highly encouraged to promote meaningful learning contexts. Hence, in order to promote meaningful learning contexts, educators should conduct activities which integrate the usage of technology (Juhaida, 2014; Nachmias & Segev, 2003) . This is because the net- generations are interested in and surrounded by new digital technologies. Some of them already have the relevant knowledge and skills in technologies. This exposure facilitates them in communicating, interacting, and reflecting constructively during their learning process (Nachmias & Segev, 2003; Maslawati et al., 2011; Zaharudin et al., 2011; Juhaida, 2014; Juhaida et al., 2014) . The digital environment also promotes fun and authentic learning, which is an essential part of young learners’ lives (Cauffman & MacIntosh, 2006; Rosseni, 2010; Rahamat et al., 2011) .

2. Methodology

This study used a survey design. The randomly selected sample consisted of 106 male and 144 female young learners from rural primary schools in Malaysia. The data gathered were analysed using SPSS and Winsteps version 3.68.2, a Rasch-based item analysis programme. The EPFun instrument was developed by a group of researchers (Juhaida, 2014; Juhaida et al., 2014; Rosseni et al., 2011) . The EPFun instrument consists of 40 items that examine four constructs: a) Usefulness (10 items), b) Ease of Use (10 items), c) Ease of Learning (10 items) and d) Satisfaction (10 items). Subsequently, in order to measure the data gathered, Item Response Theory (IRT) that employs Rasch measurement analysis was utilized to identify the DIF in the instrument. This step is necessary to ensure the quality of the items. Therefore, the items could be further improved, and thus, gender bias could be avoided or reduced.

3. Results

Problem Solution with Rasch-Based Item Analysis

Table 1 and Table 2 display the group statistics to verify the reliability and validity of the items used in the instrument. They are also used to show if there are any differences in mean between the male and female groups in answering the four constructs using the EPFun survey instrument. Based on both tables, it can be concluded that there is no significant difference in Web-based Fun learning for male and female young learners.

Table 1. Group statistics.

Table 2. Independent samples test.

Note: Usefulness (USFN); Ease of Use (EOU); Ease of Learning (EOL) and Satisfaction (SAT).

To verify whether gender bias existed in the four constructs of the EPFun instrument, further analysis using Winsteps version 3.68.2, a Rasch-based item analysis programme, was administered. Rasch analysis converts raw data from scores to logits. The logits are compared to a linear model to find its odds of success. The logits are within the range of 0 to 1. It is stated that reliability is measured by the ability of the scale to locate the level of the attribute (Bond & Fox, 2011) . The purpose is to ascertain its validity; even though the same constructs are given to other groups of respondents in different environments, the same ability can be produced (Bond & Fox, 2011; Tennant & Pallant, 2007) .

Two significant elements of validity are criterion and construct (Linacre, 2006) . The criterion-related validity analyses the ability to calculate an outcome. Construct validity observes whether the items used are able to reflect the construct measured. As stated by Linacre (2006) , point-measured correlations should be positive. Every item should add a significant approach to the construct (Tennant & Pallant, 2007; Bond & Fox, 2011) . The suitable item is calculated by the mean-square residual fit statistics (Bond & Fox, 2011) . Fit statistics expected value is 1.0, and ranges from 0 to infinity. Abnormalities represent a lack of fit between the items and the model. The lower values than expected can be interpreted as item redundancy or overlap. Bond and Fox (2011) recommended that item mean square for Infit and Outfit scale (Likert/survey) ranges from 0.6 to 1.4.

Table 3 and Table 4 illustrate the statistical summary of items and person reliability index, respectively. The item reliability index is 0.95, whereas the person reliability index is 0.91. This value is positive, as it is near to 1.0 (Bond & Fox, 2011) . Hence, EPFun item repetition prediction is also high if it is being conducted on other groups of respondents with similar capabilities (Bond & Fox, 2011) . Item separation index is 4.27. It indicates that EPFun items could be divided into 4 strata. Moreover, this provision connotes that the items are 4 times more distributed from mean square error. The items and person reliability index respectively ≥0.8 is acceptable and strong (Bond & Fox, 2011) . In addition, the separation index of ≥2.0 is also acceptable (Bond & Fox, 2011) .

Table 5 illustrates a summary of point measure correlation (PTMEA CORR) for 40 items in EPFun. All items show a positive value with the index >0.13. The minimum index is 0.13 for item A4 (Usefulness). The maximum index is 0.84 of item D39 (Satisfaction). The positive value of PTMEA CORR indicates that the measuring items should be carefully constructed. All items in EPFun are to measure 4 constructs. This analysis is the basic step to measure the construct validity used to build and validate the EPFun instrument. PTMEA CORR index

Table 3. Summary Statistics of Item Reliability Index.

Table 4. Summary statistics of person reliability index.

Table 5. Point measure correlation (PTMEA CORR) of EPFun constructs.

can be increased if misfitting items are dropped from the cluster item measurement.

Figure 1 shows the numbers of respondents and the difficulty of items capability hierarchy above a logit scale. The results confirm that all items are scattered and point towards the respondents’ capability level. The ranking of respondents with high capability (easy to agree) is above the scale, whilst the ranking of low ability respondents (difficult to agree) is below the scale. The items that are difficult to agree upon are A4 and B13, while difficulty to be measured is 1.58 logit on the top scale. In contrast, the simplest items to agree upon are items C25 and D37 with the measurement of −1.85 logit on the lower scale. The item perceived to be difficult could be answered by respondents with high capability. On the other hand, respondents could answer easy items with both high and low capability (Bond & Fox, 2011; Linacre, 2006) . Overlapping items measure different elements with diverse levels of difficulty (Bond & Fox, 2011) .

DIF exists when a group of respondents manages to score higher than another group on the same item. In this study, this phenomenon could be observed in Figures 2-5. Hence, to support this finding, item parameters should be similar across the population. As such, to determine if DIF exists in the instrument used, three indicators were used (Bond & Fox, 2011; Tennant & Pallant, 2007) , namely:

i) t value of < −2.0 or > 2.0

ii) DIF contrast value of < −0.5 or > 0.5

iii) p (Probability) value < 0.05 or > −0.05

The three indicators were examined thoroughly. An item is considered biased if all three conditions appear

Figure 1. Map of EPFun person-item.

Figure 2. DIF for usefulness construct.

Figure 3. DIF for ease of use construct.

and it should be dropped from the instrument. If the item only meets one of the conditions, it should not be rejected but it should be separated and fixed (Rosseni et al., 2011) . With respect to the three indicators given, DIF from EPFun constructs can be determined from Figures 2-5.

Figure 2 reveals a good pattern for both male and female respondents, as there is not much gap or distance between both lines (Blue line 1 and Red line 2). Nevertheless, items A06 to A10 exhibit a small gap between the

Figure 4. GDIF for ease of learning construct.

Figure 5. GDIF for satisfaction construct.

two lines. This suggests the items are more difficult for male respondents (Blue line 1) and easier for female respondents (Red line 2). The t-test for Equality of Means revealed the result of the Sig. (2 tailed) as 0.558, which is more than 0.5 as the cut-off point for DIF. The visible distance is very close and not much difference can be measured. Thus, it is safe to conclude that there is no gender bias for items in the first construct (Usefulness).

For the second construct as illustrated in Figure 3, the same phenomenon exists. Both lines do not display any significant differences within the logit scales. The t-test for Equality of Means revealed the result of the Sig. (2 tailed) as 0.638, which is more than 0.5 for DIF contrast value. Again, the visible distance is very close, and not much difference can be measured except for items B11, B12, B17, B19 and B20. Nevertheless, all ten items in the second construct (Ease of Use) donot show any sign of bias, whereas even the hardest item, B12, appears to be easier for male respondents. Item B17 appears to be easier for female respondents. However, the values indicated in the logit scales are not strong enough to make it a biased item. Hence, the conclusion can be drawn that all items that measure the second construct (Ease of Use) show fairness to both genders.

Similar to the first and second constructs, the third construct (Ease of Learning) in Figure 4 also reveals the same phenomenon. DIF contrast value is 0.628, above 0.5. From the graph, the lines indicate that there is a big gap between male and female respondents. Out of ten items in this construct, six items, namely C21, C23, C26, C28, C29 and C30 exhibit distance between the two lines. This suggests that items C21, C23 and C26 are more difficult for male respondents (Blue line 1) and easier for female respondents (Red line 2). Items C28, C29, C30 reflect otherwise. With respect to the items in EPFun questionnaire, items C21, C23 and C26 are asking one’s opinion on how to use the blog. Asking one’s opinion and expecting an answer that can be neither right nor wrong could be considered as a higher-order thinking skill (HOTs) (Juhaida, 2014) . Generally, females are known to be able to exercise HOTs better than males (Din et al., 2011; Prensky, 2001; Prensky, 2007). This explanation can be accepted because males tend to take matters lightly, especially if they do not trigger their interest. Males seldom focus on details or irrelevant issues (Hanizah et al., 2006; Sheppard et al., 2006 ).

For the last construct (Satisfaction), as can be referred to in Figure 5, seven out of ten items are detected to have a gap between both lines (Blue line 1 and Red line 2). This indicates gender bias. The items D31, D38, D39 and D41 are identified to be easier for male respondents. Items D32, D35 and D37 reveal otherwise. In order to support the issue, the t-test for Equality of Means revealed the result of the Sig. (2 tailed) as 0.5, which is equal to 0.5 for DIF contrast value. The value is not much contradicted to the cut of value of the indicator. Though the items indicate gender bias, researchers were reluctant to drop the items. This is because moderate bias items are considered not problematic, especially when its direction is unsystematic (Tennant & Pallant, 2007) . Therefore, the items need to be separated, fixed and then reused in the instrument.

Meanwhile, several astonishing findings were revealed with regards to the issue discussed (Sheppard et al., 2006; Zaharudin et al., 2011) . Though gender bias has been detected in some items under the construct (Satisfaction), it could not be the reason for the researchers to drop these items. The findings of this study reveal otherwise, indicating that there is no significant difference between male and female young learners in gaining fun during web-based learning. These findings are parallel with another study, which indicates that itembias does not adversely affect the measurement quality and predictive validity of the overall instrument (Sheppard et al., 2006) . Hence, there is no difference with regard to the capability of male and female young learners in Fun learning during web-based learning, as perceived by some scholars (Prensky, 2001; Rahamat et al., 2011) . They believe that male young learners are expected to be more powerful in certain aspects, especially physical endurance. Female young learners, on the other hand, are more at ease with cognitive matters, namely mathematic and scientific areas. Female young learners are also capable of achieving good results in the domain of technology. For these reasons, it is imperative that a digital learning environment be developed for both genders. Despite all the issues discussed, DIF analysis indicates that female young learners are prone to possess more non-verbal intelligence than male young learners. Male young learners, on the other hand, are prone to have more verbal intelligence (Hanizah et al., 2006). Each learner has his unique learning style; namely, learning strategies, cognitive levels, types of instruction to which they respond best, and perceptions and attitudes toward the nature of knowledge.

4. Conclusion

It is safe to conclude that EPFun is a valid and reliable instrument, since only 7 out of 40 items were detected to be gender biased. The solution was to simplify and improve the sentence construction. They could be retained and easily understood by the young respondents. Generally, Malaysian young learners are robust in struggling to be proficient in their field (Din et al., 2011). The issue of gender bias could be eliminated or reduced only by the educators’ correct perceptions. Whether an instrument is created or adapted, the process of measuring the reliability and validity of the instrument should be the main focus of any researcher.

New and appropriate teaching approaches hopefully will increase students’ achievement and motivation to learn (El-Bakry et al., 2011; Nachmias & Segev, 2003; Norman et al., 2011; Prensky, 2007; Wheeler, 2011; Wood, 2010) . With the awareness that the development of web-based Fun learning is possible within the context of the primary educational system in Malaysia, appropriate measures could be taken to attract young learners, regardless of their gender differences toward new multimodal modes of learning. In a nutshell, DIF benchmarking could also be administered in primary schools throughout Malaysia.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Bond, T. G., & Fox, C. M. (2011). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. NJ: Lawrence Erlbaum Association Publishers.
[2] Cauffman, E., & MacIntosh, R. (2006). A Rasch Differential Item Functioning Analysis of the Massachusetts Youth Screening Instrument: Identifying Race and Gender Differential Item Functioning among Juvenile Offenders. Educational and Psychological Measurement, 66, 502-521.
[3] El-Bakry, H. M., Saleh, A. A., Asfour, T. T., & Mastorakis, N. (2011). A New Adaptive e-Learning Model Based on Learner’s Styles. In M. Demiralp, Z. Bojkovic, & A. Repanovici (Eds.), Mathematical Methods & Techniques in Engineering and Environmental Science, Proceedings of the 13th WSEAS International Conference on Mathematical and Computational Methods in Science and Engineering (MACMESE’II).
[4] Juhaida, A. A. (2014). Development and Evaluation of a Web-Based Fun Reading English for Primary Learners. Ph.D. Thesis, National University of Malaysia.
[5] Juhaida, A. A., Parilah. M. S., & Rosseni, D. (2014). A Paradigm in Education: Validation of Web-Based Learning for Young Learners. International Journal of Education and Information Technologies, 8, 288-293.
[6] Linacre, J.M. (2006). WINSTEPS: Facets Rasch Measurement Computer Program. Chicago: Winsteps. Com.
[7] Maslawati, M., Azura, O., Supyan, H., & Zaini, A. (2011). Evaluating the Effectiveness of e-Learning in Language Classrooms. E-ACTIVITIES’11 Proceedings of the 10th WSEAS International Conference on e-Activities, 112-117.
[8] Nachmias, R., & Segev, L. (2003). Students’ Use of Content in Web-Supported Academic Courses. Internet and Higher Education, 6, 145-157.
[9] Norman, H., Rosseni, D., & Nordin, N. (2011). A Preliminary Study of an Authentic Ubiquitous Learning Environment for Higher Education. Proceedings of the 10th WSEAS e-Activities.
[10] Prensky, M. (2001). Digital Natives, Digital Immigrants. On the Horizon, 9, 1-6.
[11] Rahamat, R., Shah, P. M., Rosseni, D., Nor, R., Puteh, S., Mohamed Amin, E., & Abdul Aziz, J. (2011). Learners’ Evaluation of an e-Learning Material. Proceedings of the 10th WSEAS e-Activities.
[12] Rosseni, D. (2011). Development and Validation of an Integrated Meaningful Hybrid e-Training (I-Met) for Computer Science: Theoretical-Empirical Based Design and Development Approach. Ph.D. Thesis, Universiti Kebangsaan Malaysia.
[13] Rosseni, D., Norazah, N. K., Jusof, F., Sahar, M., Nordin, I., Zakaria, M. S., Khairul Anwar, M., & Mohamed Amin, E. (2011). Hybrid E-Training Measurement Tool: Reliability and Validity. Middle East Journal of Scientific Research, 7, 40-45.
[14] Rosseni, D., Verawati, M. F. K., & Johar, N. (2011). Gender Differential Item Functioning (GDIF) Analysis for the Meaningful e-Learning Instrument. Proceedings of the 10th WSEAS e-Activities, 40-45.
[15] Sheppard, R., Kyunghee, H., Colarelli, S. M., Guangdong, D., & King, D. W. (2006). Differential Item Functioning by Sex and Race in the Hogan Personality Inventory. Assessment, 13, 442-453.
[16] Siti Rahayah, A., Noriah, M. I., Abdul Ghafur, A., Rodiah, I., & Nur’ Ashiqin, N. (2008a). Communication, Leadership, and Teamwork Skills as Core Competencies among Higher Education Students. Proceedings of the ASAIHL International Conference, 7-10 April, Bangkok, 149-158.
[17] Siti Rahayah, A., Rodiah, I., & Noriah, M. I. (2008b). Profil kemahiran generik pelajar-pelajar institut pengajian tinggi: Kajian kes di Universiti Kebangsaan Malaysia (UKM). Seminar Kebangsaan Jawatankuasa Penyelaras Pendidikan Guru.
[18] Stoneberg Jr., B. D. (2004). A Study of Gender-Based and Ethnic-Based Differential Item Functioning (DIF) in the Spring 2003 Idaho Standards Achievement Tests Applying the Simultaneous Bias Test (SIBTEST) and the Mantel-Haenszel Chi Square Test. Internship in Measurement and Statistics, 1-15.
[19] Tennant, & Pallant, J. (2007). SPSS Survival (3rd Ed.). NY: Open University Press.
[20] Wheeler, S. (2011). Teacher Resistance to New Technologies: How Barriers to Web Enhanced Learning Can Be Overcome. In G. Trentin, & M. Repetto (Eds.), Faculty Training for Web Enhanced Learning, Nova Science.
[21] Wood, S. L. (2010). Technology for Teaching and Learning. International Journal of Teaching and Learning in Higher Education, 22, 299-307.
[22] Zaharudin, R., Nordin, N. M., Yasin, M. H. M., Din, R., & Embi, M. A. (2011). Observation on the Deaf Students’ Interaction in Learning ICT-Courses. Proceedings of the 10th WSEAS e-Activities.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.