Investigating Differences between Genders in an English Entrance Test in a Public University


  • Kamarul Ariffin Ahmad Faculty of Languages and Communication, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, MALAYSIA
  • Izazol Idris Faculty of Human Development, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, MALAYSIA



DIF, entrance test, gender, Rasch


English entrance test is designed to select candidates to join English programmes that are offered in public universities. Many entrance test instruments have been developed to serve that purpose but there are quite a number of instruments which are developed by the lecturers are left without further investigation on how the instruments work to all the test-takers. This research analysed reading, vocabulary and grammar instrument that has been used in a public university in Malaysia as the entrance test for its English programme. The paper consists of 30 multiple-choice questions (10 questions for each section) and was administered to selected candidates who scored a minimum B+ grade in the Sijil Pelajaran Malaysia (SPM) examination. This research aimed to identify the reliability of the instrument, how each gender interacted with each of the items based on the comparison of the facility indices, to identify the DIF items and finally to classify them based on CEFR, Bloom’s taxonomy and grammar topic. Based on the analysis with Winsteps, this research found that the reliability of the instrument is very good (.98), female test-takers are better when it comes to higher vocabularies in CEFR, male candidates perform better than female counterparts when it comes to preposition and tenses, female candidates perform well when it comes to higher rank of items based on Bloom’s taxonomy, and there are three items flagged as showing DIF. This research concluded that, the instrument requires further improvements even though the reliability index is high. Some item which were flagged as showing DIF need to be reviewed and either be replaced or re-write.


Download data is not yet available.


Alderson, J. C. (2000). Techniques for testing reading. Assessing Reading, 202-270. doi:10.1017/cbo9780511732935.008

Alderson, J. C., Clapham, C., & Wall, D. (2010). Language test construction and evaluation. Cambridge: Cambridge Univ. Press.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.

Amirian, Seyed Mohammad & Ghonsooly, Behzad & Amirian, Seyedeh. (2020). Investigating Fairness of Reading Comprehension Section of INUEE: Learner’s Attitudes towards DIF Sources. International Journal of Language Testing, 10(2), 88-100.

Arung, F. (2013). Testing Reading. 10.13140/RG.2.1.4171.8484.

Aryadoust, V., 2017. Rasch Measurement using WINSTEPS. [video] Available at:

Aryadoust, V. (2018, Jan 21). Rasch Measurement Unidimenstionality and Local Independence (Part 2) [video]. Youtube.

Bond, T., & Fox, C. (2015). Applying the Rasch Model Fundamental Measurement in the Human Sciences (3rd ed.). Routledge.

Boyer, M. (2020, October 1). Accountability. Retrieved from

Cauffman, E., & Macintosh, R. (2006). A Rasch Differential Item Functioning Analysis of the Massachusetts Youth Screening Instrument. Educational and Psychological Measurement, 66(3), 502-521. doi:10.1177/0013164405282460

Chen WH., Revicki D. (2014) Differential Item Functioning (DIF). In: Michalos A.C. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht.

Coniam, D., & Falvey, P. (2013). Ten years on: The Hong Kong Language Proficiency Assessment for Teachers of English (LPATE). Language Testing, 30(1), 147-155.

Coombe, C., Folse, K. & Hubley, N. (2007). A Practical Guide to Assessing English Language Learners. Ann Arbor: The University of Michigan Press

Coombe, C., & Davidson, P. (2014). Common Educational Proficiency Assessment (CEPA) in English. Language Testing, 31(2), 269-276.

Davies, A. (2010). Test fairness: A response. Language Testing, 27(2), 171-176. doi:10.1177/0265532209349466

Differential Item Functioning. (2021, September 30). Retrieved from

Ercikan, K., Roth, W.-M., Simon, M., Sandilands, D., & Lyons-Thomas, J. (2014). Inconsistencies in DIF detection for sub-groups in heterogeneous language groups. Applied Measurement in Education, 27(4), 273–285.

Gezer, M., Oner Sunkur, M. & Sahin, F. (2014). An Evaluation of the Exam Questions of Social Studies Course According to Revised Bloom’s Taxonomy. GESJ: Education Science and Psychology, 28(2), 3–17.

Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8(3), 237–250.

Jones, L. (2004). Testing L2 vocabulary recognition and recall using pictorial and written test items. Language Learning & Technology, 8(3), 122-143.

Kamarul Ariffin Ahmad, Muhamad Lothfi Zamri & Nora Liza Abdul Kadir. (2015). An Investigation of the Frequency of HOT and LOT of Bloom Taxonomy in the Diploma English Entrance Exam. AJELP: Asian Journal of English Language and Pedagogy, 3, 228-241.

Kan, A., & Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264.

Khorramdel, Lale & Pokropek, Artur & Joo, Seang-Hwane & Kirsch, Irwin & Halderman, Laura. (2020). Examining gender DIF and gender differences in the PISA 2018 reading literacy scale: A partial invariance approach. Psychological Test and Assessment Modeling. 62. 179-231.

Kitao, S. K., & Kitao, K. (1996). Testing Reading (ED398258). ERIC.

Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages for PISA science items. International Journal of Testing, 9(2), 122–133.

Linacre, M. (2021). (n.d.). Table 30.1 Differential item functioning DIF pairwise. Winsteps.Com.

Moghadam, M., & Nasirzadeh, F. (2020). The application of Kunnan’s test fairness framework (TFF) on a reading comprehension test. Language Testing in Asia, 10(1). doi:10.1186/s40468-020-00105-2

Nation, P., & Doughty, C. (2009). Teaching and Testing Vocabulary. In T. Chung & M. Long (Eds.), The Handbook of Language Teaching (pp. 543-559). doi:10.1002/9781444315783.ch28

Oliveri, María Elena, Ercikan, K., & Zumbo, B. D. (2014). Effects of population heterogeneity on accuracy of DIF detection. Applied Measurement in Education, 27(4), 286–300.

Oliveri, Maria Elena, Ercikan, K., Lyons-Thomas, J., & Holtzman, S. (2016). Analyzing fairness among linguistic minority populations using a latent class differential item functioning approach. Applied Measurement in Education, 29(1), 17–29.

Park, H.-S., Pearson, P. D., & Reckase, M. D. (2005). Assessing the effect of cohort, gender, and race on differential item functioning (dif) in an adaptive test designed for multi-age groups. Reading Psychology, 26(1), 81–101.

Peng, Y., Yan, W., & Cheng, L. (2020). Hanyu Shuiping Kaoshi (HSK): A multi-level, multipurpose proficiency test. Language Testing, 38(2), 326-337.

Rezaee, A., Shabani, E. (2010). Gender Differential Item Functioning Analysis of the University of Tehran English Proficiency Test. Research in Contemporary World Literature, 14(56).

Salvia, J., & Ysseldyke, J. E. (2001). Assessment. Boston, MA: Houghton Mifflin.

Shamsuddin, Hasni, Abd. Razak, Nordin, Thien, Lei Mee, Khairani, Ahmad Zamri. (2020). Do boys and girls interpret mathematics test items similarly? Insights from Rasch model analysis. Asia Pacific Journal of Educators and Education, 35(1), 17–36.

Shermis, M. D., Mao, L., Mulholland, M., & Kieftenbeld, V. (2017). Use of automated scoring features to generate hypotheses regarding language-based DIF. International Journal of Testing, 17(4), 351–371.

Siti Rahayah Ariffin, Rodiah Idris, & Noriah Mohd Ishak. (2010). Differential Item Functioning in Malaysian Generic Skills Instrument (MyGSI). Jurnal Pendidikan Malaysia, 35(1), 1-10.

Sumintono, Bambang & Widhiarso, Wahyu. (2015). Aplikasi Pemodelan Rasch pada Assessment Pendidikan.

Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25(3), 246–280.

Webb, S. A.; Sasao, Y. (2013). New Directions In Vocabulary Testing. RELC Journal, 44(3), 263–277. doi:10.1177/0033688213500582

Wedman, J. (2018). Reasons for gender-related differential item functioning in a college admissions test. Scandinavian Journal of Educational Research, 62(6), 959–970.

Zhao, C., & Liu, C. (2019). An evidence-based review of Celpe-Bras: The exam for certification of proficiency in Portuguese as a foreign language. Language Testing, 36(4), 617-627.



How to Cite

Ahmad, K. A., & Idris, I. (2022). Investigating Differences between Genders in an English Entrance Test in a Public University. AJELP: Asian Journal of English Language and Pedagogy, 10(2), 56–67.