Test Equating in Educational Assessment: A Comprehensive Framework for Promoting Fairness, Validity, and Cross-Cultural Equity
DOI:
https://doi.org/10.37134/ajatel.vol14.1.7.2024Keywords:
Test Equating, Educational assessment, Fairness, Validity, Cross-cultural equityAbstract
This study presents a comprehensive conceptual framework for equating in educational assessment, aimed at enhancing the accuracy, validity, and fairness of equating outcomes. The framework emphasizes the importance of considering sample characteristics, statistical assumptions, model fit, advancements in equating methodology, the integration of technology, and the factors of equity and fairness. By incorporating these elements, educational institutions can improve their equating practices and support equitable and fair evaluation processes. The framework also impacts policy-making and educational assessment procedures, providing a foundation for evidence-based policies that promote accountability and effective evaluation. Policymakers can use this framework to develop policies that ensure fair and valid assessment practices. Additionally, the study highlights the critical role of empirical research in validating and refining the framework, advocating for the exploration of cross-cultural equating methodologies to address diverse cultural contexts in education. To further advance the profession, the study suggests conducting empirical studies, embracing technology, fostering collaboration, increasing reporting standards, training practitioners, and monitoring equating practices. These efforts will help ensure more accurate, fair, and valid equating outcomes. This study offers valuable insights into equating in educational assessments, providing a robust basis for enhancing fairness, validity, and cross-cultural equity in educational evaluations.
Downloads
References
Adetutu, O., & Lawal, H. (2022). Applications of Item Response Theory models to assess item properties and students’ abilities in dichotomous responses items. Open Journal of Educational Development (ISSN: 2734-2050). https://doi.org/10.52417/ojed.v3i1.304.
Adhikari, G. (2021). Calculating the Sample Size in Quantitative Studies. Scholars' Journal. https://doi.org/10.3126/scholars.v4i1.42458.
Alba, A., Alexander, P., Chang, J., Macisaac, J., DeFry, S., & Guyatt, G. (2016). High statistical heterogeneity is more frequent in meta-analysis of continuous than binary outcomes.. Journal of clinical epidemiology, 70, 129-35 . https://doi.org/10.1016/j.jclinepi.2015.09.005.
Albano, A. (2015). A General Linear Method for Equating with Small Samples.. Journal of Educational Measurement, 52, 55-69. https://doi.org/10.1111/JEDM.12062.
Algina, J., & Swaminathan, H. (2015). Psychometrics: Classical Test Theory. , 423-430. https://doi.org/10.1016/B978-0-08-097086-8.42070-2.
Alordiah, C. O. (2022). An examination of the latent constructs in a well-being scale for children: Application of Rasch Model. University of Delta Journal of Contemporary Studies in Education, 1(2), 39-57.
Alordiah, C. (2015). Comparison of index of Differential Item functioning under the methods of Item Response theory and classical test theory in Mathematics. An unpublished Ph. D thesis of Delta State University, Abraka, Delta State, Nigeria.
Austin, J. (2019). Classical Test Theory and Music Testing. The Oxford Handbook of Assessment Policy and Practice in Music Education, Volume 1. https://doi.org/10.1093/OXFORDHB/9780190248093.013.21.
Bais, F., Schouten, B., Lugtig, P., Toepoel, V., Arends-Tóth, J., Douhou, S., Kieruj, N., Morren, M., & Vis, C. (2019). Can Survey Item Characteristics Relevant to Measurement Error Be Coded Reliably? A Case Study on 11 Dutch General Population Surveys. Sociological Methods & Research, 48, 263 - 295. https://doi.org/10.1177/0049124117729692.
Born, S., Fink, A., Spoden, C., & Frey, A. (2019). Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01277
Brzezińska, J. (2018). Item response theory models in the measurement theory. Communications in Statistics - Simulation and Computation, 49, 3299 - 3313. https://doi.org/10.1080/03610918.2018.1546399.
Campbell, I. (2019). Test Equating Requirements from an SEM Perspective. Multivariate Behavioral Research, 54, 147 - 148. https://doi.org/10.1080/00273171.2018.1555748.
Casson, R., & Farmer, L. (2014). Understanding and checking the assumptions of linear regression: a primer for medical researchers. Clinical & Experimental Ophthalmology, 42. https://doi.org/10.1111/ceo.12358.
Edwards, M., Houts, C., & Wirth, R. (2018). Measurement invariance, the lack thereof, and modeling change. Quality of Life Research, 27, 1735-1743. https://doi.org/10.1007/s11136-017-1673-7.
Foster, R. (2019). A generalized framework for classical test theory. Journal of Mathematical Psychology. https://doi.org/10.31234/osf.io/4j9vt.
Glas, C. (2014). Item response theory in educational assessment and evaluation. , 31, 19-34. https://doi.org/10.7202/1025005AR.
Gross, A., Kueider-Paisley, A., Sullivan, C., & Schretlen, D. (2019). Comparison of Approaches for Equating Different Versions of the MMSE Administered in 22 Studies.. American journal of epidemiology. https://doi.org/10.1093/aje/kwz228.
Hajian-Tilaki, K. (2014). Sample size estimation in diagnostic test studies of biomedical informatics. Journal of biomedical informatics, 48, 193-204 . https://doi.org/10.1016/j.jbi.2014.02.013.
Haberman, S. (2015). Pseudo-Equivalent Groups and Linking. Journal of Educational and Behavioral Statistics, 40, 254 - 273. https://doi.org/10.3102/1076998615574772.
Hori, K., Fukuhara, H., & Yamada, T. (2020). Item response theory and its applications in educational measurement Part II: Theory and practices of test equating in item response theory. Wiley Interdisciplinary Reviews: Computational Statistics, 14. https://doi.org/10.1002/wics.1543.
Huggins, A. (2014). The Effect of Differential Item Functioning in Anchor Items on Population Invariance of Equating. Educational and Psychological Measurement, 74, 627 - 658. https://doi.org/10.1177/0013164413506222
Kim, S., Lee, W., & Kolen, M. (2020). Simple-Structure Multidimensional Item Response Theory Equating for Multidimensional Tests. Educational and Psychological Measurement, 80, 125 - 91. https://doi.org/10.1177/0013164419854208.
Lakens, D., Scheel, A., & Isager, P. (2018). Equivalence Testing for Psychological Research: A Tutorial. Advances in Methods and Practices in Psychological Science, 1, 259 - 269. https://doi.org/10.1177/2515245918770963.
Lee, E. (2013). Equating multidimensional tests under a random groups design: A comparison of various equating procedures. . https://doi.org/10.17077/ETD.QPBFYMEI.
Leontaridou, M., Gabbert, S., & Landsiedel, R. (2019). The impact of precision uncertainty on predictive accuracy metrics of non-animal testing methods.. ALTEX. https://doi.org/10.14573/altex.1810111.
Leôncio, W., Wiberg, M., & Battauz, M. (2022). Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods. Applied Psychological Measurement, 47, 123 - 140. https://doi.org/10.1177/01466216221124087.
Mair, P. (2018). Item Response Theory. , 95-159. https://doi.org/10.1007/978-3-319-93177-7_4.
Pena, C., Costa, M., & Oliveira, R. (2018). A new item response theory model to adjust data allowing examinee choice. PLoS ONE, 13. https://doi.org/10.1371/journal.pone.0191600.
Reyhanlioğlu, Ç., & Doğan, N. (2020). An Analysis of Parameter Invariance according to Different Sample Sizes and Dimensions in Parametric and Nonparametric Item Response Theory. , 11, 98-112. https://doi.org/10.21031/epod.584977.
Schielzeth, H., Dingemanse, N., Nakagawa, S., Westneat, D., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N., Garamszegi, L., & Araya-Ajoy, Y. (2020). Robustness of linear mixed‐effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11, 1141 - 1152. https://doi.org/10.1111/2041-210X.13434.
Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A Review of Test Equating Methods with a Special Focus on IRT-Based Approaches. Statistica, 77(4), 329–352. https://doi.org/10.6092/issn.1973-2201/7066
Sun, T., & Kim, S. (2021). Evaluating Six Approaches to Handling Zero-Frequency Scores under Equipercentile Equating. Measurement: Interdisciplinary Research and Perspectives, 19, 213 - 235. https://doi.org/10.1080/15366367.2020.1855034.
Varas, I., González, J., & Quintana, F. (2020). A Bayesian Nonparametric Latent Approach for Score Distributions in Test Equating. Journal of Educational and Behavioral Statistics, 45, 639 - 666. https://doi.org/10.3102/1076998620907381.
Qiu, Y., Liu, L., Lai, X., & Qiu, Y. (2019). An Online Test for Goodness-of-Fit in Logistic Regression Model. IEEE Access, 7, 107179-107187. https://doi.org/10.1109/ACCESS.2019.2927035.
Wang, L., Liu, Y., Wu, W., & Pu, X. (2013). Sequential LND sensitivity test for binary response data. Journal of Applied Statistics, 40, 2372 - 2384. https://doi.org/10.1080/02664763.2013.817546.
Wu, J., & Drton, M. (2023). Partial Homoscedasticity in Causal Discovery With Linear Models. IEEE Journal on Selected Areas in Information Theory, 4, 639-650. https://doi.org/10.1109/JSAIT.2023.3328476.
Yuan, S., Zhao, S., & He, Z. (2011). Test equating and model application. 2011 International Conference on Computer Science and Service System (CSSS), 3640-3643. https://doi.org/10.1109/CSSS.2011.5974614.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Caroline Alordiah, John Oji
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.