Login

IRT parameter estimation with Bayesian MCMC methods for small samples in Islamic schools

Vol. 5 No. 1 (2025):

Muhammad Ali Gunawan (1), Nor Syamimi Mohamed Adnan (2), Ari Setiawan (3)

(1) Sekolah Tinggi Agama Islam Ki Ageng Pekalongan, Indonesia
(2) Universiti Malaysia Perlis, Malaysia
(3) Universitas Sarjanawiyata Tamansiswa, Indonesia
Fulltext View | Download

Abstract:

This study aims to estimate item parameters in Item Response Theory (IRT) using the Bayesian Markov Chain Monte Carlo (MCMC) method in the context of Islamic schools in Pekalongan Regency/City, where small sample sizes pose a challenge. Unlike conventional methods such as maximum likelihood estimation, which tend to yield biased results with limited data, Bayesian MCMC incorporates prior knowledge and contextual information to improve estimation accuracy. Simulated datasets with varying sample sizes (30, 100, 300, 1000) and item numbers (10, 25, 30, 40) were used to compare the performance of Bayesian MCMC with traditional IRT methods. The results show that Bayesian MCMC produces more stable and accurate estimates, particularly in small-sample conditions. These findings suggest that Bayesian approaches are effective for psychometric analysis in Islamic education settings. The study concludes that Bayesian MCMC is a valuable method for improving the robustness of item parameter estimation in limited-data contexts.

Author Biography

Muhammad Ali Gunawan, Sekolah Tinggi Agama Islam Ki Ageng Pekalongan

Program Studi Manajemen Pendidikan Islam, Sekolah Tinggi Agama Islam Ki Ageng Pekalongan

References

Alsefri, M., Sudell, M., García-Fiñana, M., & Kolamunnage-Dona, R. (2020). Bayesian joint modelling of longitudinal and time to event data: A methodological review. BMC Medical Research Methodology, 20(1). https://doi.org/10.1186/s12874-020-00976-2

Asosega, K. A., Iddrisu, W. A., Tawiah, K., Opoku, A. A., & Okyere, E. (2022). Comparing Bayesian and Maximum Likelihood Methods in structural equation modelling of university student satisfaction: an empirical analysis. Education Research International, 2022. https://doi.org/10.1155/2022/3665669

Assaf, A. G., Tsionas, M., Kock, F., & Josiassen, A. (2021). A Bayesian non-parametric stochastic frontier model. Annals of Tourism Research, 87. https://doi.org/10.1016/j.annals.2020.103116

Avvisati, F. (2020). The measure of socio-economic status in PISA: a review and some suggested improvements. In Large-Scale Assessments in Education (Vol. 8, Issue 1). Springer. https://doi.org/10.1186/s40536-020-00086-x

Bürkner, P.-C. (2019). Bayesian item response modeling in R with brms and Stan. http://arxiv.org/abs/1905.09501

Buyl, M., & De Bie, T. (2020). DeBayes: a Bayesian method for debiasing network embeddings.

Candès, E. J., & Sur, P. (2020). The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. Annals of Statistics, 48(1), 27–42. https://doi.org/10.1214/18-AOS1789

Cao, H., Lee, C.-J., Iqbal, S., Czerwinski, M., Wong, P., Rintel, S., Hecht, B., Teevan, J., & Yang, L. (2022). Large scale analysis of multitasking behavior during remote meetings. IEEE International Conference on Program Comprehension, 2022-March, 36–47. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Chang-Tik, C. (2022). Introduction: collaborative active learning—strategies, assessment and feedback. In C. Chang-Tik, G. Kidman, & M. Tee (Eds.), Collaborative Active Learning: Practical Activity-Based Approaches to Learning, Assessment and Feedback (pp. 3–32). Springer Nature. https://doi.org/10.1007/978-981-19-4383-6

Chater, N., Zhu, J. Q., Spicer, J., Sundh, J., León-Villagrá, P., & Sanborn, A. (2020). Probabilistic biases meet the bayesian brain. Current Directions in Psychological Science, 29(5), 506–512. https://doi.org/10.1177/0963721420954801

Deonovic, B., Bolsinova, M., Bechger, T., & Maris, G. (2020). A Rasch model and rating system for continuous responses collected in large-scale learning Systems. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.500039

Embretson, S. E. ., & Reise, S. P. . (2013). Item response theory. Taylor and Francis.

Fabreti, L. G., & Höhna, S. (2022). Convergence assessment for Bayesian phylogenetic analysis using MCMC simulation. Methods in Ecology and Evolution, 13(1), 77–90. https://doi.org/10.1111/2041-210X.13727

Fox, J. P. (2010). Bayesian item response modeling theory and applications. Springer Science+Business Media. LLC. https://doi.org/10.1007/978-1-4419-0742-4

Fox, J.-P. (2010). Bayesian item response modeling: Theory and Applications. Springer Science+Business Media, LLC. https://doi.org/DOI 10.1007/978-1-4419-0742-4

Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory (measurement methods for the social science). Sage Publication, Inc.

Hanada, M., & Matsuura, S. (2022). MCMC from scratch: A practical introduction to Markov Chain Monte Carlo. In MCMC from Scratch: A Practical Introduction to Markov Chain Monte Carlo. Springer Nature. https://doi.org/10.1007/978-981-19-2715-7

Herrera, P. A., Marazuela, M. A., & Hofmann, T. (2022). Parameter estimation and uncertainty analysis in hydrological modeling. In Wiley Interdisciplinary Reviews: Water (Vol. 9, Issue 1). John Wiley and Sons Inc. https://doi.org/10.1002/wat2.1569

Hoofs, H., van de Schoot, R., Jansen, N. W. H., & Kant, Ij. (2018). Evaluating model fit in Bayesian confirmatory factor analysis with large samples: simulation study introducing the BRMSEA. Educational and Psychological Measurement, 78(4), 537–568. https://doi.org/10.1177/0013164417709314

Houts, C. R., Morlock, R., Blum, S. I., Edwards, M. C., & Wirth, R. J. (2018). Scale development with small samples: a new application of longitudinal item response theory. Quality of Life Research, 27(7), 1721–1734. https://doi.org/10.1007/s11136-018-1801-z

Islam, M. S., Hasan, M. K., Sultana, S., Karim, A., & Rahman, M. M. (2021). English language assessment in Bangladesh today: principles, practices, and problems. In Language Testing in Asia (Vol. 11, Issue 1). Springer. https://doi.org/10.1186/s40468-020-00116-z

Kaikkonen, L., Parviainen, T., Rahikainen, M., Uusitalo, L., & Lehikoinen, A. (2021). Bayesian networks in environmental risk assessment: A Review. Integrated Environmental Assessment and Management, 17(1), 62–78. https://doi.org/10.1002/ieam.4332

Kaplan, D. (2021). On the quantification of model uncertainty: A Bayesian perspective. Psychometrika, 86(1), 215–238. https://doi.org/10.1007/s11336-021-09754-5

Khosravi-Farmad, M., & Ghaemi-Bafghi, A. (2020). Bayesian decision network-based security risk management framework. Journal of Network and Systems Management, 28(4), 1794–1819. https://doi.org/10.1007/s10922-020-09558-5

Linden, W. J. van der., & Hambleton, R. K. . (1997). Handbook of modern item response theory. Springer.

Lotfi, S., Izmailov, P., Benton, G., Goldblum, M., & Wilson, A. G. (2022). Bayesian Model selection, the marginal Likelihood, and Generalization. 39th International Conference on Machine Learning, 1–25.

Lüdtke, O., Ulitzsch, E., & Robitzsch, A. (2021). A comparison of penalized maximum Likelihood estimation and Markov Chain Monte Carlo techniques for estimating confirmatory factor analysis models with small sample sizes. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.615162

Lyle, C., Schut, L., Ru, B., Gal, Y., & Van Der Wilk, M. (2020). A Bayesian perspective on training speed and model selection. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 1–13.

Mohammadi, M., & Rezaei, J. (2020). Bayesian best-worst method: A probabilistic group decision making model. Omega (United Kingdom), 96. https://doi.org/10.1016/j.omega.2019.06.001

Natesan, P. (2011). A review of bayesian item response modeling: Theory and applications. In Journal of Educational and Behavioral Statistics (Vol. 36, Issue 4). https://doi.org/10.3102/1076998611411919

Neklyudov, K., Welling, M., Egorov, E., & Vetrov, D. (2020). Involutive MCMC: a Unifying Framework.

Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., & Wu, Y. N. (2020). On the anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). www.aaai.org

Nishio, M., Kobayashi, D., Matsuo, H., Urase, Y., Nishioka, E., & Murakami, T. (2023). Bayesian multidimensional nominal response model for observer study of radiologists. Japanese Journal of Radiology, 41(4), 449–455. https://doi.org/10.1007/s11604-022-01366-y

Pane, D. N., Fikri, M. EL, & Ritonga, H. M. (2018). Bayesian statistics the fun way. In Journal of Chemical Information and Modeling (Vol. 53, Issue 9).

Papamarkou, T., Skoularidou, M., Palla, K., Aitchison, L., Arbel, J., Dunson, D., Filippone, M., Fortuin, V., Hennig, P., Miguel Hernández-Lobato, J., Hubin, A., Immer, A., Karaletsos, T., Emtiyaz Khan, M., Kristiadi, A., Li, Y., Mandt, S., Nemeth, C., Osborne, M. A., & J Rudner, T. G. (2024). Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI.

Plummer, M. (2024). Annual Review of statistics and its application simulation-based Bayesian Analysis. 12, 49. https://doi.org/10.1146/annurev-statistics-122121

Rainey, C., & McCaskey, K. (2021). Estimating logit models with small samples. Political Science Research and Methods, 9(3), 549–564. https://doi.org/10.1017/psrm.2021.9

Shepard, L. A., Penuel, W. R., & Pellegrino, J. W. (2018). Classroom assessment principles to support learning and avoid the harms of testing. Educational Measurement: Issues and Practice, 37(1), 52–57. https://doi.org/10.1111/emip.12195

South, L. F., Riabiz, M., Teymur, O., & Oates, C. J. (2024). Annual review of statistics and its application postprocessing of MCMC. https://doi.org/10.1146/annurev-statistics-040220

Srivastava, S., & Xu, Y. (2020). Distributed Bayesian Inference in Linear Mixed-Effects Models.

Stone, C. A. ., & Zhu, Xiaowen. (2015). Bayesian analysis of item response theory models using SAS®. SAS.

Sukenti, D., Tambak, S., & Siregar, E. (2021). Learning assessment for madrasah teacher: strengthening islamic psychosocial and emotional intelligence. Al-Ishlah: Jurnal Pendidikan. https://doi.org/DOI:10.35445/alishlah.v13i1.552

Taka, E., Stein, S., & Williamson, J. H. (2020). Increasing interpretability of bayesian probabilistic programming models through interactive representations. Frontiers in Computer Science, 2. https://doi.org/10.3389/fcomp.2020.567344

Tian, Q., Lewis-Beck, C., Niemi, J. B., & Meeker, W. Q. (2024). Specifying prior distributions in reliability applications. Applied Stochastic Models in Business and Industry, 40(1), 5–62. https://doi.org/10.1002/asmb.2752

Tolba, Ahlam. (2022). Bayesian and non-bayesian estimation methods for simulating the parameter of the akshaya distribution. Computational Journal of Mathematical and Statistical Sciences, 1(1), 13–25. https://doi.org/10.21608/cjmss.2022.270897

Urban, C. J., & Bauer, D. J. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika. https://doi.org/doi:%2010.1007/s11336-021-09748-3

Vaheoja, M. (2019). Finding equivalent standards in small samples. Springer International Publishing. https://doi.org/10.1007/978-3-030-18480-3_9

van de Schoot, R., Depaoli, S., Gelman, A., King, R., Kramer, B., Kaspar, M., Tadesse, M. G., Vannucci, M., Veen, D., Willemsen, J., & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1(1). https://doi.org/10.1038/s43586-020-00003-0

Vasishth, S., Yadav, H., Schad, D. J., & Nicenboim, B. (2023). Sample size determination for bayesian hierarchical models commonly used in psycholinguistics. Computational Brain and Behavior, 6(1), 102–126. https://doi.org/10.1007/s42113-021-00125-y

Wilson, K. J., Williamson, S. F., Allen, A. J., Williams, C. J., Hellyer, T. P., & Lendrem, B. C. (2022). Bayesian sample size determination for diagnostic accuracy studies. Statistics in Medicine, 41(15), 2908–2922. https://doi.org/10.1002/sim.9393

Zhang, Z., Tian, J., Zhao, Z., Zhou, W., Sun, F., Que, Y., & He, X. (2022). Factors influencing vocational education and training teachers’ professional competence based on a large-scale diagnostic method: a decade of data from China. Sustainability (Switzerland), 14(23). https://doi.org/10.3390/su142315871