Comparison of Artificial Intelligence-Generated Questions and Item Parameters Under MST Test Conditions

Authors

DOI:

https://doi.org/10.5281/zenodo.12637347

Keywords:

Assessment, Artificial Intelligence, Multistage Testing, ChatGPT

Abstract

One of the major implications of the rapid proliferation of smart technologies in the world of education is in the area of measurement and evaluation. By its very nature, it is a basic requirement that measurement should be carried out in a safe environment in order to provide results that are close to the truth. Today, as new technologies have become part of life and have been integrated into many stages of the education system, measurement and evaluation processes should be carried out at the same level of development. In a system where smart technologies are integrated into education and training processes at a level that can be described as a paradigm shift, the traditional methods of measurement are an obstacle to achieving sound results. The fact that the measurement and evaluation method widely used in the current system is the traditional method is an indication of the need for studies in this area. In this context, it is an important step to carry out the measurement and evaluation process with methods and technologies that allow precise measurements. The aim of the research is to demonstrate the use of smart technologies in measurement and evaluation processes in a concrete way. For this reason, the research covers the process of bringing together modern test presentation methods supported by new smart technologies and enabling precise measurements. In this process, Multistage Testing and ChatGPT were first discussed theoretically. In the next stage, questions were created with ChatGPT within the constraints of the research and the b-parameters of the created questions were estimated by ChatGPT. In another stage of the research, the b-parameters of the same questions were calculated by test assembly using the multistage method. The results of the b-parameters obtained by ChatGPT and MST were compared. According to the results, it was determined that the margin of error is not very high and it is appropriate to use ChatGPT supervised in the MST method.

References

Adedoyin, O. ve Mokobi, T. (2013). Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items. International Journal of Asian Social Science, 3(4), 992–1011. Erişim adresi: https://archive.aessweb.com/index.php/5007/article/view/2471

Adeshola, I. & Adepoju, A. P. (2023). The opportunities and challenges of ChatGPT in education. Interactive Learning Environments, 0(0), 1–14. https://doi.org/10.1080/10494820.2023.2253858

Ariel, A., Veldkamp, B. P. & Breithaupt, K. (2006). Optimal testlet pool assembly for multistage testing designs. Applied Psychological Measurement, 30(3), 204–215. doi:10.1177/0146621605284350

Baidoo-Anu, D. & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. http://dx.doi.org/10.2139/ssrn.4337484

Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., ... & Fung, P. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. Erişim adresi: http://arxiv.org/pdf/2302.04023.

Bechger T, Koops J, Partchev I, Maris G (2023). dexterMST: CML and Bayesian Calibration of Multistage Tests. R package version 0.9.6, URL: https://CRAN.R-project.org/package=dexterMST

Bengio, Y., Lecun, Y. & Hinton, G. (2021). Deep learning for AI. Communications of the ACM, 64(7), 58-65. doi:10.1145/3448250

Berger, S., Verschoor, A. J., Eggen, T. J. H. M. ve Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(January). https://doi.org/10.3389/feduc.2019.00001

Bock, R. D. (1997). A brief history of item response theory. Educational Measurement: Issues and practice, 16, 21–23. doi:10.1111/j.1745-3992.1997.tb00605.x.

Cotton, D. R., Cotton, P. A. & Shipway, J. R. (2024). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 228-239. https://doi.org/10.1080/14703297.2023.2190148

Crocker, L. ve Algina, J. (2008). Introduction to classical and modern test theory. In M. Baird, M., Staudt, M. & Strans (Ed.), Cengage Learning. USA: Cengage Learning.

Elkins, K. & Chun, J. (2020). Can GPT-3 pass a writer’s turing test?. Journal of Cultural Analytics, 5(2), 1-16. doi: 10.22148/001c.17212

Fabiyi, S. D. (2024). What can ChatGPT not do in education? Evaluating its effectiveness in assessing educational learning outcomes. Innovations in Education and Teaching International, 0(0), 1–15. https://doi.org/10.1080/14703297.2024.2333395

Gierl, M., Lai, H. & Li, J. (2011). Evaluating the performance of CATSIB in a multistage adaptive testing Environment. Erişim adresi: https://mcc.ca/wpcontent/uploads/Technical-Reports-Gierl-Lai-Li-2011.pdf

Gleason, N. (2022). ChatGPT and the rise of AI writers: How should higher education respond? THE Campus Learn, Share, Connect. Erişim adresi https://www.timeshighereducation.com/campus/chatgpt-and-rise-aiwriters-how-should-higher-education-respond

Hambleton, R. K. ve Linden, W. J. (1997). Handbook of modern item response theory (1st ed.). USA: Springer. https://doi.org/10.1007/978-1-4757-2691-6

Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory library (1st ed.; D. Foster, ed.). London: SAGE.

Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37, 666–668. doi: 10.1177/0146621613499639

Han, K. T. ve Guo, F. (2013). An approach to assembling optimal multistage testing modules on the fly. GMAC Research Reports (Report No: RR-13-01). Erişim adresi: https://www.gmac.com/-/media/files/gmac/research/research-report-series/rr-13-01-moduleassemblyonthefly.pdf

Hu, K. (2023, Şubat). ChatGPT sets record for fastest-growing user base - analyst note. Erişim adresi: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/

Imran, M. & Almusharraf, N. (2023). Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature. Contemporary Educational Technology, 15(4), ep464. https://doi.org/10.30935/cedtech/13605

Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52(101522), 1-9. https://doi.org/10.1016/j.tsc.2024.101522

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Kirmani, A. R. (2022). Artificial intelligence-enabled science poetry. ACS Energy Letters, 8(1), 574-576. https://doi.org/10.1021/acsenergylett.2c02758

Kohnke, L., Moorhouse, B. L. & Zou, D. (2023). ChatGPT for language teaching and learning. RELC Journal, 54(2), 1–14. https://doi.org/10.1177/00336882231162868

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Latif, E. & Zhai, X. (2024). Fine-tuning chatgpt for automatic scoring. Computers and Education: Artificial Intelligence, 6(100210), 1-10. https://doi.org/10.1016/j.caeai.2024.100210

Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 1-15. https://doi.org/10.3390/educsci13040410

Luecht, R. M. & Nungester, R. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 239–249. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x

Naidu, K. & Sevnarayan, K. (2023). ChatGPT: An ever-increasing encroachment of artificial intelligence in online assessment in distance education. Online Journal of Communication and Media Technologies, 13(3), e202336. https://doi.org/10.30935/ojcmt/13291

Newton, P. & Xiromeriti, M. (2024). ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review. Assessment & Evaluation in Higher Education, 0(0), 1–18. https://doi.org/10.1080/02602938.2023.2299059

OpenAI. (2023). ChatGPT: Optimizing language models for dialogue. Erişim Adresi: https://openai.com/research

Qadir, J. (2023). Engineering education in the era of chatGPT: Promise and pitfalls of generative AI for education. 2023 IEEE Global engineering education conference (EDUCON) (pp. 1–9). IEEE. Erişim adresi: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10125121

R Core Team (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/.

Rasul, T., Nair, S., Kalendra, D., Robin, M., de Oliveira Santini, F., Ladeira, W. J., Sun, M., Day, I., Rather, R. A., & Heathcote, L. (2023). The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching, 6(1). https://doi.org/10.37074/jalt.2023.6.1.29

Sharples, M. (2022). Automated essay writing: An AIED opinion. International Journal of Artificial Intelligence in Education, 32(4), 1119-1126. https://doi.org/10.1007/s40593-022-00300-7

Sok, S. & Heng, K. (March 6, 2023). ChatGPT for education and research: A review of benefits and risks. http://dx.doi.org/10.2139/ssrn.4378735

Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Doktora Tezi). ProQuest Dissertations & Theses Global veri tabanında erişildi (Order No. 10273809). Erişim adresi: https://www.proquest.com/dissertations-theses/fair-comparison-performancecomputerized-adaptive/docview/1901897901/se-2

Yan, D., Davier, A. A. & Lewis, C. (2014). Computerized multistage testing: Theory and application (1st ed.). USA: CRC Press. doi: 10.1201/b16858

Zawacki-Richter, O., Marín, V. I., Bond, M. & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education–where are the educators?. International Journal of Educational Technology in Higher Education, 16(1), 1-27. https://doi.org/10.1186/s41239-019-0171-0

Zhai, X. (2023). ChatGPT for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3), 42-46. doi: 10.1145/358964

Zhai, X. (December 27, 2022). ChatGPT user experience: Implications for education. http://dx.doi.org/10.2139/ssrn.4312418

Zhang, X., Li, D., Wang, C., Jiang, Z., Ngao, A. I., Liu, D., Peters, M. A., & Tian, H. (2023). From ChatGPT to China’ Sci-Tech: Implications for Chinese Higher Education. Beijing International Review of Education, 5(3), 296-314. https://doi.org/10.1163/25902539-05030007

Published

2024-06-30

How to Cite

Bulut, G., & Akyıldız, M. (2024). Comparison of Artificial Intelligence-Generated Questions and Item Parameters Under MST Test Conditions. Journal of Digital Technologies and Education, 3(1), 1–12. https://doi.org/10.5281/zenodo.12637347
Views
  • Abstract 180
  • PDF (Türkçe) 60

Similar Articles

You may also start an advanced similarity search for this article.