Yapay Zekâ ile Üretilen Soruların ve Madde Parametrelerinin MST Test Koşullarında Karşılaştırılması

Gülgün Bulut; Murat Akyıldız

doi:10.5281/zenodo.12637347

Yazarlar

Gülgün Bulut Anadolu Üniversitesi Lisansüstü Eğitim Enstitüsü, Eskişehir, Türkiye https://orcid.org/0000-0002-7257-6207
Murat Akyıldız Anadolu Üniversitesi Açıköğretim Fakültesi, Eskişehir, Türkiye https://orcid.org/0000-0001-5069-0132

DOI:

https://doi.org/10.5281/zenodo.12637347

Anahtar Kelimeler:

Ölçme, Yapay Zekâ, Çok Aşmalı Testler, ChatGPT

Özet

Akıllı teknolojilerin hızla yaygınlaşmasının eğitim dünyasındaki önemli yansımalarından biri de ölçme ve değerlendirme alanında gerçekleşmektedir. Doğası gereği ölçmenin gerçeğe yakın doğrulukta sonuçlar verebilmesi için öncelikle güvenli bir ortamda gerçekleştirilmesi temel koşul olarak belirtilmektedir. Günümüzde yeni teknolojilerin gerek hayatın bir parçası haline gelmesi gerekse eğitim sisteminin birçok aşamasına entegre edilmiş olması sebebiyle ölçme ve değerlendirme işlemlerinin de aynı gelişmişlik düzeyinde gerçekleştirilmesi gerekmektedir. Akıllı teknolojilerin paradigma değişimi olarak nitelendirilecek düzeyde eğitim öğretim süreçlerine dahil edildiği bir sistemde ölçme işleminin geleneksel yöntemlerle yapılması sağlıklı sonuçlar elde edilmesinin önünde engel durumundadır. Mevcut sistemde yaygın biçimde uygulanan ölçme ve değerlendirme yönteminin geleneksel yöntem olması alanda bu yönde yapılacak çalışmalara duyulan ihtiyacın göstergesidir. Bu kapsamda ölçme ve değerlendirme işleminin hassas ölçümlere olanak tanıyan yöntem ve teknolojilerle yapılması önemli bir adımdır. Araştırmanın amacı ölçme ve değerlendirme süreçlerinde akıllı teknolojilerin kullanımını somut bir biçimde ortaya koymaya yönelik olarak tasarlanmıştır. Bu sebeple araştırma yeni akıllı teknolojilerle desteklenen ve hassas ölçümlere olanak tanıyan modern test sunum yöntemlerinin bir araya getirilme sürecini kapsamaktadır. Bu süreçte ilk olarak çok aşamalı testler (Multistage Testing) ve ChatGPT teorik olarak ele alınmıştır. Bir sonraki aşamada araştırma sınırlılıklarında ChatGPT ile soru üretimi yapılarak üretilen soruların b parametreleri ChatGPT’ye tahmin ettirilmiştir. Araştırmanın bir diğer aşamasında ise aynı soruların multistage yöntem ile test montajı sağlanarak b parametreleri hesaplanmıştır. ChatGPT ve MST ile elde edilen b parametreleri sonuçları karşılaştırılmıştır. Elde edilen bulgulara göre yanılma payının çok yüksek olmadığı ChatGPT’nin MST yönteminde gözetimli olarak kullanılmasının uygun olduğu tespit edilmiştir.

Referanslar

Adedoyin, O. ve Mokobi, T. (2013). Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items. International Journal of Asian Social Science, 3(4), 992–1011. Erişim adresi: https://archive.aessweb.com/index.php/5007/article/view/2471

Adeshola, I. & Adepoju, A. P. (2023). The opportunities and challenges of ChatGPT in education. Interactive Learning Environments, 0(0), 1–14. https://doi.org/10.1080/10494820.2023.2253858

Ariel, A., Veldkamp, B. P. & Breithaupt, K. (2006). Optimal testlet pool assembly for multistage testing designs. Applied Psychological Measurement, 30(3), 204–215. doi:10.1177/0146621605284350

Baidoo-Anu, D. & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. http://dx.doi.org/10.2139/ssrn.4337484

Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., ... & Fung, P. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. Erişim adresi: http://arxiv.org/pdf/2302.04023.

Bechger T, Koops J, Partchev I, Maris G (2023). dexterMST: CML and Bayesian Calibration of Multistage Tests. R package version 0.9.6, URL: https://CRAN.R-project.org/package=dexterMST

Bengio, Y., Lecun, Y. & Hinton, G. (2021). Deep learning for AI. Communications of the ACM, 64(7), 58-65. doi:10.1145/3448250

Berger, S., Verschoor, A. J., Eggen, T. J. H. M. ve Moser, U. (2019). Improvement of measurement efficiency in multistage tests by targeted assignment. Frontiers in Education, 4(January). https://doi.org/10.3389/feduc.2019.00001

Bock, R. D. (1997). A brief history of item response theory. Educational Measurement: Issues and practice, 16, 21–23. doi:10.1111/j.1745-3992.1997.tb00605.x.

Cotton, D. R., Cotton, P. A. & Shipway, J. R. (2024). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 228-239. https://doi.org/10.1080/14703297.2023.2190148

Crocker, L. ve Algina, J. (2008). Introduction to classical and modern test theory. In M. Baird, M., Staudt, M. & Strans (Ed.), Cengage Learning. USA: Cengage Learning.

Elkins, K. & Chun, J. (2020). Can GPT-3 pass a writer’s turing test?. Journal of Cultural Analytics, 5(2), 1-16. doi: 10.22148/001c.17212

Fabiyi, S. D. (2024). What can ChatGPT not do in education? Evaluating its effectiveness in assessing educational learning outcomes. Innovations in Education and Teaching International, 0(0), 1–15. https://doi.org/10.1080/14703297.2024.2333395

Gierl, M., Lai, H. & Li, J. (2011). Evaluating the performance of CATSIB in a multistage adaptive testing Environment. Erişim adresi: https://mcc.ca/wpcontent/uploads/Technical-Reports-Gierl-Lai-Li-2011.pdf

Gleason, N. (2022). ChatGPT and the rise of AI writers: How should higher education respond? THE Campus Learn, Share, Connect. Erişim adresi https://www.timeshighereducation.com/campus/chatgpt-and-rise-aiwriters-how-should-higher-education-respond

Hambleton, R. K. ve Linden, W. J. (1997). Handbook of modern item response theory (1st ed.). USA: Springer. https://doi.org/10.1007/978-1-4757-2691-6

Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory library (1st ed.; D. Foster, ed.). London: SAGE.

Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37, 666–668. doi: 10.1177/0146621613499639

Han, K. T. ve Guo, F. (2013). An approach to assembling optimal multistage testing modules on the fly. GMAC Research Reports (Report No: RR-13-01). Erişim adresi: https://www.gmac.com/-/media/files/gmac/research/research-report-series/rr-13-01-moduleassemblyonthefly.pdf

Hu, K. (2023, Şubat). ChatGPT sets record for fastest-growing user base - analyst note. Erişim adresi: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/

Imran, M. & Almusharraf, N. (2023). Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature. Contemporary Educational Technology, 15(4), ep464. https://doi.org/10.30935/cedtech/13605

Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52(101522), 1-9. https://doi.org/10.1016/j.tsc.2024.101522

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Kirmani, A. R. (2022). Artificial intelligence-enabled science poetry. ACS Energy Letters, 8(1), 574-576. https://doi.org/10.1021/acsenergylett.2c02758

Kohnke, L., Moorhouse, B. L. & Zou, D. (2023). ChatGPT for language teaching and learning. RELC Journal, 54(2), 1–14. https://doi.org/10.1177/00336882231162868

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Latif, E. & Zhai, X. (2024). Fine-tuning chatgpt for automatic scoring. Computers and Education: Artificial Intelligence, 6(100210), 1-10. https://doi.org/10.1016/j.caeai.2024.100210

Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 1-15. https://doi.org/10.3390/educsci13040410

Luecht, R. M. & Nungester, R. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 239–249. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x

Naidu, K. & Sevnarayan, K. (2023). ChatGPT: An ever-increasing encroachment of artificial intelligence in online assessment in distance education. Online Journal of Communication and Media Technologies, 13(3), e202336. https://doi.org/10.30935/ojcmt/13291

Newton, P. & Xiromeriti, M. (2024). ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review. Assessment & Evaluation in Higher Education, 0(0), 1–18. https://doi.org/10.1080/02602938.2023.2299059

OpenAI. (2023). ChatGPT: Optimizing language models for dialogue. Erişim Adresi: https://openai.com/research

Qadir, J. (2023). Engineering education in the era of chatGPT: Promise and pitfalls of generative AI for education. 2023 IEEE Global engineering education conference (EDUCON) (pp. 1–9). IEEE. Erişim adresi: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10125121

R Core Team (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/.

Rasul, T., Nair, S., Kalendra, D., Robin, M., de Oliveira Santini, F., Ladeira, W. J., Sun, M., Day, I., Rather, R. A., & Heathcote, L. (2023). The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching, 6(1). https://doi.org/10.37074/jalt.2023.6.1.29

Sharples, M. (2022). Automated essay writing: An AIED opinion. International Journal of Artificial Intelligence in Education, 32(4), 1119-1126. https://doi.org/10.1007/s40593-022-00300-7

Sok, S. & Heng, K. (March 6, 2023). ChatGPT for education and research: A review of benefits and risks. http://dx.doi.org/10.2139/ssrn.4378735

Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Doktora Tezi). ProQuest Dissertations & Theses Global veri tabanında erişildi (Order No. 10273809). Erişim adresi: https://www.proquest.com/dissertations-theses/fair-comparison-performancecomputerized-adaptive/docview/1901897901/se-2

Yan, D., Davier, A. A. & Lewis, C. (2014). Computerized multistage testing: Theory and application (1st ed.). USA: CRC Press. doi: 10.1201/b16858

Zawacki-Richter, O., Marín, V. I., Bond, M. & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education–where are the educators?. International Journal of Educational Technology in Higher Education, 16(1), 1-27. https://doi.org/10.1186/s41239-019-0171-0

Zhai, X. (2023). ChatGPT for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3), 42-46. doi: 10.1145/358964

Zhai, X. (December 27, 2022). ChatGPT user experience: Implications for education. http://dx.doi.org/10.2139/ssrn.4312418

Zhang, X., Li, D., Wang, C., Jiang, Z., Ngao, A. I., Liu, D., Peters, M. A., & Tian, H. (2023). From ChatGPT to China’ Sci-Tech: Implications for Chinese Higher Education. Beijing International Review of Education, 5(3), 296-314. https://doi.org/10.1163/25902539-05030007