國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於小樣本學習之跨語言自動簡答評分系統,Cross-lingual Automatic Short Answer Grading System Based on Few-shot Learning

論文名稱 Title	基於小樣本學習之跨語言自動簡答評分系統 Cross-lingual Automatic Short Answer Grading System Based on Few-shot Learning
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	112 學年度第 1 學期 The fall semester of Academic Year 112	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	71
研究生 Author	陳宗順 Zong-Shun Chen
指導教授 Advisor	林耕霈 Lin, Keng-Pei
召集委員 Convenor	張德民 Chang,Te-Min
口試委員 Advisory Committee	張慈玲 Chang, Tzu-Lin
口試日期 Date of Exam	2023-08-09	繳交日期 Date of Submission	2023-08-10
關鍵字 Keywords	簡答題、自動評分、自然語言處理、跨語言、遷移學習、孿生神經網路 Short Answer, Automatic Grading, Natural Language Processing, Cross-Lingual, Transfer Learning, Siamese Neural Network
統計 Statistics	本論文已被瀏覽 243 次，被下載 0 次 The thesis/dissertation has been browsed 243 times, has been downloaded 0 times.

中文摘要
本研究旨在應對全球化教育環境中的評分挑戰，開發一種基於小樣本學習的跨語言自動簡答評分系統。該系統能適應於少量樣本下的新跨語言簡答評分任務。簡答評分方式的優勢在於能夠更好地衡量學生的理解程度與知識表達能力，但評分過程中需要面對大量答案的差異性、多語言環境的挑戰、以及資料標記的困難等問題。本研究的方法主要採用跨語言遷移學習以及孿生神經網路，並結合外部知識增強模型的表示能力。透過計算答案間的差異特徵，並根據特徵調整模型權重，以提高評分準確性。在實驗部分，本研究在多語言資料集上實驗，並與現有的評分方法比較。實驗結果顯示，本研究的跨語言自動簡答評分系統即使在少量樣本的情況下，也能達到優秀的預測效果，並在多語言環境下展現出高度的評分準確性和可行性。該系統具有良好的泛化能力，能有效解決多語言環境下的評分挑戰，為教育評估和自動化測試領域提供新的解決方案。本研究的發展開拓在非英語語言的簡答評分方向，為未來多語言環境的教育評估提供一種新的可能。
Abstract
This study aims to address the grading challenges in the globalized education environment, developing a cross-linguistic automatic short answer grading system based on few-shot learning. The system is adaptive to new cross-linguistic short answer grading tasks under small sample conditions. The merits of short answer grading are its ability to better measure students' understanding and knowledge expression, yet it encounters the complexities of answer diversity, multi-language environment, and data annotation difficulties. Our methodology mainly utilizes cross-linguistic transfer learning and twin neural networks, supplemented by external knowledge to enhance the model's representational power. Specifically, we calculate the differential features among answers and adjust the model weights accordingly to improve grading accuracy. In the experimental segment, we conduct experiments on multi-language datasets and compare them with existing grading methods. The experimental results demonstrate that our cross-linguistic automatic short answer grading system achieves excellent prediction performance, even with few samples, and exhibits high grading accuracy and feasibility in multi-language environments. The system has robust generalization capabilities, effectively addressing the grading challenges in multi-language environments, providing a novel solution for educational assessment and automated testing domains. The development of this study pioneers the short answer grading direction in non-English languages, offering a new possibility for future educational assessment in multi-language environments.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vi 表次 vii 第一章　緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 第二章　文獻探討 5 2.1 自動簡答評分定義 5 2.2 自動簡答評分相關研究 6 2.3 自然語言處理 9 2.3.1 跨語言句子表示 10 2.3.2 Multilingual BERT 10 2.3.3 Language-agnostic BERT Sentence Embedding 11 2.3.4 Language-Agnostic Sentence Representations 13 2.4 小樣本學習方法 14 2.4.1 孿生神經網路 14 2.4.2 遷移學習 17 2.5 一維卷積神經網路 18 2.6 不平衡資料 20 第三章　研究方法 21 3.1 資料蒐集與前處理 24 3.1.1 外部知識資料集蒐集與前處理 24 3.1.2 問答資料集蒐集與前處理 27 3.2 外部知識模型 28 3.2.1 多語言句子嵌入 29 3.2.2 深層特徵提取 30 3.2.3 輸出預測結果 30 3.3 自動簡答評分模型 33 3.3.1 多語言句子嵌入 35 3.3.2 深層與淺層特徵提取 35 3.3.3 輸出預測結果 35 第四章　實驗設計 39 4.1 資料集 39 4.2 實驗設置 41 4.3 評估方法 43 第五章　實驗結果 45 5.1 消融實驗 45 5.2 與基準模型比較 47 5.3 單題分析 49 5.4 多語言零樣本測試 53 第六章　結論 59 參考文獻 60

參考文獻 References
[1] 行政院主計總處. "大專校院境外學生在臺留學及研習人數." https://nstatdb.dgbas.gov.tw/dgbasall/webMain.aspx?sys=100&funid=qryout&funid2=A160110010&outmode=8&ym=9500&ymt=11100&cycle=4&outkind=11&compmode=0&ratenm=%u7D71%u8A08%u503C&fldlst=11111111111&compmode=0&rr=q9561x&&rdm=R103254 (accessed June 11, 2023). [2] M. Polat, "Analysis of Multiple-Choice versus Open-Ended Questions in Language Tests According to Different Cognitive Domain Levels," Novitas-ROYAL (Research on Youth and Language), vol. 14, no. 2, pp. 76-96, 2020. [3] B. Budiyono, "MULTIPLE CHOICE QUESTIONS (MCQS) VS SHORT ANSWER QUESTIONS (SAQS) FOR INFERENTIAL COMPREHENSION," English Education: Journal of English Teaching and Research, vol. 3, no. 2, pp. 71-83, 2018. [4] Y. Lu, J. Qiu, and G. Gupta, "ProtSi: Prototypical Siamese Network with Data Augmentation for Few-Shot Subjective Answer Evaluation," arXiv preprint arXiv:2211.09855, 2022. [5] G. Koch, R. Zemel, and R. Salakhutdinov, "Siamese neural networks for one-shot image recognition," in ICML deep learning workshop, 2015, vol. 2, no. 1: Lille. [6] A. Prabhudesai and T. N. Duong, "Automatic short answer grading using Siamese bidirectional LSTM based regression," in 2019 IEEE International Conference on Engineering, Technology and Education (TALE), 2019: IEEE, pp. 1-6. [7] H. T. Nguyen, C. T. Nguyen, H. Oka, T. Ishioka, and M. Nakagawa, "Fully automatic scoring of handwritten descriptive answers in Japanese language tests," arXiv preprint arXiv:2201.03215, 2022. [8] S.-H. Wu and C.-Y. Yeh, "A short answer grading system in chinese by cnn," in 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), 2019: IEEE, pp. 1-5. [9] S. Burrows, I. Gurevych, and B. Stein, "The eras and trends of automatic short answer grading," International Journal of Artificial Intelligence in Education, vol. 25, pp. 60-117, 2015. [10] E. B. Page, "The imminence of... grading essays by computer," The Phi Delta Kappan, vol. 47, no. 5, pp. 238-243, 1966. [11] R. Siddiqi, C. J. Harrison, and R. Siddiqi, "Improving teaching and learning through automated short-answer marking," IEEE Transactions on Learning Technologies, vol. 3, no. 3, pp. 237-249, 2010. [12] N. Süzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, "Automatic short answer grading and feedback using text mining methods," Procedia Computer Science, vol. 169, pp. 726-743, 2020. [13] M. Mohler, R. Bunescu, and R. Mihalcea, "Learning to grade short answer questions using semantic similarity measures and dependency graph alignments," in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 2011, pp. 752-762. [14] Y. Yang, L. Xia, and Q. Zhao, "An automated grader for Chinese essay combining shallow and deep semantic attributes," IEEE Access, vol. 7, pp. 176306-176316, 2019. [15] J. Mueller and A. Thyagarajan, "Siamese recurrent architectures for learning sentence similarity," in Proceedings of the AAAI conference on artificial intelligence, 2016, vol. 30, no. 1. [16] C. N. Tulu, O. Ozkaya, and U. Orhan, "Automatic short answer grading with SemSpace sense vectors and MaLSTM," IEEE Access, vol. 9, pp. 19270-19280, 2021. [17] C. Sung, T. Dhamecha, S. Saha, T. Ma, V. Reddy, and R. Arora, "Pre-training BERT on domain resources for short answer grading," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 6071-6075. [18] H. Schwenk and M. Douze, "Learning joint multilingual sentence representations with neural machine translation," arXiv preprint arXiv:1704.04154, 2017. [19] K. Yu, H. Li, and B. Oguz, "Multilingual seq2seq training with similarity loss for cross-lingual document classification," in Proceedings of The Third Workshop on Representation Learning for NLP, 2018, pp. 175-179. [20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. [21] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, "Language-agnostic bert sentence embedding," arXiv preprint arXiv:2007.01852, 2020. [22] E. A. Chimoto and B. A. Bassett, "Very low resource sentence alignment: Luhya and Swahili," arXiv preprint arXiv:2211.00046, 2022. [23] M. Artetxe and H. Schwenk, "Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond," Transactions of the Association for Computational Linguistics, vol. 7, pp. 597-610, 2019. [24] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, "Signature verification using a" siamese" time delay neural network," Advances in neural information processing systems, vol. 6, 1993. [25] S. Dey, A. Dutta, J. I. Toledo, S. K. Ghosh, J. Lladós, and U. Pal, "Signet: Convolutional siamese network for writer independent offline signature verification," arXiv preprint arXiv:1707.02131, 2017. [26] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2010. [27] X. Li, M. Chen, and J.-Y. Nie, "SEDNN: Shared and enhanced deep neural network model for cross-prompt automated essay scoring," Knowledge-Based Systems, vol. 210, p. 106491, 2020. [28] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989. [29] J. M. Johnson and T. M. Khoshgoftaar, "Survey on deep learning with class imbalance," Journal of Big Data, vol. 6, no. 1, pp. 1-54, 2019. [30] K. R. M. Fernando and C. P. Tsokos, "Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940-2951, 2021. [31] L. Ouahrani and D. Bennouar, "AR-ASAG an Arabic dataset for automatic short answer grading evaluation," in Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020, pp. 2634-2643. [32] L. Ramachandran, J. Cheng, and P. Foltz, "Identifying patterns for short answer scoring using graph-based lexico-semantic text matching," in Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 2015, pp. 97-106. [33] M. A. Sultan, C. Salazar, and T. Sumner, "Fast and easy short answer grading with high accuracy," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1070-1075. [34] W. H. Gomaa and A. A. Fahmy, "Ans2vec: A scoring system for short answers," in The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) 4, 2020: Springer, pp. 586-595. [35] W. H. Gomaa and A. A. Fahmy, "Automatic scoring for answers to Arabic test questions," Computer Speech & Language, vol. 28, no. 4, pp. 833-857, 2014. [36] R. M. Badry, M. Ali, E. Rslan, and M. R. Kaseb, "Automatic Arabic Grading System for Short Answer Questions," IEEE Access, 2023. [37] U. Ammon, "World languages: Trends and futures," The handbook of language and globalization, vol. 64, pp. 101-122, 2010.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2025-08-10 校外 Off-campus：開放下載的時間 available 2025-08-10 您的 IP(校外) 位址是 216.73.216.54 現在時間是 2025-06-17 論文校外開放下載的時間是 2025-08-10 Your IP address is 216.73.216.54 The current date is 2025-06-17 This thesis will be available to you on 2025-08-10.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2025-08-10

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS