國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以敘述分類進行事實錯誤改正,Enhanced Factual Error Correction By Description Category

論文名稱 Title	以敘述分類進行事實錯誤改正 Enhanced Factual Error Correction By Description Category
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	71
研究生 Author	鄭宇謙 Yu-Chien Cheng
指導教授 Advisor	李偉柏 Lee,Wei-Po
召集委員 Convenor	楊宗憲 Tsung-Hsien Yang
口試委員 Advisory Committee	許育峯 Yu-Feng Hsu
口試日期 Date of Exam	2022-09-05	繳交日期 Date of Submission	2022-09-11
關鍵字 Keywords	錯誤資訊、自動化事實查核、事實錯誤改正、事實不一致、BERT false information, automated fact checking, factual error correction, factual inconsistency, BERT
統計 Statistics	本論文已被瀏覽 547 次，被下載 2 次 The thesis/dissertation has been browsed 547 times, has been downloaded 2 times.

中文摘要
在網際網路蓬勃發展的大環境下，錯誤資訊的傳播變得越來越快，為了能夠快速應付錯誤資訊好減緩其帶來的危害，自動化事實查核的工作受到許多的關注。自動化事實查核是指使用證據來自動化處理錯誤資訊的工作。自動化事實查核的步驟可以分為趨於成熟的基本步驟：尋找查核目標、證據返還和事實驗證，以及還在發展中的進階應用：解釋生成和事實錯誤改正。而在進階應用當中，事實錯誤改正的工作更為新穎。事實錯誤改正是透過修改內容來讓查核目標和證據更加一致，以此來更進一部打擊錯誤的資訊。事實錯誤改正能夠幫助我們更好的打擊錯誤資訊，但其發展卻面對一些限制與難關。本研究旨在探討並改進事實錯誤改正中的兩個問題：（1）在評估方面，目前常使用的評估方法提供的視角較為單一，以及（2）在過往的研究中，缺乏探討模型在處理不同語句類別時的能力。這些問題的改善能夠幫助我們對模型有更多的了解，並制定更合適的事實錯誤改正工作流程。針對第一個問題，本研究透過引入評估模型QAGS提供過往評估方法所無法提供的語義視角，透過語義的視角能輔助我們更全面的看待模型成果並找出未來改進模型的方向。針對第二個問題，本研究透過建立一個分類機制將資料分成不同的語句類別，並使用傳統評估方法ROUGE和SARI、新的評估方法QAGS分析與討論目前主流模型處理不同類別語句資料的結果。最後本家研究也提出一個綜合傳統評估方法以及新評估方法QAGS的綜合視角來較全面的分析模型在各語句類別的能力優劣。
Abstract
In the environment of the vigorous development of the Internet, the spread of false information is becoming faster and faster, in order to quickly deal with false information and alleviate the harm caused by it, automated fact checking has received a lot of attention. Automated fact checking is defined as handling false information by using evidence automated. Automated fact checking consists of two parts, the first part is basis step including: (i) claim detection; (ii) evidence retrieval; (iii) fact verification. The second part is advanced applications which includes: (i) justification generation, (ii) factual error correction. In advanced applications, the work of factual error correction is more novel. Factual error correction is defined as an explainable alternative for fact verification, which makes claim and evidence more consistent. Factual error correction can help us better combat false information, but it also faces some limits. This study aims to explore and improve two limits in factual error correction: (1) the automated evaluation methods commonly used only provide single perspective, and (2) there is a lack of research of the model's ability to process different type of textual data. Improvements in above issues can help us on model understanding and develop a more appropriate workflow for correcting factual errors. For first issue, this study introduce QAGS model, This model provide a border view which other evaluation model do not provided. Through the view angle of semantic, it help us to have a overall view on model performance and also find the direction on how the model can be further improved . For second issue, this study build a sequence classification mechanism to separated claim into different semantic category, and use both traditional ROUGE and SARI and new QAGS approach to evaluated the current mainstream model test result. This study provide a board perspective for evaluated the pro and con on different model by combining traditional and new evaluation approach(QAGS).

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 v 圖次 vii 表次 viii 第一章緒論 1 1.1背景 1 1.2研究動機 2 1.3 研究目的與研究貢獻 4 第二章文獻探討 6 2.1 自動化事實查核 6 2.1.1 基本流程 6 2.1.2 解釋生成 7 2.1.3 事實錯誤改正 8 2.2 事實錯誤改正的評估方法 12 2.2.1 ROUGE 13 2.2.2 SARI 15 2.3事實不一致（factual inconsistency） 16 2.3.1 事後修改（post-editing） 17 2.3.2 事實不一致的評估方式 17 2.3.3 評估模型QAGS（Question Answering and Generation for Summarization） 18 2.4 成對語句分類方法 20 2.4.1 使用BERT模型進行語句分類 21 第三章研究方法 23 3.1資料集介紹 24 3.2 引入評估模型QAGS 27 3.2.1 QAGS可靠性實驗設計 28 3.2.2 使用證據作為輸出運作QAGS 30 3.2.3 使用不同程度的QAGS 30 3.3 分類機制 32 3.4使用的改正模型以及預期評估結果 34 第四章實驗結果 36 4.1 QAGS可靠性實驗結果 36 4.1.1 QAGS可靠性結果說明 37 4.1.2 QAGS無法正常回答問題之分析與討論 40 4.1.3 同義字詞處理方案之結果分析與討論 41 4.1.4 QAGS可靠性結論 43 4.2 分類機制訓練結果 44 4.2.1以ROUGE、sentence-Bert為根基的分類算法實驗結果 44 4.2.2 以BERT為基底的分類模型實驗結果 46 4.1.3 分類機制結論 47 4.3 以ROUGE和SARI評估成果 47 4.3.1 ROUGE、SARI與資料集 48 4.3.2 ROUGE、SARI與模型結果 48 4.4 以QAGS評估成果 50 4.4.1 使用參考答案與使用證據之結果討論與分析 51 4.4.2 QAGS與模型結果 52 4.5 綜合ROUGE、SARI和QAGS評估結果 53 第五章結論與建議 56 5.1 結論 56 5.2 研究限制 57 5.3 未來展望 57 參考文獻 58

參考文獻 References
[1]Guo, Z., Schlichtkrull, M., & Vlachos, A. (2022). A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10, 178-206. [2]Thorne, J., &Vlachos, A. (2021). Evidence-based factual error correction. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 3298–3309. [3]Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, and Isabelle Augenstein. 2020. Generating Fact Checking Explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7352–7364. [4]D. Stammbach, E. Ash, e-fever: Explanations and summaries for automated fact checking, in: Proceedings of the 2020 Truth and Trust Online Conference, TTO 2020, Hacks Hackers, 2020,32 [5] Alhindi, T., Petridis, S., & Muresan, S. (2018, November). Where is your evidence: improving fact-checking by justification modeling. In Proceedings of the first workshop on fact extraction and verification (FEVER) ,85-90 [6]Kotonya, N., &Toni, F. (2020). Explainable automated fact-checking for public health claims. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 7740–7754. [7]Thorne, J., Vlachos, A., Christodoulopoulos, C., &Mittal, A. (2018). FEVER: A large-scale dataset for fact extraction and verification. NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 809–819. [8]Samarinas, C., Hsu, W., & Lee, M. L. (2021, June). Improving Evidence Retrieval for Automated Explainable Fact-Checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations , 84-91 [9]Vedula, N., & Parthasarathy, S. (2021, March). Face-keg: Fact checking explained using knowledge graphs. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining , 526-534 [10]Molina, M. D., Sundar, S. S., Le, T., & Lee, D. (2021). “Fake news” is not simply false information: a concept explication and taxonomy of online content. American behavioral scientist, 65(2), 180-212. [11]Mishra, R., Gupta, D., &Leippold, M. (2020). Generating Fact Checking Summaries for Web Claims. The 2020 Conference on Empirical Methods in Natural Language Processing EMNLP - WNUT ,81–90. [12]Kazemi, A., Li, Z., Pérez-Rosas, V., &Mihalcea, R. (2021). Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News. NLP for Internet Freedom Workshop at NAACL 2021. 45–50. [13]Orabi, M., Mouheb, D., Al Aghbari, Z., & Kamel, I. (2020). Detection of bots in social media: A systematic review. Information Processing & Management, 57(4), 102250. [14]Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., &Zettlemoyer, L. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. 7871–7880. [15]Cao, M., Dong, Y., Wu, J., &Cheung, J. C. K. (2020). Factual error correction for abstractive summarization models. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6251–6258. [16]Wang, A., Cho, K., &Lewis, M. (2020). Asking and Answering Questions to Evaluate the Factual Consistency of Summaries. ACL2020,5008–5020. [17]Yichong Huang, Xiachong Feng, Xiaocheng Feng, and Bing Qin. 2021. The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey. arXiv preprint arXiv:2104.14839 (2021). [18]Dong, Y., Wang, S., Gan, Z., Cheng, Y., Cheung, J. C. K., &Liu, J. (2020). Multi-fact correction in abstractive text summarization. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 9320–9331. [19]Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out ,74-81 [20]Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics ,311-318 [21]Xu, W., Napoles, C., Pavlick, E., Chen, Q., & Callison-Burch, C. (2016). Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4, 401-415. [22]El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. [23]Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li. 2018. Faithful to the original: Fact aware neural abstractive summarization. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, 4784–4791 [24]Kryściński, W., McCann, B., Xiong, C., &Socher, R. (2020). Evaluating the factual consistency of abstractive text summarization. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 9332–9346. [25]Wang, W. Y. (2017). “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2, 422–426. [26]Heo, H. 2021. FactSumm: Factual Consistency Scorer for Abstractive Summarization. https://github.com/Huffon/ factsumm. [27]Shah, D., Schuster, T., & Barzilay, R. (2020, April). Automatic fact-guided sentence modification. In Proceedings of the AAAI Conference on Artificial Intelligence ,Vol. 34, No. 05,8791-8798. [28]Colin Raffel, Noam Shazeer, Adam Roberts, Katherine, Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 12:1-67 [29]Goodrich, B., Rao, V., Liu, P. J., & Saleh, M. (2019, July). Assessing the factual accuracy of generated text. In proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining,166-175 [30]Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O Arnold, and Bing Xiang. 2021. Improving factual consistency of abstractive summarization via question answering. arXiv preprint arXiv:2105.04623. [31]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis,Minnesota, 4171–4186. [32]Xing, Z., Pei, J., & Keogh, E. (2010). A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1), 40-48. [33]Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150. [34]E. Keogh and S. Kasetty(2002). On the need for time series data mining benchmarks: a survey and empirical demonstration. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 102–111 [35]Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with bert. In Proceedings of the International Conference on Learning Representations. [36]Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019, October). How to fine-tune bert for text classification?. In China national conference on Chinese computational linguistics ,194-206 [37]González-Carvajal, S.; Garrido-Merchán, E.C. Comparing BERT against Traditional Machine Learning Text Classification. arXiv 2021, arXiv:2005.13012.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0811122-193510.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS