國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,使用少樣本方法檢測文本性別偏見,Advancing Gender Bias Detection in Text: A Few-Shot Approach

論文名稱 Title	使用少樣本方法檢測文本性別偏見 Advancing Gender Bias Detection in Text: A Few-Shot Approach
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	111 學年度第 2 學期 The spring semester of Academic Year 111	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	75
研究生 Author	蔣昆興 Kun-Hsing Chiang
指導教授 Advisor	黃三益 Hwang, San-Yih
召集委員 Convenor	林福仁 Lin, Fu-Ren
口試委員 Advisory Committee	羅珮綺 Lo, Pei-Chi
口試日期 Date of Exam	2023-07-07	繳交日期 Date of Submission	2023-08-28
關鍵字 Keywords	性別偏見偵測、大型語言模型、少樣本學習、提示調整、Social Bias Inference Corpus (SBIC) Gender Bias Detection, Large Language Models, Few-Shot Learning, Prompt Tuning, Social Bias Inference Corpus (SBIC)
統計 Statistics	本論文已被瀏覽 320 次，被下載 0 次 The thesis/dissertation has been browsed 320 times, has been downloaded 0 times.

中文摘要
隨著AI語言模型技術的快速發展，如何確保模型的公平性和減少偏見已成為一個迫切需要解決的問題。本研究提出一種新穎的用於檢測文本中性別偏見的方法，該方法結合了大型語言模型和少樣本學習技術。我們的方法主要使用OpenAI生產的GPT-3.5 Turbo模型，通過少樣本學習來進行性別偏見的自動檢測。與傳統依賴於繁瑣手工標記和專門知識的方法不同，我們的策略更為自動化、高效和可擴展。我們還引入了提示調整（Prompt Tuning）的概念，以更精確地引導模型生成回應，從而能在更細緻的層面上分析偏見。實驗結果使用了Social Bias Inference Corpus（SBIC）資料集，證明了我們的方法在檢測不同範疇（例如目標群體和冒犯程度）的性別偏見方面具有高度的有效性。此外，我們的方法還能生成推理理由，以深入了解模型在偏見檢測過程中的行為。本研究不僅為AI公平性的持續討論做出了有意義的貢獻，也為對AI模型中偏見檢測和減少的研究開闢了新的途徑。通過這一方法，我們為實現更公平和包容的語言模型提供了一條可行的路徑。
Abstract
We introduce a new method for detecting gender bias in text using large language models and few-shot learning, highlighting the urgent need for bias mitigation in the rapidly advancing field of AI language models. We propose a methodology that uses GPT-3.5 Turbo produce by OpenAI's, to perform few-shot learning tasks for bias detection. Unlike traditional methods, our approach is automated, efficient, and scalable. The method incorporates the process of prompt tuning to guide the model's response generation, enabling us to investigate bias at a fine-grained level. Experimental results, using the Social Bias Inference Corpus (SBIC) dataset, demonstrate the efficacy of our method in accurately detecting gender bias across various scopes. Our approach also allows for reason generation, providing insight into the model's bias detection process. This work contributes to the ongoing discussions on AI fairness, opening up new avenues for researchers interested in bias detection and mitigation in AI models. Through our approach, we offer a way forward for more equitable and inclusive language models.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv Chapter 1 - Introduction 1 Chapter 2 - Related Work 6 2.1 Gender bias in text 6 2.2 Early Attempts 7 2.2.1 Lexicon-Based Methods for Gender Bias Detection 7 2.2.2 Emergence of Supervised Learning Models 9 2.2.3 Advancing Towards Automated Reason Generation 11 2.3 The Importance of Downstream Applications and Training Datasets 12 2.4 Large Language Models as a Solution 13 Chapter 3 - Methodology 17 3.1 Task description 17 3.2 Comprehending and Contextualizing Sentence Semantics 19 3.2.1 Sentence Deconstruction 19 3.2.2 Contextual Understanding 19 3.3 Deconstructing Gender Bias 20 3.3.1 Target Gender Group 22 3.3.2 Offensive 23 3.3.3 Subjectivity or Intent 23 3.3.4 Sentiment 23 3.3.5 Lewd 24 3.4 Prompt Design 25 3.4.1 Single Label Classification (Base Method) 25 3.4.2 Single-Pass Chain of Thought 26 3.4.3 Two-Stage Chain of Thought 30 3.5 Four Variations of Input for Gender Bias Detection 33 3.5.1 Rewrite 34 3.5.2 Self-Consistency 37 3.5.3 Four Variations: Combining Rewrite and Self-Consistency 39 Chapter 4 - Experiment 41 4.1 Dataset 41 4.2 GPT-3.5 Turbo Overview 42 4.3 Experiment Setup 43 4.4 Results 44 4.4.1 Evaluation Metrics 44 4.4.2 Single Label Classification (Base Method) 46 4.4.3 Single-Pass Chain of Thought 49 4.4.4 Two-Stage Chain of Thought 51 4.4.5 Reason Generation Results 53 4.5 Discussion 55 4.5.1 Model Outputs and Inherent Challenges 55 4.5.2 Limitations and Data Imbalance Issues 57 4.5.3 Lack of Baseline and Evaluation Challenges 58 4.5.4 Further Discussion 59 Chapter 5 - Conclusion 62 Reference 65

參考文獻 References
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?? Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. Cryan, J., Tang, S., Zhang, X., Metzger, M., Zheng, H., & Zhao, B. Y. (2020). Detecting gender stereotypes: Lexicon vs. supervised learning methods. Proceedings of the 2020 CHI conference on human factors in computing systems, Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Du, Y., Zheng, Q., Wu, Y., Lan, M., Yang, Y., & Ma, M. (2022). Understanding Gender Bias in Knowledge Base Embeddings. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4), 1-30. Founta, A., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., & Kourtellis, N. (2018). Large scale crowdsourcing and characterization of twitter abusive behavior. Proceedings of the international AAAI conference on web and social media, Goldfarb-Tarrant, S., Marchant, R., Sánchez, R. M., Pandya, M., & Lopez, A. (2020). Intrinsic bias metrics do not correlate with application bias. arXiv preprint arXiv:2012.15859. Nadeem, M., Bethke, A., & Reddy, S. (2020). Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. Proceedings of the 25th international conference on world wide web, Park, J. H., Shin, J., & Fung, P. (2018). Reducing gender bias in abusive language detection. arXiv preprint arXiv:1808.07231. Patel, A., Oza, P., & Agrawal, S. (2023). Sentiment Analysis of Customer Feedback and Reviews for Airline Services using Language Representation Model. Procedia Computer Science, 218, 2459-2467. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, Roberts, A., Gaizauskas, R., Hepple, M., Davis, N., Demetriou, G., Guo, Y., Kola, J. S., Roberts, I., Setzer, A., & Tapuria, A. (2007). The CLEF corpus: semantic annotation of clinical text. AMIA Annual Symposium Proceedings, Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2018). Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301. Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., & Choi, Y. (2019). Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891. Snow, R., O’connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 conference on empirical methods in natural language processing, Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. arXiv preprint arXiv:1906.00591. Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., & Daelemans, W. (2016). A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837. Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2025-08-28 校外 Off-campus：開放下載的時間 available 2025-08-28 您的 IP(校外) 位址是 216.73.216.221 現在時間是 2025-05-25 論文校外開放下載的時間是 2025-08-28 Your IP address is 216.73.216.221 The current date is 2025-05-25 This thesis will be available to you on 2025-08-28.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2025-08-28

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS