Responsive image
博碩士論文 etd-0728123-224043 詳細資訊
Title page for etd-0728123-224043
論文名稱
Title
使用少樣本方法檢測文本性別偏見
Advancing Gender Bias Detection in Text: A Few-Shot Approach
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
75
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2023-07-07
繳交日期
Date of Submission
2023-08-28
關鍵字
Keywords
性別偏見偵測、大型語言模型、少樣本學習、提示調整、Social Bias Inference Corpus (SBIC)
Gender Bias Detection, Large Language Models, Few-Shot Learning, Prompt Tuning, Social Bias Inference Corpus (SBIC)
統計
Statistics
本論文已被瀏覽 76 次,被下載 0
The thesis/dissertation has been browsed 76 times, has been downloaded 0 times.
中文摘要
隨著AI語言模型技術的快速發展,如何確保模型的公平性和減少偏見已成為一個迫切需要解決的問題。本研究提出一種新穎的用於檢測文本中性別偏見的方法,該方法結合了大型語言模型和少樣本學習技術。我們的方法主要使用OpenAI生產的GPT-3.5 Turbo模型,通過少樣本學習來進行性別偏見的自動檢測。與傳統依賴於繁瑣手工標記和專門知識的方法不同,我們的策略更為自動化、高效和可擴展。
我們還引入了提示調整(Prompt Tuning)的概念,以更精確地引導模型生成回應,從而能在更細緻的層面上分析偏見。實驗結果使用了Social Bias Inference Corpus(SBIC)資料集,證明了我們的方法在檢測不同範疇(例如目標群體和冒犯程度)的性別偏見方面具有高度的有效性。此外,我們的方法還能生成推理理由,以深入了解模型在偏見檢測過程中的行為。
本研究不僅為AI公平性的持續討論做出了有意義的貢獻,也為對AI模型中偏見檢測和減少的研究開闢了新的途徑。通過這一方法,我們為實現更公平和包容的語言模型提供了一條可行的路徑。
Abstract
We introduce a new method for detecting gender bias in text using large language models and few-shot learning, highlighting the urgent need for bias mitigation in the rapidly advancing field of AI language models.
We propose a methodology that uses GPT-3.5 Turbo produce by OpenAI's, to perform few-shot learning tasks for bias detection. Unlike traditional methods, our approach is automated, efficient, and scalable. The method incorporates the process of prompt tuning to guide the model's response generation, enabling us to investigate bias at a fine-grained level.
Experimental results, using the Social Bias Inference Corpus (SBIC) dataset, demonstrate the efficacy of our method in accurately detecting gender bias across various scopes. Our approach also allows for reason generation, providing insight into the model's bias detection process.
This work contributes to the ongoing discussions on AI fairness, opening up new avenues for researchers interested in bias detection and mitigation in AI models. Through our approach, we offer a way forward for more equitable and inclusive language models.
目次 Table of Contents
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
Chapter 1 - Introduction 1
Chapter 2 - Related Work 6
2.1 Gender bias in text 6
2.2 Early Attempts 7
2.2.1 Lexicon-Based Methods for Gender Bias Detection 7
2.2.2 Emergence of Supervised Learning Models 9
2.2.3 Advancing Towards Automated Reason Generation 11
2.3 The Importance of Downstream Applications and Training Datasets 12
2.4 Large Language Models as a Solution 13
Chapter 3 - Methodology 17
3.1 Task description 17
3.2 Comprehending and Contextualizing Sentence Semantics 19
3.2.1 Sentence Deconstruction 19
3.2.2 Contextual Understanding 19
3.3 Deconstructing Gender Bias 20
3.3.1 Target Gender Group 22
3.3.2 Offensive 23
3.3.3 Subjectivity or Intent 23
3.3.4 Sentiment 23
3.3.5 Lewd 24
3.4 Prompt Design 25
3.4.1 Single Label Classification (Base Method) 25
3.4.2 Single-Pass Chain of Thought 26
3.4.3 Two-Stage Chain of Thought 30
3.5 Four Variations of Input for Gender Bias Detection 33
3.5.1 Rewrite 34
3.5.2 Self-Consistency 37
3.5.3 Four Variations: Combining Rewrite and Self-Consistency 39
Chapter 4 - Experiment 41
4.1 Dataset 41
4.2 GPT-3.5 Turbo Overview 42
4.3 Experiment Setup 43
4.4 Results 44
4.4.1 Evaluation Metrics 44
4.4.2 Single Label Classification (Base Method) 46
4.4.3 Single-Pass Chain of Thought 49
4.4.4 Two-Stage Chain of Thought 51
4.4.5 Reason Generation Results 53
4.5 Discussion 55
4.5.1 Model Outputs and Inherent Challenges 55
4.5.2 Limitations and Data Imbalance Issues 57
4.5.3 Lack of Baseline and Evaluation Challenges 58
4.5.4 Further Discussion 59
Chapter 5 - Conclusion 62
Reference 65
參考文獻 References
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?? Proceedings of the 2021 ACM conference on fairness, accountability, and transparency,
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
Cryan, J., Tang, S., Zhang, X., Metzger, M., Zheng, H., & Zhao, B. Y. (2020). Detecting gender stereotypes: Lexicon vs. supervised learning methods. Proceedings of the 2020 CHI conference on human factors in computing systems,
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Du, Y., Zheng, Q., Wu, Y., Lan, M., Yang, Y., & Ma, M. (2022). Understanding Gender Bias in Knowledge Base Embeddings. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4), 1-30.
Founta, A., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., & Kourtellis, N. (2018). Large scale crowdsourcing and characterization of twitter abusive behavior. Proceedings of the international AAAI conference on web and social media,
Goldfarb-Tarrant, S., Marchant, R., Sánchez, R. M., Pandya, M., & Lopez, A. (2020). Intrinsic bias metrics do not correlate with application bias. arXiv preprint arXiv:2012.15859.
Nadeem, M., Bethke, A., & Reddy, S. (2020). Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. Proceedings of the 25th international conference on world wide web,
Park, J. H., Shin, J., & Fung, P. (2018). Reducing gender bias in abusive language detection. arXiv preprint arXiv:1808.07231.
Patel, A., Oza, P., & Agrawal, S. (2023). Sentiment Analysis of Customer Feedback and Reviews for Airline Services using Language Representation Model. Procedia Computer Science, 218, 2459-2467.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining,
Roberts, A., Gaizauskas, R., Hepple, M., Davis, N., Demetriou, G., Guo, Y., Kola, J. S., Roberts, I., Setzer, A., & Tapuria, A. (2007). The CLEF corpus: semantic annotation of clinical text. AMIA Annual Symposium Proceedings,
Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2018). Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301.
Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., & Choi, Y. (2019). Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891.
Snow, R., O’connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 conference on empirical methods in natural language processing,
Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. arXiv preprint arXiv:1906.00591.
Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., & Daelemans, W. (2016). A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837.
Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2025-08-28
校外 Off-campus:開放下載的時間 available 2025-08-28

您的 IP(校外) 位址是 18.224.149.242
現在時間是 2024-04-27
論文校外開放下載的時間是 2025-08-28

Your IP address is 18.224.149.242
The current date is 2024-04-27
This thesis will be available to you on 2025-08-28.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2025-08-28

QR Code