Responsive image
博碩士論文 etd-0626123-123502 詳細資訊
Title page for etd-0626123-123502
論文名稱
Title
應用提示學習擴充資料於自然語言理解任務
Data Augmentation by Prompt Tuning on Natural Language Understanding Task
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
53
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2023-07-07
繳交日期
Date of Submission
2023-07-26
關鍵字
Keywords
實體萃取、意圖分類、資料擴充、自然語言理解、提示學習
Entity Extraction, Intent Classification, Data Augmentation, Natural Language Understanding, Prompt tuning
統計
Statistics
本論文已被瀏覽 165 次,被下載 0
The thesis/dissertation has been browsed 165 times, has been downloaded 0 times.
中文摘要
隨著自然語言技術的進步,現在許多客服會採用聊天機器人來輔助使用者取得資訊或是提供服務,聊天機器人又可分為從使用者輸入中獲取資訊的自然語言理解模組以及對話流程控制模組,其中自然語言理解任務又包含實體辨識以及意圖分類。在訓練自然語言模型時需要大量資料,然而,我們可以獲取的訓練資料卻不多,此時便需要資料擴充技術來協助資料的產生。

本研究藉由預訓練語言模型,透過對預訓練語言模型再訓練,來生成目標領域內的資料,並在後續透過分類器的篩選來提升資料品質,最後再用篩選過的資料訓練分類器。

我們基於 PromDA (Wang & Xu, 2022) 提出了多任務的生成架構。藉由整合意圖分類以及實體辨識兩個任務,使生成的資料能夠用於多任務的訓練,並用以證明此類整合可以提升兩個任務的準確度。
Abstract
With the advancement of natural language technology, many customer service systems now employ chatbots to assist users in obtaining information or providing services. Chatbots can be divided into two main modules: natural language understanding (NLU) modules, which extract information from user inputs, and dialogue flow control modules. The NLU task includes entity recognition and intent classification. Training natural language models requires a large amount of data, but the available training data is often limited. In such cases, data augmentation techniques are employed to generate additional data.

In this study, we leverage pre-trained language models and further train them to generate data in the target domain. The generated data is then enhanced by a classifier filtering process to improve its quality. Finally, the filtered data is used to train the classifier.

Building upon PromDA (Wang et al., 2022), we proposes a multi-task generation framework. By integrating intent classification and entity recognition tasks, the generated data can be used for multi-task training, and it demonstrates that such integration can enhance the accuracy of both tasks.



目次 Table of Contents
審定書 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 v
表目錄 vi
1. Introduction 1
2. Related Work 5
2.1 Few-shot Learning 5
2.2 Prompt Tuning 5
2.2.1 Discrete Prompt 6
2.2.2 Continuous Prompt 7
2.3 Data Augmentation 9
2.3.1 Ruled-based 9
2.3.2 Interpolation 9
2.3.3 Model-based 10
2.4 Pre-train Language Model 10
2.4.1 T5 10
2.4.2 BERT 11
2.4.3 GPT and ChatGPT 12
3. Method 15
3.1 Prompt-based Generator 17
3.2 Pre-train for Prompt Initialization 17
3.3 Finetune Generator 18
3.4 Generative DA 21
3.5 Consistency Filtering 24
3.6 Finetune Classifier 25
4. Experiments 27
4.1 Experiment Settings 27
4.2 Result 30
4.3 Discussion 32
4.3.1 Slot Type 32
4.3.2 LLM as Generator 35
5. Conclusion 37
Reference 38
Appendix 43
A. Performance summary 43
B. Prompt with LLM 44
參考文獻 References
Anaby-Tavor, A., Carmeli, B., Goldbraich, E., Kantor, A., Kour, G., Shlomov, S., Tepper, N., & Zwerdling, N. (2020). Do Not Have Enough Data? Deep Learning to the Rescue! Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7383-7390. https://doi.org/10.1609/aaai.v34i05.6233
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Chen, J., Yang, Z., & Yang, D. (2020). MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.194
Dai, X., & Adel, H. (2020, December). An Analysis of Simple Data Augmentation for Named Entity Recognition.Proceedings of the 28th International Conference on Computational Linguistics Barcelona, Spain (Online).
De Cao, N., Izacard, G., Riedel, S., & Petroni, F. (2020). Autoregressive entity retrieval. arXiv preprint arXiv:2010.00904.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT Pre-training of Deep Bidirectional Transformers for Language Understanding.Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies, Volume 1 (Long and Short Papers) Minneapolis, Minnesota.
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552.
Ding, B., Liu, L., Bing, L., Kruengkrai, C., Nguyen, T. H., Joty, S., Si, L., & Miao, C. (2020). DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.488
FitzGerald, J., Hench, C., Peris, C., Mackie, S., Rottmann, K., Sanchez, A., Nash, A., Urbach, L., Kakarala, V., Singh, R., Ranganath, S., Crist, L., Britan, M., Leeuwis, W., Tur, G., & Natarajan, P. (2023). MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.235
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.-Y., Cubuk, E. D., Le, Q. V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
Gu, Y., Han, X., Liu, Z., & Huang, M. (2022). PPT: Pre-trained Prompt Tuning for Few-shot Learning. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.576
Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. Proceedings of the IEEE international conference on computer vision,
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. International Conference on Machine Learning,
Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2020). How Can We Know What Language Models Know? (Vol. 8). MIT Press. https://doi.org/10.1162/tacl_a_00324
Kobayashi, S. (2018). Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2072
Kumar, V., Glaude, H., de Lichy, C., & Campbell, W. (2019, November). A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification.Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) Hong Kong, China.
Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
Li, X. L., & Liang, P. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353
Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., & Tang, J. (2022). P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),
Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., & Tang, J. (2021). GPT understands, too. arXiv preprint arXiv:2103.10385.
Ng, N., Cho, K., & Ghassemi, M. (2020). SSMBA: Self-supervised manifold based data augmentation for improving out-of-domain robustness. arXiv preprint arXiv:2009.10195.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551.
Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. Text mining: applications and theory, 1-20.
Schick, T., & Schütze, H. (2021). Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.20
Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Kumar, A., Feris, R., Giryes, R., & Bronstein, A. (2018). Delta-encoder: an effective sample synthesis method for few-shot object recognition. Advances in neural information processing systems, 31.
Sellam, T., Das, D., & Parikh, A. (2020). BLEURT: Learning Robust Metrics for Text Generation. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.704
Sennrich, R., Haddow, B., & Birch, A. (2016). Improving Neural Machine Translation Models with Monolingual Data. Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1009
Wang, Y., Xu, C., Sun, Q., Hu, H., Tao, C., Geng, X., & Jiang, D. (2022). PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.292
Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
Xu, X., Wang, G., Kim, Y.-B., & Lee, S. (2021). AugNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.95
Yang, Y., Malaviya, C., Fernandez, J., Swayamdipta, S., Le Bras, R., Wang, J.-P., Bhagavatula, C., Choi, Y., & Downey, D. (2020). Generative Data Augmentation for Commonsense Reasoning. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.90
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF international conference on computer vision,
Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
Zhang, R., Yu, Y., & Zhang, C. (2020). SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.691
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2025-07-26
校外 Off-campus:開放下載的時間 available 2025-07-26

您的 IP(校外) 位址是 18.118.255.51
現在時間是 2024-11-21
論文校外開放下載的時間是 2025-07-26

Your IP address is 18.118.255.51
The current date is 2024-11-21
This thesis will be available to you on 2025-07-26.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2025-07-26

QR Code