國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,啟發式演算法在大型語言模型提示優化中的應用與效能分析之研究,Research on the Application and Performance Analysis of Heuristic Algorithms in Large Language Model Prompt Optimization

論文名稱 Title	啟發式演算法在大型語言模型提示優化中的應用與效能分析之研究 Research on the Application and Performance Analysis of Heuristic Algorithms in Large Language Model Prompt Optimization
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	112 學年度第 1 學期 The fall semester of Academic Year 112	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	55
研究生 Author	謝博丞 Po-Cheng Hsieh
指導教授 Advisor	李偉柏 Lee,Wei-Po
召集委員 Convenor	彭元隆 Yuan-Long Peng
口試委員 Advisory Committee	楊宗憲 Tsung-Hsien Yang
口試日期 Date of Exam	2024-01-29	繳交日期 Date of Submission	2024-01-31
關鍵字 Keywords	啟發式演算法、模擬退火演算法、粒子群演算法、大型語言模型、提示工程 Large Language Model, Heuristic Algorithm, Simulated Annealing, Particle Swarm Optimization, Prompt Engineering
統計 Statistics	本論文已被瀏覽 661 次，被下載 0 次 The thesis/dissertation has been browsed 661 times, has been downloaded 0 times.

中文摘要
近年來，大型語言模型（Large Language Model, LLM）在人工智慧領域的迅猛發展引起了廣泛的關注。這些模型以深度學習技術為基礎，已經在各種自然語言處理（Natural Language Processing, NLP）任務中展現出驚人的能力，如文本生成、語言翻譯、情感分析等。特別值得注意的是，LLM的優化主要集中在兩個方面：Fine-tuning和Prompt Engineering。Fine-tuning是一種改進模型性能的方法，但通常需要大量的計算資源和時間。而Prompt Engineering則專注於設計和改進模型的輸入提示，以提高其效率和精準度，但這同樣需要大量的人工參與和專業知識。在此背景下，一項新的研究(Guo et al., 2023) 使用了最佳化演算法，嘗試自動化Prompt Engineering的過程。這種方法的目標是減少對專業人員的依賴，從而降低成本和提高效率。本研究進一步探討這一研究領域，深入研究如何利用兩種不同的啟發式算法:粒子群演算法（Particle Swarm Optimization, PSO）和模擬退火演算法（Simulated Annealing, SA）來優化LLM的Prompt。研究重點是比較PSO與基因演算法（Genetic Algorithm, GA）、差分進化演算法（Differential Evolution Algorithm, DE）在Prompt Engineering方面的不同效果及其優勢，另外本研究還探討了SA與其他三種群體優化演算法之間的差異，探討個體演算法在Prompt Engineering上的成效。我們通過對多種NLP任務，如自然語言理解（Language Understanding）、自然語言摘要（Text Summarization）和自然語言簡化（Text Simplification）等，使用不同的資料集進行了一系列實驗。這些實驗旨在驗證不同優化方法的有效性。實驗結果顯示，PSO和SA在提升Prompt品質和整體LLM性能方面各有優勢，特別是PSO在增強Prompt的多樣性和品質方面表現出色，而SA則展現了其在不同任務和情境下的靈活適應性。這些研究成果不僅為LLM的優化提供了新的方法和視角，也為未來人工智慧的研究和應用開闢了新的道路，特別是在自然語言處理這一快速發展的領域。透過這些先進的優化技術，我們可以期待未來LLM在各種實際應用中將發揮更大的作用，從而推動整個人工智慧領域的發展。
Abstract
In recent years, the development of Large Language Models (LLMs) in AI has gained significant attention. These models excel in various Natural Language Processing tasks like text generation, translation, and sentiment analysis. Their optimization focuses mainly on two areas: Fine-tuning and Prompt Engineering. Fine-tuning improves model performance but requires extensive resources, while Prompt Engineering involves crafting input prompts for better efficiency and accuracy, demanding substantial expertise. A new study(Guo et al., 2023) employed optimization algorithms to automate Prompt Engineering, aiming to reduce professional reliance, lower costs, and increase efficiency. This research used heuristic algorithms like Particle Swarm Optimization (PSO) and Simulated Annealing (SA) for LLM prompt optimization. The study compared these algorithms' effectiveness in Prompt Engineering, demonstrating PSO's strength in enhancing prompt diversity and SA's adaptability across various tasks. We conducted a series of experiments using different datasets for various NLP tasks, such as Language Understanding, Text Summarization, and Text Simplification. These experiments aimed to verify the effectiveness of different optimization methods. The results showed that both PSO and SA have their advantages in enhancing prompt quality and overall LLM performance. In particular, PSO excels in enhancing the diversity and quality of prompts, while SA demonstrates its flexibility and adaptability across different tasks and contexts. These research findings not only provide new methods and perspectives for the optimization of LLMs but also pave new paths for future research and applications in artificial intelligence, especially in the rapidly evolving field of natural language processing. Through these advanced optimization techniques, we can expect LLMs to play a more significant role in various practical applications in the future, thereby driving the development of the entire field of artificial intelligence.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 viii 第一章　緒論 1 1.1研究背景 1 1.2研究動機 2 1.3研究目的 3 第二章　文獻探討 4 2.1大型語言模型 4 2.2 Prompt與大型語言模型 5 2.2.1自動優化提示（Auto Prompting） 5 2.2.2離散型提示（Discrete Prompt）和連續型提示（Continuous Prompt） 5 2.2.3各式Prompt Engineering 6 2.3啟發式演算法與大型語言模型 7 2.3.1進化演算法與大型語言模型 7 2.3.2粒子群演算法 7 2.3.3模擬退火演算法 8 2.4粒子群演算法與GA和DE的差別 9 第三章　研究方法 10 3.1研究方法設計 10 3.2以粒子群演算法優化Prompt 10 3.2.1 粒子群演算法整體流程架構 11 3.2.2粒子群演算法 12 3.2.3將粒子群演算法套用在Prompt的優化上 13 3.3以模擬退火演算法優化Prompt 15 3.3.1模擬退火演算法整體流程架構 15 3.3.2模擬退火演算法 16 3.3.3將模擬退火演算法套用在Prompt的優化上 16 第四章實驗結果與討論 17 4.1實驗環境介紹 17 4.2資料集介紹 17 4.2.1自然語言理解 18 4.2.2自然語言生成 20 4.3將生成的文字對應到標籤 20 4.3.1建立生成關鍵字字典 20 4.3.2查詢字典 21 4.4基線比較(Compared Baselines) 21 4.5評估指標 22 4.5.1準確度(Accuracy) 22 4.5.2 Rouge 22 4.5.3 SARI 23 4.6實驗流程與設計 24 4.6.1初始Prompts建立 24 4.6.2粒子群演算法 25 4.6.3模擬退火演算法 26 4.6.4自然語言任務變數 27 4.7實驗結果與分析 31 4.7.1實驗結果 31 4.7.2粒子群演算法的優勢 35 4.7.3模擬退火演算法的優勢 40 第五章結論與未來展望 41 5.1結論 41 5.2未來展望 41 第六章參考文獻 42

參考文獻 References
[1]Almufti, S., Zebari, A., & Omer, H. (2019). A comparative study of particle swarm optimization and genetic algorithm. 8, 40-45. https://doi.org/10.14419/jacst.v8i2.29401 [2]Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., & Specia, L. (2020). ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. arXiv:2005.00481. Retrieved May 01, 2020, from https://ui.adsabs.harvard.edu/abs/2020arXiv200500481A https://arxiv.org/pdf/2005.00481.pdf [3]Bach, S. H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Saiful Bari, M., Fevry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Chhablani, G., Wang, H., Fries, J. A., . . . Rush, A. M. (2022). PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. arXiv:2202.01279. Retrieved February 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220201279B [4]Deng, M., Wang, J., Hsieh, C.-P., Wang, Y., Guo, H., Shu, T., Song, M., Xing, E. P., & Hu, Z. (2022). RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. arXiv:2205.12548. Retrieved May 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220512548D [5]Guo, Q., Wang, R., Guo, J., Li, B., Song, K., Tan, X., Liu, G., Bian, J., & Yang, Y. (2023). Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. arXiv:2309.08532. Retrieved September 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230908532G https://arxiv.org/pdf/2309.08532.pdf [6]Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. [7]Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA. https://doi.org/10.1145/1014052.1014073 [8]Kachitvichyanukul, V. (2012). Comparison of three evolutionary algorithms: GA, PSO, and DE. Industrial Engineering and Management Systems, 12, 215-223. https://doi.org/10.7232/iems.2012.11.3.215 [9]Kennedy, J., & Eberhart, R. (1995, 27 Nov.-1 Dec. 1995). Particle swarm optimization. Proceedings of ICNN'95 - International Conference on Neural Networks, [10]Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671-680. https://doi.org/doi:10.1126/science.220.4598.671 [11]Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916. Retrieved May 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220511916K https://arxiv.org/pdf/2205.11916.pdf [12]Li, B., Wang, R., Guo, J., Song, K., Tan, X., Hassan, H., Menezes, A., Xiao, T., Bian, J., & Zhu, J. (2023). Deliberate then Generate: Enhanced Prompting Framework for Text Generation. arXiv:2305.19835. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230519835L [13]Li, M., Wang, W., Feng, F., Cao, Y., Zhang, J., & Chua, T.-S. (2023, December). Robust Prompt Optimization for Large Language Models Against Distribution Shifts. In H. Bouamor, J. Pino, & K. Bali, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Singapore. [14]Mishra, S., Khashabi, D., Baral, C., & Hajishirzi, H. (2022, May). Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In S. Muresan, P. Nakov, & A. Villavicencio, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Dublin, Ireland. [15]Pang, B., & Lee, L. (2004, July). A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) Barcelona, Spain. [16]Pang, B., & Lee, L. (2005, June). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In K. Knight, H. T. Ng, & K. Oflazer, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) Ann Arbor, Michigan. [17]Passigan, P., Yohannes, K., & Pereira, J. (2023). Continuous Prompt Generation from Linear Combination of Discrete Prompt Embeddings. arXiv:2312.10323. Retrieved December 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv231210323P [18]Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic Prompt Optimization with "Gradient Descent" and Beam Search. arXiv:2305.03495. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230503495P [19]Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. [20]Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Le Scao, T., Raja, A., Dey, M., Saiful Bari, M., Xu, C., Thakker, U., Sharma Sharma, S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N., . . . Rush, A. M. (2021). Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv:2110.08207. Retrieved October 01, 2021, from https://ui.adsabs.harvard.edu/abs/2021arXiv211008207S [21]Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013, October). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In D. Yarowsky, T. Baldwin, A. Korhonen, K. Livescu, & S. Bethard, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing Seattle, Washington, USA. [22]Storn, R., & Price, K. (1997). Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11, 341-359. [23]Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., & Hashimoto, T. B. (2023). Stanford alpaca: An instruction-following llama model. In. [24]Tharwat, A., & Schenck, W. (2021). A conceptual and practical comparison of PSO-style optimization algorithms. Expert Systems with Applications, 167, 114430. https://doi.org/https://doi.org/10.1016/j.eswa.2020.114430 [25]Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971. Retrieved February 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230213971T https://arxiv.org/pdf/2302.13971.pdf [26]Voorhees, E. M., & Tice, D. M. (2000). Building a question answering test collection Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece. https://doi.org/10.1145/345508.345577 https://dl.acm.org/doi/pdf/10.1145/345508.345577 [27]Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903. Retrieved January 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220111903W https://arxiv.org/pdf/2201.11903.pdf [28]Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Singh Koura, P., Sridhar, A., Wang, T., & Zettlemoyer, L. (2022). OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068. Retrieved May 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220501068Z [29]Zhang, W., Deng, Y., Liu, B., Jialin Pan, S., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check. arXiv:2305.15005. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230515005Z [30]Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification. arXiv:1509.01626. Retrieved September 01, 2015, from https://ui.adsabs.harvard.edu/abs/2015arXiv150901626Z https://arxiv.org/pdf/1509.01626.pdf [31]Zhang, Y., Cui, L., Cai, D., Huang, X., Fang, T., & Bi, W. (2023). Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance. arXiv:2305.13225. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230513225Z http://arxiv.org/pdf/2305.13225.pdf [32]Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. arXiv:2205.10625. Retrieved May 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv220510625Z https://arxiv.org/pdf/2205.10625.pdf [33]Zhou, Y., Ioan Muresanu, A., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910. Retrieved November 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv221101910Z https://arxiv.org/pdf/2211.01910.pdf

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2027-01-31 校外 Off-campus：開放下載的時間 available 2027-01-31 您的 IP(校外) 位址是 216.73.216.54 現在時間是 2025-06-17 論文校外開放下載的時間是 2027-01-31 Your IP address is 216.73.216.54 The current date is 2025-06-17 This thesis will be available to you on 2027-01-31.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2027-01-31

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS