國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於強化學習之四軸無人機飛行姿態控制器設計,Design of flight attitude controller for quadcopter UAV based on reinforcement learning

論文名稱 Title	基於強化學習之四軸無人機飛行姿態控制器設計 Design of flight attitude controller for quadcopter UAV based on reinforcement learning
系所名稱 Department	機械與機電工程學系 Department of Mechanical and Electro-Mechanical Engineering
畢業學年期 Year, semester	112 學年度第 1 學期 The fall semester of Academic Year 112	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	114
研究生 Author	曾彥文 Yan-Wen Zeng
指導教授 Advisor	許煜亮 Yu-Liang Hsu
召集委員 Convenor	王振興 Jeen-Shing Wang
口試委員 Advisory Committee	王郁仁, 沈聖智, 張興政 Yu-Jen Wang; Sheng-Chih Shen; Hsing-Cheng Chang
口試日期 Date of Exam	2023-07-14	繳交日期 Date of Submission	2023-10-11
關鍵字 Keywords	四旋翼無人機、飛行姿態、強化學習、PID控制器、Q學習 Quadcopter, flight attitude angle, reinforcement learning, PID controller, Q-learning
統計 Statistics	本論文已被瀏覽 262 次，被下載 0 次 The thesis/dissertation has been browsed 262 times, has been downloaded 0 times.

中文摘要
近年來隨著科技快速的發展，使得四旋翼無人機具有結構簡單、操作靈活、成本低廉及應用廣泛等優點，所以不論是在民生應用或是軍事需求上皆已被廣泛地使用。然而，四旋翼無人機在執行任務時，會因為其本身的高耦合性及易受外界干擾等特性，導致其飛行穩定性一直以來都是四旋翼無人機研究的重要課題之一。因此，本論文之研究目的在於透過強化學習中的Q-Learning演算法來加以動態調整控制飛行姿態的PID控制器中的參數，藉此維持四旋翼無人機執行任務時的飛行水平姿態穩定性。首先，本論文整合微控制器、慣性感測模組、接收機、電子調速器、直流無刷馬達、螺旋槳、電池與機架，形成一套四旋翼無人機飛行系統；接著，透過慣性感測模組來加以量測四旋翼無人機在飛行時的姿態角度，並透過微控制器來加以計算穩定飛行時的姿態角度與當下四旋翼無人機飛行姿態角度之誤差，並藉此作為PID控制器之輸入訊號；然後，我們透過Q-Learning演算法設計獎勵分數優化控制策略，來加以動態優化調整PID控制器中的比例增益(k_p)、積分增益(k_i)、微分增益(k_d)等參數；最後，由PID控制器輸出PWM訊號控制四旋翼無人機的四顆馬達運轉，藉此控制其飛行姿態角度，來加以確保其在執行任務時的飛行水平穩定性。經由實驗結果顯示，本論文所提出之基於強化學習之PID控制器相較於傳統的PID控制器，可更有效率地動態優化調整PID控制器參數，使得四旋翼無人機可在外界干擾的情況下，使振盪特性快速收斂至穩定。
Abstract
In recent years, with the rapid advancement of technology, quadcopters have gain popularity due to their simple structure, flexible operation, low cost, and wide range of applications. As a result, they have been widely used in both civilian and military fields. However, quadcopters face challenges in maintaining flight stability due to their inherent high coupling and sensitivity to external disturbances. Therefore, the objective of this study is to enhance the flight stability of quadcopters by dynamically adjusting the parameters of the PID controller for flight attitude control using the Q-learning in reinforcement learning. First, we integrate a microcontroller, an inertial measurement unit (IMU) module, a receiver, electronic speed controllers, brushless DC motors, propellers, batteries, and a frame to form a quadcopter flight system. Next, the quadcopter’s attitude angles during flight are measured using the IMU module. The microcontroller calculates the error between the desired stable flight attitude angles and the current quadcopter attitude angles. The attitude angle error is then used as the input signal for the PID controller. Subsequently, we design a reward optimization strategy using the Q-Learning algorithm to dynamically optimize and adjust the parameters of the PID controller, such as the proportional gain (k_p), integral gain (k_i) and derivative gain (k_d). Finally, the PID controller outputs PWM signals to control the operation of the four brushless DC motors of the quadcopter, thereby controlling its flight attitude angles to ensure flight stability during task execution. Experimental results demonstrate that the reinforcement learning-based PID controller proposed in this paper is more efficient in dynamically optimizing and adjusting PID controller parameters compared to traditional PID controllers. This allows the quadcopter to rapidly converge to stability in the external disturbances.

目次 Table of Contents
目錄論文審定書 i 致謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖目錄 viii 表目錄 xii 第一章緒論 1 1.1 研究動機 1 1.2 文獻探討 3 1.3 研究目的 7 1.4 論文架構 8 第二章四旋翼無人機系統 10 2.1 無人機系統 10 2.2 微控制器 12 2.3 慣性感測器 16 2.4 無刷直流馬達 20 2.5 電子調速器 21 2.6 電池 22 2.7 遙控器 23 2.8 接收機 24 2.9 機架 25 2.10 螺旋槳 26 2.11 訊號前處理 26 2.11.1 感測器校正 27 2.11.2 訊號濾波 31 2.12 無人機姿態 31 2.13 基於互補式濾波器無人機姿態估測 33 2.14 PWM訊號控制 36 2.15 PID控制器 38 2.16 無人機控制 44 2.17 無人機流程圖介紹 51 第三章無人機自適應PID控制器設計 54 3.1 強化學習之馬可夫過程介紹 55 3.2 狀態價值函數 58 3.3 Q學習演算法 59 3.4 Q學習算法無人機kp、ki及kd參數調控 61 3.4.1 無人機的震盪狀態空間 61 3.4.2 無人機控制器行為空間 62 3.4.3 無人機強化學習獎勵獲取 62 3.4.4 無人機強化學習訓練經驗記憶 65 3.4.5 無人機行為策略 67 3.4.6 強化學習狀態探索優化 67 3.4.7 狀態探索範圍限制 68 3.4.8 狀態回歸與最佳狀態回歸 70 3.4.9 Q-learning探索經驗更新 70 3.4.10 強化學習迭代結束條件 71 3.5 無人機強化學習計算流程 72 3.6 無人機實驗環境設定 74 第四章實驗結果 79 4.1 實驗環境建置與流程 79 4.2 無人機訓練 80 4.3 提升訓練結果的穩定性 82 4.4 最佳性測試 82 4.4.1 k_p值最佳性測試 83 4.4.2 k_d值最佳性測試 84 4.4.3 k_i值最佳性測試 86 4.5 X軸重現性 87 4.6 Y軸重現性 90 第五章結論與未來展望 93 5.1 結論 93 5.2 未來展望 94 參考文獻 96

參考文獻 References
[1]. Q. Shi, H. K. Lam, B. Xiao, and S. H. Tsai, “Adaptive PID controller based on Q‐learning algorithm,”CAAI Transactions on Intelligence Technology, vol. 3, no. 4, pp. 235-244, 2018. [2]. J. Musial, K. Stebel, and J. Czeczot, “Self-improving Q-learning based controller for a class of dynamical processes,”Archives of Control Sciences, vol. 31, no. 3, pp. 527-551, 2021. [3]. J. Pongfai, X. Su, H. Zhang, and W. Assawinchaichote, “PID controller autotuning design by a deterministic Q-SLP algorithm,” IEEE Access, vol. 8, pp. 50010-50021, 2020. [4]. H. Boubertakh, M. Tadjine, P. Y. Glorennec, and S. Labiod, “Tuning fuzzy PD and PI controllers using reinforcement learning,” ISA Transactions, vol. 49, no. 4, pp. 543-551, 2010. [5]. A. I. Dounis and P. Kofinas, “Online tuning of a PID controller with a fuzzy reinforcement learning MAS for flow rate control of a desalination unit,” Electronics, vol. 8, no. 2, p. 231, 2019. [6]. Y. Yao, N. Ma, C. Wang, Z. Wu, C. Xu, and J. Zhang, “Research and implementation of variable-domain fuzzy PID intelligent control method based on Q-Learning for self-driving in complex scenarios,” Mathematical Biosciences and Engineering, vol. 20, no. 3, pp. 6016-6029, 2023. [7]. S. Islam, P. X. Liu, and A. El Saddik, “Nonlinear adaptive control for quadrotor flying vehicle,” Nonlinear Dynamics, vol. 78, no. 1, pp. 117-133, 2014. [8]. A. C. Satici, H. Poonawala, and M. W. Spong, “Robust optimal control of quadrotor UAVs,” IEEE Access, vol. 1, pp. 79-93, 2013. [9]. R. Xu and U. Ozguner, “Sliding mode control of a quadrotor helicopter,” in Proceedings of the 45th IEEE Conference on Decision and Control, 2006, pp. 4957-4962. [10]. C. Diao, B. Xian, Q. Yin, W. Zeng, H. Li, and Y. Yang, “A nonlinear adaptive control approach for quadrotor UAVs,” in 2011 8th Asian Control Conference, 2011, pp. 223-228. [11]. S. Bouabdallah, A. Noth, and R. Siegwart, “PID vs LQ control techniques applied to an indoor micro quadrotor,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004, pp. 2451-2456. [12]. E. Kayacan and R. Maslim, “Type-2 fuzzy logic trajectory tracking control of quadrotor VTOL aircraft with elliptic membership functions,” IEEE/ASME Transactions on Mechatronics, vol. 22, no. 1. pp. 339-348, 2017. [13]. H. A. Rozi, E. Susanto, and I. P. Dwibawa, “Quadrotor model with proportional derivative controller,” in International Conference on Control, Electronics, Renewable Energy and Communications, 2017, pp. 241-246. [14]. S. Khatoon, M. Shahid, and H. Chaudhary, “Dynamic modeling and stabilization of quadrotor using PID controller,” in International Conference on Advances in Computing, Communications and Informatics, 2014, pp. 746-750. [15]. T. Üstünkök and M. Karakaya, “Effect of PSO tuned P, PD, and PID controllers on the stability of a quadrotor,” in International Informatics and Software Engineering Conference, 2019, pp. 1-6. [16]. T. Shuprajhaa, S. K. Sujit, and K. Srinivasan, “Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes,” Applied Soft Computing, vol. 128, 2022, Art. no. 109450. [17]. K. J. Åström and T. Hägglund, “The future of PID control,” Control Engineering Practice, vol. 9, no. 11. pp. 1163-1175, 2001. [18]. Z. Guan and T. Yamamoto, “Design of a reinforcement learning PID controller,” IEEJ Transactions on Electrical and Electronic Engineering, vol. 16, no. 10, pp. 1354-1360, 2021. [19]. X. Wang, Y. Cheng, and W. Sun, “A proposal of adaptive PID controller based on reinforcement learning,” Journal of China University of Mining and Technology, vol. 17, pp. 40-44, 2006. [20]. D. Lee, S. J. Lee, and S. C. Yim, “Reinforcement learning-based adaptive PID controller for DPS,” Ocean Engineering, vol. 16, 2021, Art. no. 108053. [21]. Y. T. Liao, K. Koiwai, and T. Yamamoto, “Design and implementation of a hierarchical-clustering CMAC PID controller,” Asian Journal of Control, vol. 21, no. 3, pp. 1077-1087, 2019. [22]. H. X. Wu, H. Ye, W. T. Xue, and X. F. Yang, “Improved reinforcement learning using stability augmentation with application to quadrotor attitude control,” IEEE Access, vol. 10, pp. 67590-67604, 2022. [23]. Z. Qingqing, T. Renjie, G. Siyuan, and Z. Weizhong, “A PID gain adjustment scheme based on reinforcement learning algorithm for a quadrotor,” in Chinese Control Conference, 2020, pp. 6756-6761. [24]. W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement learning for UAV attitude control,” ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, 2019, Art. no. 22. [25]. K. Kahili, O. Bouhali, F. Khenfri, and N. Rizoug, “Robust intelligent self-tuning PID controller for the body-rate stabilization of quadrotors,” in Annual Conference of the IEEE Industrial Electronics Society, 2019, pp. 5281-5286. [26]. X. Lu, X. Zhang, S. Jia, and J. Shan, “Design of quadrotor hovering controller based on improved particle swarm optimization,” in International Symposium on Computational Intelligence and Design, 2017, pp. 414-417. [27]. J. Shin, T. A. Badgwell, K. H. Liu, and J. H. Lee, “Reinforcement learning-overview of recent progress and implications for process control,” Computers & Chemical Engineering, vol. 127, pp. 282-294, 2019. [28]. E. Bøhn, E. M. Coates, S. Moe, and T. A. Johansen, “Deep reinforcement learning attitude control of fixed-wing UAVs using proximal policy optimization,” in International Conference on Unmanned Aircraft Systems, 2019, pp. 523-533. [29]. K. Wan, X. Gao, Z. Hu, and G. Wu, “Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning,” Remote Sensing, vol. 12, no. 4, 2020, Art. no. 640. [30]. S. Lee and H. Bang, “Automatic gain tuning method of a quad-rotor geometric attitude controller using A3C,” International Journal of Aeronautical and Space Sciences, vol. 21, no. 2, pp. 469-478, 2020. [31]. H. Bou-Ammar, H. Voos, and W. Ertel, “Controller design for quadrotor UAVs using reinforcement learning,” in IEEE International Conference on Control Applications, 2010, pp. 2130-2135. [32]. https://adduino.com/introduction-to-arduino-pro-mini-avr-atmel-atmega328p/ [33]. ATmega328 data sheet, Atmel. [34]. https://www.mischianti.org/2022/05/15/stm32f103c8t6-blue-pill-high-resolution-pinout-and-specs/ [35]. MPU6050 data sheet, InvenSense. [36]. HMC5883L data sheet, Honeywell. [37]. http://www.brokking.net/ymfc-32_main.html [38]. V. Chopra and S. K. Singla, “Comparative analysis of tuning a PID controller using intelligent methods,” ACTA Polytechnica Hungarica, vol. 11, no. 8, pp. 235-249, 2015. [39]. A. Bagis, “Determination of the PID controller parameters by modified genetic algorithm for improved performance,” Journal of Information Science and Engineering, vol. 23, no. 5, pp. 1469-1480, 2007. [40]. S. B. Joseph, E. G. Dada, A. Abidemi, D. O. Oyewola, and B. M. Khammas, “Metaheuristic algorithms for PID controller parameters tuning: Review, approaches and open problems,” Heliyon, vol. 8, no. 5, 2022, Art. no. e09399. [41]. C. W. Tao and J. S. Taur, “An approach for the robustness comparison between piecewise linear PID-like fuzzy and classical PID controllers,” Soft Computing, vol. 9, no. 6, pp. 430-438, 2005. [42]. S. Skogestad, “Simple analytic rules for model reduction and PID controller tuning,” Journal of Process Control, vol. 13, no. 4, pp. 291-309, 2003. [43]. J. B. He, Q. G. Wang, and T. H. Lee, “PI/PID controller tuning via LQR approach,” Chemical Engineering Science, vol. 55, no. 13, pp. 2429-2439, 2000. [44]. P. Cominos and N. Munro, “PID controllers: Recent tuning methods and design to specification,” IEE Proceedings-Control Theory and Applications, vol. 149, no. 1, pp. 46-53, 2002. [45]. Y. H. Wang, T. H. S. Li, and C. J. Lin, “Backward Q-learning: The combination of sarsa algorithm and Q-learning,” Engineering Applications of Artificial Intelligence, vol. 26, no. 9, pp. 2184-2193, 2013. [46]. C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992. [47]. B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-learning algorithms: A comprehensive classification and applications,” IEEE Access, vol. 7, pp. 133653-133667, 2019. [48]. J. Peng and R. J. Williams, “Incremental multi-step Q-learning,” Machine Learning, vol. 22, no. 1-3, pp. 283-290, 1996. [49]. W. Jin, R. Gu, and Y. Ji, “Reward function learning for Q-learning-based geographic routing protocol,” IEEE Communications Letters, vol. 23, no. 7, pp. 1236-1239, 2019. [50]. J. Ren, S. Guo, and F. Chen, “Orientation-preserving rewards’ balancing in reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 11, pp. 6458-6472, 2021. [51]. https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/2_Q_Learning_maze

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2026-10-11 校外 Off-campus：開放下載的時間 available 2026-10-11 您的 IP(校外) 位址是 216.73.216.128 現在時間是 2025-06-12 論文校外開放下載的時間是 2026-10-11 Your IP address is 216.73.216.128 The current date is 2025-06-12 This thesis will be available to you on 2026-10-11.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2026-10-11

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS