國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於深度強化學習之機械臂載具控制- 以脊椎微創手術為例,Robotic Arm Control with Deep Reinforcement Learning- A Case Study of the Minimally Invasive Spine Surgery

論文名稱 Title	基於深度強化學習之機械臂載具控制- 以脊椎微創手術為例 Robotic Arm Control with Deep Reinforcement Learning- A Case Study of the Minimally Invasive Spine Surgery
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	110 學年度第 1 學期 The fall semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	74
研究生 Author	曾建嘉 Jian-Jia Zeng
指導教授 Advisor	楊惠芳 Yang,Huei-Fang
召集委員 Convenor	魏家博 Wei,Chia-Po
口試委員 Advisory Committee	康藝晃 KANG, YI-HUANG
口試日期 Date of Exam	2022-01-26	繳交日期 Date of Submission	2022-01-27
關鍵字 Keywords	手術輔助機械臂、深度強化學習、稠密神經網路、全卷積神經網路、脊椎微創手術 Surgical Robotic Arm, Deep Reinforcement Learning, Dense Convolutional Network, Fully Convolutional Networks, Minimally Invasive Spine Surgery
統計 Statistics	本論文已被瀏覽 628 次，被下載 9 次 The thesis/dissertation has been browsed 628 times, has been downloaded 9 times.

中文摘要
目前的脊椎微創手術所使用的手術輔助機械臂，都需要由醫療人員從旁控制末端工具座標的移動路徑，依循手術環境之情景，將末端工具座標位置移動到安全區域後，再緩慢靠近施術目標位置，進行手術作業。整個過程會需要人力介入，來避開手術空間內所有手術器械，讓手術輔助機械臂在使用過程中，處於安全狀態，協助脊椎微創手術的進行。本研究提出一個兩階段方法，使手術輔助機械臂能夠在手術環境中自我導航且精確到達正確位置。在第一階段，我們在虛擬仿真環境中訓練手術輔助機械臂，透過Deep Q Network演算法，來學習從視覺觀察去對映動作。具體來說，我們的方法是包含了三個全卷積網路，分別輸入RGB數位影像、深度影像及目標方向資訊，然後將三個網路產出的特徵圖作串聯拼接，再放入反卷積網路，產出像素特徵Q值。根據所產生的Q值，機械臂末端決定其要移動的方向及距離，當末端成功到達指定位置後，提供獎勵值。因此，學到的行為決策可以使機械臂避開障礙物。在第二階段，將虛擬仿真環境中收斂完成的訓練模型，平行部署至真實環境，來證明所訓練模型的有效性。實驗結果顯示，訓練模型的避障移動距離，與最短直線距離呈現正相關。此外，該模型的目標成功達成率可達到70%。我們希望本研究方法不只能夠減少人為控制手術輔助機械臂，更可以使醫療人員專心於手術作業流程。
Abstract
Robot-assisted minimally invasive spinal surgery has shown promising results with improved accuracy of surgical operation in the past decades. However, human intervention is typically needed to control the movement of a robotic arm from one surgical area to another in order to avoid all surgical obstacles. Once the arm is placed in the right area, surgeons then adjust the tool attached to the arm to make it slowly approach the target and perform the operation. To reduce the human intervention during the surgery, this thesis proposes a two-stage approach that enables the arm to navigate the environment and accurately place the tool in a right position. In the first stage, we train the robotic arm to learn a mapping from the visual observation into actions based on the deep Q algorithm in a virtual simulation environment. Specifically, our approach involves three fully convolutional networks, which take as input the RGB images, depth information, and moving direction towards the target, respectively. The feature maps produced by these three networks are then concatenated and fed into a deconvolution network that outputs pixel-wise Q values for inferring the moving directions and distances. All the networks are optimized jointly. Rewards are provided for successfully arriving at the desired location. As such, the learned policy can avoid the obstacles. In the second stage, the model trained in the virtual environment is deployed in a real environment to justify its effectiveness. Experimental results have shown a positive correlation between the distances that the robotic arm moves while avoiding the obstacles and the distances of a shortest path. In addition, the model yields a 70% success rate of reaching the target. We hope that this study can not only minimize the need for human intervention but also offer improved outcomes by allowing the surgeons to focus more on the operation.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 vi 圖次 ix 表次 xi 第一章、緒論 1 第一節、研究背景 1 第二節、研究動機 1 第三節、研究目的 2 第二章、文獻探討 4 第一節、強化學習 4 第二節、深度強化學習 5 2.2.1 Deep Q Network 7 2.2.2 Double DQN 8 第三節、全卷積神經網路 10 2.3.1 卷積層和反卷積層 11 2.3.2 池化層 12 2.3.3 激勵函數 13 2.3.4 Batch Normalization 14 第四節、稠密神經網路 15 第五節、視覺機械臂應用 16 2.5.1 機械臂載具 16 2.5.2 機械臂載具與電腦視覺影像資訊之整合 17 第三章、研究方法 19 第一節、虛擬仿真物理環境 19 3.1.1 真實環境訓練之困境 20 3.1.2 虛擬仿真物理環境建置方法 20 第二節、問題描述 26 第三節、環境狀態設計 27 3.3.1 環境局部資訊 27 3.3.2 深度局部資訊 28 3.3.3 目標方向資訊 30 第四節、訓練模型建構方法 31 3.4.1 稠密神經網路 32 3.4.2 DQN模型設計 34 3.4.3 獎勵機制 35 3.4.4 損失函數 36 3.4.5 探索機制 37 3.4.6 訓練模型整合 39 3.4.7 獎勵延遲機制 40 第五節、真實環境 43 3.5.1 真實環境建置方法 43 3.5.2 影像座標校正方法 44 3.5.3 真實環境推論方法 46 第四章、研究結果 48 第一節、虛擬仿真物理環境之訓練成效 48 4.1.1 訓練模型細節資訊 48 4.1.2 單顆定位軌跡球訓練結果 48 4.1.3 完整定位軌跡球訓練結果 49 第二節、真實環境推論結果 51 4.2.1 單顆定位軌跡球推論結果 52 4.2.2 多顆定位軌跡球推論結果 54 4.2.3 機械臂末端位移成功達成率 58 第五章、研究結論與未來展望 59 參考文獻 60

參考文獻 References
洪長春. 骨脊椎醫材市場與專利情報分析 Accessed on: Dec. 23, 2019. [Online] Available: https://portal.stpi.narl.org.tw/index?p=article&id=4b1141427395c699017395c756b31ff6 衛生福利部雙和醫院. 精準醫療再進化~ROSA spine機器人手臂 Accessed on: Feb. 19, 2019. [Online] Available: https://shh.tmu.edu.tw/page/ReportDetail.aspx?seq_no=20190219134934410639 W.-Y. Chuang, S.-H. Chang, W.-H. Yu, C.-K. Yang, C.-J. Yeh, S.-H. Ueng, Y.-J. Liu, T.-D. Chen, K.-H. Chen, and Y.-Y. Hsieh, “Successful identification of nasopharyngeal carcinoma in nasopharyngeal biopsies using deep learning,” Cancers, vol. 12, no. 2, p. 507, 2020. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” NIPS Deep Learning Workshop, vol. abs/1312.5602, 2013. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018. I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning (no. 2). MIT press Cambridge, 2016. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Ostrovski, “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529-533, 2015. H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2016, vol. 30, no. 1. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International Conference on Machine Learning, 2015: PMLR, pp. 448-456. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700-4708. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778. S. B. Niku, Introduction to robotics: analysis, control, applications. John Wiley & Sons, 2020. B. Sangiovanni, A. Rendiniello, G. P. Incremona, A. Ferrara, and M. Piastra, “Deep reinforcement learning for collision avoidance of robotic manipulators,” in Proceedings of 2018 European Control Conference, 2018: IEEE, pp. 2063-2068. A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018: IEEE, pp. 4238-4245. J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189-1232, 2001. H. Tang, R. Houthooft, D. Foote, A. Stooke, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel, “# exploration: A study of count-based exploration for deep reinforcement learning,” in Proceedings of 31st Conference on Neural Information Processing Systems, 2017, vol. 30, pp. 1-18. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 11, pp. 1330-1334, 2000.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0027122-172111.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS