國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用基於2-D骨架的深度學習模型於車前行人路徑預測,A 2-D Skeleton-Based Deep Learning Model for Pedestrian Path Prediction from Moving Vehicle

論文名稱 Title	應用基於2-D骨架的深度學習模型於車前行人路徑預測 A 2-D Skeleton-Based Deep Learning Model for Pedestrian Path Prediction from Moving Vehicle
系所名稱 Department	機械與機電工程學系 Department of Mechanical and Electro-Mechanical Engineering
畢業學年期 Year, semester	111 學年度第 1 學期 The fall semester of Academic Year 111	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	111
研究生 Author	李汶修 Wen-Hsiu Lee
指導教授 Advisor	嚴成文, 彭昭暐 Yen,Chen-wen; Perng, Jau-woei
召集委員 Convenor	梁勝富 Liang, Sheng-Fu
口試委員 Advisory Committee	蔡佩璇 Tsai, Pei-Hsuan
口試日期 Date of Exam	2023-01-13	繳交日期 Date of Submission	2023-01-30
關鍵字 Keywords	車前行人偵測、自動駕駛、深度學習、轉移學習、光學雷達融合攝影機 Pedestrian detection in front of the car, autonomous driving, deep learning, transfer learning, lidar fusion with camera
統計 Statistics	本論文已被瀏覽 89 次，被下載 0 次 The thesis/dissertation has been browsed 89 times, has been downloaded 0 times.

中文摘要
自動駕駛已然成為市場以及世界發展主軸及趨勢，使得百家爭鳴，如Google旗下的Waymo或是當前的自駕車主導車廠Tesla皆已逐步實現無須駕駛介入之安全自駕車系統，而國內如鴻海等產業龍頭也開始投入研發國產電動車，希望趕上這波自動駕駛所掀起的巨浪，而影響自動駕駛車業關鍵的因素之一必為與周圍環境互動之安全性，在都市場景中則常常需要面對行人，而如何營造行人與車輛的安全互動成為各車廠最需要解決以及面對的問題。　　因此本論文提出一套建置於機場航廈內之自動駕駛車前行人偵測方案，藉由光學雷達(LiDAR)通過濾波、雜訊去除來追蹤車輛周圍環境之障礙物，同時憑藉攝影機的畫面來分析行人的位置及關節資訊萃取，接著藉由深度學習神經網路來學習並實現預測行人相對於車輛的未來移動路徑，更甚透過轉移學習方法將已訓練好之模型轉移至國立中山大學校園所建置之環境，最終參考Euro NCAP之AEB測試規章進行本論文提出之安全偵測方法驗證，表明此系統有安全、時間及不同車速下應用之有效性，最後透過設計之人機互動介面顯示車前影像偵測到之危險行人，給予駕駛人或自動駕駛車系統參考依據，使車輛能夠於準確的預測行人移動路徑來做出最可靠的決策。
Abstract
Autonomous driving has become the primary axis and trend of the market and global development, giving rise to a hundred schools of thought. For example, Google's Waymo or the current leading self-driving car manufacturer, Tesla, have gradually realized a safe self-driving system that does not require driver intervention, and domestic companies such as Hon Hai have also begun to invest in research and development. Domestic electric vehicles hope to catch up with the huge wave of autonomous driving, and one of the key factors affecting the autonomous driving industry must be the safety of interacting with the surrounding environment, and the most that needs to be addressed in urban scenes is pedestrian safety. The most important problem that every car factory must solve and face is how to create a safe interaction between pedestrians and vehicles. Therefore, this paper proposes a set of pedestrian detection solutions for self-driving vehicles built in the D area of the second terminal of Taoyuan International Airport. The LiDAR is used to track obstacles in the surrounding environment of the vehicle through filtering and noise removal. At the same time, the camera image is used to analyze the pedestrian's position and joint information extraction, and then use the deep learning neural network to learn and realize the prediction of the future trajectories of the pedestrian relative to the vehicle, and even transfer the trained model through the transfer learning method to the environment built on the campus of National Sun Yat-sen University. The safety detection method proposed in this paper was verified by referring to the AEB test regulations of Euro NCAP, which showed that the system has safety, time, and the application effectiveness at different car speeds. Finally, the human-computer interaction interface displays the dangerous pedestrians detected by the image from the camera, providing a solid reference for the driver or the automatic driving system so that the vehicle could accurately predict the pedestrian's movement path to make the most reliable decision.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 viii 表目錄 xi 第 1 章緒論 1 1.1 研究動機 1 1.2 文獻回顧 3 1.2.1 骨架分析 3 1.2.2 多目標追蹤 3 1.2.3 行為辨識 5 1.2.4 動作預測 5 1.2.5 意向評估 6 1.2.6 路徑預測 6 1.2.7 風險評估 7 1.3 主要貢獻 8 1.4 章節介紹 9 第 2 章研究方法 10 2.1 神經元與多層感知器 10 2.2 CNN 11 2.3 LSTM 13 2.4 GRU 15 2.5 Transformer 16 2.6 模型損失函數 19 2.6.1 Categorical Cross Entropy 19 2.6.2 RMSE 19 第 3 章系統概述 20 3.1 系統架構 20 3.1.1 影像子系統 21 3.1.2 LiDAR子系統 22 3.1.3 決策子系統 22 3.2 實驗平台 23 3.2.1 硬體設備 24 3.2.2 軟體開發元件 29 第 4 章系統方法與實現 31 4.1 物件偵測方法 31 4.1.1 體素濾波 31 4.1.2 雜點去除 32 4.1.3 點雲聚類 33 4.2 物件追蹤方法 34 4.2.1 多目標決策 35 4.2.2 卡爾曼濾波 38 4.3 影像座標轉換方法 40 4.4 間接預測行人未來路徑架構 44 4.4.1 資料來源 44 4.4.2 資料處理流程 48 4.4.3 特徵取得及前處理方法 50 4.4.4 姿態辨識神經網路架構 53 4.4.5 線性路徑生成方法 54 4.5 直接預測行人未來路徑架構 57 4.5.1 資料來源 57 4.5.2 資料處理流程 57 4.5.3 特徵取得及前處理方法 60 4.5.4 深度神經網路架構 61 4.5.5 直接路徑生成方法 64 4.5.6 轉移學習 66 4.6 行車路徑偵測方法 68 4.7 警示方法 69 第 5 章實驗結果與討論 71 5.1 實驗場域 71 5.2 LiDAR點雲濾波及雜點去除效果 73 5.3 LiDAR點雲聚類效果 75 5.4 座標轉換結果比較 77 5.4.1 線性擬合比較 77 5.4.2 獨立運作及融合進直接路徑預測模型 79 5.4.3 轉移學習前後結果比較 79 5.5 姿態預測準確性評估 81 5.6 間接預測架構結果評估及誤差值 85 5.7 直接路徑預測架構測試及結果評估 86 5.7.1 航廈場景 86 5.7.2 校園場景 87 5.8 直接路徑預測架構各類模型評估 90 5.8.1 航廈場景 90 5.8.2 校園場景 91 5.9 直接路徑預測整合測試結果及人機介面設計 93 第 6 章結論與未來展望 94 6.1 結論 94 6.2 未來展望 94 參考文獻 95

參考文獻 References
[1] " GLOBAL STATUS REPORT on ROAD SAFETY 2018." https://www.who.int/publications/i/item/9789241565684 (accessed 2022). [2] "EUROPEAN NEW CAR ASSESSMENT PROGRAMME TEST PROTOCOL – AEB/LSS VRU SYSTEMS." https://cdn.euroncap.com/media/70313/euro-ncap-aeb-lss-vru-test-protocol-v42.pdf (accessed 2022). [3] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, "OPENPOSE: REALTIME MULTI-PERSON 2D POSE ESTIMATION USING PART AFFINITY FIELDS," IEEE Trans Pattern Anal Mach Intell, vol. 43, no. 1, pp. 172-186, Jan 2021, doi: 10.1109/TPAMI.2019.2929257. [4] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "SIMPLE ONLINE and REALTIME TRACKING," 2016 IEEE international conference on image processing (ICIP), pp. 3464-3468, 2016. [5] N. Wojke, A. Bewley, and D. Paulus, "SIMPLE ONLINE and REALTIME TRACKING with A DEEP ASSOCIATION METRIC," 2017 IEEE international conference on image processing (ICIP), pp. 3645-3649, 2017. [6] C.-J. Liu and T.-N. Lin, "DET: DEPTH-ENHANCED TRACKER TO MITIGATE SEVERE OCCLUSION AND HOMOGENEOUS APPEARANCE PROBLEMS FOR INDOOR MULTIPLE-OBJECT TRACKING," IEEE Access, vol. 10, pp. 8287-8304, 2022. [7] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "TOWARDS ROBUST MONOCULAR DEPTH ESTIMATION: MIXING DATASETS for ZERO-SHOT CROSS-DATASET TRANSFER," IEEE transactions on pattern analysis and machine intelligence, 2020. [8] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, "FAIRMOT: On THE FAIRNESS of DETECTION and RE-IDENTIFICATION in MULTIPLE OBJECT TRACKING," International Journal of Computer Vision, vol. 129, no. 11, pp. 3069-3087, 2021. [9] P. Chu, J. Wang, Q. You, H. Ling, and Z. Liu, "TRANSMOT: SPATIAL-TEMPORAL GRAPH TRANSFORMER for MULTIPLE OBJECT TRACKING," arXiv preprint arXiv:2104.00194, 2021. [10] A. Vaswani et al., "ATTENTION IS ALL YOU NEED," Advances in neural information processing systems, vol. 30, 2017. [11] P. Elias, J. Sedmidubsky, and P. Zezula, "UNDERSTANDING THE GAP BETWEEN 2D and 3D SKELETON-BASED ACTION RECOGNITION," 2019 IEEE International Symposium on Multimedia (ISM), pp. 192-1923, 2019. [12] Z. Huang, W. Xu, and K. Yu, "BIDIRECTIONAL LSTM-CRF MODELS for SEQUENCE TAGGING," arXiv preprint arXiv:1508.01991, 2015. [13] G. W. Taylor, G. E. Hinton, and S. Roweis, "MODELING HUMAN MOTION USING BINARY LATENT VARIABLES," Advances in neural information processing systems, vol. 19, 2006. [14] Q. Deng, R. Tian, Y. Chen, and K. Li, "SKELETON MODEL BASED BEHAVIOR RECOGNITION for PEDESTRIANS and CYCLISTS from VEHICLE SCE NE CAMERA," 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1293-1298, 2018. [15] B. M. Guerra, S. Ramat, R. Gandolfi, G. Beltrami, and M. Schmid, "SKELETON DATA PRE-PROCESSING for HUMAN POSE RECOGNITION USING NEURAL NETWORK," 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 4265-4268, 2020. [16] D. Avola, M. Cascio, L. Cinque, G. L. Foresti, C. Massaroni, and E. Rodola, "2-D SKELETON-BASED ACTION RECOGNITION via TWO-BRANCH STACKED LSTM-RNNS," IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2481-2496, 2019. [17] S. Hochreiter and J. Schmidhuber, "LONG SHORT-TERM MEMORY," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [18] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "DENSELY CONNECTED CONVOLUTIONAL NETWORKS," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017. [19] C. Schuldt, I. Laptev, and B. Caputo, "RECOGNIZING HUMAN ACTIONS: A LOCAL SVM APPROACH," Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 3: IEEE, pp. 32-36, 2004. [20] T. Kurai, Y. Shioi, Y. Makino, and H. Shinoda, "TEMPORAL CONDITIONS SUITABLE for PREDICTING HUMAN MOTION in WALKING," 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2986-2991, 2019. [21] M. S. Yasar and T. Iqbal, "A SCALABLE APPROACH to PREDICT MULTI-AGENT MOTION for HUMAN-ROBOT COLLABORATION," IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1686-1693, 2021. [22] Q. Zhang, T. Wang, H.-N. Wu, M. Li, J. Zhu, and H. Snoussi, "HUMAN ACTION PREDICTION BASED ON SKELETON DATA," 2020 39th Chinese Control Conference (CCC), pp. 6608-6612, 2020. [23] J. Martinez, M. J. Black, and J. Romero, "On HUMAN MOTION PREDICTION USING RECURRENT NEURAL NETWORKS," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2891-2900, 2017. [24] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "EMPIRICAL EVALUATION of GATED RECURRENT NEURAL NETWORKS on SEQUENCE MODELING," arXiv preprint arXiv:1412.3555, 2014. [25] K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, "On THE PROPERTIES of NEURAL MACHINE TRANSLATION: ENCODER-DECODER APPROACHES," arXiv preprint arXiv:1409.1259, 2014. [26] I. Sutskever, O. Vinyals, and Q. V. Le, "SEQUENCE to SEQUENCE LEARNING with NEURAL NETWORKS," Advances in neural information processing systems, vol. 27, 2014. [27] E. Wu and H. Koike, "REAL-TIME HUMAN MOTION FORECASTING USING A RGB CAMERA," Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology, pp. 1-2, 2018. [28] X. Liu, J. Yin, J. Liu, P. Ding, J. Liu, and H. Liu, "TRAJECTORYCNN: A NEW SPATIO-TEMPORAL FEATURE LEARNING NETWORK for HUMAN MOTION PREDICTION," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2133-2146, 2020. [29] S. Zhao, H. Li, Q. Ke, L. Liu, and R. Zhang, "ACTION-VIT: PEDESTRIAN INTENT PREDICTION in TRAFFIC SCENES," IEEE Signal Processing Letters, vol. 29, pp. 324-328, 2021. [30] O. Ghori et al., "LEARNING to FORECAST PEDESTRIAN INTENTION from POSE DYNAMICS," 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1277-1284, 2018. [31] X. Zhu, W. Fu, and X. Xu, "INTENT PREDICTION of PEDESTRIANS via INTEGRATION of FACIAL EXPRESSION and HUMAN 2D SKELETON for AUTONOMOUS CAR-LIKE MOBILE ROBOTS," 2021 IEEE 16th Conference on Industrial Electronics and Applications (ICIEA), pp. 1775-1780, 2021. [32] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, "GRAPH CONVOLUTIONAL NETWORKS: A COMPREHENSIVE REVIEW," Computational Social Networks, vol. 6, no. 1, pp. 1-23, 2019. [33] I.-H. Kao and C.-Y. Chan, "IMPACT OF POSTURE and SOCIAL FEATURES on PEDESTRIAN ROAD-CROSSING TRAJECTORY PREDICTION," IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-16, 2022. [34] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, "SOCIAL GAN: SOCIALLY ACCEPTABLE TRAJECTORIES with GENERATIVE ADVERSARIAL NETWORKS," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2255-2264, 2018. [35] C. Li, H. Yang, and J. Sun, "INTENTION-INTERACTION GRAPH BASED HIERARCHICAL REASONING NETWORKS for HUMAN TRAJECTORY PREDICTION," IEEE Transactions on Multimedia, 2022. [36] Z. Huang, R. Li, K. Shin, and K. Driggs-Campbell, "LEARNING SPARSE INTERACTION GRAPHS of PARTIALLY DETECTED PEDESTRIANS for TRAJECTORY PREDICTION," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1198-1205, 2021. [37] A. Rasouli, I. Kotseruba, T. Kunic, and J. K. Tsotsos, "PIE: A LARGE-SCALE DATASET and MODELS for PEDESTRIAN INTENTION ESTIMATION and TRAJECTORY PREDICTION," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6262-6271, 2019. [38] Y. Feng, T. Zhang, A. P. Sah, L. Han, and Z. Zhang, "USING APPEARANCE to PREDICT PEDESTRIAN TRAJECTORIES THROUGH DISPARITY-GUIDED ATTENTION AND CONVOLUTIONAL LSTM," IEEE Transactions on Vehicular Technology, vol. 70, no. 8, pp. 7480-7494, 2021. [39] J. Qiu et al., "EGOCENTRIC HUMAN TRAJECTORY FORECASTING with A WEARABLE CAMERA and MULTI-MODAL FUSION," IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8799-8806, 2022. [40] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, "ORB-SLAM: A VERSATILE and ACCURATE MONOCULAR SLAM SYSTEM," IEEE transactions on robotics, vol. 31, no. 5, pp. 1147-1163, 2015. [41] L. Zhang et al., "PEDESTRIAN COLLISION RISK ASSESSMENT BASED ON STATE ESTIMATION and MOTION PREDICTION," IEEE Transactions on Vehicular Technology, vol. 71, no. 1, pp. 98-111, 2021. [42] A. Shewalkar, "PERFORMANCE EVALUATION of DEEP NEURAL NETWORKS APPLIED TO SPEECH RECOGNITION: RNN, LSTM and GRU," Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 4, pp. 235--245, 2019. [43] S. Yang, X. Yu, and Y. Zhou, "LSTM and GRU NEURAL NETWORK PERFORMANCE COMPARISON STUDY: TAKING YELP REVIEW DATASET AS AN EXAMPLE," in 2020 International workshop on electronic communication and artificial intelligence (IWECAI), pp. 98-101, 2020. [44] D. Soselia, R. Wang, and E. M. Gutierrez-Farewik, "LOWER-LIMB JOINT TORQUE PREDICTION USING LSTM NEURAL NETWORKS and TRANSFER LEARNING," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 30, pp. 600-609, 2022. [45] "桃園國際機場-機場地圖." https://www.taoyuan-airport.com/maps (accessed 2022).

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2028-01-30 校外 Off-campus：開放下載的時間 available 2028-01-30 您的 IP(校外) 位址是 3.136.19.165 現在時間是 2025-05-11 論文校外開放下載的時間是 2028-01-30 Your IP address is 3.136.19.165 The current date is 2025-05-11 This thesis will be available to you on 2028-01-30.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2028-01-30

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS