國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於單張影像重建三維人體動作與相機姿態估測,Camera Pose Estimation and 3D Human Pose Reconstruction from a Single Human Image

論文名稱 Title	基於單張影像重建三維人體動作與相機姿態估測 Camera Pose Estimation and 3D Human Pose Reconstruction from a Single Human Image
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	110 學年度第 1 學期 The fall semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	63
研究生 Author	張訓豪 Xun-Hao Zhang
指導教授 Advisor	高崇堯, 魏家博 Kao,Chung-yao; Wei ,Chia-Po
召集委員 Convenor	劉耿豪 LIU, KENG-HAO
口試委員 Advisory Committee	楊惠芳 Yang,Huei-Fang
口試日期 Date of Exam	2022-01-12	繳交日期 Date of Submission	2022-02-08
關鍵字 Keywords	相機姿態估測、三維人體姿態估測、距離估測、字典訓練、相機校正 Camera Pose Estimation, 3D Human Pose Reconstruction, Distance Estimation, Dictionary Learning, Camera Calibration
統計 Statistics	本論文已被瀏覽 184 次，被下載 7 次 The thesis/dissertation has been browsed 184 times, has been downloaded 7 times.

中文摘要
本論文針對三維人體姿態估測問題，提出一個基於單張二維人體影像同時進行相機姿態估測與人體動作重建之方法。由於影像上的人體已經可以藉由深度神經網路準確的預測出其關節之二維座標，並且已有相當多開源的訓練完善之模型架構可以被使用，因此如何搭建並訓練這樣的模型並非本論文想探討之重點，我們將基於這些模型所預測出的結果取得更進一步的三維資訊。我們將模型所預測出的二維資訊作為演算法之輸入，以此同時進行相機姿態估測與三維人體動作重建，並從Human3.6M中預先挑選出一些動作作為基底，使這些動作基底之線性組合的結果與影像中的人體動作足夠相近來完成三維人體動作之重建。而同時估測相機姿態與基底之線性組合參數可以被視為一個非線性最佳化問題，因此我們藉由交替更新迭代的方式來求解此問題，透過交替固定其中一個參數並更新另一個參數的方式分別求取解相機姿態與動作基底的線性組合參數。在交替固定其中一個參數迭代更新的情況下，我們所求解的為一個凸最佳化問題，而演算法只要在三維人體動作的重建結果與輸入的二維影像上之動作足夠接近時停止，則此問題可以被有效率的求解。在數值實驗的結果將顯示出我們所提出的方法比較於現行前端技術所搭建的深度神經網路時，我們於準確度上的成果確實是顯得弱勢的，但我們所預測出的結果與真實情況之間的誤差在實際運用上還在可以被接受的範圍內，並且我們的方法仍具有其他優勢。由於架設複雜的深度神經網路在訓練階段會需要更加龐大的訓練資料，並且當拍攝環境、拍攝角度與所準備的訓練時資料之間有較大的差距時，其模型的預測結果之準確度將明顯下降。而我們所提出的方法則是基於二維人體姿態估測的完善開發，在克服訓練資料的不易取得之前提下，所達到的成果即便遜於深度神經網路的準確程度，但將更適用於實際應用。
Abstract
In this thesis, we propose a methodology for simultaneous estimation of camera pose and three-dimensional (3D) human pose based on a single two-dimensional (2D) human image. First, the 2D pose is accurately estimated from the image using models trained by Deep Neural Networks (DNN). These models are widely available nowadays. Training such a model is not a focus of this thesis, and in this thesis we simply implemented models taken from the literature. With the 2D pose data, the simultaneous estimation of camera pose and three-dimensional (3D) human pose is formulated as solving a nonlinear optimization problem involving extrinsic camera parameters and parameters for constructing the 3D pose, which is done by linearly combining pre-selected 3D pose vectors taken from the Human 3.6M data set. The optimization problem is solved in an alternate fashion: in each iteration, we freeze parameters of either the camera pose or the 3D human pose, and update the other. The resulting problem in each iteration is convex, which can be solved very efficiently. The algorithm stops when the projection of the constructed 3D human pose closely matches the prescribed 2D pose estimation. Numerical experiments show that the proposed methodology produces results that are not as accurate as those by the state-of-the-art DNN models, but the errors are acceptable for practical applications. Our methodology, however, has a distinct advantage over DNN models: DNN models require substantial trainings for each and every camera environment, and therefore is less suitable for real-time applications where cameras are constantly changing poses. Our methodology, on the other hand, is aimed for such applications.

目次 Table of Contents
論文審定書...i 誌謝...ii 中文摘要...iii 英文摘要...iv 目錄...v 圖目錄...vii 表目錄...ix 緒論...1 預備知識...4 相機姿態估測及三維人體姿態重建...20 實驗與結果...31 結論與未來展望...49 論文審定書：i 誌謝：ii 中文摘要：iii 英文摘要：iv 目錄：v 圖目錄：vii 表目錄：ix 第一章緒論：1 1.1簡介與文獻回顧：1 1.2研究動機、目的與貢獻：3 1.3論文架構：4 第二章預備知識4 2.1針孔相機模型(Pinhole camera model) ：5 2.2平面投影轉換(Homography)：11 2.3相機校正(Camera Calibration) ：13 2.4相機姿態估測(Camera Pose Estimation) ：15 2.5基底訓練：17 2.6總結：20 第三章相機姿態估測及三維人體姿態重建20 3.1問題描述：21 3.2相機姿態估測及三維人體重建：22 3.2.1步驟一：以核心關節點進行相機姿態估測：23 3.2.2步驟二：三維人體重建：24 3.2.3步驟三：相機姿態估測：27 3.3預處理：27 3.3.1中心化(Centralizing)：28 3.3.2正交普氏問題(Orthogonal Procrustes Problem)：29 第四章實驗與結果31 4.1數據說明、評估方法：31 4.1.1數據說明：31 4.1.2評估方法：34 4.1.3字典訓練：35 4.2結果呈現：37 4.2.1與其他論文之成果比較：37 4.2.2演算法之輸出成果：40 第五章結論與未來展望：49 參考文獻：50

參考文獻 References
[1]Long Quan and Zhongdan Lan, “Linear npoint camera pose determination,”IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 8, pp. 774–780, 1999. [2]M. Hofmann, D. Wolf, and G. Rigoll, “Hypergraphs for joint multiview reconstruction and multiobject tracking,” in2013 IEEE Conference on Computer Vision andPattern Recognition, pp. 3650–3657, 2013. [3]C. Bregler, A. Hertzmann, and H. Biermann, “Recovering nonrigid 3d shape fromimage streams,” inProceedings IEEE Conference on Computer Vision and PatternRecognition., vol. 2, pp. 690–696, June 2000. [4]A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimationusing deep learning and geometry,”2017 IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 5632–5640, 2017. [5]X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3d object detection for autonomous driving,”2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pp. 2147–2156, 2016. [6]S.E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4724–4732, 2016. [7]Z. Cao, G. Hidalgo, T. Simon, S.E. Wei, and Y. Sheikh, “Openpose: Realtime multiperson 2d pose estimation using part affinity fields,”IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 43, pp. 172–186, 2021. [8]T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint detection in singleimages using multiview bootstrapping,”2017 IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pp. 4645–4653, 2017. [9]R. A. Güler, N. Neverova, and I. Kokkinos, “Densepose: Dense human pose estimation in the wild,”2018 IEEE/CVF Conference on Computer Vision and PatternRecognition, pp. 7297–7306, 2018.50 [10]A. Bulat, J. Kossaifi, G. Tzimiropoulos, and M. Pantic, “Toward fast and accuratehuman pose estimation via softgated skip connections,”2020 15th IEEE InternationalConferenceonAutomaticFaceandGestureRecognition(FG2020), pp. 8–15,2020.[11]M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2d human pose estimation:New benchmark and state of the art analysis,” in2014IEEEConferenceonComputerVision and Pattern Recognition, pp. 3686–3693, 2014 [12]N. D. Reddy, L. Guigues, L. Pischulini, J. Eledath, and S. Narasimhan, “Tessetrack:Endtoend learnable multiperson articulated 3d pose tracking,” inProceedings of(CVPR) Computer Vision and Pattern Recognition, IEEE, June 2021.[13]C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scaledatasets and predictive methods for 3d human sensing in natural environments,”IEEETransactionsonPatternAnalysisandMachineIntelligence, vol. 36, pp. 1325–1339, jul 2014. [14]H. Jenkins, Francis A. and White,Fundamentals of optics / by Francis A. Jenkinsand Harvey E. White. McGrawHill New York, 2nd ed. ed., 1950. [15]Z. Zhang, “A flexible new technique for camera calibration,”IEEE Trans. PatternAnal. Mach. Intell., vol. 22, pp. 1330–1334, 2000.[16]L. Quan and Z.D. Lan, “Linear npoint camera pose determination,”IEEE Trans.Pattern Anal. Mach. Intell., vol. 21, pp. 774–780, 1999.[17]V. Lepetit, F. MorenoNoguer, and P. Fua, “Epnp: An accurate o(n) solution to thepnp problem,”InternationalJournalofComputerVision, vol. 81, pp. 155–166, 2008.[18]P. O. Hoyer, “Nonnegative sparse coding,”Proceedings of the 12th IEEE Workshopon Neural Networks for Signal Processing, pp. 557–565, 2002.[19]D. D. Lee and H. S. Seung, “Algorithms for nonnegative matrix factorization,” inNIPS, 2000.[20]U. Iqbal, A. Doering, H. Yasin, B. Krüger, A. Weber, and J. Gall, “A dualsourceapproach for 3d human pose estimation from a single image,”Comput. Vis. ImageUnderst., vol. 172, pp. 37–49, 2018.51 [21]C.H. Chen and D. Ramanan, “3d human pose estimation = 2d pose estimation +matching,”2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5759–5767, 2017. [22]J. Martinez, R. Hossain, J. Romero, and J. Little, “A simple yet effective baselinefor 3d human pose estimation,”2017 IEEE International Conference on ComputerVision (ICCV), pp. 2659–2668, 2017. [23]X. Sun, J. Shang, S. Liang, and Y. Wei, “Compositional human pose regression,”2017 IEEE International Conference on Computer Vision (ICCV), pp. 2621–2630,2017. [24]H. Fang, Y. Xu, W. Wang, X. Liu, and S.C. Zhu, “Learning pose grammar to encodehuman body configuration for 3d pose estimation,” inAAAI, 2018. [25]F. MorenoNoguer, “3d human pose estimation from a single image via distancematrix regression,”2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1561–1570, 2017. [26]X. Zhou, M. Zhu, G. Pavlakos, S. Leonardos, K. G. Derpanis, and K. Daniilidis,“Monocap: Monocular human motion capture using a cnn coupled with a geometricprior,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41,pp. 901–914, 2019. [27]X. Sun, B. Xiao, S. Liang, and Y. Wei, “Integral human pose regression,” inECCV,2018.[28]D. Tome, C. Russell, and L. Agapito, “Lifting from the deep: Convolutional 3d poseestimation from a single image,”2017 IEEE Conference on Computer Vision andPattern Recognition (CVPR), Jul 2017. [29]E. Jahangiri and A. L. Yuille, “Generating multiple diverse hypotheses for human 3dpose consistent with 2d joint detections,”2017 IEEE International Conference onComputer Vision Workshops (ICCVW), pp. 805–814, 2017. [30]J. Zhen, Q. Fang, J. Sun, W. Liu, W. Jiang, H. Bao, and X. Zhou, “Smap: Singleshotmultiperson absolute 3d pose estimation,” inECCV, 2020

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0108122-152921.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS