國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,使用圖卷積神經網絡進行臉部表情辨識的正負性—喚醒度情緒分析,Valence-Arousal Emotion Analysis Using Graph Convolutional Networks for Facial Expression Recognition

論文名稱 Title	使用圖卷積神經網絡進行臉部表情辨識的正負性—喚醒度情緒分析 Valence-Arousal Emotion Analysis Using Graph Convolutional Networks for Facial Expression Recognition
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	113 學年度第 1 學期 The fall semester of Academic Year 113	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	45
研究生 Author	宋祐安 Yu-An Sung
指導教授 Advisor	魏家博 Wei,Chia-Po
召集委員 Convenor	楊惠芳 Yang,Huei-Fang
口試委員 Advisory Committee	劉耿豪 Liu,Keng-Hao
口試日期 Date of Exam	2025-01-14	繳交日期 Date of Submission	2025-02-03
關鍵字 Keywords	臉部表情識別、多任務學習、圖卷積網絡、正負性、喚醒度 Facial Expression Recognition, Multi-Task Learning, Graph Convolutional Network, Valence, Arousal
統計 Statistics	本論文已被瀏覽 53 次，被下載 2 次 The thesis/dissertation has been browsed 53 times, has been downloaded 2 times.

中文摘要
臉部表情識別是一項利用電腦視覺和深度學習技術，從自然場景的人臉影像中識別情緒分類的技術。心理學領域也在近期提出了多種描述人類情緒狀態的模型。然而，目前尚無明確證據顯示哪種情緒表示方式更為適切，大多數的表情識別系統僅採用情緒的分類模型或維度模型之一作為基礎。若要獲取情緒分類與維度資訊需要採用不同方法，因此我們致力於整合這些功能改善現有模型的不足之處，以產生全面的預測結果。本論文提出了一種多任務學習框架，利用圖卷積網絡來探討情緒分類模型與維度模型之間的關聯性，以同時預測自然場景中的表情類別和該表情的維度。具體而言，該方法通過圖卷積網絡學習表情標籤與正負性-喚醒度的共享特徵表示，使這些特徵在訓練過程中逐步優化。此外，為了充分利用人臉特徵點的資訊，我們新增了一個額外的特徵提取網絡，並重新設計損失函數來平衡多任務學習的表現。在AffectNet數據集上的實驗結果表明，我們的方法在平均準確率和正負性預測方面均優於其他表情識別模型，展示了其在現實場景中應用的潛力與優勢。
Abstract
Facial expression recognition (FER) is a technique that uses computer vision and deep learning methods to identify emotional categories from facial images captured in natural scenes. Recently, the psychology field has introduced various models to describe human emotional states. However, there is currently no conclusive evidence to determine which emotion representation is more appropriate. Most FER systems are based solely on either the categorical model or the dimensional model of emotion. Since obtaining both categorical and dimensional information typically requires distinct methods, we aim to integrate these functionalities to address the limitations of existing models and produce comprehensive prediction results. This thesis proposes a multi-task learning framework that employs a Graph Convolutional Network (GCN) to explore the relationships between the categorical and dimensional models of emotion, enabling simultaneous prediction of facial expression categories and their corresponding dimensions in natural scenes. Specifically, the proposed method utilizes a GCN to learn shared feature representations for both emotion labels and valence-arousal dimensions, allowing these features to be progressively optimized during training. Additionally, to fully leverage facial landmark information, we introduce an auxiliary feature extraction network and redesign the loss function to balance the performance of multiple tasks. Experimental results on the AffectNet dataset demonstrate that our approach outperforms existing FER models in terms of average accuracy and valence prediction, highlighting its potential and advantages in real-world applications.

目次 Table of Contents
論文審定書 i 誌謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖目錄 vii 表目錄 viii 第一章緒論 1 1.1 研究背景與目的 1 1.2 文獻回顧 2 1.3 方法比較與貢獻 3 第二章背景知識 5 2.1 臉部表情辨識理論 5 2.2 臉部表情識別的挑戰 6 2.3 特徵提取 7 2.3.1 DenseNet 8 2.3.2 Stacked Hourglass Network 9 第三章研究方法 11 3.1 方法概述 11 3.2 網路結構 12 3.2.1 Graph Convolutional Networks 13 3.2.2 可訓練的鄰接矩陣 14 3.2.3 Face Alignment Network 15 3.3 損失函數 19 第四章實驗結果 21 4.1 資料集 21 4.1.1 AffectNet 21 4.2 實驗細節 22 4.3 評估指標 23 4.4 表情辨識結果比較 24 4.4.1 鄰接矩陣 25 4.4.2 混淆矩陣 26 4.5 消融實驗 28 4.6 表情辨識範例 28 第五章結論與未來展望 31 參考文獻 32

參考文獻 References
[1] P. Antoniadis, P. P. Filntisis, and P. Maragos, “Exploiting emotional dependencies with graph convolutional networks for facial expression recognition,” in Proceedings of the IEEE international conference on automatic face and gesture recognition(FG), 2021. [2] J. F. Cohn, “Foundations of human computing: Facial expression and emotion,” in Proceedings of the international conference on multimodal interfaces, 2006. [3] M. Liu, Y. Duan, R. A. Ince, C. Chen, O. G. Garrod, P. G. Schyns, and R. E. Jack,“Facial expressions elicit multiplexed perceptions of emotion categories and dimensions,” Current Biology, vol. 32, no. 1, pp. 200–209, 2022. [4] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem?,” in Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1021–1030, 2017. [5] A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017. [6] A. Toisoul, J. Kossaifi, A. Bulat, G. Tzimiropoulos, and M. Pantic, “Estimation of continuous valence and arousal levels from faces in naturalistic conditions,” Nature Machine Intelligence, 2021. [7] X. Jiang, Y. Zong, W. Zheng, C. Tang, W. Xia, C. Lu, and J. Liu, “DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,” in Proceedings of the ACM international conference on multimedia, pp. 2881–2889, 2020. [8] D. Kollias and S. Zafeiriou, “Expression, affect, action unit recognition: Aff-Wild2, multi-task learning and ArcFace,” arXiv preprint arXiv:1910.04855, 2019. [9] P. Ekman, “An argument for basic emotions,” Cognition & Emotion, vol. 6, no. 3-4, pp. 169–200, 1992. [10] P. Ekman and W. V. Friesen, “Facial action coding system,” Environmental Psychology & Nonverbal Behavior, 1978. [11] J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6, p. 1161, 1980. [12] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPR-W), 2010. [13] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in Proceedings of the IEEE international conference on multimedia and expo (ICME), 2005. [14] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Collecting large, richly annotated facial-expression databases from movies,” IEEE Multimedia, vol. 19, no. 03, pp. 34–41, 2012. [15] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark,” in Proceedings of the IEEE international conference on computer vision workshops (ICCV-W), 2011. [16] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3D facial expression database for facial behavior research,” in Proceedings of the IEEE international conference on automatic face and gesture recognition (FG), 2006. [17] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “BP4D-spontaneous: A high-resolution spontaneous 3D dynamic facial expression database,” Image and Vision Computing, vol. 32, no. 10, pp. 692–706, 2014. [18] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016. [19] S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017. [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778, 2016. [21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [22] C. Kervadec, V. Vielzeuf, S. Pateux, A. Lechervy, and F. Jurie, “CAKE: Compact and accurate k-dimensional representation of emotion,” arXiv preprint arXiv:1807.11215, 2018. [23] S. C. Hung, J.-H. Lee, T. S. Wan, C.-H. Chen, Y.-M. Chan, and C.-S. Chen, “Increasingly packing multiple facial-informatics modules in a unified deep-learning model via lifelong learning,” in Proceedings of the international conference on multimedia retrieval (ICMR), pp. 339–343, 2019. [24] C.-Y. Hung, C.-H. Tu, C.-E. Wu, C.-H. Chen, Y.-M. Chan, and C.-S. Chen, “Compacting, picking and growing for unforgetting continual learning,” Advances in neural information processing systems (NeurIPS), vol. 32, 2019. [25] Z. Wen, W. Lin, T. Wang, and G. Xu, “Distract your attention: Multi-head cross attention network for facial expression recognition,” Biomimetics, vol. 8, no. 2, p. 199, 2023. [26] A. H. Farzaneh and X. Qi, “Facial expression recognition in the wild via deep attentive center loss,” in IEEE winter conference on applications of computer vision (WACV), pp. 2402–2411, 2021. [27] L. Schoneveld, A. Othmani, and H. Abdelkawy, “Leveraging recent advances in deep learning for audio-visual emotion recognition,” Pattern Recognition Letters, vol. 146, pp. 1–7, 2021. [28] R. Daněček, M. J. Black, and T. Bolkart, “EMOCA: Emotion driven monocular face capture and animation,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 20311-20322, June 2022. [29] T. Xu and W. Takano, “Graph stacked hourglass networks for 3D human pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 16105–16114, 2021. [30] S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018. [31] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 4700–4708, 2017. [32] A. Vaswani, “Attention Is All You Need,” Advances in neural information processing systems (NeurIPS), 2017. [33] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. [34] L. Liebel and M. Körner, “Auxiliary tasks in multi-task learning,” arXiv preprint arXiv:1805.06334, 2018. [35] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems (NeurIPS), vol. 32, 2019. [36] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,” in Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014. [37] J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proceedings of the European conference on computer vision (ECCV), pp. 222–237, 2018. [38] Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2439–2450, 2018. [39] Y. Chen, J. Wang, S. Chen, Z. Shi, and J. Cai, “Facial motion prior networks for facial expression recognition,” in IEEE visual communications and image processing (VCIP), pp. 1–4, 2019. [40] H. Ding, P. Zhou, and R. Chellappa, “Occlusion-adaptive deep network for robust facial expression recognition,” in Proceedings of the IEEE international joint conference on biometrics (IJCB), pp. 1–9, 2020. [41] M.-I. Georgescu, R. T. Ionescu, and M. Popescu, “Local learning with deep and handcrafted features for facial expression recognition,” IEEE Access, vol. 7, pp. 64827–64836, 2019. [42] W. Hayale, P. Negi, and M. Mahoor, “Facial expression recognition using deep siamese neural networks with a supervised loss function,” in Proceedings of the IEEE international conference on automatic face and gesture recognition (FG), 2019.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0103125-121453.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS