國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,姿勢對於多模態情緒辨識影響,The influence of posture on multi-modal emotion recognition

論文名稱 Title	姿勢對於多模態情緒辨識影響 The influence of posture on multi-modal emotion recognition
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	109 學年度第 2 學期 The spring semester of Academic Year 109	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	101
研究生 Author	孫重祥 Chung-Shiang Sung
指導教授 Advisor	李偉柏 Lee,Wei-Po
召集委員 Convenor	楊新章 Hsin-Chang Yang
口試委員 Advisory Committee	許育峯 Yu-Feng Hsu
口試日期 Date of Exam	2021-08-18	繳交日期 Date of Submission	2021-08-26
關鍵字 Keywords	多模態融合、情緒辨識、姿勢、深度學習、多標籤分類 multi-modal fusion, emotion recognition, posture, deep learning, multi-label
統計 Statistics	本論文已被瀏覽 628 次，被下載 12 次 The thesis/dissertation has been browsed 628 times, has been downloaded 12 times.

中文摘要
過往多模態情緒辨識的領域中，一直將重點放在文字、聲音以及表情上，對於姿勢模態卻鮮少有人提起。我們認為，人類可以透過表情，聲音，文字以及肢體動作，來判斷另一個人的情緒。因此我們認為，在人機互動的領域中，姿勢模態對於情緒辨識也是一個相當重要的特徵來源。所以本篇論文的研究目的，將會去探討姿勢模態對於多模態情緒辨識帶來的影響。在日常生活中，也會出現很多其他模態受到干擾的狀況，使準確率降低。而增加模態的數量，可以使受到干擾的影響降低，提升整體的辨識能力。因此我們設計了模態組合的實驗，主要是將姿勢模態與其他模態融合，比較不同的模態組合，對姿勢模態進行交叉驗證。並且在四個不同資料集，三個不同的分類任務中進行了相同的實驗，藉此驗證我們的實驗結論。接著我們透過調整姿勢模態融合比例的方式，優化了整體多模態情緒辨識模型，使準確率提升。最後透過文字干擾的實驗，模擬多模態情緒辨識模型受到干擾時的狀況，並加入姿勢模態來維持多模態模型的辨識能力。實驗結果顯示，單一姿勢模態的情緒辨識能力較差，但是如果將姿勢模態與其他模態融合可以有效提升辨識能力。另外與低比例的姿勢模態融合，可以有效地提升整體辨識準確率，並且增加對於文字干擾的抵抗力。
Abstract
In the field of multimodal emotion recognition in the past, the emphasis has been on texts, speech, and facial expressions, but few people mention posture modalities. We believe that humans can recognize another person’s emotions through facial expressions, speech, texts, and body movements. So, we believe that in the field of human-computer interaction, posture modalities are also a very important source of characteristics for emotion recognition. Therefore, the research purpose of this paper is to explore the influence of posture modalities on multi-modal emotion recognition. In daily life, many other modes will be interfered, which reduces the accuracy of multi-modal emotion recognition. Increasing the number of modes can reduce the influence of interference and improve the overall recognition ability. Therefore, we designed an experiment of modal combination, which is mainly to fuse the posture model with other model, then compare different model combinations, and cross-validate the posture model. And in four different datasets, three different classification tasks conducted the same experiment to verify our experimental conclusions. Then we optimized the overall multi-model emotion recognition model by adjusting the posture modal fusion ratio to improve the accuracy. Finally, through the experiment of text interference, the situation when the multi-modal emotion recognition model is disturbed is simulated, and the posture mode is added to maintain the recognition ability of the multi- model. The experimental results show that the emotion recognition ability of a single posture model is poor, but if the posture mode is combined with other models, the recognition ability can be effectively improved. In addition, blending with low-proportion posture modalities can effectively improve the overall recognition accuracy and increase the resistance to text interference.

目次 Table of Contents
論文審定書 i 論文公開授權書 ii 摘要 iii Abstract iv 目錄 v 圖目錄 vii 表目錄 ix 第一章緒論 1 1.1研究背景 1 1.2研究動機 2 1.3研究目的 3 1.4研究方法與流程 4 第二章文獻探討 6 2.1情緒辨識 6 2.2單模態情緒辨識 6 2.3多模態情緒辨識 7 2.4多模態資料集 9 2.5肢體動作 17 2.5.1肢體動作對情緒辨識影響 17 第三章研究方法 19 3.1資料集前處理 19 3.1.1 姿勢模態處理 20 3.1.2其他特徵前處理 23 3.2單模態情緒辨識模型 24 3.3多模態情緒辨識模型 29 3.4模態組合 30 3.5模態融合比例 32 第四章研究結果 35 4.1評估準則 35 4.1.1極性情緒(sentiment)及多分類(multi-class)情緒的評估指標 35 4.1.2 多標籤分類(multi-labels)情緒預測評估 36 4.2模態組合結果分析比較 38 4.2.1單模態的極性情緒辨識 38 4.2.2多模態的極性情緒辨識 39 4.2.3單模態的多分類情緒辨識 41 4.2.4多模態的多分類情緒辨識 43 4.2.5單模態的多標籤情緒辨識 44 4.2.6多模態的多標籤情緒辨識 48 4.3模態融合比例結果分析 51 4.3.1極性情緒辨識的模態融合比例調整 51 4.3.2多分類任務情緒辨識的模態融合比例調整 53 4.3.3多標籤分類任務情緒辨識的模態融合比例調整 54 4.4文字干擾 58 4.4.1極性情緒辨識文字干擾 59 4.4.2多分類任務情緒辨識文字干擾 68 4.4.3多標籤分類任務情緒辨識文字干擾 74 4.5情緒valence-arousal分析 79 4.6實驗結論 81 第五章結論 85 5.1結論 85 5.2資料集限制 85 5.3建議 86 參考文獻 88

參考文獻 References
1. Noroozi, F., et al., Survey on emotional body gesture recognition. IEEE transactions on affective computing, 2018. 2. Glowinski, D., et al. Towards a minimal representation of affective gestures. in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 2015. IEEE. 3. Soleymani, M., et al., A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, 2011. 3(1): p. 42-55. 4. Sreeshakthy, M. and J. Preethi, Classification of human emotion from deap eeg signal using hybrid improved neural networks with cuckoo search. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 2016. 6(3-4): p. 60-73. 5. Ekman, P., i Friesen, WV (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 1972. 17(2): p. 124. 6. Ekman, R., What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). 1997: Oxford University Press, USA. 7. Coşkun, M., et al. Face recognition based on convolutional neural network. in 2017 International Conference on Modern Electrical and Energy Systems (MEES). 2017. IEEE. 8. Trigeorgis, G., et al. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2016. IEEE. 9. Zhao, J., X. Mao, and L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 2019. 47: p. 312-323. 10. Fayek, H.M., M. Lech, and L. Cavedon, Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 2017. 92: p. 60-68. 11. Mirsamadi, S., E. Barsoum, and C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017. IEEE. 12. Rahman, W., et al., M-bert: Injecting multimodal information in the bert structure. arXiv preprint arXiv:1908.05787, 2019. 13. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 14. Busso, C., et al. Analysis of emotion recognition using facial expressions, speech and multimodal information. in Proceedings of the 6th international conference on Multimodal interfaces. 2004. 15. Tripathi, S., S. Tripathi, and H. Beigi, Multi-modal emotion recognition on iemocap dataset using deep learning. arXiv preprint arXiv:1804.05788, 2018. 16. Rahman, W., et al., M-bert: Injecting multimodal information in the bert structure, in arXiv preprint arXiv:1908.05787. 2019. 17. Majumder, N., et al., Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based systems, 2018. 161: p. 124-133. 18. Zadeh, A., et al., Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250, 2017. 19. Zadeh, A., et al., Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259, 2016. 20. Zadeh, A.B., et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. 21. Nojavanasghari, B., et al. Emoreact: a multimodal approach and dataset for recognizing emotional responses in children. in Proceedings of the 18th acm international conference on multimodal interaction. 2016. 22. Busso, C., et al., IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 2008. 42(4): p. 335-359. 23. Ekman, P., W.V. Freisen, and S. Ancoli, Facial signs of emotional experience. Journal of personality and social psychology, 1980. 39(6): p. 1125. 24. Mehrabian, A. and M. Wiener, Decoding of inconsistent communications. Journal of personality and social psychology, 1967. 6(1): p. 109. 25. Mehrabian, A. and S.R. Ferris, Inference of attitudes from nonverbal communication in two channels. Journal of consulting psychology, 1967. 31(3): p. 248. 26. Mehrabian, A., Silent messages. Vol. 8. 1971: Wadsworth Belmont, CA. 27. Carney, D.R., A.J. Cuddy, and A.J. Yap, Power posing: Brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological science, 2010. 21(10): p. 1363-1368. 28. Gunes, H. and M. Piccardi. Affect recognition from face and body: early fusion vs. late fusion. in 2005 IEEE international conference on systems, man and cybernetics. 2005. IEEE. 29. Glowinski, D., et al. Technique for automatic emotion recognition by body gesture analysis. in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2008. IEEE. 30. Cao, Z., et al., OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 2019. 43(1): p. 172-186. 31. Chernykh, V. and P. Prikhodko, Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071, 2017. 32. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. 33. Baltrušaitis, T., P. Robinson, and L.-P. Morency. Openface: an open source facial behavior analysis toolkit. in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). 2016. IEEE. 34. Tsai, Y.-H.H., et al., Multimodal transformer for unaligned multimodal language sequences, in Proceedings of the conference. Association for Computational Linguistics. Meeting. 2019, NIH Public Access. p. 6558. 35. Russell, J.A., A circumplex model of affect. Journal of personality and social psychology, 1980. 39(6): p. 1161.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0726121-144946.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS