Responsive image
博碩士論文 etd-0726121-144946 詳細資訊
Title page for etd-0726121-144946
論文名稱
Title
姿勢對於多模態情緒辨識影響
The influence of posture on multi-modal emotion recognition
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
101
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2021-08-18
繳交日期
Date of Submission
2021-08-26
關鍵字
Keywords
多模態融合、情緒辨識、姿勢、深度學習、多標籤分類
multi-modal fusion, emotion recognition, posture, deep learning, multi-label
統計
Statistics
本論文已被瀏覽 343 次,被下載 3
The thesis/dissertation has been browsed 343 times, has been downloaded 3 times.
中文摘要
過往多模態情緒辨識的領域中,一直將重點放在文字、聲音以及表情上,對於姿勢模態卻鮮少有人提起。我們認為,人類可以透過表情,聲音,文字以及肢體動作,來判斷另一個人的情緒。因此我們認為,在人機互動的領域中,姿勢模態對於情緒辨識也是一個相當重要的特徵來源。所以本篇論文的研究目的,將會去探討姿勢模態對於多模態情緒辨識帶來的影響。在日常生活中,也會出現很多其他模態受到干擾的狀況,使準確率降低。而增加模態的數量,可以使受到干擾的影響降低,提升整體的辨識能力。因此我們設計了模態組合的實驗,主要是將姿勢模態與其他模態融合,比較不同的模態組合,對姿勢模態進行交叉驗證。並且在四個不同資料集,三個不同的分類任務中進行了相同的實驗,藉此驗證我們的實驗結論。接著我們透過調整姿勢模態融合比例的方式,優化了整體多模態情緒辨識模型,使準確率提升。最後透過文字干擾的實驗,模擬多模態情緒辨識模型受到干擾時的狀況,並加入姿勢模態來維持多模態模型的辨識能力。實驗結果顯示,單一姿勢模態的情緒辨識能力較差,但是如果將姿勢模態與其他模態融合可以有效提升辨識能力。另外與低比例的姿勢模態融合,可以有效地提升整體辨識準確率,並且增加對於文字干擾的抵抗力。
Abstract
In the field of multimodal emotion recognition in the past, the emphasis has been on texts, speech, and facial expressions, but few people mention posture modalities. We believe that humans can recognize another person’s emotions through facial expressions, speech, texts, and body movements. So, we believe that in the field of human-computer interaction, posture modalities are also a very important source of characteristics for emotion recognition. Therefore, the research purpose of this paper is to explore the influence of posture modalities on multi-modal emotion recognition. In daily life, many other modes will be interfered, which reduces the accuracy of multi-modal emotion recognition. Increasing the number of modes can reduce the influence of interference and improve the overall recognition ability. Therefore, we designed an experiment of modal combination, which is mainly to fuse the posture model with other model, then compare different model combinations, and cross-validate the posture model. And in four different datasets, three different classification tasks conducted the same experiment to verify our experimental conclusions. Then we optimized the overall multi-model emotion recognition model by adjusting the posture modal fusion ratio to improve the accuracy. Finally, through the experiment of text interference, the situation when the multi-modal emotion recognition model is disturbed is simulated, and the posture mode is added to maintain the recognition ability of the multi- model. The experimental results show that the emotion recognition ability of a single posture model is poor, but if the posture mode is combined with other models, the recognition ability can be effectively improved. In addition, blending with low-proportion posture modalities can effectively improve the overall recognition accuracy and increase the resistance to text interference.
目次 Table of Contents
論文審定書 i
論文公開授權書 ii
摘要 iii
Abstract iv
目錄 v
圖目錄 vii
表目錄 ix
第一章 緒論 1
1.1研究背景 1
1.2研究動機 2
1.3研究目的 3
1.4研究方法與流程 4
第二章 文獻探討 6
2.1情緒辨識 6
2.2單模態情緒辨識 6
2.3多模態情緒辨識 7
2.4多模態資料集 9
2.5肢體動作 17
2.5.1肢體動作對情緒辨識影響 17
第三章 研究方法 19
3.1資料集前處理 19
3.1.1 姿勢模態處理 20
3.1.2其他特徵前處理 23
3.2單模態情緒辨識模型 24
3.3多模態情緒辨識模型 29
3.4模態組合 30
3.5模態融合比例 32
第四章 研究結果 35
4.1評估準則 35
4.1.1極性情緒(sentiment)及多分類(multi-class)情緒的評估指標 35
4.1.2 多標籤分類(multi-labels)情緒預測評估 36
4.2模態組合結果分析比較 38
4.2.1單模態的極性情緒辨識 38
4.2.2多模態的極性情緒辨識 39
4.2.3單模態的多分類情緒辨識 41
4.2.4多模態的多分類情緒辨識 43
4.2.5單模態的多標籤情緒辨識 44
4.2.6多模態的多標籤情緒辨識 48
4.3模態融合比例結果分析 51
4.3.1極性情緒辨識的模態融合比例調整 51
4.3.2多分類任務情緒辨識的模態融合比例調整 53
4.3.3多標籤分類任務情緒辨識的模態融合比例調整 54
4.4文字干擾 58
4.4.1極性情緒辨識文字干擾 59
4.4.2多分類任務情緒辨識文字干擾 68
4.4.3多標籤分類任務情緒辨識文字干擾 74
4.5情緒valence-arousal分析 79
4.6實驗結論 81
第五章 結論 85
5.1結論 85
5.2資料集限制 85
5.3建議 86
參考文獻 88

參考文獻 References
1. Noroozi, F., et al., Survey on emotional body gesture recognition. IEEE transactions on affective computing, 2018.
2. Glowinski, D., et al. Towards a minimal representation of affective gestures. in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 2015. IEEE.
3. Soleymani, M., et al., A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, 2011. 3(1): p. 42-55.
4. Sreeshakthy, M. and J. Preethi, Classification of human emotion from deap eeg signal using hybrid improved neural networks with cuckoo search. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 2016. 6(3-4): p. 60-73.
5. Ekman, P., i Friesen, WV (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 1972. 17(2): p. 124.
6. Ekman, R., What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). 1997: Oxford University Press, USA.
7. Coşkun, M., et al. Face recognition based on convolutional neural network. in 2017 International Conference on Modern Electrical and Energy Systems (MEES). 2017. IEEE.
8. Trigeorgis, G., et al. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2016. IEEE.
9. Zhao, J., X. Mao, and L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 2019. 47: p. 312-323.
10. Fayek, H.M., M. Lech, and L. Cavedon, Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, 2017. 92: p. 60-68.
11. Mirsamadi, S., E. Barsoum, and C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017. IEEE.
12. Rahman, W., et al., M-bert: Injecting multimodal information in the bert structure. arXiv preprint arXiv:1908.05787, 2019.
13. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
14. Busso, C., et al. Analysis of emotion recognition using facial expressions, speech and multimodal information. in Proceedings of the 6th international conference on Multimodal interfaces. 2004.
15. Tripathi, S., S. Tripathi, and H. Beigi, Multi-modal emotion recognition on iemocap dataset using deep learning. arXiv preprint arXiv:1804.05788, 2018.
16. Rahman, W., et al., M-bert: Injecting multimodal information in the bert structure, in arXiv preprint arXiv:1908.05787. 2019.
17. Majumder, N., et al., Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based systems, 2018. 161: p. 124-133.
18. Zadeh, A., et al., Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250, 2017.
19. Zadeh, A., et al., Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259, 2016.
20. Zadeh, A.B., et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.
21. Nojavanasghari, B., et al. Emoreact: a multimodal approach and dataset for recognizing emotional responses in children. in Proceedings of the 18th acm international conference on multimodal interaction. 2016.
22. Busso, C., et al., IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 2008. 42(4): p. 335-359.
23. Ekman, P., W.V. Freisen, and S. Ancoli, Facial signs of emotional experience. Journal of personality and social psychology, 1980. 39(6): p. 1125.
24. Mehrabian, A. and M. Wiener, Decoding of inconsistent communications. Journal of personality and social psychology, 1967. 6(1): p. 109.
25. Mehrabian, A. and S.R. Ferris, Inference of attitudes from nonverbal communication in two channels. Journal of consulting psychology, 1967. 31(3): p. 248.
26. Mehrabian, A., Silent messages. Vol. 8. 1971: Wadsworth Belmont, CA.
27. Carney, D.R., A.J. Cuddy, and A.J. Yap, Power posing: Brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological science, 2010. 21(10): p. 1363-1368.
28. Gunes, H. and M. Piccardi. Affect recognition from face and body: early fusion vs. late fusion. in 2005 IEEE international conference on systems, man and cybernetics. 2005. IEEE.
29. Glowinski, D., et al. Technique for automatic emotion recognition by body gesture analysis. in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2008. IEEE.
30. Cao, Z., et al., OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 2019. 43(1): p. 172-186.
31. Chernykh, V. and P. Prikhodko, Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071, 2017.
32. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
33. Baltrušaitis, T., P. Robinson, and L.-P. Morency. Openface: an open source facial behavior analysis toolkit. in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). 2016. IEEE.
34. Tsai, Y.-H.H., et al., Multimodal transformer for unaligned multimodal language sequences, in Proceedings of the conference. Association for Computational Linguistics. Meeting. 2019, NIH Public Access. p. 6558.
35. Russell, J.A., A circumplex model of affect. Journal of personality and social psychology, 1980. 39(6): p. 1161.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code