查看论文信息

查看全文

查看论文信息

中文题名：	基于人脸视觉表征的情感识别研究
姓名：	李连栋
保密级别：	公开
论文语种：	中文
学科代码：	081203
学科专业：	计算机应用技术
学生类型：	博士
学位：	工学博士
学位类型：	学术学位
学位年度：	2018
校区：	北京校区培养
学院：	信息科学与技术学院
第一导师姓名：	孙波
第一导师单位：	北京师范大学信息科学与技术学院
提交日期：	2018-06-08
答辩日期：	2018-05-30
外文题名：	Emotion Recognition Based on Visual Facial Features
中文关键词：	情感识别 ; 表情识别 ; 面部运动单元 ; 多模态机器学习
中文摘要：	︿情感计算是一个涉及计算机科学、心理学、脑科学、社会学等多学科的交叉研究领域。情感计算的研究涉及情感语音识别、面部表情识别、身体姿态识别、情感识别与合成、生理特征监测、情感计算应用等多个方面，是一个充满活力的新兴研究领域。由于情感的脑神经机制的黑箱性、情感心理学现有理论方法的多样性、数据处理算法的局限性等原因，情感计算研究领域目前仍然缺乏统一的理论框架和方法体系，因此，存在许多具有挑战性的研究问题。情感识别是情感计算中的一个重要方向，其研究目的为赋予计算机系统检测和识别人类情感的能力。而人类情感的表达涉及眼神、语言、语音、表情、身体动作与姿态等多模态信号的共同作用，因此自动情感识别是一种多模态信息的识别问题。而视觉通道含有丰富的人类行为信息，是情感的主要载体。基于此，本文围绕多模态机器学习方法，以面部视觉表征和情感识别模型为主要研究对象，探索了情感识别中的若干问题。论文的研究工作和主要贡献包括：（1）对心理学和社会学中有关情感的定义，情感的分类模型和维度模型，静态情感、动态情感和群体等相关概念进行了仔细梳理和全面总结。由于面部在情感的表达中起到关键作用，本文针对面部肌肉运动的面部运动编码系统（FACS），深入研究了FACS通过定义面部表情运动单元（AU）来表述面部丰富而细微的表情变化的基本原理和编码体系。（2）论文研究了多模态情感特征对于情感识别的影响，并提出了两种特征融合方法，探索在自然环境下进行表情识别的框架和流程，试验了多种数据预处理方法来提升后期识别的准确率。针对静态面部表情的表示，论文采用了几何和外观纹理特征，并通过深度学习的方法来学习面部表示。在时域上将静态的表情特征聚合为视频段的特征，并提出了基于融合权重学习的多模态融合网络。通过在多个公开数据集上的实验，证明了本文方法的有效性。（3）针对面部表情运动单元（AU），本文在特征表示和检测模型两个层面进行了深入研究。为了提升深度卷积网络在表情识别中的效果，本文借鉴了边缘检测特征在传统AU识别方法中的有效性，提出了边缘检测约束下的卷积模板，设计基于边缘卷积的深度学习模型。另一方面，由于直接进行AU强度识别具有很大的挑战性，本文提出了基于局部和全局对比的检测模型，通过样本间的对比来进行强度的间接识别。模型首先采样生成图像对，并得出一系列图像之间的AU强度强弱关系；接着基于强弱关系对样本进行强度的排序，并通过一个简单的线性映射进行AU真实强度的预测。通过在AU数据集上的实验，证明了本文模型的有效性。（4）最后，本文基于深度学习模型，对情感识别中的若干关键问题：连续情感识别、真实/伪装表情识别和群体情感识别进行了深入研究。连续情感识别是基于多模态特征的情感回归预测问题，本文试验了若干视觉特征，并通过多元线性回归来融合预测结果。真实/伪装表情识别是基于视频数据的动态情感检测问题，本文结合面部特征点和纹理特征，联合使用了带注意力机制的循环神经网络模型对真实/伪装表情进行了分类识别。群体情感识别是联合分析群体对于观察者表现出的积极或者消极的情感分类问题，本文在个人和群体两个层面，针对性的使用了面部和群体特征进行了联合识别。﹀
外文摘要：	︿ Affective computing is an interdisciplinary research field involving the study of computer science, psychology, brain science, and sociology. The study of affective computing involves many aspects such as emotional speech, facial affect detection, body gesture, emotion recognition and synthesis, physiological monitoring and emotional computing applications. It is a newly developed research area. Due to the unclear and complicated brain mechanism of emotion, the diversity of existing theoretical methods of emotional psychology, and the limitations of data processing algorithms, there is still a lack of a unified theoretical framework and methodological system in the field of affective computing. Therefore, there are many challenging research problems. Emotion recognition is an important aspect in affective computing. Its purpose is to give computer systems the ability to detect and recognize human expressions and emotions. The expression of human emotions involves the co-expression of multimodal signals such as eye gazing, speech, language, facial expression, body movements, and postures. Therefore, automatic emotion recognition is a multimodal recognition problem. Among all the multimodal signals, visual feature is the main channel for emotional communication. Based on that, I focus on using the multi-modal machine learning methods, and take visual emotion representation and emotion recognition model as the main research objects, and explore several issues in emotional recognition applications. The main work and contents of this thesis are: Firstly, I investigate the psychological and sociological aspects of the concept and definition of emotion, including the discrete emotion models and continuous dimensional models, and the concept of static emotion, dynamic emotion, and group emotion. Due to the key role of face in the expression of emotions, I also study the facial action coding system (FACS), which is based on the facial muscle movements. It can express the rich and subtle facial expression changes by defining the facial expression action unit (AU). Secondly, I study the performance of multimodal emotional features for emotional recognition, and propose two feature fusion methods to combine multimodal features. I explore the framework and scheme for facial expression recognition in the wild, and test a variety of data preprocessing methods to reduce the difficulty for later recognition. I test the geometric and texture features of static facial expressions and train models to learn facial representations through deep learning. Then the static expression features are aggregated into the features of the video clips in the temporal domain, and a multimodal fusion network based on parameter learning is proposed. Through experiments on multiple public data sets, the effectiveness of the proposed methods are proved. Thirdly, I study the feature representation and detection model for action unit intensity recognition. Based on the fact that the effect of the deep convolutional network is limited in the recognition of facial expressions, and inspired by the effectiveness of the handcrafted edge detection feature in AU recognition methods, I propose a convolution kernel under the constraint of edge detection, and design an edge convolution network model. On the other hand, due to the difficulty of directly performing the AU intensity recognition, I propose a detection model based on local and global ranking, and indirectly recognize the intensity through comparison of image samples. The model first samples the images and generated image pairs, and then obtains the relationship between the intensity of the AUs of a series of images. Then the intensities of the images are sorted based on the local rank results, and the actual AU intensity is predicted by a simple linear mapping function. Experimental results show that the methods have achieved good results on two publicly available datasets. Lastly, based on the deep learning models, I study some key problems of emotion recognition, including continuous emotion recognition, genuine and deceptive expression recognition and group emotion recognition. The continuous emotion recognition task is an emotional regression prediction problem based on multi-modal features. I test several visual features and fuse the prediction results through multiple linear regression. Distinguishing genuine and deceptive emotions is a dynamic expression detection problem based on video data. I combine the facial landmark points and texture feature, and use a recurrent neural network model with attention mechanism to classify and recognize the real or disguised expressions. The goal of group emotion recognition is to analysis group images and classify the images to positive, negative or neutral emotion states. I use both face-level and group-level features to jointly identify the group-level emotion state. ﹀
参考文献总数：	0
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博081203/18003
开放日期：	2019-07-09

附件下载