查看论文信息

查看全文

查看论文信息

中文题名：	基于对比学习和注意力机制的表情识别研究及应用
姓名：	刘紫凤
保密级别：	公开
论文语种：	chi
学科代码：	081202
学科专业：	计算机软件与理论
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2023
校区：	北京校区培养
学院：	教育学部
研究方向：	AI&教育，AR&教育
第一导师姓名：	蔡苏
第一导师单位：	教育学部
提交日期：	2023-06-15
答辩日期：	2023-05-24
外文题名：	Facial expression recognition and application based on contrastive learning and visual attention
中文关键词：	表情识别 ; 深度学习 ; 卷积神经网络 ; 对比学习 ; 注意力机制
外文关键词：	Expression recognition ; Deep learning ; Convolutional neural networks ; Contrast learning ; Attention mechanisms
中文摘要：	︿表情识别旨在判断和分析图像或视频中的人脸情绪，相关研究致力于得到具有高准确率、识别速度快、节省计算资源的表情识别模型。研究表情识别算法和模型可以使计算机具备识别和理解人类情感和情绪的能力，从而在人机交互、智能教育等方面得到应用。传统人脸表情识别算法通过人工方式选择并提取表情特征，模型泛化能力不强。基于深度学习的表情识别研究得到一定进展但仍存在表情特征提取困难、特征权重关注不足以及数据类别不均衡和数据不足的问题，模型训练易产生过拟合，难以提取有效表情特征，最终模型的准确率和鲁棒性需要进一步提升。针对以上问题，本文的研究工作主要包括：（1）针对人脸表情数据类别不均衡和模型易过拟合问题，研究基于对比学习的深度卷积网络模型。本文将无监督学习中的对比学习引入人脸表情识别任务，通过对人脸表情数据集进行数据增强得到训练数据的对比正负样例，构造基于对比学习的损失函数引导模型对关键人脸表情特征进行区分。基于FER2013 数据集，在VGGNet、ResNet 和DenseNet架构上的5 个深度卷积神经网络模型上的实验证明，引入对比学习可以更加准确地对不同表情类别进行区分，提高模型对少数量类别样本的关注，同时提高模型训练的稳定性；（2）针对表情特征提取困难及特征权重关注不足的问题，研究引入通道注意力和自注意力机制的人脸表情识别模型。本文提出了SPP-SENet 和ResVit 两种融合注意力机制的深度卷积模型。其中，为了进一步提升卷积神经网络的特征表示能力，SPP-SENet 结合了空间金字塔池化结构和通道注意力模块，ResVit 则是引入自适应的注意力机制和融合多个模型的特征表示，结合了基于网格的自注意力和多模型特征的决策融合。在FER2013，CK+ 和JAFFE 三个基准数据集上的实验证明所提出的两个引入注意力机制的模型能够有效地提取人脸表情图像的深度通道特征和空间特征，显著提升了模型识别效果；（3）基于对比学习和注意力机制的表情识别方法的课堂应用。在实际应用中，本文将所提出的人脸表情识别技术集成到表情识别系统中，实现了学生情感状态的实时监控和分析。通过将前两项研究成果有效结合并应用于实际教学中，为实现智慧教育背景下的学生学习情感自动分析和诊断提供了关键技术。﹀
外文摘要：	︿ Facial expression recognition is the process of identifying and analyzing the emotional state of a human face in an image or video. The related research focuses on developing expression recognition models with higher accuracy, faster recognition speed, and more economical computational resources. The study of expression recognition algorithms and models enables computers to recognize and understand human emotions and sentiments, which have applications in humancomputer interaction, industrial manufacturing, intelligent education, security services, and traffic safety. Traditional facial expression recognition algorithms rely on manual feature extraction, which is subject to human factors and results in poor model generalization and recognition accuracy.While deep learning-based expression recognition algorithms have made progress, challenges remain,including difficult feature extraction, insufficient attention to feature weights, unbalanced data categories, insufficient data, and overfitting. These issues are addressed in this paper through the following research: 1. A deep convolutional network model based on contrast learning is proposed to address the problems of unbalanced face expression data categories and overfitting. Contrast learning in unsupervised learning is introduced into the face expression recognition task, and the loss function based on contrast learning is constructed to guide the model to distinguish key face expression features. Experiments on five deep convolutional neural network models on VGGNet, ResNet, and DenseNet architectures demonstrate that contrast learning can improve recognition accuracy and stability of model training. 2. To address the problems of difficult feature extraction and insufficient attention to feature weights, two face expression recognition models incorporating attention mechanisms, SPP-SENet and ResVit, are proposed. SPP-SENet combines the spatial pyramidal pooling structure and the channel attention module, while ResVit introduces the adaptive attention mechanism and fuses the feature representation of multiple models. Experiments on three benchmark datasets demonstrate that the proposed models can effectively extract deep channel features and spatial features of face expression images, significantly improving recognition accuracy. 3. The proposed expression recognition technology is integrated into an expression recognition system for real-time monitoring and analysis of students’ emotional states in the context of smart education. This provides a key technology to realize automatic analysis and diagnosis of students’ learning emotions. In summary, this paper proposes novel deep learning-based models for facial expression recognition and demonstrates their effectiveness in improving recognition accuracy and stability. The proposed technology has practical applications in various domains, including education, security, and human-computer interaction. ﹀
参考文献总数：	94
馆藏号：	硕081202/23010
开放日期：	2024-06-15

附件下载