查看论文信息

查看全文

查看论文信息

中文题名：	面向课堂场景的学生动作识别模型的研究与实现
姓名：	赵子云
保密级别：	公开
论文语种：	chi
学科代码：	081002
学科专业：	信号与信息处理
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2023
校区：	北京校区培养
学院：	人工智能学院
研究方向：	智能技术与教育应用
第一导师姓名：	姚力
第一导师单位：	人工智能学院
提交日期：	2023-06-20
答辩日期：	2023-06-02
外文题名：	Advancements and Implementation of Student Action Recognition Models in Classroom Scenarios
中文关键词：	课堂场景 ; 目标检测 ; 姿态估计 ; 动作识别 ; 系统设计
外文关键词：	Classroom scenario ; Object detection ; Pose estimation ; Action recognition ; System design
中文摘要：	︿随着计算机视觉技术迅速发展，课堂学生动作识别模型日益受到关注。课堂动作识别有助于了解学生学习状态、参与度与纪律性，为教学评估及管理提供数据支持。本文针对课堂动作识别模型中问题，改进目标检测、姿态估计和动作识别方法，实现面向课堂场景的动作识别系统。首先，本文分析了目标检测模型中场景先验知识被忽视的问题，提出基于空间约束的优化策略。策略包括提取与筛选学生中心位置、基于高斯核函数的空间约束一和基于边界限制的空间约束二。实验结果显示，引入空间约束后，精度最高提升25.9%（SSD），证明了其在识别课堂学生方面的有效性。其次，本文研究了姿态估计模型轻量化问题，提出基于高分辨率表示学习的轻量级姿态估计模型DSC-HRNet。描述了基于深度可分离卷积的残差模块、多分辨率表示学习分支与多层级特征融合等关键技术，用COCO和MPII数据集验证。结果表明，相较基础模型HRNet，DSC-HRNet在保持相似精度水平（差别约1%）下，减少78.1%参数量，速度提升约90.2%，适用于实时应用场景。再者，本文探讨了动作识别模型多模态信息融合问题，提出基于多头自注意力机制的多模态融合课堂学生动作识别模型。详细阐述时空多模态数据融合方法，包括基于C3D的视觉模态信息提取、基于ST-GCN的拓扑模态信息提取和基于Transformer的多模态特征融合结构等。在Berkeley-MHAD、UTD-MHAD和Student-MHAD数据集上验证。结果显示，所提C3D+ST-GCN融合模型与C3D和ST-GCN相比，宏精度分别提升了7.5%和2.5%，5.5%和3.7%，12.9%和0.5%；微精度分别提升了8.5%和3.0%，6.5%和5.6%，15.0%，4.1%。这充分说明了融合模型的有效性。最后，本文设计并实现了面向课堂场景的学生动作识别原型系统。首先分析问题，给出总体设计方案，并详细介绍了系统中各模块及功能。实现方法和效果展示包括目标检测与跟踪模块、姿态估计模块和动作识别模块。系统仿真结果分析验证了所设计系统在课堂学生动作识别方面的有效性。综上所述，本文针对面向课堂场景的学生动作识别模型中存在的问题，提出一系列模型创新措施，并实现了一个课堂学生动作识别系统。实验结果表明，本文提出的方法在目标检测、姿态估计和动作识别方面均具有较高的性能，为课堂学生动作识别研究和实际应用提供了有益的参考。﹀
外文摘要：	︿ With the rapid development of computer vision technology, classroom student action recognition models have increasingly gained attention. Classroom action recognition helps understand students' learning status, engagement, and discipline, providing data support for teaching assessment and management. This paper addresses the problems in classroom action recognition models by improving object detection, pose estimation, and action recognition methods, realizing a classroom-scene action recognition system. Firstly, this paper analyzes the issue of overlooking scene prior knowledge in object detection models and proposes an optimization strategy based on spatial constraints. The strategy includes extracting and filtering student center positions, spatial constraint one based on Gaussian kernel function, and spatial constraint two based on boundary limitation. Experimental results show that after introducing spatial constraints, the accuracy is improved by up to 25.9% (SSD), proving its effectiveness in recognizing classroom students. Secondly, this paper studies the lightweight issue of pose estimation models, proposing a lightweight pose estimation model DSC-HRNet based on high-resolution representation learning. It describes key technologies such as residual modules based on depthwise separable convolution, multi-resolution representation learning branches, and multi-level feature fusion, verified using COCO and MPII datasets. Results show that compared to the base model HRNet, DSC-HRNet maintains similar accuracy levels (approximately 1% difference), reducing 78.1% of parameter quantity and improving speed by approximately 90.2%, suitable for real-time application scenarios. Furthermore, this paper explores the multi-modal information fusion issue in action recognition models, proposing a multi-modal fusion classroom student action recognition model based on multi-head self-attention mechanism. It elaborates on the spatio-temporal multi-modal data fusion method, including visual modality information extraction based on C3D, topological modality information extraction based on ST-GCN, and multi-modal feature fusion structure based on Transformer, verified on Berkeley-MHAD, UTD-MHAD, and Student-MHAD datasets. Results show that compared to C3D and ST-GCN, the proposed C3D+ST-GCN fusion model improves macro-precision by 7.5% and 2.5%, 5.5% and 3.7%, 12.9% and 0.5%; micro-precision by 8.5% and 3.0%, 6.5% and 5.6%, 15.0%, 4.1%. This fully demonstrates the effectiveness of the fusion model. Lastly, this paper designs and implements a prototype system for classroom-scene student action recognition. It first analyzes the problem, proposes an overall design scheme, and details the modules and functions within the system. The implementation method and effect demonstration include object detection and tracking modules, pose estimation modules, and action recognition modules. System simulation result analysis verifies the effectiveness of the designed system in classroom student action recognition. In summary, this paper addresses the problems in classroom student action recognition models, proposes a series of innovative measures, and implements a classroom student action recognition system. Experimental results show that the proposed methods exhibit high performance in object detection, pose estimation, and action recognition, providing valuable reference for classroom student action recognition research and practical applications. ﹀
参考文献总数：	123
馆藏号：	硕081002/23004
开放日期：	2024-06-19

附件下载