- 无标题文档
查看论文信息

中文题名:

 基于多元特征的在线课堂专注度检测研究    

姓名:

 李贺    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 081202    

学科专业:

 计算机软件与理论    

学生类型:

 硕士    

学位:

 工学硕士    

学位类型:

 学术学位    

学位年度:

 2022    

校区:

 北京校区培养    

学院:

 人工智能学院    

研究方向:

 深度学习、图像处理    

第一导师姓名:

 吴昊    

第一导师单位:

 北京师范大学人工智能学院    

提交日期:

 2022-06-10    

答辩日期:

 2022-06-02    

外文题名:

 Research on Online Classroom Engagement Detection Based on Multiple Features    

中文关键词:

 专注度检测 ; 多特征融合 ; 时空卷积网络 ; 多速率滑动窗口    

外文关键词:

 engagement detection ; multi-feature fusion ; spatio-tempoengagement detection ; multi-feature fusion ; spatio-temporal convolutional network ; multi-rate sliding windowral convolutional network ; multi-rate sliding window    

中文摘要:

随着人工智能和教育改革的深入发展,教育信息化逐渐成为全国关注的重点话题。在线教育系统和大型开放式网络课程(MOOC)飞速发展,带来众多网络线上学习者。同时受到近几年新冠疫情冲击,很多大中小学校几乎都能随时切换线上教学。但网络学习环境不像传统的大班教学,教师无法实时掌握学生在电脑屏幕前学习时否专注。

而专注度是衡量学生学习过程质量的重要指 ,对学习效果有着关键影响。因此本文围绕在线课堂中学生的专注度检测展开研究,通过分析学生在线学习时的视频图 ,检测学生的学习专注度。本文的核心创新点主要包括以下几个方面:

1)本文设计了一种基于多速率滑动窗口的关键帧提取模块。目前仅有两个开源的专注度识别数据集,其不同专注度类别的视频数量比例悬殊。针对其样本分布严重失衡的问题,本文提出了一种使用滑动窗口进行关键帧提取的策略。在此基础上,每个滑动窗口内为不同类别的视频数据设置不同的帧提取速率,从而平衡各类别的训练样本比例,使后续模型能够有效学习相关特征并提升检测精度。

2)本文设计了一种基于视频关键帧的多元特征提取模块。现有的专注度评价系统往往依赖于人脸检测或表情识别,特征单一且评价方式滞后,即使目前先进的多特征方法也无法全面利用视频中学习者的各种特征。所以本文选择眼睛张合角与凝视角度、嘴部张合角度、AU动作单元强度、头部运动角度、身体动作姿态和卷积网络提取的高维特征等多个方面,针对在线课堂环境对学习者进行全面的特征提取,并以此为依据为后续的检测网络提供有效的特征素材。

3)本文设计了一种基于时空相关性的专注度检测模块。考虑到卷积网络虽然在图像方面表现优异,但无法利用视频序列的时序信息。因此本文提出了基于LSTM的帧级别检测网络和基于TCN的视频级别检测网络。前者侧重每一帧,并通过全局注意力机制减少偶然出现的异常行为对整体检测的影响。后者侧重视频序列,通过膨胀卷积和因果卷积在更长的时间范围上进行建模。实验证明,本文设计的时空相关性检测网络在DAISEEEmotiW数据集上表现良好。

    此外,针对数据集中低专注度类别的样本数量偏低,检测精度不佳的问题,本文深入分析了分类任务评价指标的局限性,并采用回归任务中的MSE评价指标进行充分实验。实验表明,本文提出的研究方法相比现有方法有明显提升,取得了理想的检测结果。
外文摘要:

With the further development of artificial intelligence and education reform, education informatization is gradually becoming a key topic of national attention. Online education systems and large open online courses (MOOC) are developing rapidly, bringing many online learners on the Internet. Also impacted by the COVID-19 in recent years, many schools, both large and small, are able to switch to online teaching at almost any time. However, unlike traditional large classroom teaching, online learning environments do not allow teachers to keep track of whether students are engaged when studying in front of a computer screen in real time.

And engagement is  indicator of the quality of students' learning process and  impact  outcomes. Therefore, this paper focuses on the detection of students' engagement in the online classroom, by analyzing the video images of students' online learning and detecting their engagement. The core innovations of this paper mainly include the following aspects.

(1) This paper designs a key frame extraction module based on multi-rate sliding windows. At present, there are only two open source engagement recognition datasets with disparate proportions of videos in different engagement categories. To address the problem of serious imbalance in their sample distribution, this paper proposes a strategy for key frame extraction using sliding windows. Based on this, different frame extraction rates are set for different categories of video data in each sliding window to balance the proportion of training samples for each category, so that the subsequent model can effectively learn relevant features and improve detection accuracy.

(2) In this paper, a multivariate feature extraction module based on video key frames is designed. Existing engagement evaluation systems often rely on face detection or expression recognition with single features and lagging evaluation methods, and even the current advanced multi-feature methods cannot fully utilize the various features of learners in videos. Therefore, this paper selects multiple aspects such as eye opening angle and gaze angle, mouth opening angle, AU action unit intensity, head movement angle, body movement posture and high-dimensional features extracted by convolutional networks to conduct comprehensive feature extraction for learners in online classroom environment and use them as a basis to provide effective feature material for subsequent detection networks.

(3) In this paper, an engagement detection module based on spatio-temporal correlation is designed. Considering that convolutional networks, although excellent in images, cannot utilize the temporal information of video sequences. Therefore, this paper proposes an LSTM-based frame-level detection network and a TCN-based video-level network. The former focuses on each frame and reduces the impact of occasional anomalous behaviors on the overall detection through a global attention mechanism. The latter focuses on video sequences and models them over a longer time horizon by means of inflation convolution and causal convolution. Experiments demonstrate that the temporal correlation detection network designed in this paper performs well on the DAISEE and EmotiW datasets.

     In addition, to address the problem of poor detection accuracy due to the low number of samples in the low-engagement this paper deeply analyzes the limita-tions of the evaluation metrics for the classification task and conducts sufficient experiments using the MSE evaluation metrics in the regression task. The experiments show that the research method proposed in this paper has significantly improved compared with the existing methods and achieved the desired detection results.
参考文献总数:

 83    

作者简介:

 李贺,北京师范大学人工智能学院,计算机软件与理论专业,研究方向为深度学习、图像处理,有一篇论文和专利正在发表中。    

馆藏号:

 硕081202/22012    

开放日期:

 2023-06-10    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式