查看论文信息

查看全文

查看论文信息

中文题名：	基于深度学习技术的监考视频智能分析
姓名：	万桢洪
保密级别：	公开
论文语种：	中文
学科代码：	081002
学科专业：	信号与信息处理
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2022
校区：	北京校区培养
学院：	人工智能学院
研究方向：	计算机视觉
第一导师姓名：	骆祖莹
第一导师单位：	北京师范大学人工智能学院
提交日期：	2022-06-08
答辩日期：	2022-06-04
外文题名：	Intelligent Analysis of Invigilation Video Based on Deep Learning Technology
中文关键词：	目标检测 ; 行为识别 ; 考场异常行为 ; 卷积神经网络 ; 深度学习
外文关键词：	Object detection ; Action recognition ; Abnormal behavior in examination room ; Convolutional neural network ; Deep learning
中文摘要：	︿考试作为评判个人能力及选拔人才的一项重要手段，保证其公平公正地开展，直接关系到广大考生的切身利益，影响到社会的和谐发展。为了维护考试公平，近年来国家大力推进标准化考场建设，其中非常重要的一个环节就是监控录像的后续审查。这些工作目前是由电子监考员通过线上实时观看多个考场的监考视频来完成，受限于人类的视觉分辨力和忍耐力，电子监考员只能宏观地监视考场群体事件，无法对个别考生的偶发作弊行为进行监视。同时，采用人工审查的方式会耗费大量的资源与时间，且易因审查人员视觉疲劳等引起漏检、误检的情况。因此，如何利用现有深度学习技术，在保证速度的前提下实现对考场监控视频的智能分析是一个重要研究方向。基于以上问题，本文主要进行了以下研究工作：（1）构建了用于深度学习训练的考场异常行为数据集。针对考场数据缺乏的问题，本文定义了典型的考场作弊行为，通过模拟考生真实考试过程的方式，从不同角度、不同光线拍摄了考场模拟视频。通过对视频进行预处理、清洗、标记，构建了考生人体骨架数据集和考生图像块单元数据集，分别用于后续模型的训练。（2）提出了基于人体骨架的考场异常行为识别模型。模型由考生目标检测和考生行为识别两部分组成。目标检测作为行为识别的基础，本文采用YOLOv3对考生进行定位，获取其人体检测框。行为识别部分在使用OpenPose提取考生人体骨骼关键点的基础上，选择Resnet50作为主干网络，基于考生人体骨架进行特征提取，使模型能够忽略考场背景信息的差异，更加专注于考生行为特征。此外，为了保证检测的实时性，通过帧间差分法提取视频关键帧，仅对关键帧进行检测。以往的研究更偏向于在数据集上对准确率进行分析与探讨，没有在实际视频中验证效果。本文将提出的方法应用在了考场场景视频中，结果表明本文的方法在数据集和实际应用场景中都取得了不错的效果。（3）提出了基于三维卷积注意力的考场异常行为识别模型。模型采用改进的C3D作为特征提取器的主干网络，在输出的特征映射中增加通道注意力和空间注意力模块，使模型更专注于考生行为特征的提取。特征提取器输出的特征向量及相应标签作为SVM的训练集，使用训练好的SVM作为分类器，最终输出分类结果。该模型针对低分辨率视频加强了时间特征的提取，能够取得更好的效果。同时，为了提高检测的效率，本文提出了一种层次式的作弊行为检测模型，即在基于人体骨架识别的基础上，结合本章提出的三维卷积方法对检测结果进行二次识别。实验结果表明，这种方法能够在提高检测准确率的同时保证速度，做到了速度与精度的平衡。﹀
外文摘要：	︿ Examination, as an important means of judging individual ability and selecting talents, ensuring that it is conducted fairly and impartially, which is directly related to the vital interests of the majority of examinees and affects the harmonious development of society. In order to defend the fairness of examinations, the state has been vigorously promoting the construction of standardized examination rooms in recent years, one very important aspect of which is the follow-up review of surveillance video. These work are currently done by electronic invigilators by watching online real-time invigilation videos from multiple examination rooms. Limited by human visual resolution and endurance, electronic invigilators can only macroscopically monitor group events in the examination room, and cannot monitor the occasional cheating behavior of individual examinees. At the same time, the manual review method will consume a lot of resources and time, and it is easy to cause leakage and false inspections due to visual fatigue of reviewers. Therefore, it is an important research direction to use the existing deep learning technology to realize the intelligent analysis of the examination room monitoring video while ensuring the speed. Based on the above problems, this paper mainly carries out the following research work: (1) We construct a dataset of abnormal behaviors in examination room for deep learning training. To address the lack of data in the examination room, this paper defines the typical cheating behaviors in the examination room. By simulating the examinee's real examination process, the simulated videos of the exam room are taken from different angles and different lights. By pre-processing, cleaning, and labeling the videos, the examinee's body skeleton dataset and the examinee's image block unit dataset are constructed, which are respectively used for subsequent model training. (2) We propose a human skeleton-based model for abnormal behavior recognition in the examination room. The model consists of two parts: the examinee object detection and the examinee behavior recognition. Object detection as the basis of behavior recognition, and this paper use YOLOv3 to locate examinees and obtain their human proposal bounding boxes. The behavior recognition part, based on the extraction of key points of examinee's human skeleton using OpenPose, while using Resnet50 as the backbone network. The method of feature extraction is based on the examinee's human skeleton, so that the model can ignore the differences in the background information of the examination room and focus more on the examinee's behavior features. In addition, to ensure the real-time detection, the video key frames are extracted by the inter-frame difference method, and only the key frames are detected. Previous studies have been more oriented to analyze and discuss the accuracy on the datasets without validating it in actual videos. In this paper, we apply the proposed method to the video of the examination scene, and the results show that the method in this paper achieves good results in both the dataset and the actual application scenario. (3) We propose an abnormal behavior recognition model based on 3D convolutional network and attention module. The model adopts the improved C3D as the backbone network of the feature extractor, and adds channel attention module and spatial attention module to the output feature map, so that the model is more focused on the extraction of examinees' behavioral features. The feature vector output by the feature extractor and its corresponding label are used as the training set of the SVM, and the trained SVM is used as the classifier, and finally output the classification result. The model strengthens the extraction of temporal features for low-resolution videos, and can achieve better results. At the same time, in order to improve the efficiency of detection, this paper proposes a hierarchical cheating behavior detection model, that is, based on the human skeleton-based model, combined with the 3D convolution method proposed in this chapter for secondary recognition of the detection results. The experimental results show that this method can improve the detection accuracy while ensuring the speed, and achieving a balance between speed and accuracy. ﹀
参考文献总数：	71
馆藏号：	硕081002/22004
开放日期：	2023-06-08

附件下载