- 无标题文档
查看论文信息

中文题名:

 基于上下文的3D人体姿态估计    

姓名:

 周元铭    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 085212    

学科专业:

 软件工程    

学生类型:

 硕士    

学位:

 工程硕士    

学位类型:

 专业学位    

学位年度:

 2020    

校区:

 北京校区培养    

学院:

 人工智能学院    

研究方向:

 计算机视觉    

第一导师姓名:

 胡晓雁    

第一导师单位:

 北京师范大学人工智能学院    

提交日期:

 2020-06-18    

答辩日期:

 2020-06-10    

外文题名:

 3D HUMAN POSE ESTIMATION BASED ON CONTEXT INFORMATION    

中文关键词:

 3D人体姿态估计 ; 深度学习 ; 稀疏表示和三维重构    

中文摘要:

在计算机视觉领域,基于单目视觉的人体姿态估计是一个非常具有价值的研究方向,且因为该方向具备较好的基础性和普适性,在如今这个信息爆炸的时代具有十分广阔的发展前景和应用市场。但与此同时,基于单目视觉的三维人体姿态估计也是计算机视觉领域中的一项较为困难且极具挑战的任务。近年来,基于卷积神经网络的网络模型快速发展,无论从精确度还是时间开销上,人体姿态估计算法都得到了显著提升。但与此同时,仍然存在着由于遮挡,服装等因素导致的人体姿态估计不准确问题。

本文提出一种基于上下文的多阶段3D人体姿态估计网络结构,其中第一阶段是获取视频流数据中的2D人体姿态和关键点,该阶段的输出对后续工作以及整个流程都有着至关重要的作用。本文比较了常用的基于视频流的人体姿态估计网络模型OpenPoseLSTM-PM方法,通过分析其局限性和存在的不足提出一种基于上下文的二维人体姿态估计网络结构,采用具有长短时记忆的网络来加强对视频上下文信息的学习和预测,研究探索了基于上下文信息的单目二维人体姿态估计,不可见关键点可以通过当前帧中的显示姿态和上下文信息联合进行预测,并通过可视化实验证明了对于遮挡导致的关键点不可见以及人体关键点错误链接问题有较好的缓解作用。第二阶段通过稀疏表示和3D重构得到3D人体姿态。通过实验的结果表明,本文设计的方法比现有的视频流人体姿态估计方法具有更高的精度,并且在遮挡问题上有更好的表现。同时本文将计算机视觉领域中的人体姿态估计和虚拟现实进行结合,实现了输入单目视频后输出对应三维人体几何模型动作的整体流程,具有一定的工程意义和应用价值。

外文摘要:

Monocular based human pose estimation is a very important research direction in  computer vision region, because of its fundamental and universality, it has a very broad development prospect and application market.3D human pose estimation based on monocular camera is one of the most challenging problems in computer vision region. In recent years, due to the rapid development of convolutional neural networks, the accuracy of human pose estimation has been significantly improved. But at the same time, there are still many problems in human pose estimation inaccurately due to occlusion, clothing and other factors. 

This paper proposes a multiple stage 3D human pose estimation network structure based on context information. The first stage is to obtain the 2D human pose and key points in the video stream data. The output of the first stage of the network is crucial for the next and the whole work. This paper compares commonly used OpenPose and LSTM-PM methods for video stream human pose estimation network models, and proposes a context-based 2D human pose estimation network structure by analyzing its limitations and shortcomings. A network with Long Short-Term Memory module is used to strengthen the learning and prediction of video context information. The research explores monocular 2D human pose estimation based on context information. Invisible key points can be predicted by combining known poses and context information. Experimental visualization proves that it has a better mitigation effect on the invisible problem of keypoints caused by occlusion. In the second stage, 3D human pose is obtained by sparse representation and 3D reconstruction. The results show that it achieves higher accuracy than the existing video human pose estimation methods, and has better performance on occlusion problems.At the same time, this article combines the human body pose estimation and virtual reality in the field of computer vision, realizing the overall process of inputting the corresponding 3D human body geometric model action after inputting monocular video.

参考文献总数:

 64    

馆藏号:

 硕085212/20038    

开放日期:

 2021-06-18    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式