- 无标题文档
查看论文信息

中文题名:

 基于稀疏检测的单目多人姿态估计    

姓名:

 魏国利    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081203    

学科专业:

 计算机应用技术    

学生类型:

 硕士    

学位:

 工学硕士    

学位类型:

 学术学位    

学位年度:

 2024    

校区:

 北京校区培养    

学院:

 人工智能学院    

研究方向:

 虚拟现实与增强现实    

第一导师姓名:

 胡晓雁    

第一导师单位:

 人工智能学院    

提交日期:

 2024-06-14    

答辩日期:

 2024-05-26    

外文题名:

 MONOCULAR HUMAN POSE ESTIMATION BASED ON SPARSE DETECTION    

中文关键词:

 人体姿态估计 ; 稀疏检测 ; 数据集 ; 单视角 ; 单阶段    

外文关键词:

 Human pose estimation ; Sparse detection ; Data set ; Single view ; Single stage    

中文摘要:

人体姿态估计的目标是从输入的多媒体数据中定位估计人体关键点,作为当前计算机视觉领域的热门研究方向具有广泛的应用场景。随着深度学习技术的快速发展,相关算法的预测精度及鲁棒性等获得显著提升。但人体姿态估计算法,特别是带深度预测的人体姿态估计在实际应用中仍然存在一定的问题及挑战,其一,是当前多人姿态估计任务中多依赖多阶段的框架以及密集检测或密集特征交互,存在冗余计算与复杂的后处理等问题;其二,图像中广泛存在的遮挡与关节缺失等图像信息缺失问题增加了姿态估计的难度;最后,当前带有完整三维标注信息的姿态估计数据集多为实验室环境下采集,室外真实数据集仍相对匮乏。

本文将稀疏检测的思路应用于人体姿态估计之中,提出了一种基于稀疏检测的单阶段单目多人姿态估计方法,模型结构简单、检测效率高且具有良好的伸缩性,该方法仅将稀疏的建议特征与图像局部特征进行交互即可从图像中直接获取人体姿态,能够满足多种应用场景的不同需求。模型使用了稀疏检测的方法与关节坐标回归的方式进行预测,极大简化了模型结构与检测流程提高了模型检测效率,所具有的级联结构以及可调整数量的建议值使得模型具有良好的伸缩性,能够通过调整相关配置平衡模型的预测精度与预测效率以应对不同的场景需求。同时使用包含从数据集中得到具有丰富先验知识的建议值,能够在面对强遮挡的情况下给出合理的预测结果从而在一定程度上解决遮挡问题。在COCO数据集及CMU数据集上的相关实验结果表示,本文所提出人体姿态检测方法能够高效的从单目图像中获取人的姿态信息,具有良好的伸缩性能够面对多种应用场景。

针对带标注室外真实数据集匮乏的情况,制作了包含多种自然场景以及多种运动主题的室外多人交互数据集,并进行包括三维关节位置在内的各类信息标注能够满足多种任务需求,同时以真实数据集为基础构建了可以提供不同时刻下任意视角、任意遮挡率的渲染数据集,弥补了单一真实数据集存在的视角泛化性等不足,提升了数据集的泛用性。

外文摘要:

The goal of human pose estimation is to estimate the key points of human body from the input multimedia data, which has a wide range of application scenarios as a popular research direction in the field of computer vision. With the rapid development of deep learning technology, the prediction accuracy and robustness of related algorithms have been significantly improved. However, the human pose estimation algorithm, especially the human pose estimation with depth prediction, still has some problems and challenges in practical application. First, the multi-dependent multi-stage framework and intensive detection or intensive feature interaction in the current multi-person pose estimation task exist redundant computation and complex post-processing problems. Secondly, the widespread image information loss such as occlusion and missing joints increase the difficulty of attitude estimation. Finally, the current pose estimation data sets with complete 3D annotation information are mostly collected in the laboratory environment, and the real outdoor data sets are still relatively scarce.

This thesis applies the idea of sparse detection to human pose estimation, and proposes a one-stage monocular multi-person pose estimation method based on sparse detection, which has simple model structure, high detection efficiency and good scalability. The method can directly obtain human pose from images only by interacting sparse suggested features with local image features. It can meet the different requirements of various application scenarios. The model uses the sparse detection method and the joint coordinate regression method for prediction, which greatly simplifies the model structure and detection process and improves the model detection efficiency. The cascade structure and the suggested value of the adjustable number make the model have good scalability, and the prediction accuracy and prediction efficiency of the model can be balanced by adjusting the relevant configuration to meet the requirements of different scenarios. At the same time, the proposed value including rich prior knowledge obtained from the data set can give reasonable prediction results in the face of strong occlusion, so as to solve the occlusion problem to some extent. The experimental results on COCO dataset and CMU dataset show that the proposed human posture detection method can efficiently obtain human posture information from monocular images, and has good scalability and can face a variety of application scenarios.

In view of the shortage of marked outdoor real data sets, outdoor multi-person interactive data sets containing a variety of natural scenes and a variety of sports themes were made, and all kinds of information including 3D joint positions were annotated to meet the requirements of various tasks. Meanwhile, based on the real data set, rendering data sets that can provide arbitrary viewing angles and arbitrary occlusion rates at different times were constructed. It makes up for the shortcomings of perspective generalization in a single real data set and improves the generality of the data set.

参考文献总数:

 112    

馆藏号:

 硕081203/24007    

开放日期:

 2025-06-15    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式