- 无标题文档
查看论文信息

中文题名:

 基于RGB-T融合的物体检测    

姓名:

 孙晓宇    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 080901    

学科专业:

 计算机科学与技术    

学生类型:

 学士    

学位:

 理学学士    

学位年度:

 2022    

学校:

 北京师范大学    

校区:

 北京校区培养    

学院:

 人工智能学院    

第一导师姓名:

 黄华    

第一导师单位:

 北京师范大学人工智能学院    

提交日期:

 2022-05-27    

答辩日期:

 2022-05-09    

外文题名:

 RGB-T Based Object Detection    

中文关键词:

 物体检测 ; RGB-T融合 ; 特征金字塔 ; 自注意力机制    

外文关键词:

 Object Detection ; RGB-T Fusion ; Feature Pyramid ; Transformer    

中文摘要:

物体检测作为计算机视觉领域重要研究方向,近年来依靠深度学习算法和大规模训练数据取得了显著的成绩。相比于单模态物体检测,RGB-T多模态物体检测利用多样互补的信息,在真实复杂环境下表现出一定的优势。然而目前的RGB-T物体检测在模态融合方面依然存在以下两个难点第一,特征串联未能有效挖掘互补特征;第二,RGB-T图像位置偏移加大融合难度。针对上述两个难点,本文提出了一种基于多模态自注意力机制的特征金字塔模型用模态内语义信息和模态间互补信息增强物体检测的特征表示。该模型包括两个子模块:模态内特征金字塔模块和模态间特征金字塔模块。模态内特征金字塔模块通过跨空间自注意力机制及跨尺度自注意力机制来增强模态内的自身特征;模态间特征金字塔模块通过跨模态自注意力机制来学习模态间的互补特征且该模块学习了与距离无关的依赖关系此依赖对图像位置偏移不敏感。验结果表明,本文提出的模型明显优于单模态物体检测模型,达到了与当前最好RGB-T模型可比的性能。

外文摘要:

As an important research direction in the field of computer vision, object detection has made remarkable progress by relying on deep learning algorithms and large-scale training data. Compared with single-modal object detection, RGB-T multi-modal object detection utilizes diverse and complementary information, showing some advantages in real and complex environments. However, the RGB-T object detection has the following two difficulties in modal fusion. Firstly, concatenating features cannot effectively learn complementary features. Secondly, the misaligned images increase the difficulty of fusion. To address the two difficulties, a multi-modal feature pyramid transformer is proposed, which learns semantic and modal complementary information to enhance the feature representation of object detection. It mainly includes two modules: intra-modal feature pyramid transformer and inter-modal feature pyramid transformer. The intra-modal feature pyramid transformer enhances its features through cross-spatial self-attention and cross-scale self-attention. The inter-modal feature pyramid transformer learns the complementary features between modalities through the cross-modal self-attention. Meanwhile, the inter-modal feature pyramid transformer can also learn distance independent dependencies between modalities, which are not sensitive to misaligned images. The experimental results show that the proposed method outperforms the single-modal method, and is competitive with the state-of-the-art method.

参考文献总数:

 33    

优秀论文:

 北京师范大学优秀本科论文    

插图总数:

 9    

插表总数:

 5    

馆藏号:

 本080901/22001    

开放日期:

 2023-05-27    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式