查看论文信息

查看全文

查看论文信息

中文题名：	基于注意力机制和高斯分布表示的遥感图像旋转框检测研究
姓名：	高兴林
保密级别：	公开
论文语种：	chi
学科代码：	081203
学科专业：	计算机应用技术
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	人工智能学院
研究方向：	计算机视觉
第一导师姓名：	余先川
第一导师单位：	人工智能学院
提交日期：	2024-06-22
答辩日期：	2024-06-01
外文题名：	Remote Sensing Image Rotated Bounding Box Detection Based on Attention Mechanism and Gaussian Distribution Representation
中文关键词：	遥感旋转框检测 ; 注意力机制 ; 多尺度特征融合 ; 高斯分布表示 ; 巴氏距离
外文关键词：	Remote sensing oriented box detection ; Attention mechanism ; Multi-scale feature fusion ; Gaussian distribution representation ; Bhattacharyya distance
中文摘要：	︿近年来，具有高空间分辨率的遥感图像获取更加便捷，借助深度学习技术对大规模遥感图像进行快速分析已成为共识。在各种遥感图像分析任务中，旋转框目标检测任务因在无人机航拍和交通监等领域的重要应用价值而受到关注。然而由于遥感目标尺度、朝向和纵横比的显著变化，将传统水平目标检测方法直接迁移到该任务往往有严重的性能瓶颈。虽然可通过引入旋转RoI等策略取得一定效果，但在复杂场景(如目标尺度差异较大、目标密集排列和小目标等)下，现有的方法仍存在明显不足。此外，大多数模型在旋转框检测的定位子任务上采用基于回归的损失函数，这不仅与IoU评价标准存在偏差，同时旋转框表示法的定义方式也会导致边界位置不连续问题，进而影响中心点精度和相同长宽比但不同缩放目标的适应性。本研究侧重于解决复杂场景下检测性能不足和中心点偏差较大问题，从模型结构和损失函数方面分别结合注意力机制和高斯分布表示来提升检测性能。主要的研究内容和成果如下：（1）针对上述复杂场景下检测效果的不足，本研究提出一种多策略融合的优化模块，即DABiFPN。该模块包含了BiFPN、空间-通道注意力机制、可变形卷积多种优化策略。DABiFPN通过优化多尺度特征融合方式，使高层次特征具备丰富的空间语义信息，借助注意力机制在空间和通道两个维度上来获得高效的目标表示。同时，利用可变形卷积捕获目标物体的局部变形和细节信息，以实现对不同尺度目标的特征自适应。（2）针对中心点偏差较大和尺度适应性较差问题，本研究提出一种新的损失函数，称为GBD损失。该损失函数将旋转框建模为高斯分布，并利用巴氏距离来评估两个分布的偏差。相较于其它损失函数，GBD损失不仅与IoU评价标准高度一致,而且具备更优的中心点优化能力、尺度不变性和几何解释性，能够很好解决旋转框边界不连续和适应性差的问题。（3）我们在公开的遥感图像旋转框检测基准数据集DOTA-v1.0和HRSC2016上对上述改进方法的有效性进行评估，评估过程中，基线网络选用Rotated RetinaNet架构，并同时使用Swin Transformer模型作为其骨干网络。在融合DABiFPN模块后，该模型在DOTA-v1.0和HRSC2016数据集上的mAP分别达到76.77%和90.10%, 相比基线网络分别提升8.37%和4.60%，超过了众多一阶段和两阶段检测方法。当在基线网络仅融合GBD损失时，在两个数据集上可达到73.48%和88.50%，当同时融合DABiFPN模块和使用GBD损失时，检测精度可进一步达到77.43%和90.70%。实验结果证实DABiFPN模块和GBD损失在提升复杂场景下的检测精度和缓解预测中心点偏差较大和在同长宽比和不同缩放适应性较差问题有着显著作用。﹀
外文摘要：	︿ In recent years, acquiring remote sensing images with high spatial resolution has become more convenient, and it is now a consensus to utilize deep learning techniques to analyze large-scale remote sensing images rapidly. Among various tasks in remote sensing image analysis, the task of oriented bounding box detection has garnered attention due to its significant application value in drone aerial photography and traffic monitoring. However, due to the significant variations in scale, orientation, and aspect ratio of remote sensing targets, directly transferring traditional horizontal target detection methods to this task often encounters severe performance bottlenecks. While certain strategies such as introducing rotating RoI have demonstrated some effectiveness, existing methods still exhibit notable deficiencies in complex scenarios (such as significant differences in target scale, dense target arrangements, and small targets). Additionally, most models adopt regression-based loss functions for the localization subtask of this task, which not only deviates from the IoU evaluation standard but also leads to issues such as discontinuous boundary positions due to the definition of the rotating box representation, thereby affecting center point accuracy and adaptability to targets with the same aspect ratio but different scales. This study focuses on addressing deficiencies in detection performance under complex scenarios and significant deviations in center points. It does so by incorporating attention mechanisms and Gaussian distribution representation into both model structure and loss functions to enhance detection performance. The primary research content and achievements are listed below: (1) To address the aforementioned deficiencies in detection performance under complex scenarios, this study proposes a multi-strategy fusion optimization module, namely DABiFPN. This module encompasses various optimization strategies including BiFPN, spatial-channel attention mechanisms, and deformable convolutions. DABiFPN optimizes multi-scale feature fusion to endow high-level features with rich spatial semantic information and leverages attention mechanisms to efficiently obtain target representations in both spatial and channel dimensions. Simultaneously, it employs deformable convolutions to capture local deformations and fine-grained information of target objects, thereby achieving feature adaptability for targets of varying scales. (2) To tackle issues of significant deviations in center points and poor scale adaptability, this study proposes a new loss function termed GBD loss. This loss function models rotated boxes as Gaussian distributions and utilizes Bhattacharyya distance to evaluate the deviation between two distributions. Compared to other loss functions, GBD loss not only highly aligns with the IoU evaluation standard but also possesses superior capabilities in optimizing center points, scale invariance, and geometric interpretability, effectively addressing issues of discontinuous box boundaries and poor adaptability. (3) We evaluated the effectiveness of the proposed improvement methods on publicly available remote sensing image oriented box detection benchmark datasets DOTA-v1.0 and HRSC2016. In the evaluation process, the baseline network adopts the Rotated RetinaNet architecture, with the Swin Transformer model used as its backbone network. After integrating the DABiFPN module, the model achieves mAPs of 76.77% and 90.10% on the DOTA-v1.0 and HRSC2016 datasets respectively, representing improvements of 8.37% and 4.60% over the baseline network and surpassing numerous single-stage and two-stage detection methods. When only integrating GBD loss into the baseline network, mAPs of 73.48% and 88.50% are attained on the two datasets respectively. However, when simultaneously integrating the DABiFPN module and employing GBD loss, detection accuracy further increases to 77.43% and 90.70% respectively. Experimental results confirm the significant role played by the DABiFPN module and GBD loss in enhancing detection accuracy under complex scenarios and mitigating issues related to significant deviations in predicted center points and poor adaptability to targets with the same aspect ratio but different scales. ﹀
参考文献总数：	85
馆藏号：	硕081203/24011
开放日期：	2025-06-22

附件下载