- 无标题文档
查看论文信息

中文题名:

 基于树状图自注意力机制的图像语义分割研究    

姓名:

 李万琪    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 070104    

学科专业:

 应用数学    

学生类型:

 硕士    

学位:

 理学硕士    

学位类型:

 学术学位    

学位年度:

 2024    

校区:

 北京校区培养    

学院:

 数学科学学院    

研究方向:

 模糊数学与人工智能    

第一导师姓名:

 王加银    

第一导师单位:

 数学科学学院    

提交日期:

 2024-06-17    

答辩日期:

 2024-05-18    

外文题名:

 Image Semantic Segmentation Based on Tree-based Self-attention Mechanism    

中文关键词:

 图像语义分割 ; 最小生成树 ; 自注意力机制 ; 点监督 ; scribble监督    

外文关键词:

 Image semantic segmentation ; Minimum spanning tree ; Self-attention ; Point supervision ; Scribble supervision    

中文摘要:

图像语义分割是计算机视觉领域的重要任务之一。在语义分割问题中引入注意力机制可以帮助网络更好地关注图像中与任务相关的区域,但大多数现有的注意力机制通常只关注局部区域的信息,忽略了像素之间的空间结构信息。然而,像素之间的空间结构信息在语义分割任务中起着至关重要的作用,它可以帮助网络更好地理解像素之间的关联性。

针对上述问题,本文提出了基于树状图自注意力机制的图像语义分割方法,并取得了以下主要成果:

首先,本文提出了一种基于树状图的自注意力机制,利用了最小生成树的结构特性,增强自注意力机制对空间结构信息的捕捉。该方法在特征图上构建无向图,利用特征值之间的差异作为权重构建最小生成树,建立起特征图内的长距离依赖关系。同时,结合自注意力机制学习像素之间的全局依赖关系,动态调整节点之间的信息传递。实验结果表明,该方法在图像语义分割任务中取得了优异性能,实现了更高的分割准确性,同时消融实验验证了网络设计的合理性。

其次,本文提出了一种稀疏注释下基于树状图生成伪标签的语义分割方法,利用树状图结构为未标记的像素分配合理的伪标签。分别在低阶和高阶特征图上建立最小生成树,模拟低级和高级相关性,以此建立起有标记像素与未标记像素间的关联,再将网络预测结果分别经过低阶和高阶相关性滤波,从而为未标记的像素生成可靠的伪标签,并利用伪标签增强稀疏注释下的监督。实验结果表明,该方法在点监督和scribble监督下预测准确性均有提高,系列消融实验验证了方法设计的合理性。

综上所述,本文提出的基于树状图自注意力机制的语义分割方法能有效捕捉图像结构信息,提高图像语义分割性能。在稀疏注释下的语义分割充分利用未标记的像素,实现了更高效的网络训练。

外文摘要:

Image semantic segmentation is one of the important tasks in the field of computer vision. Introducing attention mechanisms in semantic segmentation problems can help networks better focus on regions in the image relevant to the task. However, most existing attention mechanisms typically only focus on local regions of information, neglecting the spatial structural information between pixels. Nevertheless, spatial structural information between pixels plays a crucial role in semantic segmentation tasks as it can help the network better understand the relationships between pixels.

In response to the above issues, this paper proposes an image semantic segmentation method based on a tree-structured self-attention mechanism and achieves the following main results:

Firstly, a tree-structured self-attention mechanism is proposed in this paper, utilizing the structural properties of minimum spanning trees to enhance the capturing of spatial structural information by the self-attention mechanism. This method constructs an undirected graph on the feature map, using differences between feature values as weights to build a minimum spanning tree, establishing long-distance dependencies within the feature map. Additionally, combining the self-attention mechanism to learn global dependencies between pixels and dynamically adjusting information propagation between nodes. Experimental results demonstrate that this method achieves outstanding performance in image semantic segmentation tasks, achieving higher segmentation accuracy, while ablation studies confirm the rationality of the network design.

Secondly, addressing semantic segmentation problems under sparse annotations, this paper proposes a pseudo-label generation method based on a tree-structured approach, using the tree structure to assign reasonable pseudo-labels to unlabeled pixels. By establishing minimum spanning trees on low-order and high-order feature maps to simulate low-level and high-level correlations, associations between labeled and unlabeled pixels are established. Subsequently, the network prediction results are filtered through low-order and high-order correlations to generate reliable pseudo-labels for unlabeled pixels, enhancing supervision under sparse annotations. Experimental results show improvements in prediction accuracy under point supervision and scribble supervision, and a series of ablation studies validate the rationality of the method design.

In conclusion, the proposed image semantic segmentation method based on a tree-structured self-attention mechanism effectively captures image structural information, improving the performance of image semantic segmentation. Leveraging unlabeled pixels under sparse annotations enables more efficient network training.

参考文献总数:

 52    

馆藏号:

 硕070104/24001    

开放日期:

 2025-06-17    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式