- 无标题文档
查看论文信息

中文题名:

 基于图注意力网络的特征匹配研究与实现    

姓名:

 吴敏恺    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 085212    

学科专业:

 软件工程    

学生类型:

 硕士    

学位:

 工程硕士    

学位类型:

 专业学位    

学位年度:

 2021    

校区:

 北京校区培养    

学院:

 人工智能学院    

研究方向:

 计算机视觉    

第一导师姓名:

 胡晓雁    

第一导师单位:

 北京师范大学人工智能学院    

提交日期:

 2021-06-25    

答辩日期:

 2021-06-05    

外文题名:

 Research and Implementation of Feature Matching Based on Graph Attention Network    

中文关键词:

 图注意力网络 ; 计算机视觉 ; 对极几何 ; 特征匹配    

外文关键词:

 Graph attention network ; Computer vision ; Epipolar geometry ; Feature matching    

中文摘要:
 

鲁棒且精确的特征匹配是计算机视觉中的一个基本任务,随着技术的不断发展,研究人员对特征匹配算法也提出了更高的要求,传统的手工设计的特征提取算法已经无法满足各种视觉任务的需求。最近基于深度学习的特征检测和匹配算法在环境变化的情况下展现了其较强的鲁棒性,但目前该任务面临真实匹配点标签难以获取,以及由于下采样、池化等操作导致提取的关键点位置不精确等问题。

针对特征匹配点的真实对应关系难以获取的问题,本文提出一种端到端的特征点检测匹配网络架构。该网络架构仅依赖相机的位姿作为监督信息,不再需要真实的匹配点标签。该网络首先通过卷积神经网络确定关键点的位置,从特征图中提取低层的几何信息和高层的语义信息。之后通过位置编码和注意力机制建立了图像内以及跨图像特征点之间的联系,利用图神经网络进行消息传递并更新特征点的表示。利用可微分的匹配算法求得对应的匹配点并估计其权重,最后使用加权的算法计算两幅视图之间的基本矩阵并分解获取相机的位姿。仅利用相机位姿的弱监督,使得整个网络架构能够在更大的数据集上进行训练,增强神经网络的泛化能力,在现有的数据集上取得了较好的效果。

针对卷积神经网络提取的特征点位置信息不精确的问题,本文提出了一种假设,认为精确的关键点大概率分布在匹配点周围。基于该假设设计了一种基于图注意力网络的关键点优化模型,该模型可以一定程度上解决关键点位置不精确的问题,通过实验证明了该优化网络的有效性,并能够方便地集成到现有的特征提取网络框架中。

本文将计算机视觉中传统的多视图几何与基于深度学习的神经网络相结合,既利用了多视图几何中明确的数学表示,又利用了数据驱动的神经网络的学习泛化能力,具有一定的应用价值和工程意义。

外文摘要:
 

Establishing robust and accurate feature matching is a basic task in computer vision. With the development of technology, researchers have also put forward higher requirements for feature matching algorithms. Traditional hand-designed feature extraction algorithms can no longer meet the needs of various visual tasks. Recently, feature detection and matching algorithms based on deep learning have demonstrated their robust feature extraction capabilities under environmental changes. However, those algorithms also face difficulties in obtaining the ground truth. The inaccurate position of keypoints in the learning-based feature detection method also caused problems.

Aiming at the problem that the ground truth correspondence between feature points is difficult to obtain, this paper proposes an end-to-end feature detector and matching network architecture. The network architecture only relies on the pose of the camera as supervised information, and no longer needs the true correspondence between matching points. The network first determines the location of keypoints through a convolutional neural network and extracts its high-level semantic information and low-level geometric information from the feature map. The relationship between the keypoints in the image and across the image is established through the position-coding and attention mechanism. The graph neural network is used for message passing to update the representation of the feature points, and the differentiable matching algorithm is used to obtain the corresponding matching points and estimate their weights, use a weighted algorithm to calculate the basic matrices of the two views, and obtain the pose of the camera through matrix decomposition. Only using the weak supervision of the camera pose enables the entire network architecture to be trained on larger datasets, enhances the generalization ability of the neural network, and achieves better results on the existing datasets.

Aiming at solving the problem of inaccurate location information of the feature points extracted from the convolutional neural network, this paper proposes a hypothesis that the accurate keypoints are likely to be around the matching points. Based on this assumption, a keypoint optimization network model based on a graph attention network is designed. This model can solve the problem of inaccurate key point positions to a certain extent. Experiments have proved the effectiveness of the optimization network and can be easily integrated. Into the existing feature extraction network framework.

This paper combines the traditional multiview geometry in computer vision with the neural network based on deep learning. It not only uses the explicit mathematical representation in the multiview geometry but also uses the learning generalization ability of the data-driven neural network, which has certain application significance.

 

参考文献总数:

 72    

馆藏号:

 硕085212/21012    

开放日期:

 2022-06-25    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式