查看论文信息

查看全文

查看论文信息

中文题名：	面向特定图像任务的基于局部与非局部变分正则化学习型方法研究
姓名：	孟俊英
保密级别：	公开
论文语种：	chi
学科代码：	070102
学科专业：	计算数学
学生类型：	博士
学位：	理学博士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	数学科学学院
研究方向：	图像处理与深度学习
第一导师姓名：	刘君
第一导师单位：	数学科学学院
提交日期：	2024-06-13
答辩日期：	2024-05-17
外文题名：	Research on Learning Methods Based on Local and Nonlocal Variational Regularization for Specific Image Tasks
中文关键词：	非局部正则化 ; 卷积神经网络 ; 变分模型 ; 图像去噪 ; 图像分割
外文关键词：	Nonlocal regularization ; Convolutional neural networks ; Variational models ; Image denoising ; Image segmentation
中文摘要：	︿图像去噪和图像分割是数字图像处理的基本问题。基于变分法的去噪和分割模型通常根据任务需求设计正则化先验。在去噪问题中，为了使去噪图像具有更清晰的边界，常用的正则化方法包括全变差(Total Variation, TV)正则和非凸非光滑全变差(Nonconvex Nonsmooth Total Variation, NNTV)正则等。通过正则化模型的一致下界性质，研究者们为 NNTV 正则模型生成具有清晰边界的去噪图像提供了理论解释。为了保持具有周期规律的纹理细节，研究者们提出了基于图像非局部自相似性的正则化方法，包括非局部全变差(Nonlocal Total Variation, NLTV)正则和块非局部全变差(Block Nonlocal Total Variation, BNLTV)正则等。尽管这些方法在保留图像的纹理细节方面表现良好，但在数学理论方面，尚未有公认的关于非局部正则化理论解释该现象。在图像分割方面，存在许多经典的变分分割模型。然而，对于一些具有挑战性的数据，例如对比度低、边界不清晰的图像，传统的变分模型往往难以实现精准分割。近年来，数据驱动的卷积神经网络(Convolutional Neural Network, CNN)能够从大量的图像数据中提取特征，在一定条件下能提供出色的分割结果。然而，目前大多 CNN 结构的设计依赖于经验等因素，其一些性质仍不清楚，可解释性相对较弱。此外，网络结构缺乏处理重要空间信息与其他先验的设计，导致在处理具有某些特定先验的图像数据时达不到理想的结果。针对以上这些问题，本文通过建立非局部正则化方法的一致下界理论，说明了该方法的优越性。进一步对图像的高维深度特征进行分析，并针对图像去噪和分割等特定任务，提出了结合空间局部与非局部正则先验的变分 CNN 结构。本文的主要研究内容如下: 被噪声污染严重的图像自相似结构被破坏。为了恢复清晰的自相似图像纹理结构，本文提出了一种基于图像块的非凸非光滑块非局部(Nonconvex Nonsmooth Block Nonlocal, NNBN)全变差正则模型。通过利用正则项中势函数的非凸非光滑性质，证明了基于图像块的非局部梯度具有一致下界性质。这在一定程度上为所提模型能够生成具有更清晰纹理和边界的去噪图像提供了数学理论解释。与一些经典正则化方法如 TV、NNTV、NLTV 和 BNLTV 相比，实验结果验证了所提出的方法在保持图像边界与重复纹理结构方面的优越性。当前的 CNN 结构缺乏对图像高维深度特征空间自相似性的数学刻画，这使得在某些特定图像任务中，网络的学习难以得到恰当先验的引导。本文采用 ParzenRosenblatt 窗口方法对高维深度特征的概率分布进行建模。通过特征对数概率先验的对偶表示，推导出了包含对偶变量非局部权的正则项。基于此非局部权，我们创建了非局部模块(NLM)，其本质是一种非局部自注意机制。与Transformer使用内积不同，NLM 使用加权的欧式距离来度量特征间的相似性。NLM 中权的学习相当于寻找合适的黎曼度量来衡量特征在流形上的相似度。这相当于从变分正则化角度解释了非局部自注意机制(Nonlocal selfattention mechanism)。结合上述正则我们建立变分模型，并采用算法展开(Unrolling)方法构建了可学习的非局部自相似性网络，即 LNSNet，用于图像去噪任务。实验结果表明，与相关的图像去噪方法相比，所提出的方法在图像去噪方面表现出有效性。基于图像尺度空间建立的变分分割模型通常在处理低对比度、边界不清晰等具有挑战性的图像数据集时，尤其是医学图像数据，面临一定的困难。本文利用特征提取器获取图像的高维深度特征。受经典的 MumfordShah(MS)分割模型启发，建立了结合阈值动力学(Threshold Dynamics, TD)正则的变分模型，用于图像特征分割。通过整合特征的分割结果，实现对原始图像的分割。该过程形成了一种轻量级的分割网络 MSMGNet，其具有编码器解码器的多尺度特征提取结构，并结合了自注意机制。MSMGNet 结构的底层数学模型已知，所提方法从变分正则化观点解释了当前流行的基于编码器解码器结构的 CNN。此外，MSMGNet 结合了来自变分模型的局部空间正则(TD 正则)先验，理论上保证了网络输出光滑的分割边界，对噪声具有一定鲁棒性。与类似的分割方法相比，在选定的低对比度或边界不清晰的数据集上的实验结果显示，所提出的方法能够以更少的参数实现更好的分割性能。本文针对特定图像任务，通过设计特征的非局部相似度量，建立了非局部正则化方法的一致下界理论。基于该理论将图像的局部与非局部正则先验信息融入到具有变分可解释性的 CNN 结构中。理论和数值实验均验证了所提方法的有效性。﹀
外文摘要：	︿ Image denoising and image segmentation are fundamental problems in digital image processing. Denoising and segmentation models based on variational methods are typically designed with regularization priors tailored to specific task requirements. In image denoising, commonly employed regularization methods for enhancing the clarity of boundaries in the denoised image include total variation (TV) regularization and nonconvex nonsmooth total variation (NNTV) regularization. Through the uniform lower bound theory of regularization models, researchers have provided a theoretical explanation for generating denoised images with clear boundaries using NNTV regularization. To preserve texture details with periodic patterns, researchers have proposed regularization methods based on nonlocal selfsimilarity in images, including Nonlocal Total Variation (NLTV) regularization and Block Nonlocal Total Variation (BNLTV) regularization. While these methods perform well in preserving texture details of images, there is currently no universally recognized mathematical theory explaining this phenomenon regarding nonlocal regularization. In the field of image segmentation, there are many classical variational segmentation models available. However, for some challenging data, such as images with low contrast and unclear boundaries, traditional variational models often struggle to achieve precise segmentation. In recent years, data-driven convolutional neural networks (CNNs) have been able to extract features from large amounts of image data and provide excellent segmentation results under certain conditions. However, the design of most CNN structures currently relies on empirical factors, and some of their properties remain unclear, resulting in relatively weak interpretability. Additionally, the lack of consideration for handling crucial spatial information and other priors in network structures leads to suboptimal results when dealing with image data possessing certain specific priors. Addressing these issues, we demonstrate the superiority of nonlocal regularization methods by establishing the uniform lower bound theory. Furthermore, an analysis of high-dimensional deep features of images is conducted, proposing a variational CNN structure that combines spatial local and nonlocal regularization priors for specific tasks such as denoising and segmentation. The main research contents of this paper are as follows: The self-similar structures of images heavily polluted by noise are disrupted. To restore the clear texture structure of images, we propose a Nonconvex Nonsmooth Block Nonlocal (NNBN) total variation regularization model based on image blocks. By utilizing the nonconvex nonsmooth nature of the potential function in the regularization term, it is demonstrated that the nonlocal gradient based on image blocks exhibits the uniform lower bound property. This provides a mathematical theory explanation to some extent for the ability of the proposed model to generate denoised images with clearer texture and boundaries. Experimental results confirm that the proposed method outperforms classic regularization techniques like TV, NNTV, NLTV, and BNLTV in preserving image boundaries and repetitive texture structures. The current CNN architectures lack a mathematical characterization of the self-similarity in high-dimensional deep feature space, which often leads to inadequate guidance for learning in certain specific image tasks. In this paper, we employ the Parzen-Rosenblatt window method to model the probability distribution of high-dimensional deep features. By considering the dual representation of log-probability priors of features, a regularization term incorporating dual variable nonlocal weights is derived. Based on these nonlocal weights, we construct a Nonlocal Module (NLM), which essentially functions as a nonlocal self-attention mechanism. Unlike Transformers which utilize dot product, NLM employs weighted Euclidean distance to measure the similarity between features. Learning the weights in NLM is akin to finding appropriate Riemannian metrics to measure feature similarity on the manifold. This provides an explanation of the nonlocal self-attention mechanism from the perspective of variational regularization. By integrating the aforementioned regularization, we establish a variational model and utilize the unrolling technique to construct a trainable non-local self-similarity network, namely LNSNet, for image denoising tasks. Experimental results indicate that compared to relevant image denoising methods, the proposed approach demonstrates effectiveness in image denoising. Variational segmentation models based on image scale space often face challenges when dealing with low contrast, unclear boundaries, and other challenging image datasets, especially in medical imaging data. In this paper, we employ a feature extractor to obtain high-dimensional deep features with more comprehensive semantic information. Inspired by the Classical Mumford-Shah (MS) segmentation model, a variational model combining Threshold Dynamics (TD) regularization is established for image feature segmentation. By integrating the segmentation results of features, segmentation of the original image is achieved. This process forms a lightweight segmentation network, MS-MGNet, which incorporates an encoder-decoder multi-scale feature extraction structure along with self-attention mechanisms. The underlying mathematical model of the MS-MGNet structure is known, and the proposed method explains current popular CNN architectures based on encoder-decoder structures from a variational regularization perspective. Additionally, MS-MGNet combines spatial regularization (TD regularization) priors from variational models, which theoretically ensure smooth segmentation boundaries and robustness to noise. Experimental results on selected datasets with low contrast or unclear boundaries demonstrate that the proposed method achieves better segmentation performance with fewer parameters compared to some similar segmentation methods. In this paper, we address specific image tasks by designing a nonlocal similarity measure for features and establishes the uniform lower bound theory of nonlocal regularization methods. Based on this theory, the local and nonlocal regularization prior information of images is integrated into a CNN structure with variational interpretability. The effectiveness of the proposed method is validated through both theoretical analysis and numerical experiments. ﹀
参考文献总数：	146
优秀论文：	北京师范大学优秀博士学位论文
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博070102/24006
开放日期：	2025-06-13

附件下载