查看论文信息

查看全文

查看论文信息

中文题名：	基于深度神经网络的高维稀疏建模及其应用
姓名：	张欣
保密级别：	公开
论文语种：	chi
学科代码：	0714Z2
学科专业：	应用统计
学生类型：	博士
学位：	理学博士
学位类型：	学术学位
学位年度：	2023
校区：	北京校区培养
学院：	统计学院
研究方向：	应用统计
第一导师姓名：	赵俊龙
第一导师单位：	统计学院
提交日期：	2023-06-26
答辩日期：	2023-05-24
外文题名：	Deep Neural Network based High-dimensional Sparse Modeling and Its Applications
中文关键词：	高维稀疏建模 ; 深度神经网络 ; 非线性组变量选择 ; 非线性降维 ; 异常检测 ; 稀疏卷积 ; 稀疏视觉变换器
外文关键词：	High-dimensional sparse modeling ; deep neural network ; nonlinear group variable selection ; nonlinear dimensional reduction ; anomaly detection ; sparse convolution ; sparse visual transformer
中文摘要：	︿高维稀疏建模的核心目的是从高维数据中挖掘感兴趣的低维信息,以大幅减少待估参数的个数.按照是否存在响应,高维稀疏建模可分为有监督和无监督两种类型.典型的有监督建模包括分类和回归问题,而无监督建模则旨在挖掘数据的内在规律而无需标签信息. 目前,该领域已经有了基于传统统计学理论的丰硕成果. 然而,现存方法多基于线性模型进行高维稀疏建模,与真实世界中广泛存在的非线性情形并不相符.尽管一些边际方法可以用于建模非线性系统,但却忽略了变量之间的相关性. 近年来,深度神经网络由于其强大的函数近似和特征表示能力,已经成为一种符合时代发展潮流的统计分析方法.此外,随着数据科学领域的发展,一些复杂的高维场景如图像、视频分析等已无法通过简单模型获得令人满意的结果,因此考虑更为复杂的深度神经网络模型已成为该领域的流行趋势. 本文的研究主体是基于深度神经网络及其复杂变体的高维稀疏建模. 第二章研究了高维稀疏数据有监督建模中的组变量选择问题. 组变量选择是高维稀疏建模中的一个重要问题,现存的大多数方法只考虑了线性模型.本章提出了一个组稀疏神经网络(GSNN)来恢复数据真实的底层系统,其适用于一般的非线性模型,包括作为特例的线性模型.进一步提出了一个两阶段组稀疏(TGS)算法,通过对GSNN网络结构的选择来诱导组变量选择. GSNN对恢复复杂非线性系统中高相关和具有交互的预测因子是很有希望的,这克服了线性或边际方法的缺点.本章建立了关于模型近似误差和算法收敛性的理论结果.模拟结果和真实数据分析均表明了所提方法的有效性. 第三章研究了高维稀疏数据无监督建模中的降维及异常检测问题. 马氏平方距离在多变量数据的无监督异常检测中很受欢迎,但也暴露出一些缺点. 第一,仅考虑协变量之间简单的线性相关严重限制了异常检测的性能.第二,异常观测对经典的位置和散布估计的影响导致了``掩蔽"和``淹没"问题.第三,由于真实世界数据的分布是未知的和复杂的,因此很难从理论上给出划定异常的临界值. 现有的工作很少能够同时处理这些问题. 本章提出了一个两步法来生成稳健的深度马氏平方距离(RdeepMSD),其中的非线性部分由深度神经网络实现.第一步的深度降维使RdeepMSD在高维设置下的异常检测中尤其有效,第二步中位置和散布参数被稳健地估计以更好地识别异常. 广泛的模拟和实际数据分析表明了所提方法的优越性. 第四章研究了基于卷积神经网络的高维稀疏建模问题. 卷积神经网络通过多个可学习的卷积核对图像进行卷积操作,以提取图像特征用于指定任务.本章定义了空间相关滤波器(SCF)用来研究区域化变量的空间变化特征和强度.与现有文献中指定理论模型的空间信息表达方法不同, SCF是数据驱动的、可解释的,且考虑了多尺度和多方向的空间信息.进一步构建了一个SCF模型,其将空间相关滤波器与图像进行一步卷积来提取特征并用于分类.考虑到多通道图像数据的特征稀疏性,提出了SCF模型的一个稀疏可学习版本SparseSCFNet,该网络利用空间相关滤波器作为卷积稀疏编码层的卷积核.实验结果表明了所提方法的优越性能. 第五章进一步考虑了基于复杂神经网络的高维稀疏建模问题. 视觉变换器 (ViT)模型具有对图像全局特征的提取能力,但其存在严重的过度平滑效应,且需要大量的训练样本来避免过拟合.鉴于此,提出了一种融合对比学习的稀疏视觉变换器模型(SparseCViT),其优化的目标函数中额外增加了有监督的对比损失、无监督的对比损失以及关于所提线性多头自注意力中权值的正则化项.具体来说,引入有监督的对比损失可以减少类内特征差异并增加类间特征差异,而无监督的对比损失则通过重构全局结构信息来减少特征冗余.特别地,正则化诱导的网络稀疏性进一步缓解了过度平滑和过拟合问题.实验结果表明SparseCViT在多个高光谱图像数据集上具有优越性. ﹀
外文摘要：	︿ The core purpose of high-dimensional sparse modeling is to mine low-dimensional information of interest from high-dimensional data, which can significantly reduce the number of parameters to be estimated. High-dimensional sparse modeling can be classified into supervised and unsupervised types based on the presence or absence of response. Typical supervised modeling includes classification and regression tasks, while unsupervised modeling aims to mine the data for intrinsic patterns without label information. Currently, fruitful results have been achieved in the field of high-dimensional sparse modeling based on traditional statistical theory. However, most existing methods for high-dimensional sparse modeling are based on linear models, which are not applicable to the nonlinear situations prevailing in the real-world. Although marginal methods can model nonlinear systems, they ignore the correlation between variables. In recent years, Deep Neural Network (DNN) has become a popular statistical analysis method in line with the trend of the times because of their powerful function approximation and feature representation capabilities. Furthermore, with the development of the data science field, some complex high-dimensional scenarios, such as image and video analysis, can no longer obtain satisfactory results by simple models. Therefore, considering more complex DNN models has become a popular trend in this field. This paper focuses on high-dimensional sparse modeling based on DNN and its complex variants. Chapter 2 focuses on the group variable selection problem in supervised modeling of high-dimensional sparse data. Group variable selection is a critical issue in high-dimensional sparse modeling, and most existing methods consider only linear models. To address this issue, we propose a Group Sparse Neural Network (GSNN) that can recover the true underlying system of the data, which is applicable to general nonlinear models, including linear models as special cases. Furthermore, a Two-stage Group Sparse (TGS) algorithm is proposed to induce group variable selection by selecting the network structure. GSNN is promising for recovering predictors with interaction and high correlation in complex nonlinear systems, overcoming the drawbacks of linear or marginal methods. Theoretical results in terms of model approximation error and algorithm convergence are established in this chapter. Both simulation results and real data analysis show the effectiveness of the proposed method. Chapter 3 focuses on dimensional reduction and anomaly detection in unsupervised modeling of high-dimensional sparse data. The Mahalanobis square distance is popular for unsupervised anomaly detection of multivariate data, but it has some drawbacks. First, considering only simple linear correlations between covariates severely limits anomaly detection performance. Second, anomalous observations have a significant impact on classical location and scatter estimates, leading to ``masking" and ``swapping" problems. Third, because the distribution of real-world data is unknown and complex, it is difficult to give a theoretical critical value for delineating anomalies. Existing work has rarely been able to address these problems simultaneously. This chapter proposes a two-step method to generate Robust Deep Mahalanobis Squared Distance (RdeepMSD), where the nonlinear proposal is implemented by a deep neural network. The deep dimensionality reduction in the first step makes RdeepMSD particularly effective in anomaly detection under high-dimensional settings, while the location and scatter factors are robustly estimated in the second step to better identify anomalies. Extensive simulations and practical data analysis demonstrate the superiority of the proposed method. Chapter 4 investigates the problem of high-dimensional sparse modeling based on convolutional neural networks. Convolutional neural networks perform convolution operations on images through multiple learnable convolution kernels, so as to extract features for a specified task. In this chapter, a Spatial Correlation Filter (SCF) is defined to study the spatially varying features and intensities of the regionalized variable. Unlike the spatial information representation of the specified theoretical models in the existing literature, the SCF is data-driven, interpretable, and takes into account multi-scale and multi-directional spatial information. An SCF model is further constructed to extract features for classification by performing a one-step convolution operation on the image using SCF. Considering the feature sparsity of multi-channel image data, a sparse and learnable version of the SCF model (SparseSCFNet) is proposed. SparseSCFNet utilizes SCF as the convolution kernel of the convolutional sparse coding layer. The experimental results show the superior performance of the proposed method. Chapter 5 delves further into the problem of high-dimensional sparse modeling based on complex neural networks. Visual Transformer (ViT) is capable of extracting global features of images, but it suffers from severe over-smoothing effects and requires a large number of training samples to avoid over-fitting. In view of this, a sparse visual transformer model with contrastive learning (SparseCViT) is proposed. The optimization objective function of SparseCViT includes an additional supervised contrastive loss, an unsupervised contrastive loss, and a regularization term on the linear multi-head self-attentive parameter. Specifically, the introduction of supervised contrastive loss reduces intra-class feature differences and increases inter-class feature differences, while unsupervised contrastive loss reduces feature redundancy by reconstructing global structural information. Furthermore, the regularization-induced network sparsity further alleviates the over-smoothing and over-fitting problems. The experimental results on four hyperspectral datasets demonstrate the superiority of SparseCViT. ﹀
参考文献总数：	300
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博0714Z2/23004
开放日期：	2024-06-26

附件下载