中文题名: | 基于深度神经网络的高维稀疏建模及其应用 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0714Z2 |
学科专业: | |
学生类型: | 博士 |
学位: | 理学博士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 应用统计 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-26 |
答辩日期: | 2023-05-24 |
外文题名: | Deep Neural Network based High-dimensional Sparse Modeling and Its Applications |
中文关键词: | |
外文关键词: | High-dimensional sparse modeling ; deep neural network ; nonlinear group variable selection ; nonlinear dimensional reduction ; anomaly detection ; sparse convolution ; sparse visual transformer |
中文摘要: |
高维稀疏建模的核心目的是从高维数据中挖掘感兴趣的低维信息,以大幅减少待估参数的个数.按照是否存在响应,高维稀疏建模可分为有监督和无监督两种类型.典型的有监督建模包括分类和回归问题,而无监督建模则旨在挖掘数据的内在规律而无需标签信息. 第二章研究了高维稀疏数据有监督建模中的组变量选择问题. 第四章研究了基于卷积神经网络的高维稀疏建模问题. |
外文摘要: |
The core purpose of high-dimensional sparse modeling is to mine low-dimensional information of interest from high-dimensional data, which can significantly reduce the number of parameters to be estimated. High-dimensional sparse modeling can be classified into supervised and unsupervised types based on the presence or absence of response. Typical supervised modeling includes classification and regression tasks, while unsupervised modeling aims to mine the data for intrinsic patterns without label information. Currently, fruitful results have been achieved in the field of high-dimensional sparse modeling based on traditional statistical theory. However, most existing methods for high-dimensional sparse modeling are based on linear models, which are not applicable to the nonlinear situations prevailing in the real-world. Although marginal methods can model nonlinear systems, they ignore the correlation between variables. In recent years, Deep Neural Network (DNN) has become a popular statistical analysis method in line with the trend of the times because of their powerful function approximation and feature representation capabilities. Furthermore, with the development of the data science field, some complex high-dimensional scenarios, such as image and video analysis, can no longer obtain satisfactory results by simple models. Therefore, considering more complex DNN models has become a popular trend in this field. This paper focuses on high-dimensional sparse modeling based on DNN and its complex variants. Chapter 2 focuses on the group variable selection problem in supervised modeling of high-dimensional sparse data. Group variable selection is a critical issue in high-dimensional sparse modeling, and most existing methods consider only linear models. To address this issue, we propose a Group Sparse Neural Network (GSNN) that can recover the true underlying system of the data, which is applicable to general nonlinear models, including linear models as special cases. Furthermore, a Two-stage Group Sparse (TGS) algorithm is proposed to induce group variable selection by selecting the network structure. GSNN is promising for recovering predictors with interaction and high correlation in complex nonlinear systems, overcoming the drawbacks of linear or marginal methods. Theoretical results in terms of model approximation error and algorithm convergence are established in this chapter. Both simulation results and real data analysis show the effectiveness of the proposed method. Chapter 3 focuses on dimensional reduction and anomaly detection in unsupervised modeling of high-dimensional sparse data. The Mahalanobis square distance is popular for unsupervised anomaly detection of multivariate data, but it has some drawbacks. First, considering only simple linear correlations between covariates severely limits anomaly detection performance. Second, anomalous observations have a significant impact on classical location and scatter estimates, leading to ``masking" and ``swapping" problems. Third, because the distribution of real-world data is unknown and complex, it is difficult to give a theoretical critical value for delineating anomalies. Existing work has rarely been able to address these problems simultaneously. This chapter proposes a two-step method to generate Robust Deep Mahalanobis Squared Distance (RdeepMSD), where the nonlinear proposal is implemented by a deep neural network. The deep dimensionality reduction in the first step makes RdeepMSD particularly effective in anomaly detection under high-dimensional settings, while the location and scatter factors are robustly estimated in the second step to better identify anomalies. Extensive simulations and practical data analysis demonstrate the superiority of the proposed method. Chapter 4 investigates the problem of high-dimensional sparse modeling based on convolutional neural networks. Convolutional neural networks perform convolution operations on images through multiple learnable convolution kernels, so as to extract features for a specified task. In this chapter, a Spatial Correlation Filter (SCF) is defined to study the spatially varying features and intensities of the regionalized variable. Unlike the spatial information representation of the specified theoretical models in the existing literature, the SCF is data-driven, interpretable, and takes into account multi-scale and multi-directional spatial information. An SCF model is further constructed to extract features for classification by performing a one-step convolution operation on the image using SCF. Considering the feature sparsity of multi-channel image data, a sparse and learnable version of the SCF model (SparseSCFNet) is proposed. SparseSCFNet utilizes SCF as the convolution kernel of the convolutional sparse coding layer. The experimental results show the superior performance of the proposed method. Chapter 5 delves further into the problem of high-dimensional sparse modeling based on complex neural networks. Visual Transformer (ViT) is capable of extracting global features of images, but it suffers from severe over-smoothing effects and requires a large number of training samples to avoid over-fitting. In view of this, a sparse visual transformer model with contrastive learning (SparseCViT) is proposed. The optimization objective function of SparseCViT includes an additional supervised contrastive loss, an unsupervised contrastive loss, and a regularization term on the linear multi-head self-attentive parameter. Specifically, the introduction of supervised contrastive loss reduces intra-class feature differences and increases inter-class feature differences, while unsupervised contrastive loss reduces feature redundancy by reconstructing global structural information. Furthermore, the regularization-induced network sparsity further alleviates the over-smoothing and over-fitting problems. The experimental results on four hyperspectral datasets demonstrate the superiority of SparseCViT. |
参考文献总数: | 300 |
馆藏地: | 图书馆学位论文阅览区(主馆南区三层BC区) |
馆藏号: | 博0714Z2/23004 |
开放日期: | 2024-06-26 |