查看论文信息

查看全文

查看论文信息

中文题名：	基于多重检验的逐层分布外检测
姓名：	李嘉伟
保密级别：	公开
论文语种：	chi
学科代码：	025200
学科专业：	应用统计
学生类型：	硕士
学位：	应用统计硕士
学位类型：	专业学位
学位年度：	2024
校区：	珠海校区培养
学院：	统计学院
研究方向：	深度学习
第一导师姓名：	谢传龙
第一导师单位：	文理学院
提交日期：	2024-06-03
答辩日期：	2024-05-26
外文题名：	MULTITESTING-BASED LAYER-WISE OUT-OF-DISTRIBUTION DETECTION
中文关键词：	分布外检测 ; 多重假设检验 ; 特征融合
外文关键词：	Out-of-Distribution Detection ; Multiple Hypothesis Testing ; Feature Fusio
中文摘要：	︿在开放的现实世界中部署深度学习模型会面临着许多挑战，例如, 很可能会遇到各式各样与训练数据显著不同的测试输入。与训练数据的数据分布相比，在训练数据分布之外的测试样本可能会表现出局部或全局范围内的特征的变化，从而导致测试推理结果与真实结果大相径庭。截至目前，研究人员已经采取了多种方法来应对这种挑战，目的在于去区分异常的测试输入和与原始训练数据同分布的测试输入，我们将其称为分布外检测。然而，之前的大多数研究主要集中在预训练深度神经网络的输出层或倒数第二层，在本文中，我们提出了一种新的框架，即基于多重检验的逐层分布外检测。该框架通过严格的多重假设检验方法来识别不同神经网络层的特征级别的测试输入样本中的分布变化。我们的方法与现有方法的区别在于它不需要修改预训练分类器的结构或者微调，通过大量的实验，我们证明我们提出的框架可以与任何现有的基于距离的检测方法无缝集成，同时能够有效地利用不同深度的特征提取器。与基线方法相比，我们的方案有效地提高了分布外检测的性能，特别是，MLOD-Fisher 方法总体上达到了优越的性能。当在 CIFAR-10 数据集上使用KNN 方法进行训练时，与仅利用倒数第二层的特征相比，MLOD-Fisher 方法在 95% 的真阳性率下的平均假阳性率从 24.09% 显著的降低至 7.47%。同时，在更复杂的 CIFAR-100数据集上，MLOD-Fisher 方法在 95% 的真阳性率下的平均假阳性率从 67.84% 显著的降低至 34.76%；平均 AUC 指标则从 83.37% 提升至 93.27%。﹀
外文摘要：	︿ Deploying deep learning models in the open real world will face many challenges. For example, it is likely to encounter a variety of test inputs that are significantly different from the training data. Compared with the data distribution of the training data, test samples outside the training data distribution may show changes in characteristics on a local or global scale, resulting in test inference results that are significantly different from the true results. So far, researchers have adopted a variety of methods to deal with this challenge, aiming to distinguish abnormal test inputs from test inputs that are distributed the same as the original training data, which we call out-of-distribution detection. However, most previous studies mainly focus on the output layer or the penultimate layer of pre-trained deep neural networks. In this paper, we propose a new framework for layer-by-layer out-of-distribution detection based on multiple testing. The framework identifies distribution changes in test input samples at the feature level of different neural network layers through a rigorous multiple hypothesis testing approach. The difference between our method and existing methods is that it does not require modifying the structure or fine-tuning of the pre-trained classifier. Through extensive experiments, we demonstrate that our proposed framework can be seamlessly integrated with any existing distance-based detection method, while Able to effectively utilize feature extractors of different depths. Compared with baseline methods, our scheme effectively improves the performance of out-of-distribution detection. In particular, the MLOD-Fisher method achieves superior performance overall. When trained using the KNN method on the CIFAR-10 dataset, the average false positive rate of the MLOD-Fisher method at a true rate of 95% significantly increased from 24.09% to 24.09% compared to only utilizing features from the penultimate layer. dropped to 7.47%. At the same time, on the more complex CIFAR-100 data set, the average false positive rate of the MLOD-Fisher method at a true rate of 95% was significantly reduced from 67.84% to 34.76%; the average AUC index was from 83.37% increased to 93.27%. ﹀
参考文献总数：	47
馆藏地：	总馆B301
馆藏号：	硕025200/24025Z
开放日期：	2025-06-04

附件下载