查看论文信息

题名：	基于梯度特征归因的神经网络可解释性
作者：	彭煜
保密级别：	公开
语种：	chi
学科代码：	071201
学科：	统计学
学生类型：	学士
学位：	理学学士
学位年度：	2024
校区：	北京校区培养
学院：	统计学院
导师姓名：	朱力行
导师单位：	统计学院
提交日期：	2024-06-15
答辩日期：	2024-05-16
外文题名：	Interpretability of Neural Networks Based on Gradient Feature Attribution
关键词：	可解释人工智能 ; 特征归因 ; 梯度 ; 神经网络
外文关键词：	Explainable Artificial Intelligence ; Feature Attribution ; Gradient ; Neural Network
摘要：	︿人工智能模型的黑盒特性意味着其内部结构和决策机理不透明，导致可靠性受到损害。因此，模型可解释性成为了人工智能领域中的一个重要议题。鉴于神经网络在人工智能研究中的广泛应用，全文聚焦于神经网络可解释性的一个特定领域：基于梯度的特征归因。首先，本文介绍了可解释性的细致分类，对特征归因领域的研究进行综述。其次，本文将对一些经典的基于梯度的特征归因方法展开讨论，从方法原理、适用场景、关联性、局限性和改进拓展等多方面进行探讨。然后，通过训练神经网络实现了图像分类任务，生成了Grad-CAM热力图。这些热力图能够定位神经网络在做出特定分类决策时关注的图像区域，从而实现了对神经网络决策过程的可视化解释。最后，结合现有研究探讨了该领域存在的评估准则和算法治理问题。﹀
外文摘要：	︿ The black-box nature of artificial intelligence models implies that their internal structures and decision mechanisms are opaque, leading to compromised reliability. Therefore, model interpretability has become a significant topic in the field of artificial intelligence. Given the widespread use of neural networks in AI research, this paper focuses entirely on a specific domain applicable to neural network interpretability: gradient-based feature attribution. First of all, the paper introduces a detailed classification of interpretability and reviews research in the field of feature attribution. Then, the paper discusses several classical gradient-based feature attribution methods, exploring aspects such as method principles, application scenarios, correlations, limitations, and potential enhancements. Besides, after training neural networks to perform image classification tasks, we generate Grad-CAM heatmaps which can pinpoint the areas of images that neural networks focus on when making specific classification decisions, thereby providing visual explanations of neural network decision-making processes. Lastly, the paper discusses evaluation criteria and algorithm governance issues in this field. ﹀
参考文献总数：	35
馆藏号：	本071201/24046
开放日期：	2025-06-15

附件下载