查看论文信息

查看全文

查看论文信息

中文题名：	基于改进 MobileViT 的深度伪造人脸图像检测方法研究
姓名：	唐桢骅
保密级别：	公开
论文语种：	chi
学科代码：	070101
学科专业：	数学与应用数学
学生类型：	学士
学位：	理学学士
学位年度：	2024
校区：	北京校区培养
学院：	数学科学学院
研究方向：	人工智能
第一导师姓名：	刘君
第一导师单位：	数学科学学院
第二导师姓名：	唐云祁
提交日期：	2024-06-28
答辩日期：	2024-05-06
外文题名：	Research on Deepfake Facial Image Detection Method Based on Improved MobileViT
中文关键词：	深度伪造检测 ; 视觉转移模型 ; 卷积神经网络 ; 中心差值卷积 ; 轻量化
外文关键词：	Deepfake detection ; Transformer ; Convolutional Neural Networks ; Center Difference Convolution ; Light Weight
中文摘要：	︿基于深度学习的人脸伪造方法不断进步，对社会秩序，个人安全产生了较大的威胁。针对该技术的威胁，在移动终端上部署轻量化伪造检测模型具有一定的应用前景。单纯的Vision Transformer模型存在参数量大、算力需求高、小目标识别效果较差、过拟合风险高等问题，存在应用上的难点。本文研究使用Transformer代替部分标准卷积过程的模型MobileViT对全脸伪造图像的检测能力。为了适配深度伪造人脸图像的任务，本文对MobileViT_xxs进行如下修改：在处理原始图像的卷积层引入中心差值卷积(CDC)，并与传统Vanilla卷积加权相加以凸显局部伪影；将激活函数由SiLU函数修改为Hard-swish函数，以提升训练速度；将模型尾部的80输入通道、320输出通道的卷积层删除，以适应2分类任务。改进的模型在测试图像经过GANprintR伪造方法优化过的情况下达成了94.01%的准确率，比原始MobileViT_xxs模型高6.64%，同时在训练速度上略快于原始模型。证明本文模型针对深度伪造人脸检测任务的优越性。﹀
外文摘要：	︿ Face forgery methods based on deep learning continue to advance, posing a greater threat to social order and personal security. In response to the threats of this technology, deploying lightweight forgery detection models on mobile terminals has certain application prospects. The simple Vision Transformer model has problems such as a large number of parameters, high computing power requirements, poor small target recognition effect, and high risk of over-fitting, which poses application difficulties. This article studies the detection ability of MobileViT, a model that uses Transformer to replace part of the standard convolution process, on full-face forged images. In order to adapt to the task of deep fake face images, this article makes the following modifications to MobileViT_xxs: introducing central difference convolution (CDC) in the convolution layer that processes the original image, and weighting it with traditional vanilla convolution to highlight local artifacts; The activation function is modified from the SiLU function to the Hard-swish function to improve the training speed; the convolutional layer with 80 input channels and 320 output channels at the end of the model is deleted to adapt to the 2-classification task. The improved model achieved an accuracy of 94.01% when the test image was optimized by the GANprintR forgery method, 6.64% higher than the original MobileViT_xxs model, and the training speed was slightly faster than the original model. This proves the superiority of this model for deep fake face detection tasks. ﹀
参考文献总数：	21
插图总数：	11
插表总数：	2
馆藏号：	本070101/24040
开放日期：	2025-06-28

附件下载