中文题名: | 基于多任务学习的人脸属性估计算法研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081203 |
学科专业: | |
学生类型: | 博士 |
学位: | 工学博士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-13 |
答辩日期: | 2023-06-03 |
外文题名: | Face attribute estimation based on multi-task learning |
中文关键词: | |
外文关键词: | Facial attribute classification ; deep learning ; age estimation ; multi-task learning ; convolutional neural network ; graph convolutional neural network ; wavelet scattering transform |
中文摘要: |
人脸属性是人类生物信息的中层表征,通常表示人类可理解的直观语义特征。近年来,人脸属性估计在生物验证、人脸检索等方向应用广泛,其中年龄估计和属性分类已经成为计算机视觉和模式识别等领域的研究热点。然而,由于人脸具有个性化和精细化的特点,目前基于年龄估计和属性分类的研究仍面临以下问题亟待解决:在数据方面,年龄数据集与人脸多属性数据集分别存在年龄分布不均匀以及类别不均衡问题,使得模型更偏向于大样本属性;特征方面,基于卷积神经网络的特征提取方法已经取得了突破性进展,但受限于卷积滤波器的特性与卷积模型结构设计,提取到的属性特征不够精细与鲁棒;模型方面,常见的方法将人脸不同属性的估计视作独立的过程,未充分挖掘不同人脸属性之间的关系。 针对上述问题,本文基于从单标签到多标签的研究思路,从多属性任务学习、多层级特征提取、多标签关系挖掘、混合域特征融合的角度出发进行人脸年龄估计与多属性分类研究,通过多任务学习在浅层网络共享特征,在深层网络进行分组学习,实现在一个模型中同时完成多个属性估计的任务。本文的主要研究工作如下: (1)针对年龄单分类与单回归模型分别受限于分类数目的构造与样本不均衡分布的问题,提出两种基于分类与回归的多任务估计模型。首先设计由粗到精的网络模型,在逐渐细化的分组层次上学习不同粒度的特征,并设计年龄组别判定网络层实现端到端的学习模型,简化模型训练步骤。结果显示该模型虽能提升年龄估计的准确性,但由于其过度依赖分类结果,以及不合理的年龄数据划分会导致分类的边界效应问题,使得模型容易过拟合。针对上述问题,提出了一种结合年龄分类与回归的深层多任务模型。该模型分别将年龄回归和年龄分类作为主任务和辅助任务,并学习其共享信息表征。然后构造了三种年龄分组方法:改进的相邻年龄分组、基于年龄特征的K-均值聚类以及基于标签分布学习的K-均值聚类,缓解了分类模型边界效应,降低了不平衡的年龄数据对年龄回归的影响,从而提升了年龄估计性能。 (2)针对基于全局人脸模型所提取的面部属性特征不够精细的问题,提出两种个性化局部特征提取与融合方法。首先提出对抗性擦除方法,通过迭代擦除感兴趣区域的过程,提取不同层级的年龄特征。结合全局特征和个性化局部特征,设计了一种多输入多输出联合模型,以实现年龄、性别与种族的有效估计。但由于该模型中层级特征的提取可视作预处理步骤,导致模型复杂度增加,优化调参困难。针对上述问题,提出基于注意力引导的分区双线性池化模型,使用注意力机制提取不同尺度和不同区域特征,使用双线性池化将不同子区域的特征进行交互与融合,简化了模型训练步骤,提升了多属性任务的估计精度。 (3)针对多标签属性间复杂的相关性与异质性导致模型性能受限的问题,提出一种基于图卷积网络的属性间深层关系挖掘模型,通过图卷积网络建模属性间的语义相关性,并将其与来自卷积神经网络的人脸特征相融合,指导卷积模型实现相互依赖的分类特征学习,获得了有意义的语义拓扑信息,提升了强相关属性估计性能。针对数据样本和属性任务的不均衡问题,采用自适应阈值和提升机制两种策略缓解类别不均衡的影响,并提出了一种动态加权策略来解决任务不均衡的问题,提高了模型对各类别或属性的预测稳定性。 (4)针对基于卷积神经网络结构提取的纹理特征不够精细、模型不够稳定的问题,提出一种基于小波散射变换的混合域属性估计模型。首先设计混合模块,将小波散射变换视为图像频域特征提取器,并以通道注意力方式与卷积神经网络结合,提取更精细的面部属性纹理特征。再者,基于混合模块设计多种集成模型,缓解卷积网络对小尺度仿射变换的敏感性,实现了有竞争力的属性估计性能。此外,针对属性间伪相关性导致的模型过拟合问题,提出一种属性间因果关系挖掘模块,提升了模型的泛化性与可解释性。
|
外文摘要: |
Face attribute is the middle representation of human biological information, usually representing intuitive semantic features. In recent years, face attribute estimation has been widely used in biological verification and face retrieval, in which age estimation and attribute classification have become research hotspots in computer vision and pattern recognition. However, due to the characteristics of personalization and refinement of the human face, the current research based on age estimation and attribute classification still faces the following problems: in terms of data, datasets of age and face multi-attribute exist the problems of imbalanced age distribution and category respectively, which make the model more inclined to large sample attributes. In terms of features, the feature extraction method based on a convolutional neural network has made a breakthrough. However, limited by the convolution filter and model structure design, the extracted attribute features are not fine-grained and robust; in terms of models, common methods regard the estimation of different facial attributes as an independent process and do not fully mine the relationship among different facial attributes. To address the above problems, this thesis proposes the research idea from single-label to multi-label, and the research perspective of multi-attribute task learning, multi-level feature extraction, multi-label relationship mining, and mixed-domain feature fusion. Multi-task learning is used to share features in shallow networks and group learning in deep networks to realize the task of multiple attribute estimation in one model simultaneously. The main research work of this paper is as follows: (1) To address the problem that age single classification and single regression model are limited by the construction of classification number and the unbalanced distribution of samples respectively, we propose two multi-task estimation models based on classification and regression. First of all, we design the coarse-to-fine network model, learn different-grained features on the gradually refined grouping level, and design the age group to determine the network layer to realize the end-to-end learning model and simplify the training steps of the model. The results show that although the model can improve the accuracy of age estimation, the over-reliance on classification results and unreasonable age data division will lead to the boundary effect of classification, which makes the model tend to overfit. In order to solve the above problems, we propose a deep multi-task model based on age classification and regression. The model takes age regression and age classification as the main task and auxiliary task respectively, and learns the shared information representation of two tasks. By constructing three age grouping methods: improved adjacent age grouping, K-means clustering based on age features, and K-means clustering based on label distribution learning, the boundary effect of the classification model is alleviated, the influence of unbalanced age data on age regression is reduced, and the performance of age estimation is improved. (2) To address the problem that the facial attribute features extracted based on the global face model are not fine enough, we propose two personalized local feature extraction and fusion methods. Firstly, we design an adversarial erasure method, where the age features of different levels are extracted through the process of iteratively erasing regions of interest. Combined with global features and personalized local features, a multi-input and multi-output joint model is designed to effectively estimate age, gender, and race. However, because the extraction of middle-level features in the model can be regarded as a preprocessing step, the complexity of the model increases, and optimizing the parameters becomes difficult. In order to solve the above problems, we propose a partition bilinear pooling model guided by the attention map. The attention mechanism is used to extract the features of different scales and different regions, and the bilinear pooling is used to interact and fuse the features of different sub-regions. The training steps of the model are simplified and the estimation accuracy of multi-attribute tasks is improved. (3) To address the problem that the performance of the model is limited due to the complex correlation and heterogeneity among multi-tag attributes, we propose a deep relationship mining model between attributes based on a graph convolution network. The semantic correlation among attributes is modeled by the graph convolution network. It is combined with the facial features from the convolution neural network to guide the convolution model to realize interdependent classification feature learning and obtain meaningful semantic topology information. The performance of strong correlation attribute estimation is improved. Aiming at the problem of imbalance of data samples and attribute tasks, two strategies of adaptive threshold and lifting mechanism are adopted to alleviate the influence of category imbalance, and a dynamic weighting strategy is proposed to solve the problem of task imbalance. The prediction stability of the model for each category or attribute is improved. (4) To address the problem that the texture features extracted based on convolution neural network structure are not fine enough and the model is not stable enough, a mixed domain attribute estimation model based on wavelet scattering transform is proposed. Firstly, a hybrid module is designed, which regards the wavelet scattering transform as the image frequency domain feature extractor, and combines the channel attention with the convolution neural network to extract more fine facial attribute texture features. In addition, a variety of integrated models are designed based on hybrid modules to alleviate the sensitivity of the convolution network to small-scale affine transformation and achieve competitive attribute estimation performance. Aiming at the problem of model overfitting caused by pseudo-correlation between attributes, a causal relationship mining module between attributes is proposed, which improves the generalization and interpretability of the model. In summary, this thesis uses the method of multi-task learning, proposes a research framework of facial attribute estimation from single-label to multi-label, extracts more refined and discriminative attribute features, and realizes a more accurate and stable attribute estimation model. In age estimation, based on the idea of divide-and-conquer and multi-task learning, two multi-task estimation models combining classification and regression are proposed, which alleviates the problem that age estimation based on a single classification and single regression is limited by the number of classifications and sample distribution, respectively. In the joint estimation task of age, gender, and race, two personalized local feature extraction methods are proposed based on adversarial erasing and attention guidance strategy, which improves personalization of unified cropping methods, and effectively mining facial features of different levels. In the task of multi-attribute classification, we propose two methods for mining the deep relationship among attributes based on the graph convolution neural network model and the perspective of causal inference, which solves the problems of strong subjectivity and insufficient feature extraction based on the human experience, improving the performance of multi-attribute task classification. In addition, a hybrid domain model based on a wavelet scattering network and convolution neural network is proposed, which makes up for the sensitivity of the convolution network to small-scale affine transformation and improves the stability of the attribute classification model. |
参考文献总数: | 208 |
馆藏地: | 图书馆学位论文阅览区(主馆南区三层BC区) |
馆藏号: | 博081203/23003 |
开放日期: | 2024-06-12 |