查看论文信息

查看全文

查看论文信息

中文题名：	基于语音驱动的情感化人体动作生成
姓名：	杜隆清
保密级别：	公开
论文语种：	chi
学科代码：	080901
学科专业：	计算机科学与技术
学生类型：	学士
学位：	工学学士
学位年度：	2024
校区：	北京校区培养
学院：	人工智能学院
第一导师姓名：	张鸿文
第一导师单位：	人工智能学院
提交日期：	2024-06-12
答辩日期：	2024-05-21
外文题名：	Speech-Driven Emotional Human Action Generation
中文关键词：	扩散模型 ; 无分类器引导 ; 注意力机制 ; WavLM特征 ; 动作生成
外文关键词：	Diffusion model ; Classifier-free guidance ; Attention mechanism ; WavLM features ; Motion generation
中文摘要：	︿人体动作生成领域相关技术发展迅速，其中基于语音驱动的动作生成技术即是主要方向之一。然而，传统方法生成的一系列动作在语音时间节奏和语义等相关方面常发生不匹配等问题。此外，生成的动作往往与语音中整体表达的主要情感关联度不高。这些问题都在一定程度上降低了动作生成的质量。本文采用了一种基于扩散模型并结合无分类器引导的方法，在保证生成动作具有较高质量和多样性的同时，实现以情感风格为条件控制动作生成。该方法结合注意力机制，保证生成动作时充分考虑语音各部分之间关联，达到充分理解语义、提高模型泛化能力的目的。此外，本方法在音频特征处理方面使用预训练的WavLM模型，提取WavLM特征后生成的动作动作更加贴合语音风格，整体动作更加自然。﹀
外文摘要：	︿ The field of human action generation is rapidly developing, and speech-driven motion generation is one of the main directions. However, a series of actions generated by traditional methods often suffer from mismatches in terms of temporal rhythm and semantic aspects of speech. In addition, the generated actions often do not correlate well with the main emotions expressed in the speech as a whole. All these problems reduce the quality of motion generation to some extent. In this paper, we adopt a method that combines diffusion models with classifier-free guidance to address these challenges. Our method ensures high-quality and diverse generated motions while allowing for conditional control based on emotional style. By incorporating attention mechanisms, we ensure that the generated motions fully consider the interdependencies between different parts of the speech, leading to a better understanding of semantics and improved model generalization. Furthermore, our approach uses pre-trained WavLM models for audio feature processing, resulting in more natural motions that closely match the speech style. ﹀
参考文献总数：	26
馆藏号：	本080901/24009
开放日期：	2025-06-12

附件下载