中文题名: | 基于AIGC技术的宝可梦图像生成研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科专业: | |
学生类型: | 硕士 |
学位: | 应用统计硕士 |
学位类型: | |
学位年度: | 2024 |
校区: | |
学院: | |
研究方向: | 经济与金融统计 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2024-06-06 |
答辩日期: | 2024-05-11 |
外文题名: | Research on Pokémon Image Generation Based on AIGC |
中文关键词: | |
外文关键词: | Image Generation ; Diffusion ; Generative Adversarial Networks ; Pokémon |
中文摘要: |
关于图像生成模型的研究在数字娱乐和内容创意行业中发挥着重要作用。作为全球著名的动漫品牌,宝可梦形象的图像生成技术不仅蕴藏了巨大的发展潜力,也具有着显著的经济价值。基于此,本研究旨在探索图像生成技术在宝可梦图像创作中的应用及其潜在价值。 研究首先概述了图像生成技术的发展历史,并选定对抗网络(Generative Adversarial Network, GAN)、变分自编码器(Variational Autoencoder, VAE)及扩散模型作为研究的核心模型。接着,本文详细介绍了GAN、VAE和扩散模型的工作原理和理论基础。GAN通过训练生成器和判别器来提高生成图像的质量;VAE通过优化生成图像与真实图像之间的概率分布差异来生成新图像;扩散模型则通过逐步去除噪声来生成图像。研究选择IS(Inception Score)评分和FID(Fréchet Inception Distance)评分以对模型的生成质量进行量化评价。 实证分析阶段,研究首先完成宝可梦数据集的收集、清洗和归一化处理,并且微调了预训练的ResNet网络用于IS评分和FID评分的计算。基于此,研究对上述三种模型进行训练和评估,重点比较其在宝可梦图像生成上的表现。实验结果显示,尽管扩散模型的训练周期最长,且模型结构相对复杂,但其生成的宝可梦图像在视觉效果和细节上均优于其他两种模型。在图像生成效果的量化评分方面,扩散模型的IS评分5.8,FID评分为32.6,表明了扩散模型生成的宝可梦图像分布与真实的宝可梦图像分布之间的相似度极高。 此外,本研究的创新点之一是对扩散模型的进一步微调,通过引入带诱导的机制,使模型能够根据文字描述或草图生成宝可梦图像。带诱导的宝可梦图像生成模型不仅显著提高了图像生成的实用性和灵活性,也为宝可梦内容创作者提供了强大的工具,使他们能够更直接地将创意转化为具体的宝可梦图像结果。通过文字或简单草图生成宝可梦的模型,极大扩展了宝可梦图像生成的可能性。 |
外文摘要: |
The research first outlines the development history of image generation technology, selecting Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models as the core models for study. It then introduces in detail the working principles and theoretical foundations of GANs, VAEs, and diffusion models. GANs improve the quality of generated images through the competition between the generator and the discriminator; VAEs generate new images by optimizing the difference in the probability distribution between the generated image and the real image; diffusion models generate images by gradually removing noise. The study chooses the Inception Score (IS) and the Fréchet Inception Distance (FID) to quantitatively evaluate the quality of the models' generated images. In the empirical analysis phase, the study first completes the collection, cleaning, and normalization of the Pokémon dataset, and fine-tunes the pre-trained ResNet network for the calculation of IS and FID scores. Based on this, the study trains and evaluates the three models mentioned above, focusing on comparing their performance in generating Pokémon images. The experimental results show that, despite the longest training cycle and relatively complex model structure of diffusion models, the Pokémon images they generate are superior in visual effects and details compared to the other two models. In terms of quantitative scoring of image generation effects, the diffusion model's IS score is 5.8, and its FID score is 32.6, indicating a high degree of similarity between the distribution of Pokémon images generated by the diffusion model and the distribution of real Pokémon images. Moreover, an innovative point of this study is the further fine-tuning of the diffusion model by introducing an induced mechanism, enabling the model to generate Pokémon images based on textual descriptions or sketches. The induced Pokémon image generation model not only significantly improves the practicality and flexibility of image generation but also provides Pokémon content creators with a powerful tool, allowing them to more directly transform their ideas into concrete Pokémon image results. The model that generates Pokémon from text or simple sketches greatly expands the possibilities of Pokémon image generation. |
参考文献总数: | 38 |
馆藏号: | 硕025200/24039 |
开放日期: | 2025-06-06 |