查看论文信息

查看全文

查看论文信息

中文题名：	DDPG和TD3智能博弈算法在多机空战决策中的应用——依托某智能博弈对抗态势仿真平台
姓名：	韩志浩
保密级别：	公开
论文语种：	chi
学科代码：	080901
学科专业：	计算机科学与技术
学生类型：	学士
学位：	工学学士
学位年度：	2024
校区：	北京校区培养
学院：	人工智能学院
第一导师姓名：	魏云刚
第一导师单位：	人工智能学院
提交日期：	2024-06-16
答辩日期：	2024-05-21
外文题名：	Research on the Application of DDPG and TD3 Intelligent Game Algorithms in Multi aircraft Air Combat Decision Making ——Relying on an intelligent game confrontation situation simulation platform
中文关键词：	智能博弈 ; 多机空战 ; 强化学习 ; DDPG ; TD3
外文关键词：	Intelligent Game ; Multi-Aircraft Air Combat ; DDPG ; TD3 ; Reinforcement Learning
中文摘要：	︿随着人工智能技术的快速发展，智能博弈算法在军事领域尤其是多机空战决策中的应用日益受到重视。论文以多机空战决策为背景，基于智能博弈与强化学习领域的理论基础，依托智能博弈对抗态势仿真平台，探讨研究了人工智能在军事智能博弈领域的应用。论文研究系统地探索了深度强化学习算法Deterministic Policy Gradient（DDPG）和Twin Delayed DDPG（TD3）在该领域的应用潜力。通过构建一个定制化的多机空战算法训练网络来实现多机空战决策框架，论文实现了一套基于仿真平台的多机空战决策智能博弈算法，并对其性能与稳健性进行了评估。研究发现，DDPG和TD3算法均展现出优秀的性能，而TD3算法通过引入双Q值网络、目标策略平滑和延迟策略更新等创新机制，显著提高了学习过程的稳定性和决策质量，在胜率和稳健性方面更优于DDPG，展现出在连续动作空间的强化学习任务中的高潜力。论文不仅验证了深度强化学习算法在多机空战决策中的有效性，还为智能博弈算法的设计和实现提供了新的视角。﹀
外文摘要：	︿ With the rapid development of artificial intelligence technology, intelligent game algorithms have increasingly gained attention in the military field, especially in the decision-making of multi-aircraft air combat. This thesis, set against the backdrop of multi-aircraft combat decision-making, based on the theoretical foundations of intelligent game-playing and reinforcement learning, and relying on an intelligent game confrontation situation simulation platform, explores the application of artificial intelligence in the field of military intelligent game-playing. The research systematically explores the potential of deep reinforcement learning algorithms Deterministic Policy Gradient (DDPG) and Twin Delayed DDPG (TD3) in this field. By constructing a customized multi-aircraft combat decision-making training network to implement a multi-aircraft combat decision-making framework, the thesis has realized a set of multi-aircraft combat decision-making intelligent game-playing algorithms based on the simulation platform and evaluated their performance and robustness. The research found that both DDPG and TD3 algorithms demonstrated excellent performance. The TD3 algorithm, by introducing innovative mechanisms such as twin Q-value networks, target policy smoothing, and delayed policy updates, significantly improved the stability of the learning process and the quality of decision-making. It performed better than DDPG in terms of win rate and robustness, showing high potential in reinforcement learning tasks with continuous action spaces. The thesis not only verified the effectiveness of deep reinforcement learning algorithms in multi-aircraft combat decision-making but also provided new perspectives for the design and implementation of intelligent game algorithms. ﹀
参考文献总数：	44
插图总数：	21
插表总数：	12
馆藏号：	本080901/24038
开放日期：	2025-06-16

附件下载