南方科技大学知识苑(SUSTech KC): 面向机动决策的演化强化学习模型研究

题名	面向机动决策的演化强化学习模型研究
其他题名	RESEARCH ON EVOLUTIONARY REINFORCEMENT LEARNING MODEL FOR MANEUVERING DECISIONS
姓名	唐岚
姓名拼音	TANG Lan
学号	12032497
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	唐珂
导师单位	计算机科学与工程系
论文答辩日期	2023-05-13
论文提交日期	2023-07-07
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	近十年来，随着深度神经网络的兴起，深度强化学习迎来了爆发式增长，演化强化学习因为兼顾演化计算和强化学习的优势而得到广泛应用。但是演化强化学习普遍存在着样本利用率低的问题，即策略模型需要跟环境进行很多次交互才能完成训练。在进行机动决策模型训练时，智能体与环境交互的成本通常是十分昂贵的，比如无人机部件磨损甚至撞毁。因此，本研究旨在尽可能降低交互次数，研究机动决策模型自主化、智能化的关键技术。基于此，本文对空战模拟器进行了调研，调研发现六自由度的无人机模型较少有人研究。而六自由度无人机模型更符合真实的飞行器设计，因此本文首先构建了六自由度无人机空战博弈智能仿真模拟器；为了对空战模拟器进行可行性分析，本文使用经典梯度强化学习算法，如近端策略优化算法，进行了研究。实验表明，我们的空战模拟器奖励重塑效果良好；为了提升样本利用率，本文提出了一个基于代理模型的负相关搜索算法（PE-FCPS-NCS），在演化强化学习算法框架中引入代理模型，在真实的环境评估之前预先筛选掉明显不适用的策略，从而提升样本利用率。我们在经典强化学习基准 Atari 游戏和本文提出的空战模拟器上进行了实验。实验研究表明，本文提出的方法具有有竞争力的性能。
关键词	强化学习演化算法代理模型机动决策
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning: abs/1312.5602[A]. 2013. [2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. [4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. [5] YE D, LIU Z, SUN M, et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning[J]. Proceedings of the 34rd AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(04): 6672-6679. [6] THORP H H. ChatGPT is fun, but not an author[J]. Science, 2023, 379(6630): 313. [7] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. MIT press, 2018. [8] DRUGAN M M. Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms[J]. Swarm and Evolutionary Computation, 2019, 44: 228-246. [9] YANG P, ZHANG H, YU Y, et al. Evolutionary reinforcement learning via cooperative co-evolutionary negatively correlated search[J]. Swarm and Evolutionary Computation, 2022, 68: 100974. [10] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforcement learning: abs/1703.03864[A]. 2017. [11] YANG Q, ZHANG J, SHI G, et al. Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning[J]. IEEE Access, 2020, 8: 363-378. [12] SUN Z, PIAO H, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112. [13] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy Networks for Exploration: abs/1706.10295 [A]. 2017. [14] CHRABASZCZ P, LOSHCHILOV I, HUTTER F. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari: abs/1802.08842[A]. 2018. [15] CONTI E, MADHAVAN V, PETROSKI SUCH F, et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents[C]// Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018. [16] TANG K, YANG P, YAO X. Negatively Correlated Search[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(3): 542-550. [17] PARKER-HOLDER J, PACCHIANO A, CHOROMANSKI K M, et al. Effective Diversityin Population Based Reinforcement Learning[C]//Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS 2020): volume 33. Curran Associates, Inc., 2020: 18050-18062. [18] KHADKA S, TUMER K. Evolution-Guided Policy Gradient in Reinforcement Learning[C]//Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018. [19] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning: abs/1509.02971[A]. 2015. [20] POURCHOT A, SIGAUD O. CEM-RL: Combining evolutionary and gradient-based methods for policy search[C]//Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). New Orleans, USA: OpenReview.net, 2019. [21] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing Function Approximation Error in Actor-Critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1587-1596. [22] LI P, TANG H, HAO J, et al. ERL-Re2: Eﬀicient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation: abs/2210.17375[A]. 2022. [23] SONG Z, WANG H, HE C, et al. A Kriging-Assisted Two-Archive Evolutionary Algorithm for Expensive Many-Objective Optimization[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(6): 1013-1027. [24] STORK J, ZAEFFERER M, BARTZ-BEIELSTEIN T, et al. Surrogate Models for Enhancing the Eﬀiciency of Neuroevolution in Reinforcement Learning[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2019): number 9. New York, USA: Association for Computing Machinery, 2019: 934-942. [25] WANG Y, ZHANG T, CHANG Y, et al. A surrogate-assisted controller for expensive evolutionary reinforcement learning[J]. Information Sciences, 2022, 616: 539-557. [26] MASCHLER M, ZAMIR S, SOLAN E. Game theory[M]. Cambridge University Press, 2020. [27] MGUNI D H, WU Y, DU Y, et al. Learning in Nonzero-Sum Stochastic Games with Potentials[C]//Proceedings of the 38th International Conference on Machine Learning (ICML 2021): volume 139. PMLR, 2021: 7688-7699. [28] ZHANG R, ZONG Q, ZHANG X, et al. Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-10. [29] 车竞, 钱炜祺, 和争春. 基于矩阵博弈的两机攻防对抗空战仿真[J]. 飞行力学, 2015(2):173-177. [30] RIZK Y, AWAD M, TUNSTEL E W. Decision Making in Multiagent Systems: A Survey[J]. IEEE Transactions on Cognitive and Developmental Systems, 2018, 10(3): 514-529. [31] PAN Q, ZHOU D, HUANG J, et al. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Macao, China: IEEE, 2017: 726-731. [32] WEINTRAUB I E, PACHTER M, GARCIA E. An Introduction to Pursuit-evasion Differential Games[C]//Proceedings of the American Control Conference (ACC 2020). Denver, CO, USA: IEEE, 2020: 1049-1066. [33] PARK H, LEE B Y, TAHK M J, et al. Differential Game Based Air Combat Maneuver Generation Using Scoring Function Matrix[J]. International Journal of Aeronautical and Space Sciences, 2016, 17(2): 204-213. [34] MCGREW J S, HOW J P, WILLIAMS B, et al. Air-Combat Strategy Using Approximate Dynamic Programming[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(5): 1641-1654. [35] 黄长强, 赵克新, 韩邦杰, 等. 一种近似动态规划的无人机机动决策方法[J]. 电子与信息学报, 2018, 40(10): 2447-2452. [36] LI F, FUHUAI X, GUANGLEI M, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11): 1994-1999. [37] ERNEST N, CARROLL D, SCHUMACHER C, et al. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1): 2167-0374. [38] KANG Y, PU Z, LIU Z, et al. Air-to-Air Combat Tactical Decision Method Based on SIRMs Fuzzy Logic and Improved Genetic Algorithm[C]//Proceedings of the International Conference on Guidance, Navigation and Control (ICGNC 2022): volume 644. Singapore: Springer Singapore, 2022: 3699-3709. [39] ZHANG H, HUANG C. Maneuver Decision-Making of Deep Learning for UCAV Thorough Azimuth Angles[J]. IEEE Access, 2020, 8: 12976-12987. [40] HU D, YANG R, ZUO J, et al. Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat[J]. IEEE Access, 2021, 9: 32282-32297. [41] WANG Z, LI H, WU H, et al. Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm[J]. Mathematical Problems in Engineering, 2020, 2020: 1-17. [42] POPE A P, IDE J S, MIĆOVIĆ D, et al. Hierarchical Reinforcement Learning for Air Combat At DARPA’s AlphaDogfight Trials[J]. IEEE Transactions on Artificial Intelligence, 2022: 1-15. [43] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms: abs/1707.06347[A]. 2017. [44] QIAN H, HU Y Q, YU Y. Derivative-Free Optimization of High-Dimensional Non-Convex Functions by Sequential Random Embeddings[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016). New York, USA: AAAI Press, 2016: 1946-1952. [45] ZHOU A, ZHANG J, SUN J, et al. Fuzzy-Classification Assisted Solution Preselection in Evolutionary Optimization[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 2403-2410. [46] GEIST M, SCHERRER B, PIETQUIN O. A Theory of Regularized Markov Decision Processes[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 2160-2169. [47] LEE H, SONG C, KIM N, et al. Comparative Analysis of Energy Management Strategies for HEV: Dynamic Programming and Reinforcement Learning[J]. IEEE Access, 2020, 8: 67112-67123. [48] JANZ D, HRON J, MAZUR P A, et al. Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning[C]//Proceedings of the 33th Advances in Neural Information Processing Systems (NeurIPS 2019）: volume 32. Curran Associates, Inc., 2019. [49] WANG X, WANG S, LIANG X, et al. Deep Reinforcement Learning: A Survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-15. [50] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications: abs/1812.05905[A]. 2018. [51] EYSENBACH B, GUPTA A, IBARZ J, et al. Diversity is all you need: Learning skills without a reward function: abs/1802.06070[A]. 2018. [52] ZHANG Y, YU W, TURK G. Learning Novel Policies For Tasks[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 7483-7492. [53] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]//Proceedings of the 31st International Conference on Machine Learning (ICML 2014): volume 32. Bejing, China: PMLR, 2014: 387-395. [54] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML 2015): volume 37. Lille, France: PMLR, 2015: 1889-1897. [55] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning (ICML 2016): volume 48. New York, USA: PMLR, 2016: 1928-1937. [56] HEESS N, TB D, SRIRAM S, et al. Emergence of Locomotion Behaviours in Rich Environments: abs/1707.02286[A]. 2017. [57] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients: abs/1804.08617[A]. 2018. [58] ESPEHOLT L, SOYER H, MUNOS R, et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1407-1416. [59] YAZDANI D, CHENG R, YAZDANI D, et al. A Survey of Evolutionary Continuous Dynamic Optimization Over Two Decades—Part A[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(4): 609-629. [60] MIRJALILI S. Genetic Algorithm: volume 780[M]. Springer, Cham, 2019: 43-55. [61] DAS S, MULLICK S S, SUGANTHAN P. Recent advances in differential evolution –An updated survey[J]. Swarm and Evolutionary Computation, 2016, 27: 1-30. [62] QIAN H, YU Y. Derivative-free reinforcement learning: a review[J]. Frontiers of Computer Science, 2021, 15(6): 156336. [63] LIU G, ZHAO L, YANG F, et al. Trust Region Evolution Strategies[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 4352-4359. [64] KHADKA S, MAJUMDAR S, NASSAR T, et al. Collaborative Evolutionary Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 3341-3350. [65] MAJUMDAR S, KHADKA S, MIRET S, et al. Evolutionary Reinforcement Learning for Sample-Eﬀicient Multiagent Coordination[C]//Proceedings of the 37th International Conference on Machine Learning (ICML 2020): volume 119. PMLR, 2020: 6651-6660. [66] 孙浩亮, 梁海军, 张东顺. 美军自主无人系统发展及启示[J]. 舰船电子工程, 2022, 42(7):1-4. [67] JULIANI A, BERGES V P, TENG E, et al. Unity: A general platform for intelligent agents: abs/1809.02627[A]. 2018. [68] JULIANI A, KHALIFA A, BERGES V P, et al. Obstacle tower: A generalization challenge in vision, control, and planning: abs/1902.01378[A]. 2019. [69] KOLVE E, MOTTAGHI R, HAN W, et al. AI2-THOR: An Interactive 3D Environment for Visual AI: abs/1712.05474[A]. 2017. [70] ZHU Y, MOTTAGHI R, KOLVE E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Singapore: IEEE, 2017: 3357-3364. [71] YAN C, MISRA D, BENNNETT A, et al. Chalet: Cornell house agent learning environment: abs/1801.07357[A]. 2018. [72] ANDRYCHOWICZ O M, BAKER B, CHOCIEJ M, et al. Learning Dexterous In-Hand Manipulation[J]. The International Journal of Robotics Research, 2020, 39(1): 3-20. [73] BEHBAHANI F, SHIARLIS K, CHEN X, et al. Learning From Demonstration in the Wild [C]//Proceedings of the International Conference on Robotics and Automation (ICRA 2019). Montreal, QC, Canada: IEEE, 2019: 775-781. [74] SONG Y, WOJCICKI A, LUKASIEWICZ T, et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence[J]. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(05): 7253-7260. [75] SWORDMASTER. Air Warfare Pro Template[EB/OL]. https://github.com/swordmaster003/Air-Warfare-Pro. [76] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with Large Scale Deep Reinforcement Learning: abs/1912.06680[A]. 2019. [77] DING N, SORICUT R. Cold-Start Reinforcement Learning with Softmax Policy Gradient[C]//Proceedings of the 31th Advances in Neural Information Processing Systems (NeurIPS 2017): volume 30. Curran Associates, Inc., 2017. [78] TONG H, HUANG C, MINKU L L, et al. Surrogate models in evolutionary single-objective optimization: A new taxonomy and experimental study[J]. Information Sciences, 2021, 562: 414-437. [79] FRANCON O, GONZALEZ S, HODJAT B, et al. Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2020): number 9. New York, USA: Association for Computing Machinery, 2020: 814-822. [80] TANG J, CHENG J, XIANG D, et al. Large-Difference-Scale Target Detection Using a Revised Bhattacharyya Distance in SAR Images[J]. IEEE Geoscience and Remote Sensing Letters, 2022,19: 1-5.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP391.4
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/545188
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	唐岚. 面向机动决策的演化强化学习模型研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032497-唐岚-计算机科学与工程（19610KB）	--	--	限制开放	--	请求全文