中文版 | English
题名

面向机动决策的演化强化学习模型研究

其他题名
RESEARCH ON EVOLUTIONARY REINFORCEMENT LEARNING MODEL FOR MANEUVERING DECISIONS
姓名
姓名拼音
TANG Lan
学号
12032497
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
唐珂
导师单位
计算机科学与工程系
论文答辩日期
2023-05-13
论文提交日期
2023-07-07
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

近十年来,随着深度神经网络的兴起,深度强化学习迎来了爆发式增长,演化 强化学习因为兼顾演化计算和强化学习的优势而得到广泛应用。但是演化强化学 习普遍存在着样本利用率低的问题,即策略模型需要跟环境进行很多次交互才能 完成训练。在进行机动决策模型训练时,智能体与环境交互的成本通常是十分昂 贵的,比如无人机部件磨损甚至撞毁。因此,本研究旨在尽可能降低交互次数,研 究机动决策模型自主化、智能化的关键技术。

基于此,本文对空战模拟器进行了调研,调研发现六自由度的无人机模型较 少有人研究。而六自由度无人机模型更符合真实的飞行器设计,因此本文首先构 建了六自由度无人机空战博弈智能仿真模拟器;为了对空战模拟器进行可行性分 析,本文使用经典梯度强化学习算法,如近端策略优化算法,进行了研究。实验表 明,我们的空战模拟器奖励重塑效果良好;为了提升样本利用率,本文提出了一个 基于代理模型的负相关搜索算法(PE-FCPS-NCS),在演化强化学习算法框架中引 入代理模型,在真实的环境评估之前预先筛选掉明显不适用的策略,从而提升样 本利用率。我们在经典强化学习基准 Atari 游戏和本文提出的空战模拟器上进行了 实验。实验研究表明,本文提出的方法具有有竞争力的性能。

关键词
语种
中文
培养类别
独立培养
入学年份
2020
学位授予年份
2023-06
参考文献列表

[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning: abs/1312.5602[A]. 2013.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[5] YE D, LIU Z, SUN M, et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning[J]. Proceedings of the 34rd AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(04): 6672-6679.
[6] THORP H H. ChatGPT is fun, but not an author[J]. Science, 2023, 379(6630): 313.
[7] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
[8] DRUGAN M M. Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms[J]. Swarm and Evolutionary Computation, 2019, 44: 228-246.
[9] YANG P, ZHANG H, YU Y, et al. Evolutionary reinforcement learning via cooperative co-evolutionary negatively correlated search[J]. Swarm and Evolutionary Computation, 2022, 68: 100974.
[10] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforcement learning: abs/1703.03864[A]. 2017.
[11] YANG Q, ZHANG J, SHI G, et al. Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning[J]. IEEE Access, 2020, 8: 363-378.
[12] SUN Z, PIAO H, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112.
[13] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy Networks for Exploration: abs/1706.10295 [A]. 2017.
[14] CHRABASZCZ P, LOSHCHILOV I, HUTTER F. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari: abs/1802.08842[A]. 2018.
[15] CONTI E, MADHAVAN V, PETROSKI SUCH F, et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents[C]// Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018.
[16] TANG K, YANG P, YAO X. Negatively Correlated Search[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(3): 542-550.
[17] PARKER-HOLDER J, PACCHIANO A, CHOROMANSKI K M, et al. Effective Diversityin Population Based Reinforcement Learning[C]//Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS 2020): volume 33. Curran Associates, Inc., 2020: 18050-18062.
[18] KHADKA S, TUMER K. Evolution-Guided Policy Gradient in Reinforcement Learning[C]//Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018.
[19] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning: abs/1509.02971[A]. 2015.
[20] POURCHOT A, SIGAUD O. CEM-RL: Combining evolutionary and gradient-based methods for policy search[C]//Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). New Orleans, USA: OpenReview.net, 2019.
[21] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing Function Approximation Error in Actor-Critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1587-1596.
[22] LI P, TANG H, HAO J, et al. ERL-Re2: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation: abs/2210.17375[A]. 2022.
[23] SONG Z, WANG H, HE C, et al. A Kriging-Assisted Two-Archive Evolutionary Algorithm for Expensive Many-Objective Optimization[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(6): 1013-1027.
[24] STORK J, ZAEFFERER M, BARTZ-BEIELSTEIN T, et al. Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2019): number 9. New York, USA: Association for Computing Machinery, 2019: 934-942.
[25] WANG Y, ZHANG T, CHANG Y, et al. A surrogate-assisted controller for expensive evolutionary reinforcement learning[J]. Information Sciences, 2022, 616: 539-557.
[26] MASCHLER M, ZAMIR S, SOLAN E. Game theory[M]. Cambridge University Press, 2020.
[27] MGUNI D H, WU Y, DU Y, et al. Learning in Nonzero-Sum Stochastic Games with Potentials[C]//Proceedings of the 38th International Conference on Machine Learning (ICML 2021): volume 139. PMLR, 2021: 7688-7699.
[28] ZHANG R, ZONG Q, ZHANG X, et al. Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-10.
[29] 车竞, 钱炜祺, 和争春. 基于矩阵博弈的两机攻防对抗空战仿真[J]. 飞行力学, 2015(2):173-177.
[30] RIZK Y, AWAD M, TUNSTEL E W. Decision Making in Multiagent Systems: A Survey[J]. IEEE Transactions on Cognitive and Developmental Systems, 2018, 10(3): 514-529.
[31] PAN Q, ZHOU D, HUANG J, et al. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Macao, China: IEEE, 2017: 726-731.
[32] WEINTRAUB I E, PACHTER M, GARCIA E. An Introduction to Pursuit-evasion Differential Games[C]//Proceedings of the American Control Conference (ACC 2020). Denver, CO, USA: IEEE, 2020: 1049-1066.
[33] PARK H, LEE B Y, TAHK M J, et al. Differential Game Based Air Combat Maneuver Generation Using Scoring Function Matrix[J]. International Journal of Aeronautical and Space Sciences, 2016, 17(2): 204-213.
[34] MCGREW J S, HOW J P, WILLIAMS B, et al. Air-Combat Strategy Using Approximate Dynamic Programming[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(5): 1641-1654.
[35] 黄长强, 赵克新, 韩邦杰, 等. 一种近似动态规划的无人机机动决策方法[J]. 电子与信息学报, 2018, 40(10): 2447-2452.
[36] LI F, FUHUAI X, GUANGLEI M, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11): 1994-1999.
[37] ERNEST N, CARROLL D, SCHUMACHER C, et al. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1): 2167-0374.
[38] KANG Y, PU Z, LIU Z, et al. Air-to-Air Combat Tactical Decision Method Based on SIRMs Fuzzy Logic and Improved Genetic Algorithm[C]//Proceedings of the International Conference on Guidance, Navigation and Control (ICGNC 2022): volume 644. Singapore: Springer Singapore, 2022: 3699-3709.
[39] ZHANG H, HUANG C. Maneuver Decision-Making of Deep Learning for UCAV Thorough Azimuth Angles[J]. IEEE Access, 2020, 8: 12976-12987.
[40] HU D, YANG R, ZUO J, et al. Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat[J]. IEEE Access, 2021, 9: 32282-32297.
[41] WANG Z, LI H, WU H, et al. Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm[J]. Mathematical Problems in Engineering, 2020, 2020: 1-17.
[42] POPE A P, IDE J S, MIĆOVIĆ D, et al. Hierarchical Reinforcement Learning for Air Combat At DARPA’s AlphaDogfight Trials[J]. IEEE Transactions on Artificial Intelligence, 2022: 1-15.
[43] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms: abs/1707.06347[A]. 2017.
[44] QIAN H, HU Y Q, YU Y. Derivative-Free Optimization of High-Dimensional Non-Convex Functions by Sequential Random Embeddings[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016). New York, USA: AAAI Press, 2016: 1946-1952.
[45] ZHOU A, ZHANG J, SUN J, et al. Fuzzy-Classification Assisted Solution Preselection in Evolutionary Optimization[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 2403-2410.
[46] GEIST M, SCHERRER B, PIETQUIN O. A Theory of Regularized Markov Decision Processes[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 2160-2169.
[47] LEE H, SONG C, KIM N, et al. Comparative Analysis of Energy Management Strategies for HEV: Dynamic Programming and Reinforcement Learning[J]. IEEE Access, 2020, 8: 67112-67123.
[48] JANZ D, HRON J, MAZUR P A, et al. Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning[C]//Proceedings of the 33th Advances in Neural Information Processing Systems (NeurIPS 2019): volume 32. Curran Associates, Inc., 2019.
[49] WANG X, WANG S, LIANG X, et al. Deep Reinforcement Learning: A Survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-15.
[50] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications: abs/1812.05905[A]. 2018.
[51] EYSENBACH B, GUPTA A, IBARZ J, et al. Diversity is all you need: Learning skills without a reward function: abs/1802.06070[A]. 2018.
[52] ZHANG Y, YU W, TURK G. Learning Novel Policies For Tasks[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 7483-7492.
[53] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]//Proceedings of the 31st International Conference on Machine Learning (ICML 2014): volume 32. Bejing, China: PMLR, 2014: 387-395.
[54] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML 2015): volume 37. Lille, France: PMLR, 2015: 1889-1897.
[55] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning (ICML 2016): volume 48. New York, USA: PMLR, 2016: 1928-1937.
[56] HEESS N, TB D, SRIRAM S, et al. Emergence of Locomotion Behaviours in Rich Environments: abs/1707.02286[A]. 2017.
[57] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients: abs/1804.08617[A]. 2018.
[58] ESPEHOLT L, SOYER H, MUNOS R, et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1407-1416.
[59] YAZDANI D, CHENG R, YAZDANI D, et al. A Survey of Evolutionary Continuous Dynamic Optimization Over Two Decades—Part A[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(4): 609-629.
[60] MIRJALILI S. Genetic Algorithm: volume 780[M]. Springer, Cham, 2019: 43-55.
[61] DAS S, MULLICK S S, SUGANTHAN P. Recent advances in differential evolution –An updated survey[J]. Swarm and Evolutionary Computation, 2016, 27: 1-30.
[62] QIAN H, YU Y. Derivative-free reinforcement learning: a review[J]. Frontiers of Computer Science, 2021, 15(6): 156336.
[63] LIU G, ZHAO L, YANG F, et al. Trust Region Evolution Strategies[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 4352-4359.
[64] KHADKA S, MAJUMDAR S, NASSAR T, et al. Collaborative Evolutionary Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 3341-3350.
[65] MAJUMDAR S, KHADKA S, MIRET S, et al. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination[C]//Proceedings of the 37th International Conference on Machine Learning (ICML 2020): volume 119. PMLR, 2020: 6651-6660.
[66] 孙浩亮, 梁海军, 张东顺. 美军自主无人系统发展及启示[J]. 舰船电子工程, 2022, 42(7):1-4.
[67] JULIANI A, BERGES V P, TENG E, et al. Unity: A general platform for intelligent agents: abs/1809.02627[A]. 2018.
[68] JULIANI A, KHALIFA A, BERGES V P, et al. Obstacle tower: A generalization challenge in vision, control, and planning: abs/1902.01378[A]. 2019.
[69] KOLVE E, MOTTAGHI R, HAN W, et al. AI2-THOR: An Interactive 3D Environment for Visual AI: abs/1712.05474[A]. 2017.
[70] ZHU Y, MOTTAGHI R, KOLVE E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Singapore: IEEE, 2017: 3357-3364.
[71] YAN C, MISRA D, BENNNETT A, et al. Chalet: Cornell house agent learning environment: abs/1801.07357[A]. 2018.
[72] ANDRYCHOWICZ O M, BAKER B, CHOCIEJ M, et al. Learning Dexterous In-Hand Manipulation[J]. The International Journal of Robotics Research, 2020, 39(1): 3-20.
[73] BEHBAHANI F, SHIARLIS K, CHEN X, et al. Learning From Demonstration in the Wild [C]//Proceedings of the International Conference on Robotics and Automation (ICRA 2019). Montreal, QC, Canada: IEEE, 2019: 775-781.
[74] SONG Y, WOJCICKI A, LUKASIEWICZ T, et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence[J]. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(05): 7253-7260.
[75] SWORDMASTER. Air Warfare Pro Template[EB/OL]. https://github.com/swordmaster003/Air-Warfare-Pro.
[76] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with Large Scale Deep Reinforcement Learning: abs/1912.06680[A]. 2019.
[77] DING N, SORICUT R. Cold-Start Reinforcement Learning with Softmax Policy Gradient[C]//Proceedings of the 31th Advances in Neural Information Processing Systems (NeurIPS 2017): volume 30. Curran Associates, Inc., 2017.
[78] TONG H, HUANG C, MINKU L L, et al. Surrogate models in evolutionary single-objective optimization: A new taxonomy and experimental study[J]. Information Sciences, 2021, 562: 414-437.
[79] FRANCON O, GONZALEZ S, HODJAT B, et al. Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2020): number 9. New York, USA: Association for Computing Machinery, 2020: 814-822.
[80] TANG J, CHENG J, XIANG D, et al. Large-Difference-Scale Target Detection Using a Revised Bhattacharyya Distance in SAR Images[J]. IEEE Geoscience and Remote Sensing Letters, 2022,19: 1-5.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP391.4
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/545188
专题工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
唐岚. 面向机动决策的演化强化学习模型研究[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12032497-唐岚-计算机科学与工程(19610KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[唐岚]的文章
百度学术
百度学术中相似的文章
[唐岚]的文章
必应学术
必应学术中相似的文章
[唐岚]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。