[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning: abs/1312.5602[A]. 2013.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[5] YE D, LIU Z, SUN M, et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning[J]. Proceedings of the 34rd AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(04): 6672-6679.
[6] THORP H H. ChatGPT is fun, but not an author[J]. Science, 2023, 379(6630): 313.
[7] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
[8] DRUGAN M M. Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms[J]. Swarm and Evolutionary Computation, 2019, 44: 228-246.
[9] YANG P, ZHANG H, YU Y, et al. Evolutionary reinforcement learning via cooperative co-evolutionary negatively correlated search[J]. Swarm and Evolutionary Computation, 2022, 68: 100974.
[10] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforcement learning: abs/1703.03864[A]. 2017.
[11] YANG Q, ZHANG J, SHI G, et al. Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning[J]. IEEE Access, 2020, 8: 363-378.
[12] SUN Z, PIAO H, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112.
[13] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy Networks for Exploration: abs/1706.10295 [A]. 2017.
[14] CHRABASZCZ P, LOSHCHILOV I, HUTTER F. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari: abs/1802.08842[A]. 2018.
[15] CONTI E, MADHAVAN V, PETROSKI SUCH F, et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents[C]// Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018.
[16] TANG K, YANG P, YAO X. Negatively Correlated Search[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(3): 542-550.
[17] PARKER-HOLDER J, PACCHIANO A, CHOROMANSKI K M, et al. Effective Diversityin Population Based Reinforcement Learning[C]//Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS 2020): volume 33. Curran Associates, Inc., 2020: 18050-18062.
[18] KHADKA S, TUMER K. Evolution-Guided Policy Gradient in Reinforcement Learning[C]//Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS 2018): volume 31. Curran Associates, Inc., 2018.
[19] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning: abs/1509.02971[A]. 2015.
[20] POURCHOT A, SIGAUD O. CEM-RL: Combining evolutionary and gradient-based methods for policy search[C]//Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). New Orleans, USA: OpenReview.net, 2019.
[21] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing Function Approximation Error in Actor-Critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1587-1596.
[22] LI P, TANG H, HAO J, et al. ERL-Re2: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation: abs/2210.17375[A]. 2022.
[23] SONG Z, WANG H, HE C, et al. A Kriging-Assisted Two-Archive Evolutionary Algorithm for Expensive Many-Objective Optimization[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(6): 1013-1027.
[24] STORK J, ZAEFFERER M, BARTZ-BEIELSTEIN T, et al. Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2019): number 9. New York, USA: Association for Computing Machinery, 2019: 934-942.
[25] WANG Y, ZHANG T, CHANG Y, et al. A surrogate-assisted controller for expensive evolutionary reinforcement learning[J]. Information Sciences, 2022, 616: 539-557.
[26] MASCHLER M, ZAMIR S, SOLAN E. Game theory[M]. Cambridge University Press, 2020.
[27] MGUNI D H, WU Y, DU Y, et al. Learning in Nonzero-Sum Stochastic Games with Potentials[C]//Proceedings of the 38th International Conference on Machine Learning (ICML 2021): volume 139. PMLR, 2021: 7688-7699.
[28] ZHANG R, ZONG Q, ZHANG X, et al. Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-10.
[29] 车竞, 钱炜祺, 和争春. 基于矩阵博弈的两机攻防对抗空战仿真[J]. 飞行力学, 2015(2):173-177.
[30] RIZK Y, AWAD M, TUNSTEL E W. Decision Making in Multiagent Systems: A Survey[J]. IEEE Transactions on Cognitive and Developmental Systems, 2018, 10(3): 514-529.
[31] PAN Q, ZHOU D, HUANG J, et al. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Macao, China: IEEE, 2017: 726-731.
[32] WEINTRAUB I E, PACHTER M, GARCIA E. An Introduction to Pursuit-evasion Differential Games[C]//Proceedings of the American Control Conference (ACC 2020). Denver, CO, USA: IEEE, 2020: 1049-1066.
[33] PARK H, LEE B Y, TAHK M J, et al. Differential Game Based Air Combat Maneuver Generation Using Scoring Function Matrix[J]. International Journal of Aeronautical and Space Sciences, 2016, 17(2): 204-213.
[34] MCGREW J S, HOW J P, WILLIAMS B, et al. Air-Combat Strategy Using Approximate Dynamic Programming[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(5): 1641-1654.
[35] 黄长强, 赵克新, 韩邦杰, 等. 一种近似动态规划的无人机机动决策方法[J]. 电子与信息学报, 2018, 40(10): 2447-2452.
[36] LI F, FUHUAI X, GUANGLEI M, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11): 1994-1999.
[37] ERNEST N, CARROLL D, SCHUMACHER C, et al. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1): 2167-0374.
[38] KANG Y, PU Z, LIU Z, et al. Air-to-Air Combat Tactical Decision Method Based on SIRMs Fuzzy Logic and Improved Genetic Algorithm[C]//Proceedings of the International Conference on Guidance, Navigation and Control (ICGNC 2022): volume 644. Singapore: Springer Singapore, 2022: 3699-3709.
[39] ZHANG H, HUANG C. Maneuver Decision-Making of Deep Learning for UCAV Thorough Azimuth Angles[J]. IEEE Access, 2020, 8: 12976-12987.
[40] HU D, YANG R, ZUO J, et al. Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat[J]. IEEE Access, 2021, 9: 32282-32297.
[41] WANG Z, LI H, WU H, et al. Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm[J]. Mathematical Problems in Engineering, 2020, 2020: 1-17.
[42] POPE A P, IDE J S, MIĆOVIĆ D, et al. Hierarchical Reinforcement Learning for Air Combat At DARPA’s AlphaDogfight Trials[J]. IEEE Transactions on Artificial Intelligence, 2022: 1-15.
[43] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms: abs/1707.06347[A]. 2017.
[44] QIAN H, HU Y Q, YU Y. Derivative-Free Optimization of High-Dimensional Non-Convex Functions by Sequential Random Embeddings[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016). New York, USA: AAAI Press, 2016: 1946-1952.
[45] ZHOU A, ZHANG J, SUN J, et al. Fuzzy-Classification Assisted Solution Preselection in Evolutionary Optimization[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 2403-2410.
[46] GEIST M, SCHERRER B, PIETQUIN O. A Theory of Regularized Markov Decision Processes[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 2160-2169.
[47] LEE H, SONG C, KIM N, et al. Comparative Analysis of Energy Management Strategies for HEV: Dynamic Programming and Reinforcement Learning[J]. IEEE Access, 2020, 8: 67112-67123.
[48] JANZ D, HRON J, MAZUR P A, et al. Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning[C]//Proceedings of the 33th Advances in Neural Information Processing Systems (NeurIPS 2019): volume 32. Curran Associates, Inc., 2019.
[49] WANG X, WANG S, LIANG X, et al. Deep Reinforcement Learning: A Survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-15.
[50] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications: abs/1812.05905[A]. 2018.
[51] EYSENBACH B, GUPTA A, IBARZ J, et al. Diversity is all you need: Learning skills without a reward function: abs/1802.06070[A]. 2018.
[52] ZHANG Y, YU W, TURK G. Learning Novel Policies For Tasks[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 7483-7492.
[53] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]//Proceedings of the 31st International Conference on Machine Learning (ICML 2014): volume 32. Bejing, China: PMLR, 2014: 387-395.
[54] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML 2015): volume 37. Lille, France: PMLR, 2015: 1889-1897.
[55] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning (ICML 2016): volume 48. New York, USA: PMLR, 2016: 1928-1937.
[56] HEESS N, TB D, SRIRAM S, et al. Emergence of Locomotion Behaviours in Rich Environments: abs/1707.02286[A]. 2017.
[57] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients: abs/1804.08617[A]. 2018.
[58] ESPEHOLT L, SOYER H, MUNOS R, et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018): volume 80. PMLR, 2018: 1407-1416.
[59] YAZDANI D, CHENG R, YAZDANI D, et al. A Survey of Evolutionary Continuous Dynamic Optimization Over Two Decades—Part A[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(4): 609-629.
[60] MIRJALILI S. Genetic Algorithm: volume 780[M]. Springer, Cham, 2019: 43-55.
[61] DAS S, MULLICK S S, SUGANTHAN P. Recent advances in differential evolution –An updated survey[J]. Swarm and Evolutionary Computation, 2016, 27: 1-30.
[62] QIAN H, YU Y. Derivative-free reinforcement learning: a review[J]. Frontiers of Computer Science, 2021, 15(6): 156336.
[63] LIU G, ZHAO L, YANG F, et al. Trust Region Evolution Strategies[J]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, 33(01): 4352-4359.
[64] KHADKA S, MAJUMDAR S, NASSAR T, et al. Collaborative Evolutionary Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning (ICML 2019): volume 97. PMLR, 2019: 3341-3350.
[65] MAJUMDAR S, KHADKA S, MIRET S, et al. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination[C]//Proceedings of the 37th International Conference on Machine Learning (ICML 2020): volume 119. PMLR, 2020: 6651-6660.
[66] 孙浩亮, 梁海军, 张东顺. 美军自主无人系统发展及启示[J]. 舰船电子工程, 2022, 42(7):1-4.
[67] JULIANI A, BERGES V P, TENG E, et al. Unity: A general platform for intelligent agents: abs/1809.02627[A]. 2018.
[68] JULIANI A, KHALIFA A, BERGES V P, et al. Obstacle tower: A generalization challenge in vision, control, and planning: abs/1902.01378[A]. 2019.
[69] KOLVE E, MOTTAGHI R, HAN W, et al. AI2-THOR: An Interactive 3D Environment for Visual AI: abs/1712.05474[A]. 2017.
[70] ZHU Y, MOTTAGHI R, KOLVE E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017). Singapore: IEEE, 2017: 3357-3364.
[71] YAN C, MISRA D, BENNNETT A, et al. Chalet: Cornell house agent learning environment: abs/1801.07357[A]. 2018.
[72] ANDRYCHOWICZ O M, BAKER B, CHOCIEJ M, et al. Learning Dexterous In-Hand Manipulation[J]. The International Journal of Robotics Research, 2020, 39(1): 3-20.
[73] BEHBAHANI F, SHIARLIS K, CHEN X, et al. Learning From Demonstration in the Wild [C]//Proceedings of the International Conference on Robotics and Automation (ICRA 2019). Montreal, QC, Canada: IEEE, 2019: 775-781.
[74] SONG Y, WOJCICKI A, LUKASIEWICZ T, et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence[J]. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), 2020, 34(05): 7253-7260.
[75] SWORDMASTER. Air Warfare Pro Template[EB/OL]. https://github.com/swordmaster003/Air-Warfare-Pro.
[76] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with Large Scale Deep Reinforcement Learning: abs/1912.06680[A]. 2019.
[77] DING N, SORICUT R. Cold-Start Reinforcement Learning with Softmax Policy Gradient[C]//Proceedings of the 31th Advances in Neural Information Processing Systems (NeurIPS 2017): volume 30. Curran Associates, Inc., 2017.
[78] TONG H, HUANG C, MINKU L L, et al. Surrogate models in evolutionary single-objective optimization: A new taxonomy and experimental study[J]. Information Sciences, 2021, 562: 414-437.
[79] FRANCON O, GONZALEZ S, HODJAT B, et al. Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription[C]//Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2020): number 9. New York, USA: Association for Computing Machinery, 2020: 814-822.
[80] TANG J, CHENG J, XIANG D, et al. Large-Difference-Scale Target Detection Using a Revised Bhattacharyya Distance in SAR Images[J]. IEEE Geoscience and Remote Sensing Letters, 2022,19: 1-5.
修改评论