[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning[J]. CoRR, 2013, abs/1312.5602.
[2] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[3] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nat., 2019, 575(7782): 350-354.
[4] WILLIAMS R J. Simple Statistical Gradient-Following Algorithms for Connectionist Rein- forcement Learning[J]. Machine Learning, 1992, 8(3): 229-256.
[5] RUMMERY G A, NIRANJAN M. On-line Q-learning using connectionist systems: 166[R]. Cambridge University Engineering Department, 1994.
[6] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
[7] KONDA V, TSITSIKLIS J. Actor-Critic Algorithms[C]//Advances in Neural Information Pro- cessing Systems: volume 12. Denver, CO, USA: MIT Press, 1999: 1008-1014.
[8] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforce- ment learning[J]. Nature, 2015, 518(7540): 529-533.
[9] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized Experience Replay[C]//4th Inter- national Conference on Learning Representations, ICLR 2016. San Juan, Puerto Rico, 2016: 1-21.
[10] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of Machine Learning Research: volume 48 Proceedings of the 33rd International Conference on Machine Learning, ICML 2016. New York, NY, USA: JMLR.org, 2016: 1928-1937.
[11] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]// Proceedings of Machine Learning Research: volume 32 Proceedings of the 31st International Conference on Machine Learning, ICML 2014. Beijing, China: JMLR.org, 2014: 387-395.
[12] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C]//4th International Conference on Learning Representations, ICLR 2016. San Juan, Puerto Rico, 2016: 1-14.
[13] PLAPPERT M, HOUTHOOFT R, DHARIWAL P, et al. Parameter Space Noise for Exploration [C]//6th International Conference on Learning Representations, ICLR 2018. Vancouver, BC, Canada: OpenReview.net, 2018: 1-18.
[14] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy Networks For Exploration[C]//6th In- ternational Conference on Learning Representations, ICLR 2018. Vancouver, BC, Canada: OpenReview.net, 2018: 1-21.
[15] OSTROVSKI G, BELLEMARE M G, VAN DEN OORD A, et al. Count-Based Exploration with Neural Density Models[C]//Proceedings of Machine Learning Research: volume 70 Pro- ceedings of the 34th International Conference on Machine Learning, ICML 2017. Sydney, NSW, Australia: PMLR, 2017: 2721-2730.
[16] TANG H, HOUTHOOFT R, FOOTE D, et al. #Exploration: A Study of Count-Based Ex- ploration for Deep Reinforcement Learning[C]//Advances in Neural Information Processing Systems: volume 30. Long Beach, CA, USA: Curran Associates, Inc., 2017: 2753-2762.
[17] BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying Count-Based Exploration and Intrinsic Motivation[C]//Advances in Neural Information Processing Systems: volume 29. Barcelona, Spain: Curran Associates, Inc., 2016: 1471-1479.
[18] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven Exploration by Self-supervised Prediction[C]//Proceedings of Machine Learning Research: volume 70 Proceedings of the 34th International Conference on Machine Learning, ICML 2017. Sydney, NSW, Australia: PMLR, 2017: 2778-2787.
[19] STANLEY K O, MIIKKULAINEN R. Efficient Reinforcement Learning through Evolving Neural Network Topologies[C]//GECCO’02: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. New York, NY, USA: Morgan Kaufmann, 2002: 569– 577.
[20] SALIMANS T, HO J, CHEN X, et al. Evolution Strategies as a Scalable Alternative to Rein- forcement Learning[J]. CoRR, 2017, abs/1703.03864.
[21] LIU G, ZHAO L, YANG F, et al. Trust Region Evolution Strategies[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 4352-4359.
[22] FUKS L, AWAD N, HUTTER F, et al. An Evolution Strategy with Progressive Episode Lengths for Playing Games[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. Macao, China: International Joint Conferences on Artificial Intelligence Organization, 2019: 1234-1240.
[23] JADERBERG M, DALIBARD V, OSINDERO S, et al. Population Based Training of Neural Networks[J]. CoRR, 2017, abs/1711.09846.
[24] FAUST A, FRANCIS A G, MEHTA D. Evolving Rewards to Automate Reinforcement Learn- ing[J]. CoRR, 2019, abs/1905.07628.
[25] HOUTHOOFT R, CHEN Y, ISOLA P, et al. Evolved Policy Gradients[C]//Advances in Neural Information Processing Systems: volume 31. Montréal, Canada: Curran Associates, Inc., 2018: 5405-5414.
[26] KHADKA S, TUMER K. Evolution-Guided Policy Gradient in Reinforcement Learning[C]// Advances in Neural Information Processing Systems: volume 31. Montréal, Canada: Curran Associates, Inc., 2018: 1196-1208.
[27] SUTTON R S. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding[C]//Advances in Neural Information Processing Systems: volume 8. Denver, CO, USA: MIT Press, 1995: 1038-1044.
[28] SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge, MA, USA: A Bradford Book, 2018.
[29] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy Gradient Methods for Reinforce- ment Learning with Function Approximation[C]//Advances in Neural Information Processing Systems: volume 12. Denver, CO, USA: MIT Press, 1999: 1057-1063.
[30] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust Region Policy Optimization[C]// Proceedings of Machine Learning Research: volume 37 Proceedings of the 32nd International Conference on Machine Learning, ICML 2015. Lille, France: PMLR, 2015: 1889-1897.
[31] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms [J]. CoRR, 2017, abs/1707.06347.
[32] DEGRIS T, WHITE M, SUTTON R S. Off-Policy Actor-Critic[C]//Proceedings of Machine Learning Research: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. Madison, WI, USA: Omnipress, 2012: 179–186.
[33] WANG Z, BAPST V, HEESS N, et al. Sample Efficient Actor-Critic with Experience Replay [C]//5th International Conference on Learning Representations, ICLR 2017. Toulon, France: OpenReview.net, 2017: 1-20.
[34] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing Function Approximation Error in Actor- Critic Methods[C]//Proceedings of Machine Learning Research: volume 80 Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Stockholm, Sweden: PMLR, 2018: 1582-1591.
[35] WIERSTRA D, SCHAUL T, GLASMACHERS T, et al. Natural Evolution Strategies[J]. Jour- nal of Machine Learning Research, 2014, 15(27): 949-980.
[36] CHRABąSZCZ P, LOSHCHILOV I, HUTTER F. Back to Basics: Benchmarking Canoni- cal Evolution Strategies for Playing Atari[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Arti- ficial Intelligence Organization, 2018: 1419-1426.
[37] HANSEN N, OSTERMEIER A. Completely Derandomized Self-Adaptation in Evolution Strategies[J]. Evolutionary Computation, 2001, 9(2): 159-195.
[38] HEIDRICH-MEISNER V, IGEL C. Evolution Strategies for Direct Policy Search[C]//Parallel Problem Solving from Nature – PPSN X. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008: 428-437.
[39] HEIDRICH-MEISNER V, IGEL C. Neuroevolution strategies for episodic reinforcement learn- ing[J]. Journal of Algorithms, 2009, 64(4): 152-168.
[40] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning Representations by Back- Propagating Errors[M]. Cambridge, MA, USA: MIT Press, 1988: 696–699.
[41] KINGMA D P, BA J. Adam: A Method for Stochastic Optimization[C]//3rd International Con- ference on Learning Representations, ICLR 2015. San Diego, CA, USA, 2015: 1-15.
[42] LOSHCHILOV I, HUTTER F. Decoupled Weight Decay Regularization[C]//7th International Conference on Learning Representations, ICLR 2019. New Orleans, LA, USA: OpenRe- view.net, 2019: 1-8.
[43] KHADKA S, MAJUMDAR S, NASSAR T, et al. Collaborative Evolutionary Reinforcement Learning[C]//Proceedings of Machine Learning Research: volume 97 Proceedings of the 36th International Conference on Machine Learning, ICML 2019. Long Beach, CA, USA: PMLR, 2019: 3341-3350.
[44] BODNAR C, DAY B, LIÓ P. Proximal Distilled Evolutionary Reinforcement Learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence: volume 34. New York, NY, USA, 2020: 3283-3290.
[45] POURCHOT, SIGAUD. CEM-RL: Combining evolutionary and gradient-based methods for policy search[C]//7th International Conference on Learning Representations, ICLR 2019. New Orleans, LA, USA: OpenReview.net, 2019: 1-19.
[46] RUDOLPH G. Convergence properties of evolutionary algorithms[M]. Kovac, 1997.
[47] MARCHESINI E, CORSI D, FARINELLI A. Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning[C]//9th International Conference on Learning Representations, ICLR 2021. Vienna, Austria: OpenReview.net, 2021: 1-15.
[48] TANG Y. Guiding Evolutionary Strategies with Off-Policy Actor-Critic[C]//AAMAS ’21: Pro- ceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC, USA: International Foundation for Autonomous Agents and Multiagent Systems, 2021: 1317–1325.
[49] LEE K, LEE B U, SHIN U, et al. An Efficient Asynchronous Method for Integrating Evo- lutionary and Gradient-based Policy Search[C]//Advances in Neural Information Processing Systems: volume 33. Vancouver, British Columbia, Canada: Curran Associates, Inc., 2020: 10124-10135.
[50] ZIMMER M, WENG P. Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains[C]//Proceedings of the Twenty-Eighth International Joint Con- ference on Artificial Intelligence, IJCAI-19. Macao, China: International Joint Conferences on Artificial Intelligence Organization, 2019: 4496-4502.
[51] NACHUM O, NOROUZI M, TUCKER G, et al. Smoothed Action Value Functions for Learning Gaussian Policies[C]//Proceedings of Machine Learning Research: volume 80 Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Stockholm, Sweden: PMLR, 2018: 3692-3700.
[52] SINHA S, SONG J, GARG A, et al. Experience Replay with Likelihood-free Importance Weights[C]//Proceedings of Machine Learning Research: volume 168 Proceedings of The 4th Annual Learning for Dynamics and Control Conference. Stanford, CA, USA: PMLR, 2022: 110-123.
[53] TODOROV E, EREZ T, TASSA Y. MuJoCo: A physics engine for model-based control[C]// 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura, Al- garve, Portugal, 2012: 5026-5033.
[54] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI Gym[J]. CoRR, 2016, abs/1606.01540.
[55] NGUYEN H T, TRAN K, LUONG N H. Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control[C]//IEEE Congress on Evolutionary Compu- tation, CEC 2022. Padua, Italy: IEEE, 2022: 1-8.
[56] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft Actor-Critic Algorithms and Appli- cations[J]. CoRR, 2018, abs/1812.05905.
[57] ZHANG S, SUTTON R S. A Deeper Look at Experience Replay[J]. CoRR, 2017, abs/1712.01275.
[58] MORITZ P, NISHIHARA R, WANG S, et al. Ray: A Distributed Framework for Emerging AI Applications[C]//13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018. Carlsbad, CA, USA: USENIX Association, 2018: 561–577.
[59] LIANG E, LIAW R, NISHIHARA R, et al. RLlib: Abstractions for Distributed Reinforcement Learning[C]//Proceedings of Machine Learning Research: volume 80 Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Stockholm, Sweden: PMLR, 2018: 3053-3062.
[60] LEVINE S, KUMAR A, TUCKER G, et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems[J]. CoRR, 2020, abs/2005.01643.
[61] FUJIMOTO S, GU S. A Minimalist Approach to Offline Reinforcement Learning[C]//Advances in Neural Information Processing Systems: volume 34. Virtual Event: Curran Associates, Inc., 2021: 20132-20145.
修改评论