[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[2] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint, arXiv:1509.02971, 2016.
[3] LIU X Y, YANG H, CHEN Q, et al. FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance[J]. arXiv preprint, arXiv:2011.09607, 2020.
[4] STAHL N, FALKMAN G, KARLSSON A, et al. Deep reinforcement learning for multiparameter optimization in de novo drug design[J]. Journal of Chemical Information and Modeling, 2019, 59(7): 3166-3176.
[5] BELLMAN R. Dynamic programming and stochastic control processes[J]. Information and Control, 1958, 1(3): 228-239.
[6] DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning[J]. Nature, 2022, 602(7897): 414-419.
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[8] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [C]//BALCAN M, WEINBERGER K Q. Proceedings of the 33rd International Conference on Machine Learning. San Diego, CA, USA: JMLR, 2016: 1928-1937.
[9] ECOFFET A, HUIZINGA J, LEHMAN J, et al. First return, then explore[J]. Nature, 2021,590: 580-586.
[10] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning. San Diego, CA, USA: JMLR, 2015: 1889-1897.
[11] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J].arXiv preprint, arXiv:1707.06347, 2017.
[12] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforcement learning[J]. arXiv preprint, arXiv:1703.03864, 2017.
[13] CHRABASZCZ P, LOSHCHILOV I, HUTTER F. Back to basics: Benchmarking canonical evolution strategies for playing atari[C]//LANG J. Proceedings of the 27th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2018:1419-1426.
[14] CONTI E, MADHAVAN V, SUCH F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[M]//BENGIO S, WALLACH H M, LAROCHELLE H, et al. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2018: 5032-5043.
[15] YANG P, ZHANG H, YU Y, et al. Evolutionary reinforcement learning via cooperative coevolutionary negatively correlated search[J]. Swarm and Evolutionary Computation, 2022, 68:100974.
[16] LIU J, SNODGRASS S, KHALIFA A, et al. Deep learning for procedural content generation[J]. Neural Computing and Applications, 2021, 33(1): 19-37.
[17] JIANG M, GREFENSTETTE E, ROCKTASCHEL T. Prioritized level replay[C]//MEILA M,ZHANG T. Proceedings of the 38th International Conference on Machine Learning. San Diego,CA, USA: JMLR, 2021: 4940-4950.
[18] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. Cambridge, MA,USA: MIT Press, 1998.
[19] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3): 229-256.
[20] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//SOLLA S A, LEEN T K, MULLER K R. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press,1999: 1008-1014.
[21] JONG K A D. Evolutionary computation: A unified approach[M]. Cambridge, MA, USA: MIT Press, 2006.
[22] YAO X. Evolving artificial neural networks[J]. Proceedings of the IEEE, 1999, 87(9): 1423-1447.
[23] 喻杨龙. 基于负相关搜索的演化强化学习算法研究[D]. 深圳: 哈尔滨工业大学, 2020.
[24] MA X, LI X, ZHANG Q, et al. A survey on cooperative co-evolutionary algorithms[J]. IEEE Transaction of Evolution Computation, 2019, 23: 421-441.
[25] CHEN B, CASTRO R M, KRAUSE A. Joint optimization and variable selection of high dimensional gaussian processes[C]//Proceedings of the 29th International Conference on Machine Learning. San Diego, CA, USA: JMLR, 2012.
[26] CARPENTIER A, MUNOS R. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit[C]//LAWRENCE N D, GIROLAMI M A. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. San Diego, CA, USA: JMLR,2012: 190-198.
[27] DJOLONGA J, KRAUSE A, CEVHER V. High-dimensional gaussian process bandits[C]//BURGES C J C, BOTTOU L, GHAHRAMANI Z, et al. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2013: 1025-1033.
[28] KABÁN A, BOOTKRAJANG J, DURRANT R J. Towards large scale continuous EDA: arandom matrix theory perspective[J]. Evolutionary computation, 2013, 24(3): 255-291.
[29] WANG Z, ZOGHI M, HUTTER F, et al. Bayesian optimization in high dimensions via random embeddings[C]//ROSSI F. Proceedings of the 23rd international joint conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2013: 1778-1784.
[30] BINOIS M, GINSBOURGER D, ROUSTANT O. On the choice of the low-dimensional domain for global optimization via random embeddings[J]. Journal of Global Optimization, 2020, 76(1): 69-90.
[31] QIAN H, HU Y, YU Y. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings[C]//KAMBHAMPATI S. Proceedings of the 25th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2016: 1946-1952.
[32] QIAN H, YU Y. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with low effective dimensions[C]//SCHUURMANS D, WELLMAN M P. Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2016: 2000-2006.
[33] QIAN H, YU Y. Solving high-dimensional multi-objective optimization problems with low effective dimensions[C]//SINGH S P, MARKOVITCH S. Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI Press, 2017: 875-881.
[34] ČREPINšEK M, LIU S H, MERNIK M. Exploration and exploitation in evolutionary algorithms: A survey[J]. ACM computing surveys, 2013, 45(3): 1-33.
[35] TANG K, YANG P, YAO X. Negatively correlated search[J]. IEEE Journal on Selected Areas in Communications, 2016, 34: 542-550.
[36] CHATZILYGEROUDIS K I, RAMA R, KAUSHIK R, et al. Black-box data-efficient policy search for robotics[C]//IEEE International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA: IEEE, 2017: 51-58.
[37] 杨鹏. 基于自动分治的智能优化方法及其应用研究[D]. 合肥: 中国科学技术大学, 2017.
[38] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[C]//BENGIO Y, LECUN Y. Proceedings of International Conference on Learning Representations. ICLR Organizing Committee, 2016.
[39] SUCH F P, MADHAVAN V, CONTI E, et al. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning[J]. arXiv preprint, arXiv:1712.06567, 2017.
[40] DUCHI J C, JORDAN M I, WAINWRIGHT M J, et al. Optimal rates for zero-order convex optimization: The power of two function evaluations[J]. IEEE Transaction of Information Theory, 2015, 61: 2788-2806.
[41] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. arXiv preprint,arXiv:1606.01540, 2016.
[42] LI C, FARKHOOR H, LIU R, et al. Measuring the intrinsic dimension of objective landscapes [C]//Proceedings of International Conference on Learning Representations. ICLR Organizing Committee, 2018.
[43] AGHAJANYAN A, GUPTA S, ZETTLEMOYER L. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[C]//ZONG C, XIA F, LI W, et al. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2021: 7319-7328.
[44] BOTTOU L. Large-scale machine learning with stochastic gradient descent[C]//LECHEVALLIER Y, SAPORTA G. Proceedings of the 19th International Conference on Computational Statistics. Proceedings of International Committee on Computational Linguistics, 2010: 177-186.
[45] KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//BENGIO Y, LECUN Y. Proceedings of International Conference on Learning Representations. ICLR Organizing Committee, 2015.
[46] RUDER S. An overview of gradient descent optimization algorithms[J]. arXiv preprint,arXiv:1609.04747, 2016.
[47] WIERSTRA D, SCHAUL T, PETERS J, et al. Natural Evolution Strategies[J]. The Journal of Machine Learning Research, 2014, 15(1): 949-980.
[48] MACHADO M C, BELLEMARE M G, TALVITIE E, et al. Revisiting the arcade learningenvironment: Evaluation protocols and open problems for general agents[J]. Journal of Artificial Intelligence Research, 2018, 61: 523-562.
[49] KOSTRIKOV I. Pytorch implementations of asynchronous advantage actor critic[EB/OL]. 2019
[2019-03-20]. https://github.com/ikostrikov/pytorch-a3c.
[50] TOBIN J, FONG R, RAY A, et al. Domain randomization for transferring deep neural networks from simulation to the real world[C]//International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA: IEEE, 2017: 23-30.
[51] COBBE K, KLIMOV O, HESSE C, et al. Quantifying generalization in reinforcement learning[C]//CHAUDHURI K, SALAKHUTDINOV R. Proceedings of the 36th International Conference on Machine Learning. San Diego, CA, USA: JMLR, 2019: 1282-1289.
[52] ROSIQUE F, NAVARRO P J, FERNÁNDEZ C, et al. A systematic review of perception system and simulators for autonomous vehicles research[J]. Sensors, 2019, 19(3): 648.
[53] ZHANG C, VINYALS O, MUNOS R, et al. A study on overfitting in deep reinforcement learning[J]. arXiv preprint, arXiv:1804.06893, 2018.
[54] PACKER C, GAO K, KOS J, et al. Assessing generalization in deep reinforcement learning[J]. arXiv preprint, arXiv:1810.12282, 2019.
[55] ZHAO C, SIGAUD O, STULP F, et al. Investigating generalisation in continuous deep reinforcement learning[J]. arXiv preprint, 2019, arXiv:1902.07015.
[56] YARATS D, KOSTRIKOV I, FERGUS R. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels[C]//Proceedings of International Conference on Learning Representations. ICLR Organizing Committee, 2021.
[57] LASKIN M, LEE K, STOOKE A, et al. Reinforcement learning with augmented data[C]//LAROCHELLE H, RANZATO M, HADSELL R, et al. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2020: 19884-19895.
[58] RISI S, TOGELIUS J. Increasing generality in machine learning through procedural content generation[J]. Nature Machine Intelligence, 2020, 2(8): 428-436.
[59] SONAR A, PACELLI V, MAJUMDAR A. Invariant policy optimization: Towards stronger generalization in reinforcement learning[C]//JADBABAIE A, LYGEROS J, PAPPAS G J, et al. Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control. San Diego,CA, USA: PMLR, 2021: 21-33.
[60] AMIT R, MEIR R, CIOSEK K. Discount factor as a regularizer in reinforcement learning[C]//Proceedings of the 37th International Conference on Machine Learning. San Diego, CA, USA:JMLR, 2020: 269-278. 57
[61] IGL M, CIOSEK K, LI Y, et al. Generalization in reinforcement learning with selective noise injection and information bottleneck[C]//WALLACH H M, LAROCHELLE H, BEYGELZIMERA, et al. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2019: 13956-13968.
[62] RAJESWARAN A, LOWREY K, TODOROV E, et al. Towards generalization and simplicity in continuous control[C]//GUYON I, VON LUXBURG U, BENGIO S, et al. Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2017: 6550-6561.
[63] GONG X, JIA H, ZHOU X, et al. Improving policy generalization for teacher-student reinforcement learning[C]//LI G, SHEN H T, YUAN Y, et al. Proceedings of the 13th International Conference Knowledge Science, Engineering and Management. Springer, 2020: 39-47.
[64] COHN D A, ATLAS L E, LADNER R E. Improving generalization with active learning[J]. Machine Learning, 1994, 15: 201-221.
[65] LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[C]//Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM/Springer, 1994: 3-12.
[66] EPSHTEYN A, VOGEL A, DEJONG G. Active reinforcement learning[C]//COHEN W W, MCCALLUM A, ROWEIS S T. Proceedings of the 25th International Conference on Machine learning. San Diego, CA, USA: JMLR, 2008: 296-303.
修改评论