[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[2] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with large scale deep reinforcement learning[A]. 2019. arXiv preprint arXiv: 1912.06680.
[3] WIERSTRA D P. A study in direct policy search[D]. München: Technische Universität München, 2010.
[4] SIGAUD O, STULP F. Policy search in continuous action domains: an overview[J]. Neural Networks, 2019, 113: 28-40.
[5] HOLLAND J H. Genetic algorithms[J]. Scientific American, 1992, 267(1): 66-73.
[6] BÄCK T, HOFFMEISTER F, SCHWEFEL H P. A survey of evolution strategies[C]//Proceedings of the Fourth International Conference on Genetic Algorithms. Citeseer, 1991.
[7] GONG Y J, CHEN W N, ZHAN Z H, et al. Distributed evolutionary algorithms and their models: a survey of the state-of-the-art[J]. Applied Soft Computing, 2015, 34: 286-300.
[8] DEISENROTH M P, NEUMANN G, PETERS J, et al. A survey on policy search for robotics [J]. Foundations and Trends in Robotics, 2013, 2(1-2): 388-403.
[9] DEISENROTH M, RASMUSSEN C E. PILCO: a model-based and data-efficient approach to policy search[C]//Proceedings of the 28th International Conference on Machine Learning (ICML-11). Citeseer, 2011: 465-472.
[10] PETERS J, MULLING K, ALTUN Y. Relative entropy policy search[C]//Twenty-Fourth AAAI Conference on Artificial Intelligence. 2010.
[11] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3): 229-256.
[12] RÜCKSTIESS T, SEHNKE F, SCHAUL T, et al. Exploring parameter space in reinforcement learning[J]. Paladyn, 2010, 1(1): 14-24.
[13] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning. PMLR, 2014: 387-395.
[14] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[A]. 2015. arXiv preprint arXiv: 1509.02971.
[15] SUN Y, WIERSTRA D, SCHAUL T, et al. Efficient natural evolution strategies[C]//Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation. 2009: 539-546.
[16] HWANGBO J, GEHRING C, SOMMER H, et al. ROCK∗—efficient black-box optimization for policy learning[C]//2014 IEEE-RAS International Conference on Humanoid Robots. IEEE, 2014: 535-540.
[17] AMARI S I. Natural gradient works efficiently in learning[J]. Neural Computation, 1998, 10 (2): 251-276.
[18] ROSENSTEIN M T, BARTO A G. Robot weightlifting by direct policy search[C]//International Joint Conference on Artificial Intelligence: volume 17. Citeseer, 2001: 839-846.
[19] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforcement learning[A]. 2017. arXiv preprint arXiv: 1703.03864.
[20] CONTI E, MADHAVAN V, PETROSKI SUCH F, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. Advances in Neural Information Processing Systems, 2018, 31.
[21] CHEN Z, ZHOU Y, HE X, et al. A restart-based rank-1 evolution strategy for reinforcement learning.[C]//International Joint Conferences on Artificial Intelligence (IJCAI). 2019: 2130-2136.
[22] CHOROMANSKI K M, PACCHIANO A, PARKER-HOLDER J, et al. From complexity to simplicity: adaptive es-active subspaces for blackbox optimization[J]. Advances in Neural Information Processing Systems, 2019, 32.
[23] SUCH F P, MADHAVAN V, CONTI E, et al. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning[A]. 2017. arXiv preprint arXiv: 1712.06567.
[24] PEREDO O, ORTIZ J M. Multiple-point geostatistical simulation based on genetic algorithms implemented in a shared-memory supercomputer[M]//Geostatistics Oslo 2012. Springer, 2012: 103-114.
[25] SURAKHI O M, QATAWNEH M, AL OFEISHAT H A. A parallel genetic algorithm for maximum flow problem[J]. International Journal of Advanced Computer Science and Applications, 2017, 8(6): 159-164.
[26] 吕奕清, 林锦贤. 基于MPI 的并行PSO 混合K 均值聚类算法[J]. 计算机应用, 2011, 31(2):428-431.
[27] LI D, LI K, LIANG J, et al. A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems[J]. Neurocomputing, 2019, 330: 380-393.
[28] 夏卫雷, 王立松. 基于MapReduce 的并行蚁群算法研究与实现[J]. 电子科技, 2013, 26(2): 146-149.
[29] 王诏远, 王宏杰, 邢焕来, 等. 基于Spark 的蚁群优化算法[J]. 计算机应用, 2015, 35(10):2777-2780.
[30] QIQI D, LIJUN S, YUHUI S. Spark clustering computing platform based parallel particle swarm optimizers for computationally expensive global optimization[C]//International Conference on Parallel Problem Solving from Nature. Springer, 2018: 424-435.
[31] YANG C, LI H, REZGUI Y, et al. High throughput computing based distributed genetic algorithm for building energy consumption optimization[J]. Energy and Buildings, 2014, 76: 92-101.
[32] SCHUMAN C D, DISNEY A, SINGH S P, et al. Parallel evolutionary optimization for neuromorphic network training[C]//2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC). IEEE, 2016: 36-46.
[33] GONG Y, FUKUNAGA A. Distributed island-model genetic algorithms using heterogeneous parameter settings[C]//2011 IEEE Congress of Evolutionary Computation (CEC). IEEE, 2011: 820-827.
[34] 赵凤强, 徐毅, 李广强. 基于岛屿群体模型的多目标演化算法研究[J]. 计算机科学, 2010,37(12): 190-192.
[35] LIU Y, ZHAO R, ZHENG K, et al. A hybrid parallel genetic algorithm with dynamic migration strategy based on sunway many-core processor[C]//2017 IEEE 19th International Conference on High Performance Computing and Communications Workshops (HPCCWS). IEEE, 2017:9-15.
[36] HEIDRICH-MEISNER V, IGEL C. Evolution strategies for direct policy search[C]//International Conference on Parallel Problem Solving from Nature. Springer, 2008: 428-437.
[37] CHOROMANSKI K, ROWLAND M, SINDHWANI V, et al. Structured evolution with compact architectures for scalable policy optimization[C]//International Conference on Machine Learning. PMLR, 2018: 970-978.
[38] TUNYASUVUNAKOOL S, MULDAL A, DORON Y, et al. Dm_control: software and tasks for continuous control[J]. Software Impacts, 2020, 6: 100022.
[39] BEATTIE C, LEIBO J Z, TEPLYASHIN D, et al. Deepmind lab[A]. 2016. arXiv preprintarXiv: 1612.03801.
[40] FORTUNATO M, TAN M, FAULKNER R, et al. Generalization of reinforcement learners with working and episodic memory[C]//Advances in Neural Information Processing Systems. 2019: 12448-12457.
[41] LEIBO J Z, D’AUTUME C D M, ZORAN D, et al. Psychlab: a psychology laboratory for deep reinforcement learning agents[A]. 2018. arXiv preprint arXiv: 1801.08116.
[42] KURACH K, RAICHUK A, STAŃCZYK P, et al. Google research football: a novel reinforcement learning environment[A]. 2019. arXiv preprint arXiv: 1907.11180.
[43] YU T, QUILLEN D, HE Z, et al. Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning[C]//Conference on Robot Learning. PMLR, 2020: 1094-1100.
[44] BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent tool use from multi-agent autocurricula[A]. 2019. arXiv preprint arXiv: 1909.07528.
[45] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI Gym[A]. 2016. arXiv preprint arXiv: 1606.01540.
[46] NICHOL A, PFAU V, HESSE C, et al. Gotta learn fast: a new benchmark for generalization in RL[A]. 2018. arXiv preprint arXiv: 1804.03720.
[47] WHITLEY D, STARKWEATHER T. Genitor II: a distributed genetic algorithm[J]. Journal of Experimental & Theoretical Artificial Intelligence, 1990, 2(3): 189-214.
[48] ISHIMIZU T, TAGAWA K. A structured differential evolution for various network topologies [J]. Int. J. Comput. Commun, 2010, 4(1): 2-8.
[49] TASOULIS D, PAVLIDIS N, PLAGIANAKOS V, et al. Parallel differential evolution[C]//Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753):volume 2. 2004: 2023-2029 Vol.2.
[50] OSTERMEIER A, GAWELCZYK A, HANSEN N. A derandomized approach to selfadaptation of evolution strategies[J]. Evolutionary Computation, 1994, 2(4): 369-380.
[51] LI Z, ZHANG Q. A simple yet efficient evolution strategy for large-scale black-box optimization[J]. IEEE Transactions on Evolutionary Computation, 2017, 22(5): 637-646.
[52] HE X, ZHOU Y, CHEN Z, et al. Large-scale evolution strategy based on search direction adaptation[J]. IEEE Transactions on Cybernetics, 2019, 51(3): 1651-1665.
[53] POLAND J, ZELL A. Main vector adaptation: a CMA variant with linear time and space complexity[C]//Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation. Citeseer, 2001: 1050-1055.
[54] WHITLEY D, DOMINIC S, DAS R, et al. Genetic reinforcement learning for neurocontrol problems[J]. Machine Learning, 1993, 13(2): 259-284.
[55] BROOKS S H. A discussion of random methods for seeking maxima[J]. Operations Research, 1958, 6(2): 244-251.
[56] LIANG J J, QIN A K, SUGANTHAN P N, et al. Comprehensive learning particle swarm optimizer for global optimization of multimodal functions[J]. IEEE Transactions on Evolutionary Computation, 2006, 10(3): 281-295.
[57] SHI Y. Brain storm optimization algorithm[C]//International Conference in Swarm Intelligence. Springer, 2011: 303-309.
[58] SHI Y. Brain storm optimization algorithm in objective space[C]//2015 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2015: 1227-1234.
[59] WIERSTRA D, SCHAUL T, PETERS J, et al. Fitness expectation maximization[C]//International Conference on Parallel Problem Solving from Nature. Springer, 2008: 337-346.
[60] BEYER H G, SENDHOFF B. Simplify your covariance matrix adaptation evolution strategy [J]. IEEE Transactions on Evolutionary Computation, 2017, 21(5): 746-759.
[61] SCHAUL T, GLASMACHERS T, SCHMIDHUBER J. High dimensions and heavy tails for natural evolution strategies[C]//Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. 2011: 845-852.
[62] GLASMACHERS T, SCHAUL T, YI S, et al. Exponential natural evolution strategies[C]// Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. 2010: 393-400.
[63] DUAN Y, CHEN X, HOUTHOOFT R, et al. Benchmarking deep reinforcement learning for continuous control[C]//International Conference on Machine Learning. PMLR, 2016: 1329-1338.
[64] MEI Y, OMIDVAR M N, LI X, et al. A competitive divide-and-conquer algorithm for unconstrained large-scale black-box optimization[J]. ACM Transactions on Mathematical Software (TOMS), 2016, 42(2): 1-24.
[65] OMIDVAR M N, YANG M, MEI Y, et al. DG2: a faster and more accurate differential grouping for large-scale black-box optimization[J]. IEEE Transactions on Evolutionary Computation, 2017, 21(6): 929-942.
[66] YANG M, ZHOU A, LI C, et al. An efficient recursive differential grouping for large-scale continuous problems[J]. IEEE Transactions on Evolutionary Computation, 2020, 25(1): 159-171.
[67] CORANA A, MARCHESI M, MARTINI C, et al. Minimizing multimodal functions of continuous variables with the simulated annealing algorithm[J]. ACM Transactions on Mathematical Software (TOMS), 1987, 13(3): 262-280.
[68] BISCANI F, IZZO D. A parallel global multiobjective framework for optimization: pagmo[J]. Journal of Open Source Software, 2020, 5(53): 2338.
[69] MANIA H, GUY A, RECHT B. Simple random search provides a competitive approach to reinforcement learning[A]. 2018. arXiv preprint arXiv: 1803.07055.
[70] CHENG S, QIN Q, CHEN J, et al. Brain storm optimization algorithm: a review[J]. Artificial Intelligence Review, 2016, 46(4): 445-458.
[71] HAMALAINEN P, TOIKKA J, BABADI A, et al. Visualizing movement control optimization landscapes[J]. IEEE Transactions on Visualization and Computer Graphics, 2020.
修改评论