南方科技大学知识苑(SUSTech KC): BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning

题名	BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning
作者	Zhang，Youkui; Zhang，Liang; Duan，Qiqi; Shi，Yuhui
DOI	10.1109/SSCI50451.2021.9660122
发表日期	2021
会议名称	IEEE Symposium Series on Computational Intelligence (IEEE SSCI)
ISBN	978-1-7281-9049-5
会议录名称	2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings
页码	01-06
会议日期	5-7 Dec. 2021
会议地点	Orlando, FL, USA
出版地	345 E 47TH ST, NEW YORK, NY 10017 USA
出版者	IEEE
摘要	Recently, it has been demonstrated that evolution strategies (ES), a class of blackbox optimization algorithms, can achieve competitive performance on many reinformance learning (RL) tasks, compared to deep RL algorithms. However, ES can be seen as a randomized local search algorithm based on stochastic finite difference and it is often hard to escape from deep or deceptive local optimum, which means that its exploration ability can be further improved. Brain storm optimization (BSO) is a swarm intelligence algorithm with tradeoff between exploration and exploitation via clustering. To enjoy best of both worlds, this paper proposes a hybrid direct policy search algorithm (BSO-ES) for RL tasks within a distributed computing framework. Specifically, we use BSO as a global searcher to explore the parameter space and ES as a local searcher to further fine-tune policies. We maintain multiple parallel policies at the same time and exchange preferable policies periodically. Experimental results on six challenging continuous control tasks from MuJoCo show that our proposed algorithm is superior to or competitive with OpenAI-ES. In particular, our algorithm has less failures to reach a fixed reward threshold.
关键词	Brain storm optimization Direct policy search Evolution strategy Reinforcement learning
学校署名	第一
语种	英语
相关链接	[Scopus记录]
收录类别	CPCI-S ; EI
资助项目	Shenzhen Fundamental Research Program[JCYJ20200109141235597]
WOS研究方向	Computer Science ; Engineering ; Operations Research & Management Science ; Mathematics
WOS类目	Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic ; Operations Research & Management Science ; Mathematics, Applied
WOS记录号	WOS:000824464300299
EI入藏号	20221011761149
EI主题词	Clustering algorithms ; Evolutionary algorithms ; Learning algorithms ; Local search (optimization) ; Stochastic systems ; Storms
EI分类号	Precipitation:443.3 ; Artificial Intelligence:723.4 ; Machine Learning:723.4.2 ; Control Systems:731.1 ; Information Sources and Analysis:903.1 ; Optimization Techniques:921.5 ; Systems Science:961
Scopus记录号	2-s2.0-85125765568
来源库	Scopus
全文链接	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9660122
引用统计	被引频次[WOS]：0
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/328059
专题	工学院_计算机科学与工程系
作者单位	Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
第一作者单位	计算机科学与工程系
第一作者的第一单位	计算机科学与工程系
推荐引用方式 GB/T 7714	Zhang，Youkui,Zhang，Liang,Duan，Qiqi,et al. BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2021:01-06.