题名 | BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning |
作者 | |
DOI | |
发表日期 | 2021
|
会议名称 | IEEE Symposium Series on Computational Intelligence (IEEE SSCI)
|
ISBN | 978-1-7281-9049-5
|
会议录名称 | |
页码 | 01-06
|
会议日期 | 5-7 Dec. 2021
|
会议地点 | Orlando, FL, USA
|
出版地 | 345 E 47TH ST, NEW YORK, NY 10017 USA
|
出版者 | |
摘要 | Recently, it has been demonstrated that evolution strategies (ES), a class of blackbox optimization algorithms, can achieve competitive performance on many reinformance learning (RL) tasks, compared to deep RL algorithms. However, ES can be seen as a randomized local search algorithm based on stochastic finite difference and it is often hard to escape from deep or deceptive local optimum, which means that its exploration ability can be further improved. Brain storm optimization (BSO) is a swarm intelligence algorithm with tradeoff between exploration and exploitation via clustering. To enjoy best of both worlds, this paper proposes a hybrid direct policy search algorithm (BSO-ES) for RL tasks within a distributed computing framework. Specifically, we use BSO as a global searcher to explore the parameter space and ES as a local searcher to further fine-tune policies. We maintain multiple parallel policies at the same time and exchange preferable policies periodically. Experimental results on six challenging continuous control tasks from MuJoCo show that our proposed algorithm is superior to or competitive with OpenAI-ES. In particular, our algorithm has less failures to reach a fixed reward threshold. |
关键词 | |
学校署名 | 第一
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
资助项目 | Shenzhen Fundamental Research Program[JCYJ20200109141235597]
|
WOS研究方向 | Computer Science
; Engineering
; Operations Research & Management Science
; Mathematics
|
WOS类目 | Computer Science, Artificial Intelligence
; Engineering, Electrical & Electronic
; Operations Research & Management Science
; Mathematics, Applied
|
WOS记录号 | WOS:000824464300299
|
EI入藏号 | 20221011761149
|
EI主题词 | Clustering algorithms
; Evolutionary algorithms
; Learning algorithms
; Local search (optimization)
; Stochastic systems
; Storms
|
EI分类号 | Precipitation:443.3
; Artificial Intelligence:723.4
; Machine Learning:723.4.2
; Control Systems:731.1
; Information Sources and Analysis:903.1
; Optimization Techniques:921.5
; Systems Science:961
|
Scopus记录号 | 2-s2.0-85125765568
|
来源库 | Scopus
|
全文链接 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9660122 |
引用统计 |
被引频次[WOS]:0
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/328059 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China |
第一作者单位 | 计算机科学与工程系 |
第一作者的第一单位 | 计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Zhang,Youkui,Zhang,Liang,Duan,Qiqi,et al. BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2021:01-06.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论