中文版 | English
题名

BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning

作者
DOI
发表日期
2021
会议名称
IEEE Symposium Series on Computational Intelligence (IEEE SSCI)
ISBN
978-1-7281-9049-5
会议录名称
页码
01-06
会议日期
5-7 Dec. 2021
会议地点
Orlando, FL, USA
出版地
345 E 47TH ST, NEW YORK, NY 10017 USA
出版者
摘要
Recently, it has been demonstrated that evolution strategies (ES), a class of blackbox optimization algorithms, can achieve competitive performance on many reinformance learning (RL) tasks, compared to deep RL algorithms. However, ES can be seen as a randomized local search algorithm based on stochastic finite difference and it is often hard to escape from deep or deceptive local optimum, which means that its exploration ability can be further improved. Brain storm optimization (BSO) is a swarm intelligence algorithm with tradeoff between exploration and exploitation via clustering. To enjoy best of both worlds, this paper proposes a hybrid direct policy search algorithm (BSO-ES) for RL tasks within a distributed computing framework. Specifically, we use BSO as a global searcher to explore the parameter space and ES as a local searcher to further fine-tune policies. We maintain multiple parallel policies at the same time and exchange preferable policies periodically. Experimental results on six challenging continuous control tasks from MuJoCo show that our proposed algorithm is superior to or competitive with OpenAI-ES. In particular, our algorithm has less failures to reach a fixed reward threshold.
关键词
学校署名
第一
语种
英语
相关链接[Scopus记录]
收录类别
资助项目
Shenzhen Fundamental Research Program[JCYJ20200109141235597]
WOS研究方向
Computer Science ; Engineering ; Operations Research & Management Science ; Mathematics
WOS类目
Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic ; Operations Research & Management Science ; Mathematics, Applied
WOS记录号
WOS:000824464300299
EI入藏号
20221011761149
EI主题词
Clustering algorithms ; Evolutionary algorithms ; Learning algorithms ; Local search (optimization) ; Stochastic systems ; Storms
EI分类号
Precipitation:443.3 ; Artificial Intelligence:723.4 ; Machine Learning:723.4.2 ; Control Systems:731.1 ; Information Sources and Analysis:903.1 ; Optimization Techniques:921.5 ; Systems Science:961
Scopus记录号
2-s2.0-85125765568
来源库
Scopus
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9660122
引用统计
被引频次[WOS]:0
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/328059
专题工学院_计算机科学与工程系
作者单位
Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
第一作者单位计算机科学与工程系
第一作者的第一单位计算机科学与工程系
推荐引用方式
GB/T 7714
Zhang,Youkui,Zhang,Liang,Duan,Qiqi,et al. BSO-ES: A Hybrid Direct Policy Search Algorithm for Reinforcement Learning[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2021:01-06.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zhang,Youkui]的文章
[Zhang,Liang]的文章
[Duan,Qiqi]的文章
百度学术
百度学术中相似的文章
[Zhang,Youkui]的文章
[Zhang,Liang]的文章
[Duan,Qiqi]的文章
必应学术
必应学术中相似的文章
[Zhang,Youkui]的文章
[Zhang,Liang]的文章
[Duan,Qiqi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。