中文版 | English
题名

BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

作者
通讯作者Shi,Yuhui
DOI
发表日期
2022
会议名称
34th Australasian Joint Conference on Artificial Intelligence (AI)
ISSN
0302-9743
EISSN
1611-3349
ISBN
978-3-030-97545-6
会议录名称
卷号
13151 LNAI
页码
570-581
会议日期
FEB 02-04, 2022
会议地点
Univ Technol Sydney,null,ELECTR NETWORK
出版地
GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
出版者
摘要
Offline reinforcement learning (RL) aims to train an agent solely using a dataset of historical interactions with the environments without any further costly or dangerous active exploration. Model-based RL (MbRL) usually achieves promising performance in offline RL due to its high sample-efficiency and compact modeling of a dynamic environment. However, it may suffer from the bias and error accumulation of the model predictions. Existing methods address this problem by adding a penalty term to the model reward but require careful hand-tuning of the penalty and its weight. Instead in this paper, we formulate the model-based offline RL as a bi-objective optimization where the first objective aims to maximize the model return and the second objective is adaptive to the learning dynamics of the RL policy. Thereby, we do not need to tune the penalty and its weight but can achieve a more advantageous trade-off between the final model return and model’s uncertainty. We develop an efficient and adaptive policy optimization algorithm equipped with evolution strategy to solve the bi-objective optimization, named as BiES. The experimental results on a D4RL benchmark show that our approach sets the new state of the art and significantly outperforms existing offline RL methods on long-horizon tasks.
关键词
学校署名
通讯
语种
英语
相关链接[Scopus记录]
收录类别
WOS研究方向
Computer Science
WOS类目
Computer Science, Artificial Intelligence
WOS记录号
WOS:000787242700046
EI入藏号
20221311865996
EI主题词
Economic and social effects ; Evolutionary algorithms ; Reinforcement learning
EI分类号
Artificial Intelligence:723.4 ; Optimization Techniques:921.5 ; Social Sciences:971
Scopus记录号
2-s2.0-85127216343
来源库
Scopus
引用统计
被引频次[WOS]:1
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/329056
专题工学院_计算机科学与工程系
作者单位
1.AAII,University of Technology Sydney,Ultimo,2007,Australia
2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China
第一作者单位计算机科学与工程系
通讯作者单位计算机科学与工程系
推荐引用方式
GB/T 7714
Yang,Yijun,Jiang,Jing,Wang,Zhuowei,et al. BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:SPRINGER INTERNATIONAL PUBLISHING AG,2022:570-581.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Yang,Yijun]的文章
[Jiang,Jing]的文章
[Wang,Zhuowei]的文章
百度学术
百度学术中相似的文章
[Yang,Yijun]的文章
[Jiang,Jing]的文章
[Wang,Zhuowei]的文章
必应学术
必应学术中相似的文章
[Yang,Yijun]的文章
[Jiang,Jing]的文章
[Wang,Zhuowei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。