题名 | BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning |
作者 | |
通讯作者 | Shi,Yuhui |
DOI | |
发表日期 | 2022
|
会议名称 | 34th Australasian Joint Conference on Artificial Intelligence (AI)
|
ISSN | 0302-9743
|
EISSN | 1611-3349
|
ISBN | 978-3-030-97545-6
|
会议录名称 | |
卷号 | 13151 LNAI
|
页码 | 570-581
|
会议日期 | FEB 02-04, 2022
|
会议地点 | Univ Technol Sydney,null,ELECTR NETWORK
|
出版地 | GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
|
出版者 | |
摘要 | Offline reinforcement learning (RL) aims to train an agent solely using a dataset of historical interactions with the environments without any further costly or dangerous active exploration. Model-based RL (MbRL) usually achieves promising performance in offline RL due to its high sample-efficiency and compact modeling of a dynamic environment. However, it may suffer from the bias and error accumulation of the model predictions. Existing methods address this problem by adding a penalty term to the model reward but require careful hand-tuning of the penalty and its weight. Instead in this paper, we formulate the model-based offline RL as a bi-objective optimization where the first objective aims to maximize the model return and the second objective is adaptive to the learning dynamics of the RL policy. Thereby, we do not need to tune the penalty and its weight but can achieve a more advantageous trade-off between the final model return and model’s uncertainty. We develop an efficient and adaptive policy optimization algorithm equipped with evolution strategy to solve the bi-objective optimization, named as BiES. The experimental results on a D4RL benchmark show that our approach sets the new state of the art and significantly outperforms existing offline RL methods on long-horizon tasks. |
关键词 | |
学校署名 | 通讯
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
WOS研究方向 | Computer Science
|
WOS类目 | Computer Science, Artificial Intelligence
|
WOS记录号 | WOS:000787242700046
|
EI入藏号 | 20221311865996
|
EI主题词 | Economic and social effects
; Evolutionary algorithms
; Reinforcement learning
|
EI分类号 | Artificial Intelligence:723.4
; Optimization Techniques:921.5
; Social Sciences:971
|
Scopus记录号 | 2-s2.0-85127216343
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:1
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/329056 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.AAII,University of Technology Sydney,Ultimo,2007,Australia 2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China |
第一作者单位 | 计算机科学与工程系 |
通讯作者单位 | 计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Yang,Yijun,Jiang,Jing,Wang,Zhuowei,et al. BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:SPRINGER INTERNATIONAL PUBLISHING AG,2022:570-581.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论