中文版 | English
题名

Explicit planning for efficient exploration in reinforcement learning

作者
通讯作者Yao,Xin
发表日期
2019
ISSN
1049-5258
会议录名称
卷号
32
摘要
Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(nmd) or O(nm + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration.
学校署名
通讯
语种
英语
相关链接[Scopus记录]
收录类别
EI入藏号
20203609141279
EI主题词
Optimization ; Reinforcement learning ; Iterative methods
EI分类号
Artificial Intelligence:723.4 ; Optimization Techniques:921.5 ; Numerical Methods:921.6 ; Probability Theory:922.1
Scopus记录号
2-s2.0-85087001133
来源库
Scopus
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/188082
专题工学院_计算机科学与工程系
作者单位
1.CERCIA,School of Computer Science,University of Birmingham,United Kingdom
2.Shenzhen Key Laboratory of Computational Intelligence,University Key Laboratory of Evolving Intelligent Systems of Guangdong Province,Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China
通讯作者单位计算机科学与工程系
推荐引用方式
GB/T 7714
Zhang,Liangpeng,Tang,Ke,Yao,Xin. Explicit planning for efficient exploration in reinforcement learning[C],2019.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
百度学术
百度学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
必应学术
必应学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。