题名 | Explicit planning for efficient exploration in reinforcement learning |
作者 | |
通讯作者 | Yao,Xin |
发表日期 | 2019
|
ISSN | 1049-5258
|
会议录名称 | |
卷号 | 32
|
摘要 | Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(nmd) or O(nm + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration. |
学校署名 | 通讯
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
EI入藏号 | 20203609141279
|
EI主题词 | Optimization
; Reinforcement learning
; Iterative methods
|
EI分类号 | Artificial Intelligence:723.4
; Optimization Techniques:921.5
; Numerical Methods:921.6
; Probability Theory:922.1
|
Scopus记录号 | 2-s2.0-85087001133
|
来源库 | Scopus
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/188082 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.CERCIA,School of Computer Science,University of Birmingham,United Kingdom 2.Shenzhen Key Laboratory of Computational Intelligence,University Key Laboratory of Evolving Intelligent Systems of Guangdong Province,Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China |
通讯作者单位 | 计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Zhang,Liangpeng,Tang,Ke,Yao,Xin. Explicit planning for efficient exploration in reinforcement learning[C],2019.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论