南方科技大学知识苑(SUSTech KC): Explicit planning for efficient exploration in reinforcement learning

题名	Explicit planning for efficient exploration in reinforcement learning
作者	Zhang，Liangpeng 1; Tang，Ke2 ; Yao，Xin1,2
通讯作者	Yao，Xin
发表日期	2019
ISSN	1049-5258
会议录名称	Advances in Neural Information Processing Systems
卷号	32
摘要	Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(nmd) or O(nm + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration.
学校署名	通讯
语种	英语
相关链接	[Scopus记录]
收录类别	EI
EI入藏号	20203609141279
EI主题词	Optimization ; Reinforcement learning ; Iterative methods
EI分类号	Artificial Intelligence:723.4 ; Optimization Techniques:921.5 ; Numerical Methods:921.6 ; Probability Theory:922.1
Scopus记录号	2-s2.0-85087001133
来源库	Scopus
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/188082
专题	工学院_计算机科学与工程系
作者单位	1.CERCIA,School of Computer Science,University of Birmingham,United Kingdom 2.Shenzhen Key Laboratory of Computational Intelligence,University Key Laboratory of Evolving Intelligent Systems of Guangdong Province,Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China
通讯作者单位	计算机科学与工程系
推荐引用方式 GB/T 7714	Zhang，Liangpeng,Tang，Ke,Yao，Xin. Explicit planning for efficient exploration in reinforcement learning[C],2019.