题名 | Log-normality and skewness of estimated state/action values in reinforcement learning |
作者 | |
发表日期 | 2017
|
ISSN | 1049-5258
|
会议录名称 | |
卷号 | 2017-December
|
页码 | 1805-1815
|
摘要 | Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects. |
学校署名 | 其他
|
语种 | 英语
|
相关链接 | [Scopus记录] |
Scopus记录号 | 2-s2.0-85047021495
|
来源库 | Scopus
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/65579 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.School of Computer Science and Technology,University of Science and Technology of China,China 2.University of Birmingham,United Kingdom 3.Shenzhen Key Lab of Computational Intelligence,Department of Computer Science and Engineering,Southern University of Science and Technology,China |
推荐引用方式 GB/T 7714 |
Zhang,Liangpeng,Tang,Ke,Yao,Xin. Log-normality and skewness of estimated state/action values in reinforcement learning[C],2017:1805-1815.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论