中文版 | English
题名

Log-normality and skewness of estimated state/action values in reinforcement learning

作者
发表日期
2017
ISSN
1049-5258
会议录名称
卷号
2017-December
页码
1805-1815
摘要
Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects.
学校署名
其他
语种
英语
相关链接[Scopus记录]
Scopus记录号
2-s2.0-85047021495
来源库
Scopus
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/65579
专题工学院_计算机科学与工程系
作者单位
1.School of Computer Science and Technology,University of Science and Technology of China,China
2.University of Birmingham,United Kingdom
3.Shenzhen Key Lab of Computational Intelligence,Department of Computer Science and Engineering,Southern University of Science and Technology,China
推荐引用方式
GB/T 7714
Zhang,Liangpeng,Tang,Ke,Yao,Xin. Log-normality and skewness of estimated state/action values in reinforcement learning[C],2017:1805-1815.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
百度学术
百度学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
必应学术
必应学术中相似的文章
[Zhang,Liangpeng]的文章
[Tang,Ke]的文章
[Yao,Xin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。