题名 | Deterministic Policy Gradient: Convergence Analysis |
作者 | |
通讯作者 | Zhang,Wei |
发表日期 | 2022
|
会议录名称 | |
页码 | 2159-2169
|
摘要 | The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an ε- accurate stationary policy up to a system error with a sample complexity of O(ε). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods. |
学校署名 | 通讯
|
语种 | 英语
|
相关链接 | [Scopus记录] |
资助项目 | Science and Technology Program of Jingdezhen City[JCYJ20200109141601708];
|
Scopus记录号 | 2-s2.0-85146148658
|
来源库 | Scopus
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/524336 |
专题 | 工学院_机械与能源工程系 |
作者单位 | 1.Department of Electrical and Computer Engineering,The Ohio State University,Columbus,United States 2.Department of Electrical and Computer Engineering,National University of Singapore,Singapore,Singapore 3.Department of Mechanical and Energy Engineering,Southern University of Science and Technology (SUSTech),Shenzhen,Guangdong,China |
通讯作者单位 | 机械与能源工程系 |
推荐引用方式 GB/T 7714 |
Xiong,Huaqing,Xu,Tengyu,Zhao,Lin,et al. Deterministic Policy Gradient: Convergence Analysis[C],2022:2159-2169.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论