题名 | Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation |
作者 | |
通讯作者 | Sun, Jun |
发表日期 | 2020
|
会议录名称 | |
卷号 | 108
|
出版地 | 75 ARLINGTON ST, STE 300, BOSTON, MA 02116-3936 USA
|
出版者 | |
摘要 | Motivated by the emerging use of multi-agent reinforcement learning (MARL) in various engineering applications, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an 'additional' projection step to control the 'gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a neighborhood of the optimum. The resultant error bounds are the first of its type-in the sense that they hold under the most practical assumptions - which is made possible by means of a novel multi-step Lyapunov analysis. |
学校署名 | 其他
|
语种 | 英语
|
相关链接 | [来源记录] |
收录类别 | |
资助项目 | NSFC[61873118][61673347][U1609214][61751205]
; Dept. of Science and Technology of Guangdong Province[2018A050506003]
; NSF[1711471][1901134]
; Key R&D Program of Zhejiang Province[2019C01050]
|
WOS研究方向 | Computer Science
; Mathematics
|
WOS类目 | Computer Science, Artificial Intelligence
; Statistics & Probability
|
WOS记录号 | WOS:000559931303034
|
来源库 | Web of Science
|
引用统计 |
被引频次[WOS]:27
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/210520 |
专题 | 南方科技大学 工学院_机械与能源工程系 |
作者单位 | 1.Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China 2.Univ Minnesota, Minneapolis, MN 55455 USA 3.Southern Univ Sci & Technol, Shenzhen, Peoples R China |
推荐引用方式 GB/T 7714 |
Sun, Jun,Wang, Gang,Giannakis, Georgios B.,et al. Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation[C]. 75 ARLINGTON ST, STE 300, BOSTON, MA 02116-3936 USA:ADDISON-WESLEY PUBL CO,2020.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论