南方科技大学知识苑(SUSTech KC): Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

题名	Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation
作者	Sun, Jun 1; Wang, Gang 2; Giannakis, Georgios B.2; Yang, Qinmin 1; Yang, Zaiyue3
通讯作者	Sun, Jun
发表日期	2020
会议录名称	INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108
卷号	108
出版地	75 ARLINGTON ST, STE 300, BOSTON, MA 02116-3936 USA
出版者	ADDISON-WESLEY PUBL CO
摘要	Motivated by the emerging use of multi-agent reinforcement learning (MARL) in various engineering applications, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an 'additional' projection step to control the 'gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a neighborhood of the optimum. The resultant error bounds are the first of its type-in the sense that they hold under the most practical assumptions - which is made possible by means of a novel multi-step Lyapunov analysis.
学校署名	其他
语种	英语
相关链接	[来源记录]
收录类别	CPCI
资助项目	NSFC[61873118][61673347][U1609214][61751205] ; Dept. of Science and Technology of Guangdong Province[2018A050506003] ; NSF[1711471][1901134] ; Key R&D Program of Zhejiang Province[2019C01050]
WOS研究方向	Computer Science ; Mathematics
WOS类目	Computer Science, Artificial Intelligence ; Statistics & Probability
WOS记录号	WOS:000559931303034
来源库	Web of Science
引用统计	被引频次[WOS]：27
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/210520
专题	南方科技大学工学院_机械与能源工程系
作者单位	1.Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China 2.Univ Minnesota, Minneapolis, MN 55455 USA 3.Southern Univ Sci & Technol, Shenzhen, Peoples R China
推荐引用方式 GB/T 7714	Sun, Jun,Wang, Gang,Giannakis, Georgios B.,et al. Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation[C]. 75 ARLINGTON ST, STE 300, BOSTON, MA 02116-3936 USA:ADDISON-WESLEY PUBL CO,2020.