南方科技大学知识苑(SUSTech KC): Theoretical Analysis of Value-Iteration-Based Q-Learning with Approximation Errors

题名	Theoretical Analysis of Value-Iteration-Based Q-Learning with Approximation Errors
作者	Zhantao Liang 1; Mingming Ha 2; Derong Liu3
DOI	10.1109/ICIST55546.2022.9926794
发表日期	2022
ISSN	2164-4357
ISBN	978-1-6654-9738-1
会议录名称	2022 12th International Conference on Information Science and Technology (ICIST)
页码	120-125
会议日期	14-16 Oct. 2022
会议地点	Kaifeng, China
摘要	In this paper, the value-iteration-based Q-Iearning algorithm with approximation errors is analyzed theoretically. First, based on an upper bound of the approximation errors caused by the Q-function approximator, we get the lower and upper bound functions of the iterative Q-function, which proves that the limit of the approximate Q-function sequence is bounded. Then, we develop a stability condition for the termination of the iterative algorithm, for ensuring that the current control policy derived from the resulting approximate Q-function is stabilizing. Also, we establish an upper bound function of the approximation errors, which is caused by the policy function approximator, to guarantee that the approximate control policy is stabilizing. Finally, the numerical results verifies the theoretical results with a simulation example.
关键词	Adaptive dynamic programming Q-Iearning value iteration asymptotic stability
学校署名	其他
相关链接	[IEEE记录]
来源库	IEEE
全文链接	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9926794
引用统计	被引频次[WOS]：0
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/412122
专题	南方科技大学
作者单位	1.School of Automation, Guangdong University of Technology, Guangzhou, China 2.School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China 3.Institute of Control Science and Technology, Southern University of Science and Technology, Shenzhen, China
推荐引用方式 GB/T 7714	Zhantao Liang,Mingming Ha,Derong Liu. Theoretical Analysis of Value-Iteration-Based Q-Learning with Approximation Errors[C],2022:120-125.