南方科技大学知识苑(SUSTech KC): 基于负相关搜索的演化强化学习算法研究

题名	基于负相关搜索的演化强化学习算法研究
其他题名	RESEARCH ON EVOLUTIONARY REINFORCEMENT LEARNING ALGORITHM BASED ON NEGATIVELY CORRELATED SEARCH
姓名	喻杨龙
学号	11849094
学位类型	硕士
学位专业	计算机科学与技术
导师	唐珂
论文答辩日期	2020-05-30
论文提交日期	2020-07-08
学位授予单位	哈尔滨工业大学
学位授予地点	深圳
摘要	作为机器学习中的一个主要分支，强化学习从智能体和环境之间的交互数据中学习最优控制策略。通过计算目标函数对策略参数的导数一直是解决强化学习问题的主流方向，近年来基于演化算法的强化学习算法不断涌现。相较于梯度算法，演化算法一方面不需要计算梯度，缩短了训练时间；另一方面演化算法能够很好的进行并行化，运行效率更高。演化算法虽然能在短时间内完成模型的训练，但是训练过程需要和环境交互的次数远远高于梯度强化学习算法。对于强化学习问题，与环境的交互是需要一定成本的，尤其是在现实问题上的应用，如强化学习应用于机器人操纵上，在训练一开始模型失败的可能性很高，很可能出现机器人损坏或者其他消耗。因此我们希望通过对强化学习算法的改进来减少智能体和环境的交互次数，或者在相同交互次数限制下得到更好的性能。本文的两个工作都基于负相关搜索思想，利用其在目标空间多个不同区域同时搜索的特性和在搜索行为层次上为演化提供的多样性来提升算法的性能。在论文的第一个工作中，我们将负相关搜索思想和自然演化策略算法相结合，提出了负相关自然演化策略算法NCNES。NCNES算法的基本设计基于自然演化策略算法框架，我们根据负相关搜索思想，设计了一个兼顾解质量和多样性的目标函数，并推导出目标函数的自然梯度。为了验证算法的性能，我们采用梯度强化学习算法A3C、PPO和演化强化学习算法CES作为对比算法，在街机游戏上进行了实验。结果表明，NCNES算法在Enduro、Freeway和BeamRider三个游戏上表现出了有竞争力的性能，同时训练过程中性能更稳定。我们的第二个工作是基于合作式协同演化框架和负相关搜索算法的NCSCC算法。演化的参数量过大是影响演化算法性能的一个主要原因；在强化学习中，对于简单的街机游戏，如打砖块游戏，策略模型常用三层卷积神经网络和两层全连接层，网络的总参数量达到了百万级别，当采用演化算法来演化策略模型的参数时，参数量过多会带来“维数灾害”。针对这个问题，我们采用了合作式协同演化框架来对演化的参数进行分组，每次只针对一个分组进行优化，最后将所有分组的优化结果进行整合；同时修改合作式协同演化框架中对残缺解的评估方式，使之更适合负相关搜索算法。强化学习问题中存在的噪声会对搜索方向产生误导，并且影响精英策略的效果；所以在训练策略模型的过程中，我们采用多次评估的结果来作为适应度值。实验结果表明，我们提出的NCSCC算法有着不弱于梯度强化学习算法的性能，明显强于演化强化学习典型算法CES；同时相较于没有采用合作式协同演化框架的负相关搜索算法，NCSCC算法的表现更加优秀，证实了合作式协同演化框架对“维度灾难”有一定的缓解作用。
其他摘要	As a main branch in machine learning, reinforcement learning learns the optimal control strategy from the interactive data between the agent and the environment. Calculating the derivative of the strategy parameter by the objective function has been the dominant direction for solving the reinforcement learning problem, and evolution-based reinforcement learning algorithms have been emerging in recent years. Compared with the gradient algorithm, on the one hand, the evolutionary algorithm does not need to calculate the gradient, which shortens the training time; on the other hand, the evolutionary algorithm can be well parallelized and run more efficiently. However, although the evolutionary algorithm can complete the training in a short time, the training process requires much more interaction with the environment than the gradient reinforcement learning algorithm. For reinforcement learning problems, the interaction with the environment requires a certain cost, especially in the application of real problems, such as reinforcement learning applied to robot manipulation, the probability of model failure at the beginning of training is very high, and robots are likely to appear damage of robots or other consumption. Therefore, we hope to reduce the number of interactions with the environment by improving the reinforcement learning algorithm, or get better performance with the same number of interactions. Both of the work in this paper are based on the idea of negatively correlated search, using the characteristics of negatively correlated search to simultaneously search multiple different regions of the target space and the diversity provided for evolution at the search behavior level to improve the performance of algorithms. In the first work of the dissertation, we combined the negatively correlated search with the natural evolution strategy algorithm, and proposed the negative correlated natural evolution strategy algorithm NCNES. The basic design of the NCNES algorithm is based on the natural evolution strategy algorithm framework. Based on the idea of negatively correlated search, we design an objective function that takes into account both solution quality and diversity, and derive the natural gradient of the objective function. To verify the performance of NCNES, we experimented on arcade games using the gradient reinforcement learning algorithm A3C, the PPO and the evolutionary reinforcement learning algorithm CES as comparison algorithms. The results show that the NCNES algorithm has shown competitive performance in the three games Enduro, Freeway and BeamRider, while the performance is more stable during training. Our second work is an algorithm called NCSCC, which based on cooperative co-evolution framework and negatively correlated search algorithm. Excessive amount of evolutionary parameters is a major factor that affects the performance of evolutionary algorithms. In reinforcement learning, for simple arcade games, such as Breakout, strategy model commonly uses three-layer convolutional neural networks and two fully connected layers, and the total number of parameters exceeds one million. When evolutionary algorithms are used to evolve the parameters of the strategy model, too many parameters will bring about "dimensional disaster". To address this problem, we use the cooperative co-evolution framework to group parameters, optimize only one group at a time, and finally integrate the optimization results of all groups; at the same time, we modify the evaluation of the partial solution in the cooperative co-evolution framework to make it more suitable for negatively correlation search algorithms. The noise in the reinforcement learning problem will mislead the search direction and affect the effect of the elite strategy; therefore, in the process of training the strategy model, we use the results of multiple evaluations as the fitness value.The experimental results show that our proposed NCSCC algorithm is not weaker than the gradient reinforcement learning algorithm, and is significantly stronger than the typical evolutionary reinforcement learning algorithm CES. At the same time, compared with the negatively correlated search algorithm without the cooperative collaborative evolution framework, the NCSCC algorithm performans better, confirming that the cooperative collaborative evolution framework has a certain mitigating effect on the "dimensional disaster".
关键词	演化算法强化学习负相关搜索自然演化策略合作式协同演化
其他关键词	evolution algorithm reinforcement learning negatively correlated search natural evolution strategy cooperative co-evolution
语种	中文
培养类别	联合培养
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/143031
专题	工学院_计算机科学与工程系
作者单位	南方科技大学
推荐引用方式 GB/T 7714	喻杨龙. 基于负相关搜索的演化强化学习算法研究[D]. 深圳. 哈尔滨工业大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
基于负相关搜索的演化强化学习算法研究.p（15733KB）	学位论文	--	限制开放	CC BY-NC-SA	请求全文