南方科技大学知识苑(SUSTech KC): AUTONOMOUS TRADING AGENT WITH REINFORCEMENT LEARNING

题名	AUTONOMOUS TRADING AGENT WITH REINFORCEMENT LEARNING
其他题名	基于强化学习的自动交易代理
姓名	刘健
学号	11849061
学位类型	硕士
学位专业	计算机科学与技术
导师	Georgios Theodoropoulos
论文答辩日期	2020-05-30
论文提交日期	2020-07-08
学位授予单位	哈尔滨工业大学
学位授予地点	深圳
摘要	This dissertation examines the use of reinforcement learning in autonomous agents that can interact intelligently with financial markets. Stock market trading is used to evaluate and develop a number of machine learning approaches specifically able to handle the challenging characteristics of the financial market trading problem, particularly reinforcement learning. The prediction of change in the stock market is a very difficult task because the underlying patterns that drive market behavior are non-stationary that means useful predictive patterns learned in the past may not be suitable to be applied in the future. Reinforcement learning has not been widely applied in this application domain and the paradigm of reinforcement learning provides a way to allow agents to directly learn trading decision models with more degrees of freedom than many other techniques, for example without a requirement to preset particular thresholds that define certain signals for buy or sell decisions. The change of price can naturally be viewed as a reward and this will avoid the drawbacks of labeling data related to setting thresholds if the problem is formulated as a supervised learning problem. Reinforcement learning can also avoid costs needed for labelling of examples and constructing a training data set. However, in a study of the literature, we find that existing research applying reinforcement learning algorithms to generate trading decisions does not in general account for the environment being non-stationary. The approaches described in the previous literature describe applications of a single agent that may not be recalibrated and learning methodologies that sometimes can be susceptible to limitations from being stuck in local optima. The proposed methods in this dissertation mitigate some of these issues by using multiple agents and a multi-stage learning model where the agents compete to recommend the best decisions. Our approach combines online learning with reinforcement learning. Online learning is used to select a recommendation from a set of agents at the decision point in real time; in addition, the technique is able to relearn and adapt the set of decision models based on recent data. To develop the approach with reinforcement learning, this research produced new methods that can modify the process of training reinforcement learning agents to give additional focus to recent data. The novel methods are evaluated with empirical analysis using data from a range of international and Chinese stock markets. We find that agents based on the proposed methodology are able to outperform other machine learning methods in terms of various metrics and including application specific measures of risk and return that are accepted in the finance industry. Experiments show that agents which use online learning and reinforcement learning achieve higher return over a benchmark trading method buy and hold and using online learning provides substantial improvement in performance of a Deep Q-learning agent. Notably, during the financial crisis, the On-Line/Reinforcement Learning (OLR) agents can stay profitable many cases while other agents suffer a loss in all tests during this time.
其他摘要	本文使用强化学习构建了与金融市场进行智能交互的自动交易代理。股票市场交易可以用于评估和开发新的机器学习方法，这些方法需要对金融市场交易问题的特征做出调整，尤其是强化学习。预测股市变化是一项非常艰巨的任务，因为驱动市场行为的基本模式是非静态的，这意味着过去学习到的有用的预测模式可能不适合在将来应用。强化学习尚未在该应用领域中广泛应用，相比于其他技术，强化学习的范式可以使代理具有更大自由度地直接学习交易决策模型，例如，无需预设定义用于购买或出售这些决策信号的特定阈值。价格的变化可以自然地被看作是一种奖励，所以强化学习可以避免在监督学习中标注示例和构建训练数据集所需的成本。在对先前文献的研究中，我们发现现有的应用强化学习算法来生成交易决策的研究通常不能解决非静态环境的问题。先前文献中所提出的方法得到的单一代理不会随着时间的变化而重新校准，同时学到的交易策略有时会陷入局部最优。本文提出的方法通过使用多个代理和一个多阶段学习模型来缓解上述提到的问题，多个代理可以竞争性地推荐最佳决策。我们的方法将在线学习与强化学习相结合。在线学习用于在决策点实时从一组代理中选择推荐的交易策略，还可以基于最近的数据重新学习和调整决策模型。为了更好地应用强化学习，实验中对训练强化学习代理的过程做出了调整，使更多的注意力集中在最新数据上。本文使用一系列来自国际和中国股票市场的数据，通过实验分析对所提出的方法进行评估。我们发现，在金融行业中常用于评估风险和收益的各种指标上，基于所提出的方法的代理都能够胜过基于其他机器学习方法的代理。实验表明，使用在线学习和强化学习的代理比基准交易方法购买并持有可获得更高的回报，并且使用在线学习可以大大提高Deep Q-learning代理的性能。值得注意的是，在金融危机期间，在线强化学习（OLR）代理可以在许多情况下保持盈利，而其他代理在所有测试中均有亏损。
关键词	autonomous trading agents stock market reinforcement learning supervised learning online weighted selection
其他关键词	自动交易代理股票市场强化学习监督学习在线带权选择
语种	英语
培养类别	联合培养
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/143029
专题	工学院_计算机科学与工程系
作者单位	南方科技大学
推荐引用方式 GB/T 7714	Liu J. AUTONOMOUS TRADING AGENT WITH REINFORCEMENT LEARNING[D]. 深圳. 哈尔滨工业大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
AUTONOMOUS TRADING A（2536KB）	--	--	限制开放	--	请求全文