题名 | 基于随机森林算法的多因子选股模型研究 |
其他题名 | RESEARCH ON MULTI-FACTOR STOCKSELECTION MODEL BASED ON RANDOMFOREST ALGORITHM
|
姓名 | |
学号 | 11749260
|
学位类型 | 硕士
|
学位专业 | 金融
|
导师 | |
论文答辩日期 | 2019-05-25
|
论文提交日期 | 2019-07-05
|
学位授予单位 | 哈尔滨工业大学
|
学位授予地点 | 深圳
|
摘要 | 本文研究目的为怎样利用机器学习方法与传统多因子选股模型相结合,并构建基于随机森林算法的多因子选股模型,通过随机森林算法对个股进行分类从而筛选出具有投资价值的股票,进而构造有效的投资组合。本文以全 A 股票作为股票池,以各个大类因子作为因子池, 分别选取了价值类、成长类、动量类、财务质量类、技术类以及分析师情绪类六大类共 23 个因子作为备选因子,因子数据的选取为 2010 年 1 月到 2017 年 12 月每月最后一个交易日的数据,以因子数据和对应下期股票月收益率数据构建样本集。 其中将 2010 年1 月到 2013 年 12 月的样本用于模型参数寻优,以确定随机森林算法超参数和最优训练窗长;将 2014 年 1 月到 2017 年 12 月的样本用于样本外模型回测, 以分析模型选股效果。基于随机森林算法的多因子选股模型是一个动态的选股模型,其在每个回测时段都要利用过去 6 个月的样本数据对模型进行训练,利用当期因子数据进行预测,选取预测概率排名靠前的 50 只股票作为下期股票持仓,并对其进行等权配置。模型的整个构建过程大体可分为三个部分:数据预处理及有效因子筛选、模型参数优化与结果分析、模型改进与优化。本文基于随机森林算法的多因子选股模型在回测期 2014 年 1 月到 2017 年 12月内取得的总收益为 160.05%, 年化收益为 27.64%,大幅度领先市场基准(沪深300 与中证 500),可证明该选股模型具有较好的选股性能。同时对比分析非动态学习模型,本文构建的动态学习模型体现出其时效性,在一定程度上能反映市场的变化。另外在模型的改进与优化方面,通过以预测概率加权确定组合权重、以因子重要性进行因子再筛选、进行因子轮动都能提升原模型的选股效果。 |
其他摘要 | The purpose of this paper is how to combine machine learning algorithm withtraditional multi-factor stock selection model, and constructs a multi-factor stockselection model based on random forest algorithm, sorts stocks by random forestalgorithm to select valuable stocks, and constructs an effective portfolio.This paper uses all A shares as a stock pool, a large class of factors as a factor pool,selects separately 23 factors of value class, growth class, momentum class, financialquality class, technology class and analyst emotion class as candidate factor, and selectsdata for the last trading day of each month from January 2010 to December 2017 as factordata, builds a sample set with factor data and corresponding next stock monthly yield.The sample from January 2010 to December 2013 is used for model parametersoptimization, to determine the hyper-parameters of the random forest algorithm and theoptimal training window length. The sample from January 2014 to December 2017 isused for out-sample model back-test, to analysis the performance of stock selection model.The multi-factor stock selection model based on random forest algorithm is a dynamicstock selection model, which is trained using sample data from the past 6 months duringeach back-test period, uses current factor data to make predictions, selects the top 50stocks with the predicted probability as the next stock position, and allocates them withequal weight. The whole construction process of this model can be roughly divided intothree parts: data preprocessing and effective factor screening, model parameteroptimization and result analysis, model improvement and optimization.In this paper, the multi-factor stock selection model based on the random forestalgorithm has a total return of 160.05% and annualized income of 27.64% during theback-test period from January 2014 to December 2017, and it significantly leads themarket benchmarks(CSI 300 and CSI 500), which can be proved that the stock selection model has better stock picking performance. Compared with the non-dynamic learningmodel, the dynamic stock selection model in this paper reflects its timeliness and thechanges of the market to a certain extent. In addition, in the improvement andoptimization of the model, the performance of original stock selection model can beimproved by weighting prediction probabilities to weights, using the factor importance tore-filter the factors and considering the rotation effect of factors. |
关键词 | |
其他关键词 | |
语种 | 中文
|
培养类别 | 联合培养
|
成果类型 | 学位论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/38918 |
专题 | 商学院_金融系 |
作者单位 | 南方科技大学 |
推荐引用方式 GB/T 7714 |
李杰. 基于随机森林算法的多因子选股模型研究[D]. 深圳. 哈尔滨工业大学,2019.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
基于随机森林算法的多因子选股模型研究.p(2833KB) | -- | -- | 限制开放 | -- | 请求全文 |
个性服务 |
原文链接 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
导出为Excel格式 |
导出为Csv格式 |
Altmetrics Score |
谷歌学术 |
谷歌学术中相似的文章 |
[李杰]的文章 |
百度学术 |
百度学术中相似的文章 |
[李杰]的文章 |
必应学术 |
必应学术中相似的文章 |
[李杰]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论