中文版 | English
题名

基于缺失数据插补和机器学习的A股收益率预测研究

其他题名
FORECASTING A-SHARE STOCKRETURNSBASED ON MISSING DATA INTERPOLATIONAND MACHINE LEARNING
姓名
姓名拼音
YANG Shuai
学号
12132997
学位类型
硕士
学位专业
0251 金融
学科门类/专业学位类别
学术型::02 经济学
导师
周倜
导师单位
商学院
论文答辩日期
2023-05
论文提交日期
2023-06-30
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

近些年来,随着资产定价理论的不断发展,传统的因子定价模型中所隐含的特设稀疏性假设也受到了学界越来越多学者的质疑。与此同时,机器学习模型在资产定价领域的应用则日趋广泛。本文针对国内A股市场,选取了偏最小二乘回归、弹性网、随机森林、神经网络、支持向量回归、AdaboostLightGBM以及XGBoost为代表的机器学习模型,通过构建公司特征变量、行业变量以及宏观变量来对国内A股市场的股票收益率进行预测。并且对于预测结果进行评估和解释。同时在缺失数据插补中采用KNN Imputer方法与传统的均值填充法进行比较。

通过实证结果本文发现基于线性模型的偏最小二乘回归和弹性网对于股票收益率的预测效果一般,基于梯度提升决策树的LightGBMXGBoost模型表现较好。而且基于最近邻的KNN Imputer算法能显著提升模型的预测效果。本文通过对于所构造公司特征变量和宏观变量进行重要性排序发现,收益率类因子和交易类因子重要性最高,价值类因子重要性一般,宏观变量中指数波动率以及盈余价格比重要性最高,而股息价格比和信用利差重要性较低。本文还通过机器学习模型的预测结果构建收益率多头、空头以及多空组合,并且将多空组合收益率作为测试资产回归到传统因子模型当中,发现除偏最小二乘回归弹性网外,其余机器学习模型都获得了显著性的超额收益,再次验证了机器学习模型在资产定价领域的优越表现。

本文认为,机器学习模型表现优越的原因在于相较于传统因子模型,机器学习模型考虑到了特征变量之间的非线性关系,并且将模型中的高维协变量纳入到了模型的考虑范围之内,并且通过分析金融数据之间的截面相关性,证明不存在少数几种因子能够解释股票收益率的全部波动,未来资产定价领域将会朝着更加深度、复杂化的方向发展。

关键词
其他关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06
参考文献列表

[1]Fama E F, French K R. The Cross-section of Expected Stock Returns[J]. Journal of Finance, 1992, 47: 427-465.
[2]Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016,8: 785-794.
[3]Friedman J H. Greedy function approximation: A gradient boosting machine[J]. Annals of Statistics, 29(5), 2001: 1189-1232.
[4]Quinlan J R. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106.
[5]Quinlan J R. C4. 5: programs for machine learning[J]. Morgan Kaufmann Publishers, 1993, 1(1): 199.
[6]Lee Y, Song J. Stock price forecasting using support vector regression on daily and weekly data[J]. Expert Systems with Applications, 2013, 40(14): 5386-5392.
[7]Hsu Y L, Huang Y C, Chou Y C. A hybrid model based on support vector regression and differential evolution for financial time series forecasting[J]. Applied Soft Computing, 2016, 48: 210-221.
[8]Li X, Li W, Sun X. Forecasting China's stock market using a hybrid model of support vector regression and multiple linear regression[J]. International Journal of Computational Intelligence Systems, 2019, 12(1): 555-566.
[9]M. Sangeetha, M. Senthil Kumaran. Deep learning-based data imputation on time-variant data using recurrent neural network[J]. Soft Computing, 2020.
[10]Hsieh Y L, Liao C L. A financial distress prediction model based on support vector machine and neural network[J]. Applied Soft Computing, 2018, 70: 1003-1017.
[11]Zhang Z, Chen X, Li M, et al. A hybrid deep learning model for stock price prediction[J]. Journal of Intelligent & Fuzzy Systems, 2019, 37(1): 1205-1213.
[12]Harris Drucker, Chris J C, Burges, et al. Support vector regression machines[J]. In Proceedings of the 9th International Conference on Neural Information Processing Systems, 1996, 1: 155–161.
[13]Zhang Fan, Lauren J, O'Donnell, Chapter 7-Support vector regression, Editor(s): Andrea Mechelli, Sandra Vieira, Machine Learning, Academic Press, 2020, 1: 123-140.
[14]Shahid N A, Abbas Q, Munir A. A comprehensive survey of support vector regression[J]. Applied Soft Computing, 2019, 83: 1-26.
[15]Bikash Baruah, Manash P. Dutta, et al. An effective ensemble method for missing data imputation[J]. International Journal of Information and Computer Security, 2023, 20(3-4).
[16]Wu D, Mao Y, Wang X, et al. A survey of deep neural networks: architectures, algorithms, and applications[J]. International Journal of Automation and Computing, 2021, 18(1): 1-43.
[17]Li X, Chen T. Wang Y, et al. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era[J]. Methods, 2020, 166: 4-21.
[18]Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[19]Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[20]Harvey C R, Liu Y. A Census of the Factor Zoo[J/OL], SSRN, 2019, 3341728.
[21]Sharpe W F. Capital asset prices: A theory of market equilibrium under conditions of risk[J]. The Journal of Finance, 1964, 19: 425–442.
[22]Lintner J. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets[J]. The Review of Economics and Statistics, 1965, 47: 13–37.
[23]Carhart M M. On Persistence in Mutual Fund Performance[J]. The Journal of Finance, 1997, 1: 52, 57-82.
[24]Jegadeesh N, Titman S. Profitability of Momentum Strategies: An Evaluation of Alternative Explanations[J]. The Journal of Finance, 2001, 56(2): 699–720.
[25]Fama E F, French K R. A Five-factor Asset Pricing Model[J]. Journal of Financial Economics, 2014, 116(1).
[26]Hou K W, Xue C, Zhang L. Digesting Anomalies: An Investment Approach[J]. The Review of Financial Studies, 2015, 28(3): 650–705.
[27]Stambaugh, Robert F, Yuan Y. Mispricing Factors[J]. Review of Financial Studies, 2016, 1: 1270-1315.
[28]Smola A J, Schölkopf B. A tutorial on support vector regression[J]. Statistics and computing, 2004, 14(3): 199-222.
[29]Harvey C R, Liu Y, Zhu H. the Cross-Section of Expected Returns[J]. The Review of Financial Studies, 2016, 29: 5-68.
[30]Gu S H, Bryan K, Xiu D C. Empirical asset pricing via machine learning[J]. Review of Financial Studies, 2020, 33(5): 2223-2273.
[31]Bryzgalov S, Lerner S, Lettau M, et al. Missing financial data. Working paper, 2022.
[32]Light N, Maslov D, Rytchkov O. Aggregation of Information About the Cross Section of Stock Returns: A Latent Variable Approach[J]. The Review of Financial Studies, 2017, 4(30): 1339–1381.
[33]Rapach D E, Zhou G F. Time series and Cross sectional Stock Return Forecasting: New Machine Learning Methods[J]. Machine Learning for Asset Management, 2020, 1: 1-33.
[34]Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536.
[35]Daniel K, Mota L, Rottke S, et al. The Cross-Section of Risk and Returns[J], The Review of Financial Studies, 2020, 33: 1927–1979.
[36]Borovkova S, Tsiamas I. An ensemble of LSTM neural networks for high-frequency stock market classification[J]. Journal of Forecasting, 2019, 38(6): 600-619.  
[37]Bryzgalova S, Pelger M, Zhu J. Forest through the trees: Building crosssections of stock returns[J]. SSRN, 2020, 3493458.
[38]Lo A W, MacKinLay A C. Stock Market Prices do not Follow Random Walks: Evidence from a Simple Specification Test[J]. Review of Financial Studies, 1988, 1: 41-66.
[39]Giannone D M, Lenza G E, Primiceri. Economic predictions with big data: The illusion of sparsity[J]. Econometrica, 2021, 89(5): 2409 – 2437.
[40]Liu Y, Chen X, Wu D. Support vector regression for machine learning based decision making: Literature review and future directions[J]. Decision Support Systems, 2020, 138: 113376.
[41]Li H, Zhao S, Liu S, et al. A survey on deep learning: from algorithms to applications[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2019, 33(12): 1958002.
[42]Yan J, Li Y, Wang S, et al. A review of ensemble methods in machine learning[J]. Neural Computing and Applications, 2021: 1-17.
[43]Zhang Y, Zhang B, Sun Y, et al. A novel method for stock price prediction based on deep learning algorithm[J]. Journal of Big Data Mining and Analytics, 2019, 2(4): 266-277.
[44]Yu K, Han S, Choi H, et al. Random forest algorithm for classification of urban land cover from Landsat-8 multispectral data[J]. Geocarto International, 2020, 35(3): 279-293.
[45]Chen J, Chen J, Zhou J, et al. A comprehensive review of decision tree algorithms[J]. Journal of Big Data, 2021, 8(1): 12.
[46]Zhang H, Zhang X, Liu Y, et al. Application of BP neural network based on genetic algorithm in wheat quality evaluation[J]. Computer Technology and Development, 2018, 28(2): 12-17.
[47]Zhu J, Liu J, Zhang W, et al. Deep learning for crop classification using hyperspectral data: A review[J]. Journal of Plant Nutrition and Fertilizer, 2020, 26(1): 105-116.
[48]Li Y, Li X. Application of BP neural network in financial risk management[J]. Science and Technology Management Research, 2021, 41(1): 188-193.
[49]Markus Leippold, Wang Q, Zhou W Y, Machine learning in the Chinese stock market[J]. Journal of Financial Economics, 2022, 145(2): 64-82.
[50]Wang W, Zhang X. The application of convolutional neural network in image recognition[J]. Journal of Computer Applications, 2021, 41(2): 391-395.
[51]Baba-Yara F, Boyer B, Davis C. The factor model failure puzzle[J], Working paper, 2021.
[52]周金洪, 张玉清, 王国胤, 等. 一种基于模型选通性的LASSO变量选择方法[J]. 计算机学报, 2011, 34(8):1476-1485.
[53]李斌, 邵新月, 李玥阳. 机器学习驱动的基本面量化投资研究[J]. 中国工业经济, 2019(08):61-79.
[54]董祥宇, 张恺, 张利军, 等. 弹性网和自适应Lasso的变量选择方法比较[J]. 控制与决策, 2015, 30(1):33-39.
[55]刘静, 朱敏, 王宝全, 等. 基于稀疏感知的惩罚回归算法[J]. 计算机学报, 2016, 39(3):547-558.
[56]陈强, 李继宏, 张卫国, 等. 一种基于模型平均和惩罚回归的股票预测方法[J]. 电子学报, 2017, 45(2):346-351.
[57]熊中敏, 郭怀宇, 吴月欣. 缺失数据处理方法研究综述[J]. 计算机工程与应用, 2021, 57(14):27-38.
[58]许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述[J]. 计算机科学, 2022, 49(06):276-286.
[59]陈威, 王庆林, 王波, 等. 基于AdaBoost和C4. 5决策树的路网交通状况预测[J]. 计算机科学, 2015, 42(增刊2):125-128.
[60]任胜利, 王琼, 陈泽宇, 等. 基于决策树的网络空间态势感知模型[J]. 计算机应用研究, 2017, 34(12):3662-3666+3672.

所在学位评定分委会
金融
国内图书分类号
F832.5
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544690
专题商学院_金融系
推荐引用方式
GB/T 7714
杨帅. 基于缺失数据插补和机器学习的A股收益率预测研究[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132997-杨帅-金融系.pdf(2321KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[杨帅]的文章
百度学术
百度学术中相似的文章
[杨帅]的文章
必应学术
必应学术中相似的文章
[杨帅]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。