中文版 | English
题名

上市公司信用风险度量研究:基于KMV模型和数据挖掘方法

其他题名
RESEARCH ON CREDIT RISK MEASUREMENT OF LISTED COMPANIES: BASED ON KMV MODEL AND DATA MINING
姓名
学号
11849448
学位类型
硕士
学位专业
金融
导师
王新杰
论文答辩日期
2020-05-28
论文提交日期
2020-06-30
学位授予单位
哈尔滨工业大学
学位授予地点
深圳
摘要
本文提出一种基于面板数据的实证回测框架,和涵盖一致性、稳定性和时效性的,以ROC(Receiver Operating Characteristic)为评估指标的信用风险度量质量评估体系。基于回测框架和评估体系,以2000年至2019年我国A股上市公司作为研究样本,设置1年和2年两种预测期限,使用“ST”和“违约”两种信用风险标签,比较KMV模型、数据挖掘技术和信用评级三者的信用风险度量质量。本文全面论述了“ST”和“违约”的异同,创造性地运用“ST”预测“违约”:以数量较为充足的ST样本训练机器学习模型,预测数量相对匮乏的违约样本,解决了目前违约样本不足且时间上聚集而难以建模的问题。研究结果表明,基于数据挖掘技术,利用“ST”样本训练得到的模型的违约预测效果显著优于信用评级,具备可行性。在KMV模型实证研究中,本文探索不同输入变量和违约距离(DD)形式对于ST预测效果的影响,发现以违约点为短期负债、波动率为静态波动率、以个股超额收益率作为资产价值预期增长率估计的KMV-Merton模型的预测效果好于其他。并且,KMV模型违约距离从横截面数据看(逐个年份看)具有较好的准确度和稳定性,而基于面板数据的准确度下降明显;其中的原因是,违约距离随时间变化过于剧烈,而以超额收益率作为资产预期增长率估计具有“顺周期性”,进一步加剧了违约距离的波动,这使得违约距离跨时间可比性较差。而对于违约预测,结果表明KMV模型ROC在0.5附近,无法识别出“高市值、高杠杆、高增长、低股权波动率”的违约样本。在基于数据挖掘技术的实证研究中,本文还探究了数据预处理、特征选择、模型选择对模型预测效果的影响。数据预处理方面,对数据进行离散化(分箱)处理对模型提升作用大于模型的选择;特征选择方面,基于IV值的特征筛选方法与“专家特征”各有优势;市场数据能够提供财务数据以外的信用风险信息从而提高模型性能,其中股权价值和超额收益发挥主要作用,KMV违约距离贡献并不明显,并且,这种提升随预测期限增加而减弱。模型选择方面,逻辑回归和基于线性核函数的SVM模型总体表现好于随机森林模型,当预测期限为1年时,三者差距并不大。
其他摘要
This paper proposes an empirical back-testing framework based on panel data, and a quality appraisal system of credit risk measurement with ROC(Receiver Operating Characteristic) as core index, which covers consistency, stability and timeliness. Based on the back-test framework and quality evaluation system, taking the a-share listed companies in China from 2000 to 2019 as research samples, setting forecast periods as both 1 year and 2 years, using both labels of credit risk——"ST" and "default", this paper compares the quality of credit risk measurement of KMV model, data mining technology and credit rating. This paper comprehensively discusses the similarities and differences between "ST" and "default", and creatively use "ST" to predict "default" : train machine learning model by sufficient ST sample to predict default samples which is relatively scarce, solve the difficulty in directly modeling on default sample because of its current insufficient numbers and occurring centrally in time. The results show that the prediction effect of the model trained with "ST" based on data mining technology is significantly better than that of credit rating, and it is feasible to predict "default" with "ST". In the empirical study of KMV model, this paper explores the influence of different input variables and default distance(DD) forms on the prediction effect of ST, and finds that the KMV-Merton model with default point as short-term liability, volatility as static volatility, and excess return of individual stocks as the expected return on the firm’s assets has better prediction effect than other models. Moreover, the DD of KMV model has good accuracy and stability from the cross-sectional data (year by year), while the accuracy based on the panel data decreases significantly. The reason for the above is that the DD varies too dramatically with time, and as an estimate of the expected growth rate of assets, the excess return rate is "pro-cyclical", which further aggravates the volatility of the DD; Therefore, the DD becomes less comparable across time. As for default prediction, the results show that the ROC of KMV model is around 0.5, and it fails to identify default samples with "high market value, high leverage, high growth and low equity volatility". In the empirical research based on data mining technology, this paper explores the influence of data preprocessing, feature selection and model selection on predicting. In terms of data preprocessing, data discretization has more effect on model promotion than model selection; In the aspect of feature selection, feature selection based on IV value and “expert feature” have advantages respectively; compared with financial data, market data can provide additional information about credit risk during the 1-year forecast period , thus improving model performance; among which market value and excess returns play a major role, the contribution of DD is not significant, and the improvement is reduced with the increase of the forecast period. In terms of model selection, the performance of logistic regression and SVM model based on linear kernel is better than that of random forest model. However, when the forecast period is 1 year, there is not much difference between them.
关键词
其他关键词
语种
中文
培养类别
联合培养
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/143149
专题商学院_金融系
作者单位
南方科技大学
推荐引用方式
GB/T 7714
罗常章. 上市公司信用风险度量研究:基于KMV模型和数据挖掘方法[D]. 深圳. 哈尔滨工业大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
上市公司信用风险度量研究:基于KMV模型(4824KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[罗常章]的文章
百度学术
百度学术中相似的文章
[罗常章]的文章
必应学术
必应学术中相似的文章
[罗常章]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。