中文版 | English
题名

基于机器学习的网络借贷反欺诈模型

其他题名
ONLINE LENDING ANTI-FRAUD MODEL BASED ON MACHINE LEARNING
姓名
学号
11930286
学位类型
硕士
学位专业
数学
导师
王新杰
论文答辩日期
2021-05-18
论文提交日期
2021-06-10
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
近年来,在互联网行业的蓬勃发展下,"互联网+"的模式也改变了诸多行业的运营模式,网络借贷业务开始兴起,并迅速发展,目前呈现出数据信息复杂化、支付途径多样化的特征。但与此同时,该行业的发展也出现了诸多问题,尤其是欺诈用户的交易数量不断增多,且欺诈手段狡猾、不易察觉,呈现出技术化、专业化和规模化等特征,使得不少网络借贷平台因此利益受损。近年来,为了打击部分网络借贷服务的欺诈行为,国家层面出台了多项政策,不断完善相关规定。同时,金融机构自身也应当不断完善相关机制,建立大型数据管理系统。本文基于拍拍贷平台公布的网络借贷业务欺诈监测数据展开研究,首先将这些数据集以归一化的方式进行处理,采取SMOTE对数据的类别不平衡进行处理,并在随机森林选择重要特征分析的上对其特征进行排序和筛选;利用随机森林计算单个特征变量的重要性以便实现特征选择。其数学原理在于:袋外的准确率在加入某个特征随机噪声后大幅降低,则分类结果受这个特征的影响很大,即重要程度比较高。采用SMOTE方法对训练样本进行重新构建,SMOTE通过在定义的邻域内的几个少数类实例之间进行插值而创建新数据,而不是简单地复制少数类实例,因此能够有效合成新的少数类数据。本文在研究过程中对数据集进行训练,采用的模型主要有逻辑回归,随机森林、Boosting 模型(AdaBoost、XGBoost)和Bagging,Stacking融合模型,按照评估指标(准确度、精确度、AUC 值、召回率) 对六类模型预测结果的评估进行比较评估。通过实证分析得到结果,发现基于Bagging融合模型构建的网络借贷欺诈行为检测的评估模型(LR-RF-XGBoost- AdaBoost投票模型)能够较好地对网络借贷交易中的欺诈行为进行检测。
其他摘要
In recent years, with the vigorous development of the Internet industry, the "Internet+" model has changed the operating mode of many industries. Online lending business has begun to rise and develop rapidly. It currently presents the characteristics of complex data and diversified payment channels. . However, at the same time, many problems have emerged in the development of the industry, especially the increasing number of fraudulent user transactions, and the fraudulent methods are cunning and difficult to detect, showing the characteristics of technology, specialization, and scale, which makes many consumers consume The interests of consumers and financial institutions suffer as a result. In recent years, in order to combat fraud in some online lending services, a number of policies have been issued at the national level to continuously improve relevant legislation to protect the property safety of financial institutions and consumers. At the same time, financial institutions themselves should continue to improve relevant mechanisms, establish large-scale database management systems, and strengthen the monitoring of fraudulent activities, so as to ensure that their legitimate economic interests will not be harmed and promote the long-term and stable development of financial institutions. Based on the online lending business fraud monitoring data published by paipai loan platform, this paper studies these data sets. Firstly, these data sets are processed in a normalized way, and smote is used to deal with the imbalance of data categories, and the features are sorted and screened on the basis of random forest selection important feature analysis; In the feature selection stage, an important feature of random forest model is used: the importance of calculating a single feature variable. The mathematical principle is: if a feature is added noise randomly, the accuracy outside the bag will be greatly reduced, which means that the feature has a great influence on the classification results of samples, that is to say, it is of high importance. In the reconstruction of samples, smote method is used to resample the minority samples. Smote creates new data by interpolating several minority instances in the defined neighborhood, instead of simply copying the minority instances, so it can synthesize new minority data effectively.The models used mainly include traditional models (logistic regression LR, random forest), Boosting models (AdaBoost, XGBoost) and bagging models. The evaluation and comparison of the prediction results of the five types of models are carried out after parameter adjustment and optimization, and the comparison and evaluation are carried out according to the evaluation indicators (accuracy, precision, AUC value). Through empirical analysis, it is found that the evaluation model of online loan fraud detection based on the fusion model (LR-RF-XGBoost-AdaBoost voting model) can better detect fraud in online loan transactions.
关键词
其他关键词
语种
中文
培养类别
独立培养
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/229887
专题商学院_金融系
作者单位
南方科技大学
推荐引用方式
GB/T 7714
何雯芳. 基于机器学习的网络借贷反欺诈模型[D]. 深圳. 南方科技大学,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
基于机器学习的网络借贷反欺诈模型.pdf(2321KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[何雯芳]的文章
百度学术
百度学术中相似的文章
[何雯芳]的文章
必应学术
必应学术中相似的文章
[何雯芳]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。