中文版 | English
题名

基于数据挖掘的银行电话精准营销

其他题名
PRECISE MARKETING OF BANK CALLS BASED ON DATA MINING
姓名
学号
11849390
学位类型
硕士
学位专业
应用统计
导师
夏志宏
论文答辩日期
2020-05-30
论文提交日期
2020-07-20
学位授予单位
哈尔滨工业大学
学位授予地点
深圳
摘要
随着数据时代的到来和数据挖掘技术的广泛应用,银行金融产品不再是以前单一的、广泛式的撒网销售,而是依托于大数据的智能分析和算法的准确判断,进行多样化的精准营销。电话营销作为银行业传统的营销方式,在获取客户方面是行之有效的。传统的银行电话营销方式由于随机性、低命中率,难以满足时代发展的需求。如何利用好银行数据库中的各种数据,进行科学有效的电话营销,是实现银行数字化、网点智能化的关键所在。本文的研究对象是预测银行电话营销的结果,由于行业特点,银行客户数据集是不平衡的。目前的银行电话营销研究大多集中在提升模型效果上,虽然模型预测结果都不错,但由于模型训练选取的数据集多为平衡数据集,改变了数据集原有的分布,且仅凭一两个指标便选出最优模型,不够全面,与实际应用有所偏差。因此,本课题从数据层面出发,结合数据集的分布,利用数据挖掘工具,综合考虑多个评价指标,研究不平衡数据集的各种采样策略对模型效果的影响,对比得到最佳采样策略,寻找最佳采样策略下的最佳模型,以此来提高模型预测的效果和营销成功率,实现精准营销。最后,对预测错误的样本进行了统计分析,并根据预测结果进行客群分类,挖掘潜在客户具有的特征,从增加银行收益和降低获客成本两个方面,为银行电话营销提供切实有效的建议。本文实验数据集选自UCI网站的葡萄牙银行数据集,共有41188条,正负比例为1:7.8。我们按照8:2的比例将数据集划分为训练集和测试集,利用ENN、Borderline-SMOTE、SMOTE+ENN和本文提出的TS采样等方式对训练集采样,接着用逻辑回归、决策树、XGBoost、LightGBM等分类模型对采样数据训练。通过综合分析F1值、KS值、AUC值等评价指标发现,ENN采样在各个模型上的综合效果最好,且ENN采样下的LightGBM模型预测效果最佳。本文提出的TS采样效果不明显,经分析,主要原因是采样过程中样本信息重复利用,容易过拟合。另外,通过ENN客群分类发现,潜在客户的特征和营销成功客户的特征十分相似,主要表现在以下方面:年龄在31岁到50岁的中青年;具有高中及以上的学历;工作较为稳定,比如技术员、管理员等职业;婚姻状况稳定的结婚人士;无不良记录,比如违约贷款,房贷;比较青睐于cellular这种联系方式等特点。
其他摘要
With the advent of the data era and the widespread application of data mining technology, bank financial products are no longer a single, extensive Internet sales, but rely on intelligent analysis of big data and accurate judgment of algorithms to diversify precision marketing. As a traditional marketing method in the banking industry, telemarketing is effective in acquiring customers. The traditional bank telephone marketing method is difficult to meet the needs of the times due to randomness and low hit rate. How to make good use of various data in the database of the bank to carry out scientific and effective telephone marketing is the key to realizing the digitalization of banks and the intelligence of outlets.The research object of this paper is to predict the results of bank telephone marketing. Due to the characteristics of the industry, bank customer data sets are unbalanced. Most of the current bank telephone marketing research focuses on improving the effectiveness of the model. Although the model prediction results are good, because the data set selected for model training is mostly a balanced data set, the original distribution of the data set is changed. The optimal model is selected based on only one or two indicators, which is not comprehensive enough and deviates from the actual application. Therefore, this topic starts from the data level, combines the distribution of data sets, uses data mining tools, comprehensively considers multiple evaluation indicators, studies the impact of various sampling strategies of unbalanced data sets on model effects, and compares to obtain the best sampling strategy. Find the best model under the best sampling strategy. Therefore, this topic starts from the data level, combines the distribution of data sets, uses data mining tools, comprehensively considers multiple evaluation indicators, studies the impact of various sampling strategies of unbalanced data sets on model effects, and compares to obtain the best sampling strategy. Looking for the best model under the best sampling strategy, in order to improve the model prediction effect and marketing success rate, to achieve precision marketing. Finally, a statistical analysis was made on the sample of prediction errors, and the customer group was classified according to the prediction results, and the characteristics of potential customers were tapped to provide practical and effective suggestions for bank telephone marketing from the aspects of increasing bank profits and reducing customer acquisition costs. The experimental data set in this paper is selected from the Portuguese bank data set on the UCI website, with a total of 41188 items, with a positive and negative ratio of 1:7.8. According to the ratio of 8:2, the data set is divided into training set and test set, and the training set is sampled by means of ENN, Borderline-SMOTE, SMOTE + ENN, and TS sampling proposed in this paper, and then using logistic regression, decision tree, XGBoost, LightGBM and other classification models to train the sampled data. Through comprehensive analysis of F1 value, KS value, AUC value and other evaluation indicators, it is found that ENN sampling has the best comprehensive effect on each model, and the LightGBM model under ENN sampling has the best prediction effect. The TS sampling effect proposed in this paper is not obvious. After analysis, the main reason is that the information is reused during the sampling process, which is easy to overfit. In addition, through the classification of ENN customer groups, the characteristics of potential customers are very similar to those of successful customers, mainly in the following aspects: young and middle-aged people between the ages of 31 and 50; have high school education and above; work is relatively stable, such as technicians, administrators and other occupations; married persons with stable marital status; no bad records, such as default loans, mortgages; due to the characteristics of cellular contact.
关键词
其他关键词
语种
中文
培养类别
联合培养
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/142642
专题创新创业学院
作者单位
南方科技大学
推荐引用方式
GB/T 7714
王梦蓉. 基于数据挖掘的银行电话精准营销[D]. 深圳. 哈尔滨工业大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
基于数据挖掘的银行电话精准营销.pdf(1358KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[王梦蓉]的文章
百度学术
百度学术中相似的文章
[王梦蓉]的文章
必应学术
必应学术中相似的文章
[王梦蓉]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。