中文题名: | 某商业银行个人信用贷款违约风险研究——基于机器学习方法 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 025200 |
学科专业: | |
学生类型: | 硕士 |
学位: | 应用统计硕士 |
学位类型: | |
学位年度: | 2020 |
校区: | |
学院: | |
研究方向: | 应用统计 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2020-06-19 |
答辩日期: | 2020-05-29 |
外文题名: | Research on the Default Risk of Personal Credit Loan of a Commercial Bank —— Based on Machine Learning |
中文关键词: | 个人信用贷款 ; 违约风险 ; 机器学习方法 ; Stacking模型融合 |
外文关键词: | Personal Credit Loan ; Default Risk ; Machine Learning ; Stacking |
中文摘要: |
在互联网金融的背景下,随着居民收入水平的不断提高和新兴消费群体的逐步形成,商业银行的个人信用贷款产品取得了快速发展,成为了银行贷款业务中不可或缺的盈利业务。但与此同时,贷款客户的违约风险进一步增大,商业银行的不良贷款率也随之上升。因此,商业银行必须对个人信用贷款业务采取有效的风险管理措施。随着机器学习的快速发展,商业银行结合大数据技术,利用海量的客户贷款数据构建个人信用贷款违约预测模型,有效识别违约客户是银行健康可持续发展的关键。 本文结合机器学习方法对商业银行的个人信用贷款违约风险问题展开研究,以国内某商业银行2017-2019年的个人信用贷款数据作为实证研究对象,首先对原始数据进行预处理,并结合业务知识在原始特征的基础上进行某种组合生成新的特征,然后采用相关系数、LightGBM特征重要性排序等多种方法对特征进行重要性评分与选择,最后通过Logistic回归、随机森林、XGBoost、LightGBM和Stacking融合这五种机器学习方法来构建商业银行个人信用贷款违约风险预测模型。在此基础上,通过AUC和KS这两个指标,从不同特征子集和不同预测模型两个角度来比较模型的违约预测能力和风险区分能力。 在特征子集方面,与只经过数据预处理的原始特征相比,在原始特征基础上进行新特征的构造及冗余特征的剔除对模型效果有较大的提升作用,说明了特征衍生和特征选择的有效性。在预测模型方面,单一模型中,LightGBM和XGBoost的模型效果优于Logistic回归和随机森林。但基于随机森林、XGBoost和LightGBM的Stacking融合模型优于单一模型,融合模型具有更强的违约预测能力和风险区分能力,能够更加精准有效地识别违约客户。本文的研究结论可以为商业银行的贷款授信和违约风险预警提供一定的参考价值。 |
外文摘要: |
In the context of Internet finance, with the rising income level of residents and the emergence of emerging consumer groups, the personal credit loan products of commercial banks have achieved rapid development and become an indispensable profitable business in all bank lending. But at the same time, the default risk of loan customers has further increased, and the non-performing loan ratio of commercial banks has also increased. Therefore, commercial banks must take effective risk management measures for personal credit loan business. With the rapid development of machine learning, commercial banks use massive amounts of customer loan data to build prediction model of personal credit loan’s default rate by big data technology and effectively identify defaulting customers is the key to the healthy and sustainable development of commercial banks. This paper studies the default risk of personal credit loan of commercial banks by machine learning and takes the personal credit loan data of a domestic commercial bank from 2017 to 2019 as the empirical research object. First of all, this paper uses the Python software to clean and preprocess the original data set, and then combines the domain knowledge to generate new features based on the original features. Secondly, correlation coefficient method, LightGBM feature ranking and other methods are used to score the importance of features and select important features. Finally, Logistic Regression, Random Forest algorithm, XGBoost algorithm, LightGBM algorithm and Stacking algorithm are used to construct the default risk prediction model of personal credit loan in commercial banks. On this basis, the AUC index and KS index are used to compare the prediction ability and risk discrimination ability of models from two perspectives: different feature subsets and different prediction models. In terms of feature subsets, compared with the original features in data that have only been preprocessed, construction of new features and elimination of redundant features can greatly improve the model performance. This illustrates the effectiveness of feature derivation and feature selection. In terms of prediction models, for the single model, LightGBM and XGBoost are better than Logistic Regression and Random Forest. However, the Stacking model based on Random Forest, XGBoost and LightGBM is superior to the single model. Because the Stacking model has stronger prediction ability and risk discrimination ability, and it can identify defaulting customers more accurately and effectively. The research conclusions of this paper can provide some reference value for the loan credit and default risk warning of commercial banks. |
参考文献总数: | 71 |
作者简介: | 王婷,北京师范大学统计学院应用统计专业硕士研究生 |
馆藏号: | 硕025200/20045 |
开放日期: | 2021-06-19 |