- 无标题文档
查看论文信息

中文题名:

 基于机器学习的个人信用评分模型的研究    

姓名:

 陆忠信    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科专业:

 应用统计    

学生类型:

 硕士    

学位:

 应用统计硕士    

学位类型:

 专业学位    

学位年度:

 2024    

校区:

 珠海校区培养    

学院:

 统计学院    

研究方向:

 机器学习    

第一导师姓名:

 唐军    

第一导师单位:

 统计学院    

提交日期:

 2024-06-04    

答辩日期:

 2024-05-22    

外文题名:

 RESEARCH ON PERSONAL CREDIT SCORING MODEL BASED ON MACHINE LEARNING    

中文关键词:

 金融风控 ; 个人信用评分 ; 集成学习 ; 不平衡数据 ; 随机欠采样 ; XGBoost ; Logistic 回归    

外文关键词:

 Financial risk control ; Personal credit score ; Integrated learning ; Imbalance data ; Random undersampling ; XGBoost ; Logistic regression    

中文摘要:

经济的发展离不开信用,市场经济的本质就是信用经济。实践证明,诚实守信不是每
个人与生俱来的品质,现实中存在很多不诚信行为。2013 年我国最高人民法院颁布《关于
公布失信被执行人名单信息的若干规定》之后,截至 2024 年,失信被执行人数量已经超过
857 万人。而银行等金融机构如果将贷款放给失信被执行人毫无疑问会造成损失。大部分
金融机构为了避免损失,会对申请贷款的客户的各项条件进行数据分析,以判断客户的违
约风险以决定是否放贷。传统的个人信用评分模型建立在统计学基础之上,而机器学习的
大火让个人信用模型有了进一步发展。
本文针对发展个人信用评分模型的现实要求,查询了解国内外相关文献之后介绍了个
人信用评分模型的发展历程,推导了多种机器学习算法原理以及多种评价指标。在实际多
面对不平衡数据集时,介绍 SMOTE 采样和随机欠采样的方法以处理数据的不平衡性。实
证上选用 Kaggle 网站上信贷逾期的公开数据集,以“是否违约”为响应变量构建二分类模
型。首先对数据的缺失值和异常值进行了处理,然后针对数据的不平衡性使用随机欠采样
的方式进行处理。对于预处理之后的数据利用 Python 编程语言,先后选择了 Logistic 回归、
朴素 Bayes、梯度提升、随机森林和 XGBoost 进行模型构建。在多种评价指标的比较下验
证了集成学习模型预测效果要比单一模型好,但是单一模型在个别指标上更优秀,要结合
实际业务需求选择合适模型。并且发现不同集成学习算法特征重要度有较大差异,对应于
实际工作时,不同模型所看重的指标有所不同。最后对银行等金融机构在个人信用模型构
建方面提出现实可行的建议。

外文摘要:

Credit is indispensable to economic development, and the essence of a market 
economy is a credit economy. Practice has proved that honesty and trustworthiness are 
not inherent qualities of every person, and there are many dishonest behaviors in 
reality.After the Supreme People's Court of China issued the “Several Provisions on 
Publishing Information on the List of Defaulted Executives” in 2013, the number of 
defaulted executives has exceeded 8.57 million as of 2024. And banks and other 
financial institutions will undoubtedly incur losses if they lend loans to the defaulted 
executors. In order to avoid losses, most financial institutions will analyze data on the 
conditions of customers applying for loans in order to determine their default risk to 
decide whether to lend. Traditional personal credit scoring models are built on statistics, 
and the fire of machine learning has allowed further development of personal credit 
models.
In this paper, we introduce the development history of personal credit scoring 
model, deduce the principles of various machine learning algorithms and various 
evaluation indexes after inquiring about the relevant literature at home and abroad to 
meet the realistic requirements of developing personal credit scoring model. When 
facing unbalanced datasets in practice, SMOTE sampling and random undersampling 
methods are introduced to deal with the unbalanced nature of the data. Empirically, the 
public dataset of credit delinquency on Kaggle website is used to construct a binary 
classification model with "whether default" as the response variable. First, the missing 
values and outliers of the data are processed, and then the imbalance of the data is 
processed by random undersampling. For the preprocessed data, Logistic Regression, 
Simple Bayes, Gradient Boost, Random Forest, and XGBoost were selected for model 
construction using Python programming language. Under the comparison of multiple 
evaluation indexes, it is verified that the prediction effect of the integrated learning 
model is better than that of a single model, but a single model is better in individual 
indexes, and it is necessary to choose a suitable model in combination with the actual 
business requirements. And it is found that the feature importance of different integratedlearning algorithms has a large difference, corresponding to the actual work, the 
indicators valued by different models are different. Finally, realistic and feasible 
suggestions are put forward for banks and other financial institutions in the construction 
of personal credit models.

参考文献总数:

 36    

作者简介:

 北师大学生    

馆藏地:

 总馆B301    

馆藏号:

 硕025200/24035Z    

开放日期:

 2025-06-05    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式