查看论文信息

查看全文

查看论文信息

中文题名：	基于集成学习的金融风控算法研究
姓名：	梁惠雯
保密级别：	公开
论文语种：	chi
学科代码：	025200
学科专业：	应用统计
学生类型：	硕士
学位：	应用统计硕士
学位类型：	专业学位
学位年度：	2023
校区：	北京校区培养
学院：	统计学院
研究方向：	机器学习、高维数据分析
第一导师姓名：	赵俊龙
第一导师单位：	统计学院
提交日期：	2023-06-20
答辩日期：	2023-05-12
外文题名：	Research on Risk Control Algorithm based on Ensemble Learning
中文关键词：	金融风控 ; 信用评分模型 ; 反欺诈模型 ; 机器学习 ; 异常检测 ; 集成
外文关键词：	Financial risk control ; Credit scoring models ; Anti-fraud models ; Machine learning ; Anomaly detection ; Integration
中文摘要：	︿随着大数据、区块链、人工智能等互联网技术的迅猛发展，网络信贷等新兴金融业态正逐步融入人们的生活。在网络贷款业务爆发式增长，不断满足金融普惠和各式小微贷款的同时，不良贷款情况也在呈现上升趋势。针对金融信贷行业的信贷管理效率与风险问题，主要研究金融信贷行业中的风控问题。具体包括反欺诈模型和信用评分模型的构建，并应用有监督和无监督的学习方法研究。本文的主要工作如下：（1）对数据进行探索性处理，具体操作为对数据集的不同维度数据进行数据缺失值异常值处理、SMOTE算法扩充数据集、数据归一化，其次根据变量可解释性和稳定性，以及相应的金融业务逻辑，进行多种特征工程包括特征衍生和特征筛选；（2）对于构建信用评分模型，首先研究多种机器学习算法，包括几种梯度提升树xgboost、gbdt和lightgbm等基模型，对比模型的优劣势；（3）对于构建信用评分模型，进一步采用了基于集成学习金融风控框架，以此弥补单一模型存在的局限性，在研究中采用多种基础模型，搭建Bagging集成和Stacking集成的学习框架以减少模型偏差，且采用自动超参数优化算法hyperopt代替传统超参数优化算法进行参数调优。相较于当前大多数基于一两种基模型的研究，本文全面地研究多种主流模型、多种集成模型在金融风控场景下的训练效果；（4）对于反欺诈模型，对于个体欺诈采用传统异常检测算法研究，其中异常检测算法为无监督学习模型，模型能够有效解决实际场景下欺诈负样本极少、无标签且欺诈手段更新迭代的问题，采用基于局部密度和全局随机搜索的集成异常检测算法，构建有效的反欺诈模型。﹀
外文摘要：	︿ With the rapid development of Internet technologies such as big data, blockchain and artificial intelligence, emerging financial businesses such as online credit are gradually integrating into people's lives. With the explosive growth of online loan business, which constantly meets the needs of financial inclusion and various small and micro loans, the situation of non-performing loans is also on the rise. Aiming at the efficiency and risk of credit management in the financial credit industry, this paper mainly studies the risk control in the financial credit industry. Specifically, it includes the construction of anti-fraud model and credit scoring model, and the application of supervised and unsupervised learning methods. The main work of this paper is as follows: (1) Exploratory processing of the data, specific operations for the different dimensions of the data set data missing value outlier processing, SMOTE extended data set, normalization of the data, secondly according to the interpretation and stability of variables, as well as the corresponding financial business logic, conduct a variety of feature engineering including feature derivation and feature screening; (2) For the construction of credit scoring model, firstly, different machine learning algorithms, including several gradient lifting tree xgboost, gbdt, lightgbm and other based models, are studied to study the advantages and disadvantages of the models; (3) For the construction of credit scoring model, the financial risk control framework based on integrated learning is further adopted to make up for the limitations of a single model. In this study, multiple basic models are used to build the learning frameworks of Bagging integration and Stacking integration to reduce model deviations. In addition, hyperopt, an automatic hyperparameter optimization algorithm, is used to replace the traditional hyperparameter optimization algorithm. Compared with most current researches based on one or two basic models, this paper comprehensively studies the training effects of various mainstream models and different integrated models in the financial risk control scenario. (4) For the anti-fraud model, the traditional anomaly detection algorithm is used to study individual fraud, among which the anomaly detection algorithm is an unsupervised learning model. The model can effectively solve the problem that the fraud negative samples are very few, there are no labels, and the fraud means are updated iteratively in the actual scene. The integrated anomaly detection algorithm based on local density and global random search is used to build an effective anti-fraud model. ﹀
参考文献总数：	23
馆藏号：	硕025200/23017
开放日期：	2024-06-19

附件下载