查看论文信息

查看全文

查看论文信息

中文题名：	上市公司破产风险评估——基于不同二分类模型的比较研究
姓名：	黄成杰
保密级别：	公开
论文语种：	中文
学科代码：	025200
学科专业：	应用统计
学生类型：	硕士
学位：	应用统计硕士
学位类型：	专业学位
学位年度：	2022
校区：	北京校区培养
学院：	统计学院/国民核算研究院
研究方向：	应用统计学
第一导师姓名：	石峻驿
第一导师单位：	北京师范大学统计学院
提交日期：	2022-06-08
答辩日期：	2022-05-27
外文题名：	BANKRUPTCY RISK ASSESSMENT OF LISTED COMPANIES——COMPARATIVE STUDY BASED ON DIFFERENT BINARY CLASSIFICATION MODELS
中文关键词：	杜邦分析法 ; 阿特曼 Z-Score 模型 ; 两分类逻辑回归模型 ; 随机森林模型 ; SMOTE 抽样法
外文关键词：	Uneven distribution of positive and negative samples ; DuPont analysis ; Altman Z-score model ; Logistic regression model ; Random forest model ; Smote sampling method
中文摘要：	︿本论文旨在研究不同二分类模型在上市公司破产风险评估这一问题下的分类效果，以评估不同模型的利弊，并寻找效果最优的模型，为企业管理者、投资人、金融监管者在上市企业经营状况与风险评估方面提供帮助。本论文选用的基础数据是3695家A股上市公司的财务指标，其中共75家退市企业，3620家非退市企业，每个样本共44个指标，其中包含8个财务核心指标。本论文选用的模型有4大类，包括杜邦分析法、阿特曼Z-Score模型、两分类逻辑回归模型、随机森林模型，针对每个模型可能存在的问题进行逐步优化，最终本论文共构建了10个模型，分别是杜邦分析法、阿特曼Z-Score模型（阈值=经典阈值1.81）、阿特曼Z-Score模型（寻求最佳阈值）、原始两分类逻辑回归模型（阈值=0.5）、原始两分类逻辑回归模型（寻找最优阈值）、基于SMOTE抽样法的两分类逻辑回归模型（阈值=0.5）、原始随机森林模型、基于SMOTE抽样法的随机森林模型、使用classwt参数的随机森林模型、使用samplesize参数的随机森林模型。在分类效果比较标准的选取上，本论文选取了6个指标，分别为正确率、召回率、精确率、F1 Score、AUC、IBA。同时，本论文通过Z-Score标准化，并按照重要性为上述6个指标分配权重，构建了综合评价指数，得出如下结论：首先，针对本论文中正负样本分布不均衡的情况，原始两分类逻辑回归模型（阈值=0.5）效果非常不理想，尽管分类正确率高达97.97%，但是正样本召回率仅有1.33%。针对剩余9个模型，按照综合评价指数作为衡量标准，分类效果由好到坏依次为：利用samplesize参数的随机森林模型、基于SMOTE抽样法的两分类逻辑回归模型（阈值=0.5）、原始两分类逻辑回归模型（寻找最优阈值）、使用classwt参数的随机森林模型、原始随机森林模型、基于SMOTE抽样法的随机森林模型、Z-Score模型（阈值=经典阈值1.81）、Z-Score模型（求最佳阈值）、杜邦分析法。针对四大类模型间的效果比较，两分类逻辑回归模型和随机森林模型的效果接近，优于Z-Score模型，优于杜邦分析法。两分类逻辑回归模型中，逻辑回归模型（求最佳阈值）的效果和基于SMOTE抽样法的两分类逻辑回归模型（阈值=0.5）的分类正确率在75%上下，且正样本召回率在65%上下。随机森林模型中，使用samplesize参数的随机森林模型，分类准确率高达到89%，且正样本召回率达到53%。Z-Score模型中，阿特曼设定的经典阈值1.81的效果优于通过训练集寻求分类效果指标IBA最大时所对应的阈值。杜邦分析法是构建难度最低的模型，且其使用的核心指标权益净利率的重要性，在逻辑回归模型变量显著性检验以及随机森林模型特征重要性分析中得到了印证。通过遍历的方式，寻找到了杜邦分析法最优阈值=0.01，即如果一家A股上市企业的年度财报中，权益净利率或净资产收益率低于0.01，那该企业就有较高的破产风险。﹀
外文摘要：	︿ This paper aims to find the best risk classification model when studying bankruptcy risk assessment of listed companies, and to provide help for company managers, investors and researchers to evaluate the financial situation of listed companies. The financial indicators of 3695 A-share listed companies, including 75 delisted enterprises and 3620 non-delisted enterprises are collected as the basic data. Each sample has 44 indicators, including 8 financial core indicators. The models selected in this paper are divided into four categories, including DuPont analysis, Altman Z-score model, logistic regression model and random forest model. The possible problems of each model are gradually optimized. Finally, this paper constructs 10 models in total, namely DuPont analysis, Altman Z-score Model (threshold = classical threshold 1.81), Altman Z-score Model (seeking the best threshold), original logistic regression model (threshold = 0.5), original logistic regression model (looking for the optimal threshold), logistic regression model based on smote sampling method (threshold = 0.5), original random forest model, random forest model based on smote sampling method, random forest model with classwt parameter and random forest model with samplesize parameter. In the selection of comparison standards, this paper selects six indicators, namely accuracy, recall, precision, F1 score, AUC and IBA. At the same time, through Z-score standardization, this paper assigns weights to the above six indicators according to the importance of the indicators, and then sums them to build a comprehensive evaluation index. This paper comes to some important conclusions as follows. Firstly, in the case of the uneven distribution of samples in different categories, the effect of the original logistic regression model (threshold = 0.5) is very unsatisfactory. Although the classification accuracy is as high as 97.97%, the recall rate of positive samples is only 1.33%. For the remaining nine models, according to the comprehensive evaluation index as the measurement standard, the classification effect from good to bad is: random forest model using samplesize parameter, logistic regression model based on smote sampling method (threshold = 0.5), original logistic regression model (looking for the optimal threshold), random forest model using classwt parameter, original random forest model, random forest model based on smote sampling method Z-score Model (threshold = classical threshold 1.81), Z-score Model (seeking the best threshold), DuPont analysis. For the comparison of the effects of the four categories of models, the logistic regression model has the same effect as the random forest model, which is better than Z-score model and DuPont analysis. In the logistic regression model, the effect of the logistic regression model (seeking the best threshold) and the classification accuracy of logistic regression model based on smote sampling method (threshold = 0.5) are about 75%, and the recall rate of positive samples is about 65%. In the random forest model, the random forest model using the samplesize parameter has a high classification accuracy of 89% and a positive sample recall rate of 53%. In the Z-score Model, the effect of the classical threshold of 1.81 set by Altman is better than that when the classification index IBA is the largest through the training set. DuPont analysis is the simplest model to build, and the importance of its core index net interest rate of equity is confirmed in the significant test for variates in the logistic regression model and characteristic importance analysis of random forest model. Through traversal, the optimal threshold of DuPont analysis method is found to be 0.01, that is, if the annual financial report of an A-share listed enterprise, the net interest rate of equity or return on net assets is lower than 0.01, the enterprise will have a higher bankruptcy risk. ﹀
参考文献总数：	29
馆藏号：	硕025200/22040
开放日期：	2023-06-08

附件下载