- 无标题文档
查看论文信息

中文题名:

 基于机器学习方法的感染性疾病预测模型研究——以新冠疫情为例    

姓名:

 李玮坤    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科专业:

 应用统计    

学生类型:

 硕士    

学位:

 应用统计硕士    

学位类型:

 专业学位    

学位年度:

 2024    

校区:

 珠海校区培养    

学院:

 统计学院    

研究方向:

 应用统计    

第一导师姓名:

 宋旭光    

第一导师单位:

 统计学院    

提交日期:

 2024-06-04    

答辩日期:

 2024-05-16    

外文题名:

 RESEARCH ON PREDICTION MODEL OF INFECTIOUS DISEASES BASED ON MACHINE LEARNING METHOD -- TAKING COVID-19 AS AN EXAMPLE    

中文关键词:

 机器学习 ; 感染性疾病预测模型 ; 新冠疫情 ; 逻辑回归 ; 随机森林    

外文关键词:

 Machine learning ; Infectious diseases prediction model ; Novel coronavirus pneumonia ; Logistic ; Random Forest    

中文摘要:

近年来全球发生了一项重大事件,即新冠疫情的爆发。此次疫情的持续时间之久,影响之广都令人震惊,但同时也给人们一个警示:感染性疾病仍然威胁着人们的生命安全,提高感染性疾病预测的水平十分重要。随着新冠疫情的发展,机器学习方法在感染性疾病领域也越来越活跃。前人学者们的研究表明,机器学习方法在感染性疾病的不同领域均发挥着重要作用,但基于机器学习方法的个体层次上的感染性疾病预测模型的研究尚少,产生的影响未知。因此本文以新冠疫情为例,搜集新冠肺炎感染患者的个体数据,进行新冠肺炎预测模型的构建和研究,以此探究机器学习方法在感染性疾病个体预测中的应用和影响。

本文主要构造logistic回归模型,随机森林模型和CatBoost模型来对新冠肺炎患者进行个体预测。整个研究过程涵盖以下关键步骤:特征的构建,包括特种的编码,响应变量的构建;数据的预处理工作,包括缺失值、异常值的处理,平衡化处理;进行特征选择并构建最终模型;进行预测分析和模型比较。机器学习方法的应用体现在特征选择和最终模型构建方法上。研究结果表明,机器学习方法在感染性疾病预测可应用于特征选择和提高预测准确率两个层次。在特征选择上,机器学习方法比传统的变量选择法具有更好的效果,对于原始数据特征的解读更为详细,因此能提取的特征信息更为丰富。对于提高预测准确率,结果表明logistic回归模型,随机森林模型和CatBoost模型的预测效果逐渐提高,后两者是基于机器学习方法的预测模型,前者是基于传统线性回归的预测模型,由此可知机器学习方法确实有着提高感染性疾病预测效果的作用。与此同时,对比发现CatBoost模型在三个模型中的预测效果最好,验证了CatBoost算法在感染性疾病预测研究中可用性。

外文摘要:

In recent years, a major event has occurred in the world, that is, the outbreak of the COVID-19 epidemic. The duration of this epidemic is shocking, but it also gives a warning: infectious diseases still threaten people's life safety, and it is very important to improve the level of prediction of infectious diseases. With the development of the COVID-19, machine learning methods are becoming more and more active in the field of infectious diseases. Previous scholars' research shows that machine learning methods play an important role in different fields of infectious diseases, but the research on individual level prediction models of infectious diseases based on machine learning methods is still rare, and the impact is unknown. Therefore, this paper takes the COVID-19 as an example, collects individual data of COVID-19 infected patients, constructs and studies the prediction model of COVID-19, and explores the application and impact of machine learning methods in individual prediction of infectious diseases.
This paper mainly constructs logistic regression model, random forest model and CatBoost model to predict the individual of COVID-19 patients. The entire research process involves the following key steps: feature construction, including special encoding, and construction of response variables; Pre processing of data, including handling missing and abnormal values, and balancing processing; Perform feature selection and construct the final model; Perform predictive analysis and model comparison. The application of machine learning methods is reflected in feature selection and final model construction methods. The research results show that the machine learning method can be applied to feature selection and improve prediction accuracy in infectious diseases prediction. In terms of feature selection, machine learning methods have better performance than traditional variable selection methods, and provide more detailed interpretation of the original data features, resulting in richer feature information that can be extracted. For improving the prediction accuracy, the results show that the prediction effect of logistic regression model, random forest model and CatBoost model is gradually improved. The latter two are prediction models based on machine learning methods, and the former is prediction models based on traditional linear regression, so it can be seen that machine learning methods can really improve the prediction effect of infectious diseases. At the same time, the comparison shows that the CatBoost model has the best prediction effect among the three models, which verifies the availability of the CatBoost algorithm in the prediction of infectious diseases.

参考文献总数:

 40    

馆藏地:

 总馆B301    

馆藏号:

 硕025200/24027Z    

开放日期:

 2025-06-04    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式