中文题名: | 基于统计学习方法的身体质量指数多分类预测研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 071201 |
学科专业: | |
学生类型: | 学士 |
学位: | 理学学士 |
学位年度: | 2023 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-09 |
答辩日期: | 2023-05-16 |
外文题名: | Analysis of Body Mass Index Multi-Classification —Based on Statistical Learning Methods |
中文关键词: | |
外文关键词: | Naïve-Bayes ; K-nearest neighbor ; Decision tree ; Random forest ; XGBoost |
中文摘要: |
身体质量指数(BMI)作为一种衡量个体身体质量的指标,可以依据BMI指数对个人身体质量水平进行分类,分类的特征变量涵盖了饮食、运动、生活方式等方面。本文应用了多种统计学习的方法,对身体质量指数建立多分类模型。所用到的分类方法包括单一分类算法和集成学习算法,比如朴素贝叶斯、K-最近邻、决策树、随机森林和XGBoost分类。在模型建立前,本文主要采用V折交叉验证,对需要预先给定参数的模型进行调参;使用训练数据建立模型后,本文在预测集上计算了模型分类的混淆矩阵,以及准确率、F1分数、Kappa系数等相关指标,对各个模型的分类效果进行了评价与对比,K-最近邻和XGBoost方法的分类准确率、精确率和召回率均超过80%。另外,本文在使用决策树、随机森林和极致梯度提升树建模分析时,计算了特征变量在分类模型中的相对重要性,找到了会对身体质量水平造成较大影响的变量。 |
外文摘要: |
Body mass index (BMI), as a measurement index of individual body quality, can be used for multi-classification. The characteristic variables of the classification include diet, exercise, lifestyle and other aspects. This paper uses a variety of statistical learning methods to establish a multi-classification model of body mass index. The classification methods include single classification algorithms and ensemble learning algorithms, such as naive bayes, K-nearest neighbor, decision trees, random forests, and XGBoost classification. Before establishing the model, this paper mainly uses the method of V-fold cross-validation to adjust parameters in advance. After establishing the model with training data, this paper uses the prediction set to calculate confusion matrix, accuracy, F1-score, and kappa coefficient. As a result, the accuracy, precision, and recall of K-nearest neighbor and XGBoost methods are all over 80%. In addition, when using decision tree, random forest and extreme gradient boosting for modeling and analysis, this paper calculates the importance of variables and finds the important features. |
参考文献总数: | 23 |
馆藏号: | 本071201/23031 |
开放日期: | 2024-06-08 |