中文题名: | 常见分类方法性能的比较 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 071201 |
学科专业: | |
学生类型: | 学士 |
学位: | 理学学士 |
学位年度: | 2018 |
学校: | 北京师范大学 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2018-05-22 |
答辩日期: | 2018-05-16 |
外文题名: | Comparison of Common Classification Methods |
中文关键词: | |
中文摘要: |
目前常见的分类算法有KNN算法、支持向量机和随机森林,它们在现实生活中都有着广泛的应用。但这三种算法思路不同,特点不同,所适用的数据集类型也有差异。本文致力于通过实验比较这三种算法的特点,从而判断每种算法所适用于解决的数据问题。
本文详细介绍了KNN算法、支持向量机和随机森林的理论基础及算法特点,并针对小样本高维数据问题及大规模数据问题对这三种算法进行了实验对比。经过实验得出的结论为:KNN方法参数简单、容易实现,但分类效果及耗时受限于样本均衡度及样本量等因素,因此适合解决高质量样本集的分类问题;支持向量机参数较多,调参过程复杂,时间复杂度取决于训练样本数及支持向量机的数目,因此适用于小样本问题及高维问题,不适用于大规模数据集;随机森林方法时间复杂度最低且参数较少,但对于样本集依赖性较高,处理噪声较大的数据集时容易出现过拟合的现象,因此适用于大规模数据问题及高质量样本集的分类问题。
﹀
|
外文摘要: |
Nowadays, KNN (K Nearest Neighbors) algorithm, SVM (support vector machine) and random forest are most commonly used classification methods. However, they act differently and produce diverse results facing certain data problems. This paper focuses on assessing their performance in certain data problems and determining which data problems each method will best fit in.
Firstly, this paper gives an introduction to KNN algorithm, SVM and random forest. Then, this paper performs numeric experiments using high dimensionality small sample size datasets and large-scale datasets. The conclusions are:
KNN algorithm is easy to implement, but the result depends greatly on sample size and equilibrium. Thus, KNN algorithm is applicable for high quality datasets.
SVM has the most parameters and adjusting these parameters can be very complicated. However, the time of cost only depends on sample size and the number of support vectors. Therefore, SVM fits in high dimensionality datasets and small sample size datasets, rather than large-scale datasets.
Random forest has a few parameters and the least time of cost, which implies its high efficiency in large-scale data problems. But there still exists over-fitting problems, especially when dealing with datasets containing noise data.
﹀
|
参考文献总数: | 15 |
插图总数: | 0 |
插表总数: | 4 |
馆藏号: | 本071201/18029 |
开放日期: | 2019-07-09 |