- 无标题文档
查看论文信息

中文题名:

 朴素贝叶斯分类器的研究和应用    

姓名:

 陈清宇    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 071201    

学科专业:

 统计学    

学生类型:

 学士    

学位:

 理学学士    

学位年度:

 2018    

学校:

 北京师范大学    

校区:

 北京校区培养    

学院:

 数学科学学院    

第一导师姓名:

 段小刚    

第一导师单位:

 北京师范大学统计学院    

提交日期:

 2018-05-23    

答辩日期:

 2018-05-16    

外文题名:

 Research on Naive Bayes Classifier and its Application    

中文关键词:

 朴素贝叶斯 ; 贝叶斯网 ; TAN ; 条件互信息    

中文摘要:
分类是数据挖掘中一项重要的任务,贝叶斯分类模型是应用最广泛的分类器之一。贝叶斯网能表示任意属性间的依赖关系,但学习最优的贝叶斯网是一个NP难问题,故朴素贝叶斯分类器得到了学者们的关注。朴素贝叶斯分类器算法简单、稳定和高效,其假设属性变量之间类条件独立,而现实生活中的数据往往不能满足这一强假设,故而会影响其分类性能。学者们对属性条件独立性假设进行放松,树扩张型朴素贝叶斯分类器(TAN)假设每个属性最多仅依赖于一个其他属性,利用属性间的条件互信息,建立最大带权生成树。后有学者在此基础上提出了一种新的朴素贝叶斯分类器改进算法——隐藏型朴素贝叶斯分类器,每个属性都有一个隐藏的父结点,其考虑了来自所有其他属性对该属性的影响。本文用WEKA软件比较了朴素贝叶斯分类器和两种改进算法对24个UCI数据集的分类精度,结果表明新的改进算法分类精确度优于TAN和朴素贝叶斯分类器。
外文摘要:
Classification is an important task in data mining. Bayesian classification model is one of the most widely used classifiers. Bayesian networks represents the dependencies between arbitrary attributes, but learning the optimal Bayesian networks is a NP-hard problem, so naive Bayesian classifiers have attracted the attention of scholars. The naive Bayesian classifier algorithm is simple, stable and efficient. It assumes that the attribute variables are conditionally independent, but the real life data often can not meet this strong hypothesis, so it will affect its classification performance. Scholars relax the hypothesis of conditional independence of attributes and assume that each attribute depends on only one other attribute at most. Using the conditional mutual information between attributes the maximum weighted spanning tree is established. On the basis of this, some scholars proposed a new naive Bayes classifier, a hidden naive Bayesian classifier. Each attribute has a hidden parent node, which takes into account the influence of all other attributes. This paper compares the accuracy of naive Bayes classifier with two improved algorithms for 24 UCI datasets by using WEKA software. The results show that the accuracy of the new improved algorithm is better than that of TAN and naive Bayes classifier.
参考文献总数:

 16    

插图总数:

 3    

插表总数:

 5    

馆藏号:

 本071201/18027    

开放日期:

 2019-07-09    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式