- 无标题文档
查看论文信息

中文题名:

 结合PageRank和c10的科学影响力评价算法研究    

姓名:

 王亚楠    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 071101    

学科专业:

 系统理论    

学生类型:

 硕士    

学位:

 理学硕士    

学位类型:

 学术学位    

学位年度:

 2019    

校区:

 北京校区培养    

学院:

 系统科学学院    

第一导师姓名:

 曾安    

第一导师单位:

 北京师范大学系统科学学院    

提交日期:

 2019-06-26    

答辩日期:

 2019-06-03    

外文题名:

 Research on evaluation algorithm of scientific significance based on PageRank and c10    

中文关键词:

 复杂网络 ; 引文网络 ; 科学影响力 ; 迭代算法 ; PageRank    

中文摘要:
随着社会的发展和科学技术的不断进步,在巨大的发展动力和发展需求的驱动下,科研领域也获得了蓬勃的发展,每年都涌现出大量的科学论文、科学家、科学杂志、科研机构等。在信息铺天盖地的大数据时代,时间这一要素成为人们越来越重要的一笔财富。当人们在科研领域寻找自己所需要的信息时,不可能穷尽每一篇文章,每一位科学家,每一本杂志。因此,如何能够快速高效地找到所需的优质信息便成为一个十分关键的问题。这对科学影响力评价算法提出了更高的要求。寻找到更加有效的科学影响力定量化方法,不仅能提高信息需求者的搜寻效率,也能够更准确地对科研工作进行量化评估。同时有益于科研工作者树立正确的自我认知从而不断完善自己,提高自己的科研能力。此外,也能够为相关科研机构进行人才引进、晋升选拔等工作提供决策依据和参考,十分具有现实的指导意义。 到目前为止,诸多方法已经被提出,其中较为著名且应用广泛的排序算法之一便是PageRank算法。PageRank算法考虑了网络的全局信息,是公认的优于线性引用量的衡量科学影响力的度量标准。但是不得不承认的是,PageRank算法对发表时间较早的老文章具有时间上的偏向性。为了削弱PageRank算法对老文章的时间偏向性,本文在PageRank算法的基础上提出了结合c10指标的科学影响力排序算法CPRank算法。该算法的具体思想为,在衡量一篇科学论文的重要性时,只考虑文章发表后十年之内的引用数量,而不是全部的引用量。 本文利用美国物理学会(American Physical Society,APS)期刊的数据构建了引文网络,以检验算法的性能。本文选择了由APS杂志的编辑精心挑选的高质量文章Milestone Letters和获得诺贝尔奖的文章作为高影响力文章的参照集,讨论不同排序方法在识别这些高质量的文章方面的表现。这些方法具体包括经典PageRank算法、CPRank算法、CiteRank算法、Rescaled PageRank算法和c10。相关研究结果表明,考虑了引用的时间特性的改进算法CPRank算法提升了传统PageRank算法的性能,可以在一定程度上削弱对老文章的偏向性,但需指出的是,其仍然无法对不同时期的文章进行完全公平的比较。 与此同时,本文还利用CPRank算法进行了科学家层面的实证分析。首先基于论文影响力加总的方法对科学家影响力进行分析。结果显示,在CPRank方法中诺贝尔奖获得者的排名优于在其他方法中的排名。另外,我们还比较了获得诺贝尔奖的作者的获奖文章在其发表的全部文章中的排名情况,我们发现,同样地,CPRank算法有着较好的表现。最后,讨论了利用CPRank算法来研究科学家最具影响力的成果出现在其职业生涯什么时期的问题,即科学家的创造力涌现时机的问题。结果表明,运用CPRank算法的结论与采用c10指标的结论不同,即科学家的最佳创造力倾向于涌现在其职业生涯的早期。
外文摘要:
With the development of society and the continuous advancement of science and technology, driven by the tremendous development momentum and development needs, the scientific research field has also achieved vigorous development. Every year, a large number of scientific papers, scientists, journals, research institutions emerge. In the era of big data, time has become an increasingly important asset for people. When people look for the information they need in research field, it is impossible to exhaust every article, every scientist, every magazine. Therefore, how to quickly and efficiently find the high quality information they need becomes a critical issue. This puts higher demands on the scientific impact ranking algorithm. Proposing more effective quantitative methods of scientific significance can not only improve the search efficiency of information demanders, but also more accurately quantify the research work. At the same time, it is beneficial for researchers to establish correct self-awareness and constantly improve themselves and improve their research capabilities. In addition, it can also provide decision-making basis and reference for the relevant scientific research institutions to carry out talent introduction, promotion and selection, etc., which has very practical guiding significance. So far, many methods have been proposed, one of the well-known and widely used sorting algorithms is the PageRank algorithm. The PageRank algorithm takes into account the global information of the network and is recognized as a measure of scientific impact over citation count. But it needs to be acknowledged that the PageRank algorithm is time-biased for older articles that are published earlier. In order to weaken the time bias of PageRank algorithm on old papers, this paper proposes a scientific significance ranking algorithm CPRank algorithm based on c10 metrics. The specific idea of ??the algorithm is that when measuring the importance of a paper, only the citations within ten years after the publication of the article are considered, rather than the total citation. This paper builds a citation network using data from the American Physical Society (APS) journal to test the performance of the algorithm. This article selects the high-level article Milestone Letters carefully selected by APS Magazine's editors and the Nobel Prize-winning articles as a benchmark set for high-impact articles to discuss the performance of different ranking methods in identifying high-quality articles. These methods specifically include the classic PageRank algorithm, CPRank algorithm, CiteRank algorithm, Rescaled PageRank algorithm and c10. The research results show that the CPRank algorithm which considers the aging characteristic of the citations improves the performance of the PageRank algorithm, which can weaken the bias of the old article to a certain extent, but it still cannot make a perfectly fair comparison of articles from different periods. At the same time, this paper applies the CPRank algorithm to conduct empirical analysis of scientists. Firstly, the scientific influence of scientists is analyzed based on the summation of the paper's influence. The results show that the Nobel Prize winners rank better in the CPRank method than in other methods. In addition, we also compared the rankings of the winning articles of the Nobel Prize-winning authors in all the articles published, and we found that, similarly, the CPRank algorithm has a good performance. Finally, the CPRank algorithm is used to study the problems of the most influential achievements of scientists in their careers. The results show that the scientists’ best creativity tends to emerge in the early stages of their careers.
参考文献总数:

 0    

馆藏号:

 硕071101/19005    

开放日期:

 2020-07-09    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式