查看论文信息

查看全文

查看论文信息

中文题名：	网络化知识的表征、抽取与演化规律研究
姓名：	王怀波
保密级别：	公开
论文语种：	中文
学科代码：	0401Z2
学科专业：	远程教育
学生类型：	博士
学位：	教育学博士
学位类型：	学术学位
学位年度：	2021
校区：	北京校区培养
学院：	教育学部
研究方向：	远程教育基本原理
第一导师姓名：	陈丽
第一导师单位：	北京师范大学教育学部
提交日期：	2021-06-22
答辩日期：	2021-06-22
外文题名：	Research on the representation, extraction and evolution of networked knowledge
中文关键词：	互联网时代 ; 网络化知识 ; 知识表征 ; 知识抽取 ; 知识演化规律
外文关键词：	Internet Era ; networked knowledge ; knowledge representation ; knowledge extraction ; knowledge evolution
中文摘要：	︿以互联网为代表的新一代信息技术，改变了知识生产和传播的流程，继而改变知识的本质。然而对于此类知识本质认识的不足，以及对其所蕴含演化规律理解得不到位，制约了新时期教育改革的深化。本论文研究的目的就是通过运用新方法，理解新知识的本质特征、探究新知识抽取方法以及揭示新知识中蕴含的演化规律。研究以联通主义知识观和回归论知识为认识论基础，以本体构建和知识工程中知识抽取技术为方法论基础，表征新知识、实现知识抽取并探究其演化发展规律。具体研究内容包括：1）网络化知识内涵特征界定与表征研究；2）互联网中知识的抽取机制研究；3）网络化知识的演变分析与规律提炼等三方面问题。三部分研究内容之间是一个循环迭代、相互影响的关系，内涵与表征研究约束了知识的抽取，知识抽取结果决定了演化的分析与规律提炼，同时演化分析的结果又会进一步完善内涵与表征的设计。研究第一部分提出新术语“网络化知识”，用于表征互联网中涌现的新知识，并基于本体思想构建网络化知识表征模型。网络化知识特指在互联网环境中由群体智慧汇聚生成、不断发展变化的信息、认识、技能、价值观和态度。此类知识不是传统知识的网上搬家，也不是简单的信息共享，而是群体智慧汇聚、协同创生并且不断更新发展的一类新知识，具有结构网络化、贡献群体化、生产传播同流程等特征。基于本体的网络化知识表征模型，本质上是为了在计算机中更全面地表征网络化知识，具体包含“实体-属性”，其中实体部分主要为互联网中知识的核心主题内容，如价值判断、态度、观点等；而属性则包括网络化知识本体所具有各类属性，如境域化、结构关系、贡献者类型等。研究第二部分重点探索形成一套包含“数据处理——实体抽取——属性抽取——知识存储”过程的网络化知识抽取机制，并设计对应抽取工具。在数据处理上，研究重点依托国内首门基于联通主义理论的cMOOC网络课程——“互联网+教育：理论与实践的对话”，以课程中的讨论、博客等文本内容作为样本数据，通过发展一套基于话题文档的分类规则，实现将文本内容处理到相对独立的语境中，以服务后续网络化知识的抽取。在实体抽取上，研究选取pkuseg作为分词和词性标注工具，结合所构建的440个专有词汇，分别从关键词、词语组合、命名实体识别三种方式获取候选实体，随后通过重复词过滤、综合TF-IDF和专家打分的重要性判断、以及BERT词向量相似度计算与人工标注的实体统一方法，最终形成4792个网络化知识实体，并存入对应实体库。在属性抽取上，研究针对性地设计了一系列具体的抽取方法，包括借助知识实体上下文获取境域化内容、以知识在文档中共现确定结构关系、以用户身份确定贡献者类型等。在知识存储上，研究以Neo4j图数据库为基础，设计网络化知识与图数据库的映射关系，将网络化知识存储到图数据库中。此外，研究还设计网络化知识抽取工具，主要采用jupyter可视化方式将复杂繁琐的知识抽取流程进行封装，以支持敏捷的知识抽取工作。研究第三部分借助cMOOC课程主题四“消费驱动的教育供给侧改革”探究网络化知识的演化规律。研究从知识演化的整体性视角出发，以知识单元（实体、属性）和知识种群为分析对象，通过构建演化路径图和知识家族树，来探究相邻时间窗口下知识在数量、规模和结构上的变化，并最终总结提炼网络化知识演化规律。研究发现，知识演化是一个“选择——遗传/变异——过滤——选择”的循环往复不断积淀的过程。其中“用户选择”是网络化知识演化过程的持续推动力，而“环境突变”是演化的催化剂，最终带来知识的大繁荣或大衰落。在贡献者类型中，草根与专家具有了同等的发声机会，但专业化社区中若要引发“海啸”般的影响，离不开专门的人才；在境域化内容上，网络化的知识演化是一个脱离最初语境范畴的过程；在网络结构上，网络化知识演变是一个内部集群交替发展的动态过程；从发展状态来看网络化知识单元并非一个匀速发展的历程，而是一个既有渐进式演变又有突变式演变的复杂过程。本研究创新性提出网络化知识新术语，打破当下对互联网中新知识理解简单化的局限；研究构建的网络化知识表征模型、知识抽取机制与工具，既是对本研究新知识抽取的实现，也是知识抽取应用领域的进一步拓宽；同时研究基于互联网专业社区中的真实互动数据提炼归纳的网络化知识演化规律，有利于教育教学实践者准确把握规律，做出科学决策。﹀
外文摘要：	︿ As a new generation of information technology, The Internet has changed the mode of knowledge production and communication, and then changed the concept of knowledge. However, the lack of understand this kind of knowledge and its evolution law, restrict the deepening of education reform. So in this paper we used new methods to understand the concept of new knowledge, explore the extraction methods and reveal the evolution law. In this paper, the epistemology was the the Connectivism knowledge and the nature of knowledge, the methodology was the ontology and the Knowledge extraction technology of knowledge engineering. Based on these foundations, we explored the representation, extraction and evolution of knowledge. It mainly includes three parts, 1) research on the concept and representation of networked knowledge; 2) research on the extraction of networked knowledge; 3) research on the evolution of networked knowledge. The relationship among the three parts are iterative and interactive. In the first part, we proposed a new term "networked knowledge" to represent the emerging new knowledge in the Internet, and constructed a representation model of networked knowledge based on ontology. Networked knowledge refers to the information, knowledge, skills, values and attitudes, which is generated by the collective wisdom in the Internet environment. This kind of knowledge is not the online transfer of traditional knowledge, nor the simple information sharing, but the knowledge of group wisdom gathering, collaborative creation and continuous updating and development, which has the characteristics of network structure, group contribution, production and communication in the same process. Ontology based representation model of networked knowledge was essentially to represent networked knowledge more comprehensively in computer. It included "entity-attribute", in which the entity part is mainly the core subject content of knowledge in the Internet, such as value judgment, attitude, viewpoint, etc.; while the attribute included the description of the entity, such as contextualization, structural relationship and contributor type, etc. In the second part, we explored the networked knowledge extraction methods and design extraction tools. The extraction methods including the process of data processing, entity extraction, attribute extraction and knowledge storage. In data processing, we focused on the cMOOC network course, which is based on the theory of connectivism. And we developed a set of rules based on the classification of topic documents, so that the sample data can be processed in a relatively independent context. In entity extraction, we chose pkuseg as a tool for word segmentation and part of speech tagging. Based on 440 expert words, we obtained the candidate entities by keywords extraction, word combination extraction and named entity recognition extraction. Then, through the repeated word filtering, the importance judgment of TF-IDF and the similarity calculation of BERT, we got 4792 networked knowledge entities. In attribute extraction, we designed a series of specific attribute extraction methods, including obtaining contextualized content through entity words; determining structural relationships through document co-occurrence; and determining the type of contributor by the identity of user. In knowledge storage, we designed the mapping relationship between networked knowledge and graph database, and stored networked knowledge in Neo4j, which is a native graph database. In addition, we also designed a networked knowledge extraction tool, mainly using jupyter visualization to encapsulate the complex knowledge extraction process, so as to support rapid knowledge extraction. In the third part, we explored the networked knowledge evolution law, through the "Educational supply-side reform driven by consumption" of cMOOC curriculum. We constructed the evolution path map of networked knowledge unit and knowledge family tree of knowledge population, so that we can find the changes of knowledge in quantity, scale and structure in adjacent time windows. At last, we summarized the evolution law depended on these changes. We found that, networked knowledge evolution is a process of "selection, heredity or variation, filtration and selection". In this process, both of user selection and environmental mutation are key force leads to the prosperity or decline of networked knowledge. On the types of contributors, the crowd and experts have the same opportunities to create knowledge, however, experts are indispensable to bring about greater influence. On the content of contextualization, the evolution of networked knowledge is a process of breaking away from the original context. On the structure of network, the evolution of networked knowledge is a dynamic process of alternating development of internal clusters. From the perspective of development, networked knowledge is not a process of uniform development, but a complex process of gradual evolution and mutation evolution. This paper puted forward new terms of network knowledge to break the limitation of the current simplified understanding of new knowledge in the Internet. The networked knowledge representation model, knowledge extraction methods and tools were not only the realization of new knowledge extraction in this research, but also the further expansion of the application in knowledge extraction. At the same time, the research refined and summarized the evolution law of network knowledge is helpful for educators to accurately grasp the law and make scientific decisions. ﹀
参考文献总数：	170
优秀论文：	北京师范大学优秀博士学位论文
作者简介：	2013年在江苏师范大学教育技术学专业获得教育学学士学位；2016年在江苏师范大学智慧教育学院教育技术学专业获得理学硕士学位；2021年在北京师范大学教育学部远程教育专业获得教育学博士学位。主要研究领域为：教育数据挖掘与学习分析、知识抽取与演化规律、在线教学交互规律等。攻博博士期间发表中英文期刊论文13篇，其中以第一作者发表CSSCI论文5篇，EI检索论文1篇。
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博0401Z2/21002
开放日期：	2022-06-22

附件下载