中文题名: | 面向开放学习资源的概念关系发现 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 081202 |
学科专业: | |
学生类型: | 硕士 |
学位: | 工学硕士 |
学位类型: | |
学位年度: | 2019 |
校区: | |
学院: | |
研究方向: | 数据挖掘 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2019-06-06 |
答辩日期: | 2019-06-04 |
外文题名: | DISCOVERY OF CONCEPT RELATIONSHIPS AMONG OPEN LEARNING RESOURCES |
中文关键词: | |
中文摘要: |
互联网上存在大量各种形式的开放学习资源,包括百科知识、开放课程以及开放教材等。大规模的开放学习资源一方面丰富了人们获取知识的途径,同时也为概念关系的自动发现提供了新的数据基础。本文从学习者获取知识的角度出发,研究如何自动地从开放学习资源中获取概念之间的相关关系以及先导关系。概念相关关系描述了概念内涵的相关性,是辅助学习者延伸个人知识体系的重要基础;概念间的先导关系描述了概念间的依赖关系,是帮助学习者建立知识学习路径的必不可少的依据。本文研究从开放学习资源中发现上述两类概念关系,可以为智能化的学习资源推荐、学习路径规划等提供基础。
首先,本文提出了基于教材和维基百科的概念相关关系发现方法。该方法分别基于教材和维基百科计算概念间的相关度,然后将相关度进行组合发现概念间的相关关系。相关度计算使用了教材的章节结构信息、维基百科类别和文本摘要信息。对于教材章节结构信息,本文获取章节标题中的概念并使用Skip-gram模型将概念表示成低维向量,然后基于向量计算相关度。对于维基百科类别信息,本文采取多路径深度加权的方法计算相关度。对于维基百科摘要文本,本文采用综合匹配方法计算相关度。上述三种相关度最后经过组合,得到概念间最终的相关度。在两个人工构建的概念相关度测试集上的实验结果表明,本文提出的方法可以获得较对比方法更高的斯皮尔曼相关系数。
其次,本文提出了基于学习资源的概念先导关系发现方法。本文将概念先导关系发现转换成分类问题,然后基于学习资源中文本和结构信息构建了八个概念先导关系特征,分别采用了逻辑回归、支持向量机和梯度提升决策树模型进行先导关系发现。本文挑选1336对概念并进行先导关系的人工标注,将算法分类的结果与人工标注的结果相对比,最优结果精确率为79.6%,召回率为78.9%,F1值为79.5%。
﹀
|
外文摘要: |
A large variety of open learning resources exist on the Internet, including encyclopedic knowledge, open courses, and open textbooks, etc. Massive open learning resources not only increase the channels to acquire knowledge, but also provide a new data foundation for the automatic discovery of concept relationships. From the perspective of learners' knowledge acquisition, this paper studies how to automatically acquire the correlation and prerequisite relations among concepts derived from open learning resources. Concept relatedness describes the relevance of conceptual connotations and is an important basis for learners to extend their personal knowledge systems. Prerequisite relationship between concepts describes their interdependence and is also an essential basis for learners to establish knowledge learning paths. By examining the above two concept relations from open learning resources, this paper provides a basis for intelligent learning resource recommendation and learning path planning.
First, this paper proposes a discovery method of the concept relatedness based on textbooks and Wikipedia, through which the concept relatedness is obtained from the integration of the relevancy degrees respectively calculated from textbooks and Wikipedia. Relatedness calculation can be gained from information such as chapter structure of the textbook, the categories of Wikipedia and the text summarization. For example, based on Skip-gram model, this paper transforms concepts obtained from chapter titles into low-dimensional vectors, and then makes a vector-based relatedness calculation. And the multi-path depth-weighted relevancy degree is calculated based on the Wikipedia categories. Besides, the synthetic matching method is used to calculate the relevancy degree for the Wikipedia summary texts. The final relatedness calculation is a combination of the above three results. Compared with the manually constructed test sets of concept relatedness calculation, namely the Words-240 and CS-160 datasets, the Spearman correlation coefficients of the proposed method in this paper are higher.
In addition, this paper also proposes a discovery method of the concept prerequisite relationship. In this paper, the discovery of prerequisite relationship is regarded as a classification problem and eight characteristics of the concept prerequisite relationship are constructed based on the textual and structural information of learning resources. Three kinds of machine learning methods including logistic regression, support vector machine (SVM) and gradient boosting decision tree (GBDT) are used to obtain the prerequisite relationship. In this paper, prerequisite relationships of 1336 pairs of concepts are manually labelled, and their results are compared with those of the algorithm classification. It has been shown that the accuracy rate of optimal results, the recall rate, and the F1 value are 79.6%, 78.9%, and 79.5% respectively.
﹀
|
参考文献总数: | 62 |
馆藏号: | 硕081202/19004 |
开放日期: | 2020-07-09 |