中文题名: | 基于关联数据的实体链接消歧研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 120102 |
学科专业: | |
学生类型: | 学士 |
学位: | 管理学学士 |
学位年度: | 2018 |
学校: | 北京师范大学 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2018-06-26 |
答辩日期: | 2018-05-17 |
外文题名: | Named Entity linking and Disambiguation using Linked Data |
中文关键词: | |
中文摘要: |
实体链接是解决命名实体歧义的重要方法,旨在将文本中的实体指称,如人名,地名,组织名等映射到知识库中,关联数据中包含大量实体及其关系,适合作为实体链接的知识库。
本文使用DBpedia关联数据集作为知识库;通过分词,词形还原等操作对文本进行预处理;通过数据库中的精确匹配与模糊匹配获得实体指称候选实体集合;采用向量空间模型消歧时,利用TF-ICF作为权重构建实体的上下文向量,通过计算向量间的余弦值获得实体指称与候选实体的相似度,将相似度最高的候选实体作为实体指称在关联数据中的映射。通过对测试数据集DBpedia Spotlight NER corpus进行实际操作,歧义实体消歧的准确率为51.94%。
﹀
|
外文摘要: |
Entity linking is an important method to solve the ambiguity of named entities. It aims to map entity references such as names of people, place names, and organization names into the knowledge base. Linked data contains a large number of entities and their relationships. It is suitable for knowledge of entity links.
This article uses DBpedia linked data as the knowledge base; preprocessing the text by word segmentation, lemmatisation, etc.; obtaining the candidate entity through accurate matching and fuzzy matching in the database. Set; use vector space model to disambiguate. Using TF-ICF as weight to build the entity context vector, calculating the cosine of the vector to obtain the similarity between the entity reference and the candidate entity, and use the candidate entity with the highest similarity as the mapping of entity reference in the knowledge base. Through the actual operation of the test data set(DBpedia Spotlight NER corpus), accuracy 51.94% respectively.
﹀
|
参考文献总数: | 23 |
馆藏号: | 本120102/18009 |
开放日期: | 2019-07-09 |