查看论文信息

查看全文

查看论文信息

中文题名：	基于语言模型的实体对齐研究
姓名：	陈玄
保密级别：	公开
论文语种：	chi
学科代码：	081203
学科专业：	计算机应用技术
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	人工智能学院
研究方向：	知识图谱，自然语言处理
第一导师姓名：	王志春
第一导师单位：	人工智能学院
提交日期：	2024-06-20
答辩日期：	2024-05-29
外文题名：	Research on Entity Alignment Based on Language Models
中文关键词：	实体对齐 ; 知识图谱 ; 大规模语言模型 ; 自然语言处理
外文关键词：	Entity alignment ; Knowledge graph ; Large language models ; Natural language processing
中文摘要：	︿实体对齐是知识融合领域的关键任务，其目的是识别并将多源异构知识图谱中的实体信息进行对齐和融合。基于TransE、图神经网络和语言模型的方法通常假设图谱之间拥有相似的拓扑结构和文本信息，通过学习图谱中结构和文本的嵌入表示来识别对应实体，依赖于结构信息和文本信息的一致性来达成对齐。当面对拓扑结构和文本信息显著异构的图谱时，这一假设往往不再成立，导致模型效果大幅下降。大规模语言模型如GPT3和LLaMA具有强大语义理解能力和推理能力，为处理实体对齐中的复杂的异构问题提供了独特的解决方案。本文结合大语言模型的优势，提出了一系列的方法来解决这些问题。面对实体信息的异构性问题，本文研究并提出了基于大语言模型异构解析的实体对齐方法。该方法首先应用基于大语言模型的异构解析模块将实体及其邻居结构信息统一转化为同种语言的连贯自然语言文本，然后利用语义召回模型和精排模型，进行候选实体的召回和相关度重排序。该方法抛弃了依赖原始异构结构和文本建模的方法，通过将异构的结构和文本信息转化为统一的自然语言描述来实现实体的对齐，从而一定程度上缓解了异构问题。通过广泛的实验验证，本框架在不同的对齐场景下均展示了优异的性能，并证明了各模块的有效性。面对现有方法在实体语义理解上的不足及精确对齐性能不佳的问题，本文研究并提出了基于大语言模型推理的实体对齐方法。该方法基于观察到的问题，即现有方法虽然在较宽松的Hits@10指标上表现良好，却往往在更严格的Hits@1指标上表现不佳。将实体对齐转化为推理判断任务，通过构造属性和关系推理模块，结合多轮投票机制来提高对正确实体的推断精度。实验结果显示，该方法能显著提升Hits@1指标，证明了其在高精度实体对齐中的有效性。此外，为了全面评估该方法的性能，本文还进行了包括探索不同候选实体数量、正确答案位置以及大模型规模对推理排序性能的影响等实验，深入分析了这些因素对最终对齐结果的具体影响。通过本文的研究，展示了基于大语言模型的方法在解决实体对齐中的异构性问题以及提升对齐精度方面的巨大潜力。本文提出的方法不仅改进了实体对齐的性能，也为未来知识融合和信息检索领域的研究提供了新的视角和工具。﹀
外文摘要：	︿ Entity alignment is a critical task in the field of knowledge fusion, aimed at identifying and aligning entity information across multi-source heterogeneous knowledge graphs. Methods based on TransE, graph neural networks, and language models typically assume similar topological structures and textual information across graphs. By learning embeddings of structures and texts within these graphs, they align corresponding entities, relying on the consistency of structural and textual information. However, this assumption often fails when facing graphs with significantly heterogeneous topologies and textual information, leading to substantial performance degradation. Large language models like GPT-3 and LLaMA, with their robust semantic understanding and reasoning capabilities, offer unique solutions to complex heterogeneity issues in entity alignment. This paper leverages the advantages of large language models to propose a series of methods addressing these challenges. In response to the issue of heterogeneity in entity information, this study introduces an entity alignment method based on heterogeneous parsing with large language models. This method initially applies a heterogeneous parsing module based on large language models to convert the entity and its neighboring structural information into coherent natural language text of the same language. It then utilizes a semantic recall model and a re-ranking model for candidate entity recall and relevance re-ranking. By transforming heterogeneous structures and textual information into a unified natural language description, this approach alleviates heterogeneity issues to some extent. Extensive experimental validation demonstrates that our framework exhibits superior performance across various alignment scenarios and validates the effectiveness of its components. Addressing the deficiencies of existing methods in semantic understanding of entities and their poor alignment accuracy, this paper proposes an entity alignment method based on inference with large language models. Observing that existing methods perform well on the lenient Hits@10 metric but often fall short on the stricter Hits@1 metric, this method transforms entity alignment into an inference judgment task. By constructing attribute and relationship inference modules, combined with a multi-round voting mechanism, it enhances the accuracy of inferring correct entities. Experimental results show a significant improvement in the Hits@1 metric, proving its effectiveness in high-precision entity alignment. Moreover, to comprehensively assess the performance of this method, further experiments were conducted, including exploring the effects of different numbers of candidate entities, the positioning of correct answers, and the impact of large model scales on inference ranking performance, providing detailed analysis of these factors on final alignment results. Through this research, we demonstrate the significant potential of methods based on large language models in addressing heterogeneity issues in entity alignment and enhancing alignment accuracy. The methods proposed not only improve the performance of entity alignment but also provide new perspectives and tools for future research in knowledge fusion and information retrieval. ﹀
参考文献总数：	96
馆藏号：	硕081203/24004
开放日期：	2025-06-20

附件下载