查看论文信息

查看全文

查看论文信息

中文题名：	面向鉴赏文本的语文知识图谱交互式标注平台构建
姓名：	杨亚杰
保密级别：	公开
论文语种：	chi
学科代码：	081203
学科专业：	计算机应用技术
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2023
校区：	北京校区培养
学院：	人工智能学院
研究方向：	中文信息处理
第一导师姓名：	宋继华
第一导师单位：	人工智能学院
提交日期：	2023-06-26
答辩日期：	2023-06-02
外文题名：	CONSTRUCTION OF AN INTERACTIVE ANNOTATION PLATFORM OF CHINESE KNOWLEDGE GRAPH FOR APPRECIATIVE TEXT
中文关键词：	文学鉴赏文本 ; 正则表达式 ; 知识抽取 ; 图谱标注平台 ; 知识图谱可视化
外文关键词：	Literary Appreciation Text ; Regular Expressions ; Knowledge Extraction ; Graph Annotation Platform ; Knowledge Graph Visualization
中文摘要：	︿文学作品的阅读鉴赏是提升语文学科核心素养的主要途径之一，也是语文学习活动中不可或缺的重要内容，包括从文学作品的语言、结构、主题、风格等多种维度进行综合的分析和解读，从而更深入地理解和体会其中蕴含的艺术价值和审美价值。在人工智能迅速发展和智慧教育普及的背景下，运用跨媒介的方式阅读与交流在语文课程教育方面被提上日程，语文教学和智能化技术的融合成为了语文学科发展的必然趋势。由于文学鉴赏文本所涉及知识的复杂性，知识图谱技术以其联通主义的学习思想，凭借可视化、网络化等先进信息技术手段，恰好为文学作品的学习提供了良好的指导思路。本文以2022年出版的《义务教育语文课程标准》和2020年修订的《普通高中语文课程标准》作为参考标准，对语文学科中文学作品的鉴赏知识体系进行形式化的框架分析，对丰富的鉴赏类文本中高质量和高可靠度的知识资源进行信息获取，并结合当下前沿的信息技术构建文学鉴赏文本知识图谱的标注平台，实现了鉴赏文本中关键信息的交互式标注操作和知识图谱的可视化展示。具体来说，本文工作及创新主要包括：（1）完成了鉴赏文本数据库和本体库的构建工作。通过对语文电子教材和各种网络资源等进行网络爬取、文本去噪和结构化处理，对海量文本资源中高质量和高可靠的文本进行采集和集成，完成了鉴赏文本数据库的构建工作，并基于语文课程标准实现了鉴赏文本本体库的构建。截至目前，共对1121篇文本进行了实体库和关系库的构建工作，实体获取结果涉及23种共计572个，实体关系涉及21种共计590条。（2）对基于规则的知识抽取技术架构进行了详细的分析和研究。在字符级通用正则表达式基础上阐述了词语级扩展正则表达式的相关知识，接着对结构限定的扩展正则表达式抽取算法和启发式扩展正则表达式抽取算法进行了研究和实现，前者通过反复迭代生成结构限定的正则表达式模板，后者则使用改进的TF-IDF策略生成元词库，填充进入模板得到更加精确的匹配结果。（3）研发了一个用户友好的面向文学鉴赏文本的B/S交互式知识图谱标注系统。该系统基于Neo4j图数据库和HTML+Django+Vue.js技术，实现了鉴赏文本知识图谱的实体标注、三元组标注与可视化展示等功能，形成了一个面向鉴赏文本语文知识图谱标注平台。通过集成基于扩展正则表达式的模板匹配功能，该系统还提供了智能辅助提示等半自动标注功能，提高了标注效率。此外，系统还支持标注结果的保存、导出、检索等操作，以及多人协同标注，为文学作品的鉴赏与教学提供了一个良好的支撑平台。﹀
外文摘要：	︿ Reading and appreciation of literary works is one of the main ways to train and enhance the core literacy of Chinese language subjects, and is also an indispensable and important content in Chinese language learning activities. It involves a comprehensive analysis and interpretation of literary works from multiple dimensions such as language, structure, theme, and style, in order to gain a deeper understanding and appreciation of their artistic and aesthetic value. In the context of the rapid development of artificial intelligence and the popularization of smart education, the use of cross-media reading and communication has been put on the agenda in the field of Chinese language curriculum education, and the integration of Chinese teaching and intelligent technology has become an inevitable trend in the development of Chinese discipline. Due to the complexity of the knowledge involved in literary appreciation texts, knowledge graph technology, with its connectivism learning concept and advanced information technology means such as visualization and networking, provides a good guiding idea for the study of literary works.This article takes the "Compulsory Education Chinese Curriculum Standards" published in 2022 and the "General High School Chinese Curriculum Standards" revised in 2020 as reference standards, and conducts a formal framework analysis of the appreciation knowledge system of Chinese literary works in the Chinese language subject. It retrieves high-quality and high-reliability knowledge resources from rich appreciation texts, and combines with the cutting-edge information technology to construct an annotation platform for literary appreciation text knowledge graph, which realizes the interactive annotation operation of key information in appreciation texts and the visualization display of the knowledge graph. Specifically, the main work and innovation of this article include:(1)Completion of the construction of the appreciation text database and ontology library. Through web crawling, text denoising, and structured processing of Chinese electronic textbooks and various network resources, high-quality and high-reliable texts are collected and integrated from massive text resources, and the construction of the appreciation text database is completed. Based on the Chinese education curriculum standards, the ontology library of the appreciation text is also constructed. Up to now, a total of 1121 texts have been used to construct the entity relationship database, with entity acquisition results involving 23 types of entities and a total of 572, and entity relationships involving 21 types and a total of 590.(2)Conducted a detailed analysis and research on the rule-based knowledge extraction technology architecture. Based on the character-level universal regular expression, the relevant knowledge of the word-level extended regular expression is explained. Then, the structured constrained extended regular expression extraction algorithm and heuristic extended regular expression extraction algorithm are studied and implemented. The former generates a structured constrained regular expression template through repeated iteration, while the latter uses an improved TF-IDF strategy to generate a word bank, which is filled into the template to obtain more accurate matching results.(3)Developed a user-friendly B/S interactive knowledge graph annotation system for literary appreciation texts. The system is based on Neo4j graph database and HTML+Django+Vue.js technology. It realizes the entity annotation, triple annotation, and visualization display of the appreciation text knowledge graph, forming a platform for annotating the Chinese knowledge graph of the appreciation text. This platform supports users to annotate entities and relationships through click and drag, and provides functions such as shortcut keys and auto-completion to improve the efficiency of annotation. The platform also supports operations such as saving, exporting, and retrieval of annotation results, and supports multi-person collaborative annotation, providing a good platform for the appreciation and teaching of literary works. ﹀
参考文献总数：	59
馆藏号：	硕081203/23015
开放日期：	2024-06-26

附件下载