中文题名: | 文本数据驱动下本体构建、对齐与演化研究——以用户产品评论数据为例 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 120502 |
学科专业: | |
学生类型: | 硕士 |
学位: | 管理学硕士 |
学位类型: | |
学位年度: | 2021 |
校区: | |
学院: | |
研究方向: | 知识组织、文本挖掘 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2021-06-23 |
答辩日期: | 2021-06-23 |
外文题名: | Ontology Construction, Alignment and Evolution Driven by Text Data: Take User Product Review Data As an Example |
中文关键词: | |
外文关键词: | Text data ; Ontology construction ; Ontology alignment ; Ontology evolution ; Product reviews ; Product comparison |
中文摘要: |
近年来,随着消费者的需求不断增加,各种各样的数据信息迅速增长,尤其是在非结构化的文本数据中,蕴含着消费者潜在的知识需求。例如在电商网站的在线用户产品评论中,消费者会利用文本化的产品评论进行购买决策的制定,产品设计者也会利用评论文本数据进行产品的优化设计。因此,在文本数据背景下,如何高效且有序地组织非结构化的文本数据,具有重要的研究意义。此外,随着日新月异的产品出现在消费者的面前,消费者可能会有更多的需求。正如新款的智能手机具有数码相机等其他产品的拍照功能,导致消费者不仅会考虑同类型产品间的比较,也会对不同领域下具有一定相似性的产品之间的对比分析。此时,需要对不同领域的产品评论进行处理和组织,以满足消费者跨领域的产品需求。另一方面,随着时间的推移,知识在不断地更新,消费者对知识的需求还处于不断变化的过程。一些新的文本数据会出现关于产品的新功能、新特征,这些新功能和特征在先前的文本数据中并不存在。因此,需要对已有的文本数据组织方式进行演化和更新,以捕捉这些产品的新功能和新特征。
﹀
本体是一种概念化的语义说明。利用领域本体来对文本数据进行知识组织,可以有效地实现从非结构化且复杂的数据到结构化且易于知识应用的转化。因此,本研究在对文本数据进行知识组织的研究中引入领域本体,探究文本数据驱动下领域本体构建、对齐与演化的框架和方法。一方面,针对在不同领域中构建的领域本体进行本体对齐研究,旨在实现跨领域本体的构建;另一方面,探究一种融合词语义表示和新词发现的领域本体演化的方法,用于应对文本数据剧增和用户需求变化下领域本体的动态演化调整,以充分发挥领域本体的可靠性和时效性。具体而言,首先,采用自然语言处理技术对文本数据进行预处理;其次,利用深度学习中TransH算法进行实体识别和关系抽取,完成领域本体的构建;然后,采用深度学习中两种算法——基于语义的Word2vec算法和基于结构的Node2vec算法,实现本体对齐,并进行本体融合;此外,采用融合句法分析和深度学习中词语义表示的方法进行领域新词发现,以支持本体演化工作。最后,本研究选用在线用户产品评论为实验案例,通过多组对比实验验证本研究提出的领域本体构建中语义关系识别方法、跨领域本体对齐方法、融合词语义表示和领域新词发现的本体演化方法的可靠性和有效性。 在理论意义层面,本研究提出的文本数据驱动下领域本体构建、对齐与演化的方法论,能为本体生命周期研究提供新的解决思路和技术方案,促进从文本中构建并应用本体的研究发展;此外,本研究提出新颖的本体对齐方法论有利于跨学科、跨领域本体的融合与集成,以及本体演化方法论也有利于促进其他领域本体随着时间不断更迭和完善,实现领域本体的长期有效利用。在实践意义层面,本研究以用户产品评论为例,根据本体对齐结果构建跨领域产品本体,有利于消费者利用产品评论进行产品比较,提供产品购买决策的支持;此外,利用新词发现技术可以有效地检测并提取出用户产品评论中蕴含的产品新特征,在演化本体的基础上,为消费者提供新颖且准确的购买决策支持。 |
外文摘要: |
In recent years, as consumers’ demands continue to increase, various types of data and information have grown rapidly, especially in unstructured text data, which contains consumers’ potential knowledge needs. For example, in online user product reviews on e-commerce websites, consumers will use textual product reviews to make purchase decisions, and product designers will also use review text data to optimize product design. Therefore, in the context of text data, how to organize unstructured text data efficiently and orderly has important research significance. In addition, with the ever-changing products appearing in front of consumers, consumers may have more demand. Just as new smartphones have the camera function of other products such as digital cameras, consumers will not only consider comparisons between products of the same type, but also compare and analyze products with certain similarities in different domains. At this time, it is necessary to process and organize product reviews in different domains to meet consumer demand for these products from across domain. On the other hand, as time goes by, knowledge is constantly updated, and consumers' demands for knowledge are still in a process of constant change. Some new text data will appear about new functions and features of the product, and these new functions and features do not exist in the previous text data. Therefore, it is necessary to evolve and update the existing text data organization methods to capture the new functions and new features of these products.
﹀
Ontology is a conceptual semantic description. Using domain ontology to organize the knowledge of text data can effectively realize the transformation from unstructured and complex data to structured and easy to apply knowledge. Therefore, this research introduces domain ontology into the study of knowledge organization of text data, and explores the framework and methodology of domain ontology construction, alignment and evolution driven by text data. On the one hand, ontology alignment research for domain ontology constructed in different domains is conducted, aiming to realize the construction of cross-domain ontology. On the other hand, it explores a method of domain ontology evolution that combines word meaning representation and new word discovery to respond to the dynamic evolution and adjustment of domain ontology under the increase of text data and changes in user needs, so as to fully utilize to the reliability and timeliness of domain ontology. Specifically, firstly, natural language processing technology is applied to realize text data preprocessing; secondly, the TransH algorithm in deep learning is adopted for entity recognition and relationship extraction to complete domain ontology construction; then, two algorithms in deep learning, that is, the semantic-based Word2vec algorithm and the structure-based Node2vec algorithm, are integrated to implement ontology alignment and ontology fusion; in addition, the method of fusing syntactic analysis and word meaning representation in deep learning are used to discover domain new words to support ontology evolution. Finally, in this study, online user product reviews are selected as an experimental case to verify the reliability and validity of the semantic relationship recognition method, the cross-domain ontology alignment method, the ontology evolution combining word meaning representation and domain new words discovery through multiple sets of comparative experiments. At the level of theoretical significance, in this paper, the methodology of domain ontology construction, alignment and evolution driven by text data can provide new solutions and technical solutions for ontology life cycle research, and then promote the research and development of building and applying ontology from text; In addition, a novel ontology alignment methodology proposed in this research is conducive to the fusion and integration of interdisciplinary and cross-domain ontology, and the proposed ontology evolution method is also conducive to promoting the continuous change and improvement of other domain ontology over time, and then realize the long-term effective use of domain ontology. At the practical level, user product reviews are taken as an example in this research, and a cross-domain product ontology is built based on ontology alignment results, which is conducive to consumers using product reviews to compare products and provide support for product purchase decisions. In addition, new word discovery technology is apdopted in this paper, which can effectively detect and extract new product features contained in product reviews, and provide consumers with novel and accurate purchase decision support based on the evolution of the ontology. |
参考文献总数: | 111 |
作者简介: | 邓斯予:研究方向为知识组织、文本挖掘,在研究生期间发表学术论文:首先,研究生期间共发表 4篇学术期刊论文,作者署名单位为北京师范大学。 (1) SCI Q1区期刊《Information Sciences》,作者排名2 Geng Q , Deng S , Jia D , et al. Cross-domain Ontology Construction and Alignment from Online Customer Product Reviews[J]. Information Sciences, 2020, 531:47-67. (2) CSSCI《情报理论与实践》,作者排名1 邓斯予, 耿骞, 靳健,等. 基于产品评论分析的领域知识库构建与应用[J]. 情报理论与实践, 2019, 42(11): 115-122,127. (3) CSSCI《图书情报工作》,作者排名2 耿骞,邓斯予,靳健.融合词语义表示和新词发现的领域本体演化———以产品评论数据为例[J].图书情报工作, 2021. 65(8):85-96. (4) CSSCI《情报学报》,作者排名4 贾丹萍, 靳健, 耿骞, 邓斯予. 感性工学视角下的用户需求挖掘研究[J].情报学报, 2020, 39(03):308-316. 其次,研究生期间共发表3篇学术会议论文,作者署名单位为北京师范大学。 (1) 2019 POMS会议论文,作者排名1 Siyu DENG, Danping JIA, Qian GENG, Jian JIN. Domain Knowledge Base Building and Application Based on Product Review Analysis[C]. 2019 International Conference on Production and Operations Management Society, POMS 2019.Institute of Electrical and Electronics Engineers Inc., 2019:21-31. (2) 2020 WWW会议论文,作者排名2 Qian GENG, Siyu DENG, Danping JIA, Jian JIN. Cross-domain Ontology Construction and Alignment from Online Product Reviews[C]. In Companion Proceedings of the Web Conference 2020 (WWW '20). ACM ISBN 978-1-4503-7024-0/20/04, 2020:401-408. (3) 2019 POMS会议论文,作者排名2 Danping JIA, Siyu DENG, Jian JIN, Qian GENG. Integrating Kansei Engineering with Sentiment Analysis for Customer Understanding from Online Opinions[C]. 2019 International Conference on Production and Operations Management Society, POMS 2019. Institute of Electrical and Electronics Engineers Inc., 2019:43-54. |
馆藏号: | 硕120502/21002 |
开放日期: | 2022-06-23 |