- 无标题文档
查看论文信息

中文题名:

 面向文本问答的问句融合    

姓名:

 余文慧    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 081203    

学科专业:

 计算机应用技术    

学生类型:

 硕士    

学位:

 工学硕士    

学位类型:

 学术学位    

学位年度:

 2020    

校区:

 北京校区培养    

学院:

 人工智能学院    

研究方向:

 文本问答    

第一导师姓名:

 党德鹏    

第一导师单位:

 北京师范大学人工智能学院    

提交日期:

 2020-06-12    

答辩日期:

 2020-06-12    

外文题名:

 Question Fusion for Text Question Answering    

中文关键词:

 文本问答 ; 问句融合 ; 查询改写 ; 句子压缩 ; 句子生成    

外文关键词:

 text question answering ; question fusion ; query rewriting ; sentence compression ; sentence generation    

中文摘要:

文本问答是语音助手、智能搜索、智能推荐、智能客服、智能机器人等人工智能应用中的关键技术。现有文本问答的研究大多着眼于查询过程的改进,鲜少有研究关注到问句本身是否有提升的空间。本文从问句改进这一方向着手,将补充的相关问句和文本内容中的关键信息与用户原始问句进行融合,实现对原始问句的改进,提高文本问答的准确率。本文研究成果有助于提升智能应用中的问答效果,促进自然语言处理技术的落地与发展。

问句融合旨在融合用户原始问句、多个相关问句和待查询文本生成一个新问句,那么如何提取其中的重要信息并合成一个句子就至关重要。此外,对于问答系统,新问句表义是否准确、与文本描述是否一致都对问答效果起到决定性作用。本文从这三个关键问题入手,提出面向文本问答的问句融合方法,分别实现原始问句与相关问句的融合(多问句融合)、问句与文本的融合(问句-文本融合)。通过这两方面的联动,逐步提高新问句的质量。基于词网的多问句融合方法能抽取出各个句子中的重要短语以及语序信息,在原始问句中增加相关问句中包含的信息,从而使融合后的问句信息充足、表义准确。引入深度语义的多问句融合方法则在这一基础上增加了深度语义特征,进一步提高了多问句融合的效果。基于多种机制的问句-文本融合方法对多问句融合方法生成的问句实现进一步改造提升,其通过深度学习中端到端的生成式网络生成新的问句表述。在生成过程中加入文本的编码特征,从而在问句中融入了文本信息,减少了问句与文本之间的语义鸿沟,提升了问句的质量。

本文主要研究工作可以分为以下4个方面:

基于词网的多问句融合模型。本文通过构建词网将多个问句表达成有向图结构,节点表示短语,边表示短语的前后语序关系,在检索解空间过程中,提出重要度、全面度和相似度衡量指标用以衡量路径的合理性和与原始问句的相似度。基于此,不仅可以使生成问句包含多个问句的短语和词序信息,还可以保留与原始问句在语义上的相似性,保证查询意图不丢失,改进了问句生成的质量,提高了文本问答的准确率。

引入深度语义的多问句融合模型。在基于词网的多问句融合模型中通过BERT模型进一步提取句子中的深度语义特征并将问句向量化。该向量可以用以改进句子相似度计算过程。同时,在求解过程中计算语言模型中单词成句的概率作为语义合理度的衡量,使生成的句子语义更连贯、可读性更好。

基于多种机制的问句-文本融合模型。为了解决问句与待查询文本之间的语义鸿沟问题,本文将多问句融合模型生成的问句进一步输入到深度学习端到端的生成式网络中用以生成更好的问句。在模型编码端对问句和文本使用基于注意力机制的网络提取特征,并采用多种方式结合这两个特征以提升网络整体效果。在解码端运用拷贝机制使生成句子得以复现两个输入中的重要细节,使其契合文本,有效提高了句子改写的质量。

整体系统设计与实现。对多问句融合模型和问句-文本融合模型进行整合,实现面向文本问答的问句融合的系统设计,整合了原始问句、补充的相关问句以及待查询文本中的信息,解决了问句表义不全面、与文本不一致的问题,生成更有利于问答系统检索的问句,提升了文本问答的效果。
外文摘要:

Text question answering(QA) is of great significance in artificial intelligence applications such as voice assistants, intelligent search, intelligent recommendation, intelligent customer service, and intelligent robots. Most of the existing research focuses on the improvement of the query process, and few studies concern whether the question itself has room for improvement. We start from the direction of question improvement, and integrate the key information in the supplementary related questions and the text content with the original question of the user to realize the improvement of the original question and increase the accuracy of the text QA. Our research results can help to improve the QA effect in intelligent applications, and promote the landing and development of natural language processing technology.

Question fusion aims at generating a new question from the original question, multiple related questions and the text to be queried. So how to extract the important information and synthesize a sentence is very important. In addition, for the QA system, whether the meaning of the new question is accurate and consistent with the text description all plays a decisive role in the QA process. Therefore, based on the above three key questions, we propose a question fusion method for text QA, which realizes the fusion of question and question (multi-question fusion), and the fusion of question and text (question-text fusion).Through the linkage of these two aspects, the quality of new questions is gradually improved. The multi-question fusion method based on word-net can extract the important phrases and word orders in each sentence, add the information contained in the relevant questions to the original question, and enriches the expression of the original question. The introduction of deep semantic features in the multi-question fusion method can further improve the effect of multi-question fusion.

The question-text fusion method based on multiple mechanisms further rewrites and improves the questions generated by the multiple question fusion method, which generates new questions through an end-to-end generative network in deep learning. The text encoding feature is added in the generation process, so that the text information is integrated into the question, reducing the semantic gap between the question and the text, and improving the quality of the question.

The main research work of this article can be divided into the following four aspects:

Multi-question fusion model based on word-net. This paper expresses multiple questions into a directed graph structure by constructing word-nets. Nodes represent phrases, and edges represent the word order relationship of the phrases. In the process of retrieving the solution space, three measures——importance, comprehensiveness and similarity are proposed to measure the rationality of the path and the similarity with the original question. Using this method, not only can the generated questions keep the phrases and word order of multiple questions, but also retain semantic similarity to the original question, ensure that the query intent is not lost, improve the quality of the original question and the text QA accuracy.

Multi-question fusion model with deep semantic features. The BERT model can further extract deep semantic features in the questions and vectorize the questions which can be used to calculate sentence similarity. Meanwhile, the probability of sentences in the language model can be calculated as a measure of semantic reasonableness, so that the generated sentences are more semantic and more readable.

Question-text fusion model based on multiple mechanisms. In order to reduce the semantic gap between the question and the text, the question generated by the multi-question fusion model is further input into the deep learning end-to-end generative network to generate better questions. At the encoder of the model, we use the attention-based network to extract features for the question and text, and combine these two features in a variety of ways. Using a copy mechanism at the decoder enables the generated sentences to reproduce important details in the input, making them fit the text, and effectively improving the quality of question.

Overall system design and implementation. We integrate the multi-question fusion model and the question-text fusion model and realize the system of question fusion for text QA. It combines the information of original question, supplementary related questions and the text to be queried. In this way, the information missing or redundant of question and the inconsistencies between the question and the text can be solved. The questions generated are more conducive to understand by the QA system, thereby improving the effect of text QA.
参考文献总数:

 77    

作者简介:

 余文慧的研究领域为自然语言处理,相继合作发表了两篇SCI论文,方向分别为众包和社交网络中的信息处理。    

馆藏号:

 硕081203/20015    

开放日期:

 2021-06-12    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式