中文题名: | 基于深度神经网络的题库自动标注 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 081203 |
学科专业: | |
学生类型: | 硕士 |
学位: | 工学硕士 |
学位类型: | |
学位年度: | 2019 |
校区: | |
学院: | |
研究方向: | 机器学习及其教育应用 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2019-06-10 |
答辩日期: | 2019-06-06 |
外文题名: | AUTOMATIC ITEM BANK TAGGING BASED ON DEEP NEURAL NETWORKS |
中文关键词: | |
中文摘要: |
随着信息技术的发展,利用计算机进行作业已经成为一种评估学生的常用方法。每个学生的学习情况存在着差异,为了提高学习效率,实现自适应的作业至关重要。题库是计算机化作业不可或缺的组成部分,为此需要给题库提供有效的标注信息,实现在作业中对学生的知识结构和水平进行度量,发现学生的薄弱环节,从而进行相应试题的推荐。目前,人工标注是最常用的题库标注方法。但是,这种方法对资金和时间都消耗巨大,还存在一致性问题。利用计算机技术,对题库进行自动标注,是一种有效的替代方法。但是,题库自动标注技术的相关研究还较少。一些已有的研究中,利用传统的机器学习方法对题库进行自动标注。这些方法大多采用词袋模型来提取特征,忽略了文本的上下文依赖,不能很好地表征题目,尤其对于选择题这样的短文本。因此,考虑题目的特点,研究出更有效的题库自动标注方法,具有十分重要的意义。
本文基于深度神经网络、注意力机制等技术,分别根据英语选择题和阅读题的特点,提出了两类题型的自动标注方法。本文主要的研究工作包括:
首先,进行英语选择题知识点标签的标注方法研究。选择题属于短文本,题目常由问题、选项、答案组成,答案单词对于反映题目的知识点较为重要。本文根据上述特点,提出基于位置的注意力模型(Position-Based Attention Model,PBAM)和基于关键词的模型(Keywords-Based Model,KBM),以更好地提取答案单词的信息,提升标注效果。为了探究模型的效果,本文将它们与一些常用的多标签文本分类方法进行对比,结果表明PBAM 和KBM 的效果显著优于这些方法。为了探究答案及其邻近单词的重要性,本文对PBAM 在训练过程中调整权重和效果的变化进行分析。实验结果验证了答案单词的重要性。
其次,进行英语阅读题的文章话题标签和问题类型标签的标注方法研究。一道阅读题常由一篇文章和多道问题组成,文章的数量会显著少于问题的数量。同时,问题也与文章有着紧密的联系。因此,本文提出文章抽取注意力网络(Document Extraction Attention Networks,DEAN),使得文章和问题的信息能够相互利用。DEAN 通过多任务学习(multi-task learning)的方式,利用问题信息,隐式地提高文章样本数。同时,DEAN利用从文章提取的信息来帮助判断问题类型。本文将DEAN 与常用的单任务学习、多任务学习文本分类方法进行对比。实验结果表明DEAN 取得了效果的提升。同时,本文从消融角度和显著性检验进一步讨论了模型效果。此外,本文对模型在单词上的注意力进行可视化,验证了设计的多任务学习方式对文章特征提取器定位关键词汇的帮助。
﹀
|
外文摘要: |
With the development of information technology, the use of computers for assignments has become a popular method to assess students. There are differences in the learning situation of students. In order to improve learning efficiency, it is very important to achieve adaptive assignments. Item banks are indispensable parts of computerized assignments, so it is necessary to provide effective indexing information to item banks. Through that, the knowledge structure and level of students can be measured in the assignment, and their weaknesses can be identified, so as to recommend suitable questions. At present, manual tagging is the most commonly used method to label item banks. However, this method suffers from the time and money cost and leads to consistency issues. Based on computer technology, automatic item bank tagging is an effective alternative method. However, automatic item bank tagging hasn’t been studied extensively. In some existing studies, traditional machine learning methods are applied to automatically tag item banks. Most of these methods employ the bag of words model to extract features, which ignores the context dependencies of the text. Thus, they cannot represent questions well, especially for short text such as multiple-choice questions. Therefore, it is of great significance to develop more effective automatic tagging methods for item banks, which take their characteristics into account.
Based on deep neural networks and attention mechanism, and according to the characteristics of English multiple-choice questions and reading comprehension, this paper proposes two kinds of automatic tagging methods. The work of this paper mainly includes:
First, research on tagging English multiple-choice questions with knowledge units. A multiple-choice question is often short text and consists of a query, options, and an answer. The words of the answer are important to reflect what knowledge units a question examines. Based on these characteristics, this paper proposes the Position-Based Attention Model (PBAM) and the Keywords-Based Model (KBM) to extract the information of answer words and boost the tagging performance. In order to explore the performance of the proposed models, this paper employs some commonly used multi-label text classification methods for comparison. The results show that PBAM and KBM outperform these methods significantly. In order to explore the importance of answer words and their surroundings, this paper analyzes the changes in adjustment weights and performance of PBAM during training. The experimental results demonstrate the importance of answer words.
Second, research on tagging documents of English reading comprehension with topics and questions with types. A reading comprehension often consists of a document and some following questions. The sample size of documents is quite less than that of questions. At the same time, the questions are closely related to the document. Therefore, this paper proposes the Document Extraction Attention Networks (DEAN) that enables the information of documents and questions to be leveraged mutually. Namely, DEAN utilizes the information of questions to implicitly increase the sample size of documents by means of multi-task learning. At the same time, DEAN leverages the information gathered from the document to help determine the types of questions. In order to explore the performance of DEAN, this paper employs some commonly used single-task learning and multi-task learning text classification methods for comparison. The experimental results show that DEAN achieves better performance than them. At the same time, this paper carries out ablation experiments and significance tests to further discuss the performance of the model. In addition, this paper visualizes the model’s attention weights on words to verify whether the designed multi-task learning method can help the Document Encoder find informative words.
﹀
|
参考文献总数: | 86 |
作者简介: | 朱云宗是就读于北京师范大学信息科学与技术学院的硕士研究生,专业是计算机应用技术。在校期间曾发表SCI三区期刊文章:Bo Sun, Yunzong Zhu, Yongkang Xiao, et al. Automatic question tagging with deep neural networks[J]. IEEE Transactions on Learning Technologies, 2019, 12(1):29--43.。一项专利申请审核中:孙波, 朱云宗,肖融, 肖永康, 魏云刚, 赖松. 一种题库知识点自动标注方法及系统. (公开号: CN107590127A) |
馆藏号: | 硕081203/19008 |
开放日期: | 2020-07-09 |