中文题名: | 基于多特征融合和度量学习的自动作文评分研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 025200 |
学科专业: | |
学生类型: | 硕士 |
学位: | 应用统计硕士 |
学位类型: | |
学位年度: | 2022 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2022-06-21 |
答辩日期: | 2022-05-20 |
外文题名: | REASEARCH ON AUTOMATIC ESSAY SCORING BASED ON MULTI-FEATURE FUSION AND METRIC LEARNING |
中文关键词: | |
外文关键词: | Automatic essay scoring ; feature design ; gradient boosting tree model ; metric learning |
中文摘要: |
自动作文评分可以有效解决传统人工评分的速度慢,效率低和易受评分老师主观影响的问题,在当前线上教学场景占比增加的背景下,其重要性越来越显著。为了使评分系统有更好的多特质分数反馈和更高的准确度以满足教学需要,本文主要对作文总分和特质得分的自动评分系统展开研究。本文的主要工作概括如下: (1)我们使用了不同的模型去构建作文三种特质分数:作文主题相关分数,篇章结构分数和作文规范分数。主题相关分数的模型借鉴了度量学习的思想,将事先选取主题分为满分的范文和样本同时输入到一个孪生网络中,以让模型学习到样本和范文在主题表达上的差异。其次篇章结构分数的模型主要使用了LSTM和注意力机制构建,以更好的捕捉文本中语句之间的关系。最后作文规范分数的模型使用三种梯度提升树模型去预测,模型的输入是作文的各类手工特征。在实验数据集上训练后的实验结果中,这三类特质分数模型的平均二次加权kappa系数值分别达到0.689,0.666和为0.724,均取得了较好的效果。 (2)我们使用了三种梯度提升树模型将计算得到的作文手工特征和数据集标注的作文三种特质分数作为输入预测作文总分。这样构建的模型综合了这两类变量的优势。在之前数据集上训练后的实验结果表明CatBoost模型的一致性最好。最后将之前各特质分数模型预测的特质分数替换掉数据集标注的特质分数,输入到训练好的做作文总分模型中。整体的自动作文评分模型的平均二次加权kappa系数值达到0.801,相比其他五种基线模型取得了最好的综合效果并且在各个作文集中表现均衡。 (3)整体的模型具有一定灵敏性。相比于之前的自动作文评分系统出现了灵敏性的问题,我们用三种方式去评估了模型的灵敏性,分别是向模型输入随机交换句子顺序的作文,随机删除部分句子的作文和随机生成的混乱文本。实验结果表明这些样本在总分和各项特质得分上都有下降,符合预期,证明了整体的模型具有一定灵敏性。 |
外文摘要: |
Automatic essay scoring can effectively solve the problems of slow, inefficient, and subjective influence of traditional manual scoring, and its importance is becoming more and more significant in the context of the increasing proportion of current online teaching scenarios. In order to make the scoring system have better feedback of multi-trait scores and higher accuracy to meet the needs of teaching, this paper focuses on the automatic scoring system of total composition score and trait score. The main work of this paper is summarized as follows. (1) We use different models to construct three trait scores for essays: essay topic-related score, chapter structure score, and essay specification score. The model for topic-related scores draws on the idea of metric learning, in which a model essay with a pre-selected topic score of full marks and a sample are simultaneously fed into a twin network to allow the model to learn the differences in topic expression between the sample and the model essay. Secondly the model for chapter structure score is mainly constructed using LSTM and attention mechanism to capture the relationship between statements in the text better. Finally, the model for canonical scores was mainly predicted using a gradient boosting tree model with various types of manual features of the input composition. In the experimental results after training on the dataset, the average quadratic weighted kappa coefficient values of these three types of trait score models reached 0.689, 0.666 and for 0.724, respectively, which all achieved good results. (2) We use three gradient boosting tree models to predict the total composition score using the calculated manual features of the composition and the three trait scores of the composition labeled by the dataset as input. The model thus constructed combines the advantages of these two types of variables. The experimental results after training on the previous dataset showed that the CatBoost model had the best consistency. Finally, the trait scores predicted by each of the previous trait score models were replaced with the trait scores labeled by the dataset and input to the trained model for doing the total essay score. The average quadratic weighted kappa coefficient value of the overall automatic essay scoring model reached 0.801, which achieved the best overall results and balanced performance across essay sets compared to the other five baseline models. (3) The overall model has some sensitivity. In contrast to the previous automatic essay scoring system that had sensitivity problems, we evaluated the sensitivity of the model in three ways: by feeding the model with essays that randomly swapped sentence order, essays that randomly deleted some sentences, and randomly generated confusing text. The experimental results showed that these samples showed a decrease in both the total score and each trait score, as expected, demonstrating that the overall model is somewhat sensitive. |
参考文献总数: | 51 |
馆藏地: | 总馆B301 |
馆藏号: | 硕0714Z2/22070Z |
开放日期: | 2023-06-21 |