中文题名: | 基于计算语言学特征预测写作的创造性(博士后研究报告) |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 040102 |
学科专业: | |
学生类型: | 博士后 |
学位: | 教育学博士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 教育大数据挖掘 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-09-04 |
答辩日期: | 2023-08-25 |
外文题名: | Predication of writing creativity based on computational linguistic features |
中文关键词: | |
外文关键词: | writing creativity ; automatic essay scoring ; semantic network ; topic analysis ; prediction ; originality |
中文摘要: |
如何评价写作的创造性是一个持续讨论的话题。其中的核心问题是,像创造性这样难以捉摸的东西,能否以超越主观判断的方式进行评价。为了回答这个问题,我们考查了作文的创造性评分是否可以通过文本分析的方法进行预测。尤其对于写作的新颖性来说,它不仅取决于作文本身的语言特性,还取决于被评价作文所在参照组中其他作文的表现。当判断一篇作文是否与众不同时,关键点在于如何定义作文所在的参照组语境以及如何量化作文间的差异。 本研究中,研究一基于主题模型,探查本研究使用的学生作文语料中潜在主题结构,即每个写作任务的作文如何聚类为有实质性意义的主题,确定主题参数化的合理范围,形成作文主题聚类的质性与量化判定原则。在此基础上,以作文创造性三个维度的人工评分为预测目标,基于每篇作文的潜在主题结构建立不同写作题目的创造性自动评分模型。研究二将每篇作文映射到一个网络中,基于作文的语义网络构造自动评分的特征。对于同一写作题目,建立创造性维度分数的3个线性预测模型,对写作题目下的所有作文进行创造性的自动评分。研究三基于主题分析为作文基于网络的新颖性评分构建精细化参照组,为新颖性的评价创建量化的参照情境。通过主题分析与语义网络相结合,进一步探索提高写作新颖性分数预测的有效路径。在量化的评分参照组内,使每篇作文都在一个对应的更加精细的语义参照组内对其新颖性进行评分,并且通过特征贡献分析,为人工评价作文的新颖性提供基于语义网络特征的证据。 研究结果表明,本文提出的一种新的基于主题分析和语义网络写作创造性自动评分模式能够应用于写作创造性自动评价。LDA方法是作文聚类的有效方法,通过主观与LDA模型参数两方面能够选择针对特定作文集的合理的主题模型,并且有助于人工评分员快速了解待评作文大致覆盖的主题与内容。基于语义网络的计算语言学特征在一定程度上可以预测写作的创造性。特别地,研究结果为作文新颖性的自动评分提供了一条“主题分析-量化差异-预测”的路径,作文的语义网络特征是量化差异的代理,计算语言学特征提供的客观证据可以用来量化、检验评分员在作文创造性评分过程中的主观经验和认知。 |
外文摘要: |
The evaluation of creativity in writing remains an ongoing topic of discussion. At its core is the question of whether something as elusive as creativity can be assessed beyond subjective judgment. To address this question, we examine whether the creativity scores of essays can be predicted using text analysis methods. Particularly for the novelty aspect of writing, it depends not only on the linguistic characteristics of the essays themselves but also on the performance of other essays in the same reference group. When determining if an essay stands out, the key lies in defining the contextual reference group of the essay and quantifying the differences between essays. In this study, the first research phase is based on topic modeling, investigating the latent thematic structure within the student essay corpus used in this research. This involves how essays for each writing task cluster into meaningful themes, establishing reasonable parameters for thematic modeling, and forming qualitative and quantitative criteria for essay thematic clustering. Building upon this foundation, we predict the three dimensions of manual creativity scores as targets and establish automatic creativity scoring models for different writing prompts based on the latent thematic structure of each essay. The second phase of the research maps each essay onto a network, constructing features for automatic scoring based on the semantic network of the essays. For the same writing prompt, three linear prediction models for creativity dimension scores are established, enabling automated creativity assessment for all essays under the writing prompt. The third phase of the study builds a refined reference group for novelty assessment in essays based on network-oriented thematic analysis. This creates a quantified contextual scenario for evaluating novelty. By combining thematic analysis with semantic networks, we further explore effective paths to enhance the prediction of novelty scores. Within the quantified scoring reference group, each essay is assessed for novelty within a more refined semantic reference group. Feature contribution analysis is conducted to provide evidence of the novelty of essays based on semantic network features, aiding human evaluators in assessing the novelty of essays. The research findings demonstrate that the newly proposed automatic creativity scoring model based on thematic analysis and semantic networks can be applied to automated creativity assessment in writing. Latent Dirichlet Allocation (LDA) proves to be an effective method for essay clustering. By considering both subjectivity and LDA model parameters, rational thematic models tailored to specific essay sets can be selected, aiding human evaluators in quickly understanding the themes and content covered in essays. Computational linguistic features based on semantic networks can predict creativity to a certain extent. Importantly, the research results provide a pathway for automated assessment of essay novelty: "Thematic Analysis - Quantitative Differences - Prediction." Semantic network features in essays serve as proxies for quantifying differences, while computational linguistic features offer objective evidence to quantify and examine the subjectivity and cognition of evaluators during the essay creativity assessment process. |
参考文献总数: | 96 |
馆藏地: | 图书馆学位论文阅览区(主馆南区三层BC区) |
馆藏号: | 博040102/23021 |
开放日期: | 2024-09-03 |