中文题名: | 中文议论性文本论证结构自动分类与应用研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 050102 |
学科专业: | |
学生类型: | 硕士 |
学位: | 文学硕士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 计算语言学 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-20 |
答辩日期: | 2023-05-28 |
外文题名: | RESEARCH ON AUTOMATIC CLASSIFICATION AND APPLICATION OF ARGUMENTATIVE STRUCTURE IN CHINESE TEXTS |
中文关键词: | |
外文关键词: | Argumentation mining ; Argumentative structure ; Corpus ; Automatic classification of argumentation ; Text evaluation ; Argumentative indicators |
中文摘要: |
论辩挖掘是计算论辩中一个重要的任务,论辩挖掘通常基于结构化论辩模型,研究如何从自然语言文本中提取论点,自动识别、比较和评估论证结构。论证结构是由论题、论点、论据和论证方式构成的结构。随着论辩挖掘在计算语言学领域引起关注,相关的研究成果也被应用到教育领域。论证结构特征在写作质量预测、文本可读性分析等任务上有重要的作用。然而当前论证结构的自动分析与特征指标的构建主要面向英语文本,针对中文教育类文本的论辩成分分类与论证结构特征的相关研究较少,缺乏相对规范的标注语料、统一的论辩成分分类框架与特征指标体系,自动分析的研究仍有不足。本研究面向教育应用的中文议论性文本分析,结合论证理论与篇章语言学的研究,利用自然语言处理技术,实现论辩成分的自动分类,并构建论证结构特征的量化计算指标,分析论证结构对议论性文本质量和难度的影响。 首先,本文以语文课外阅读材料、新闻评论、辩论赛辩词等为语料来源,制定中文议论性文本论证结构分类标注方案并面向教育应用构建中文议论性文本语料库,共计 863篇文本,该语料库对于提升模型分类效果和构建论辩计算指标具有积极作用。 其次,本文探讨了句子级的论辩成分自动挖掘,设计并实现了融合位置特征和相似度特征的BiLSTM-CRF 论辩成分识别模型。实验表明,本文融合特征的方法能有效提升论辩成分识别的效果,相比于 DiSA 方法,在观点/主题(Thesis)这一类别上取得了明显的性能提升,macro-F1 值提高了约 10%。并通过数据增强方法加入不同风格的训练数据验证了模型的鲁棒性。同时通过逐步加入特征对比,发现局部位置特征能够显著提升论点(Main Idea)的识别效果。 最后,本文从数量特征、位置特征、形式特征三个维度出发,归纳了 11 个与中文议论性文本评估相关的特征指标并进行了效果验证与筛选,通过相关性分析筛选出在论证质量评估中有效的 5 个特征,在论证难度评估中相对有效的 8 个特征。其中论点阐释比例和事例型证据比例两项指标在两类评估中均有较高的相关性,而本文改进的论辩成分的位置特征和形式特征也取得了显著的相关性。从回归分析的结果看,本文提出的指标对于议论性文本难度的评估具有良好的预测效力,达到了中效应量(Adjusted R²=0.223)。 综上所述,本文针对中文议论性文本的论证结构自动分类与评估应用进行了系统完整地研究。本文构建了高质量的论辩成分标注数据集,为后续相关研究提供了基础性的资源和标注标准。初步实现面向教育应用的议论性文本评估工作的自动化路径探索,实现了论辩成分的自动分类,相关标签分类准确率相较以往研究有较大提升。在论辩成分自动分类基础上,本文构建了面向议论文难度评估的论证结构特征量化计算指标体系,并验证了相关指标对于议论文难度评估的预测价值,为议论文写作质量评估和议论性文本论证难度评估提供了新的指标。 |
外文摘要: |
Argumentation mining is an important task in computational argumentation. It typically relies on structured argumentation models to extract arguments, automatically identify, compare, and evaluate argumentative structures from natural language texts. An argumentative structure consists of a claim, a premise, evidence, and a mode of reasoning. As argumentation mining gains attention in computational linguistics, related research results have been applied to the field of education. Features of argumentative structures play an important role in tasks such as writing quality prediction and text readability analysis. However, the current automatic analysis of argumentative structures and the construction of feature indicators are mainly focused on English texts. There is little research on the classification of argumentative components and the related features of argumentative structures in Chinese educational texts. This lack of a relatively standardized annotated corpus, a unified framework for argumentative component classification, and a feature index system, has led to insufficient research in automatic analysis. This study focuses on the analysis of Chinese argumentative texts for educational applications, combining argumentation theory and discourse linguistics research, and using natural language processing techniques to achieve automatic classification of argumentative components. The study constructs quantitative calculation indicators for argumentative structure features and analyzes the impact of argumentative structures on the quality and difficulty of argumentative texts. Firstly, this paper uses Chinese argumentative texts from extracurricular reading materials, news comments, and debate speeches as corpus sources, establishes a classification annotation scheme for Chinese argumentative text argumentative structures, and constructs a corpus of Chinese argumentative texts for educational applications, consisting of 863 texts. This corpus has a positive effect on improving the classification effect of the model and constructing argumentation calculation indicators. Secondly, this paper explores the automatic mining of argumentative components at the sentence level, designs and implements a BiLSTM-CRF argumentative component recognition model that integrates location features and similarity features. Experiments show that the feature integration method can effectively improve the effect of argumentative component recognition. Compared with the DiSA method, it achieved a significant performance improvement in the viewpoint/theme (Thesis) category, with a macro-F1 value increased by about 10%. By gradually adding features for comparison, it was found that local location features can significantly improve the recognition effect of the main idea. The model's robustness was also verified by data augmentation methods that added training data with different styles. Finally, this paper starts from three dimensions: quantity features, location features, and semantic features, summarizes 11 feature indicators related to Chinese argumentative text evaluation, and conducts effect verification and screening. Through correlation analysis, 5 effective features in argument quality evaluation and 8 relatively effective features in argument difficulty evaluation were selected. The proportions of claim explanation and example-based evidence had high correlation in both types of evaluations, while the improved location and semantic features of argumentative components in this paper also achieved significant correlation. From the results of regression analysis, the proposed indicators in this paper have good predictive power for the evaluation of argumentative text difficulty, achieving a medium effect size (Adjusted R²=0.223). In summary, this paper systematically and comprehensively studied the automatic classification and evaluation of argumentative structures in Chinese texts for educational applications. It explored the automated path of argumentative text evaluation and achieved automatic classification of argumentative components and calculation of argumentative indicators, providing new indicators for evaluating the quality of argumentative essay writing and the difficulty of argumentative texts. In addition, the Chinese argumentative text corpus constructed in this paper provides basic resources and annotation standards for related research in the future. |
参考文献总数: | 93 |
馆藏号: | 硕050102/23033 |
开放日期: | 2024-06-20 |