查看论文信息

查看全文

查看论文信息

中文题名：	初中阶段“浅易文言文”的评量研究
姓名：	马坤
保密级别：	公开
论文语种：	中文
学科代码：	050102
学科专业：	语言学及应用语言学
学生类型：	硕士
学位：	文学硕士
学位类型：	学术学位
学位年度：	2021
校区：	北京校区培养
学院：	汉语文化学院
研究方向：	中文信息处理
第一导师姓名：	刘智颖
第一导师单位：	北京师范大学中文信息处理研究所
提交日期：	2021-06-04
答辩日期：	2021-06-01
外文题名：	RESEARCH ON THE EVALUATION OF “EASY CLASSICAL CHINESE” IN JUNIOR MIDDLE SCHOOL
中文关键词：	浅易文言文 ; 可读性 ; 语言特征 ; 难度 ; 课外读物
外文关键词：	Easy Classical Chinese ; Readability ; Language Features ; Difficulty ; Extracurricular reading
中文摘要：	︿文本可读性研究已走过了百年历程，“平易”“浅易”等对文言文难度的规定性表述也在百年语文课程标准的实施中持续引发着争论和探讨。为解决这一模糊定义带来的教育教学选篇问题，在语言学、教育学、计算机科学等相关学科的理论与技术成果基础上，本文尝试开拓中文可读性研究的现有领域，利用中文信息处理技术手段对初中阶段“浅易文言文”的内在语言特征与评量参照进行重新考察。首先，本文利用文本可读性理论界定了初中“浅易文言文”这一概念内涵，以建国以来公开发行过的各版本高中、初中、小学语文教科书中文言篇目为“浅易文言文”的外延，依照标准分级信息标注任务流程进行多人协同标注，构建了一个被证实有效的300余篇规模的“文言文阅读分级基础数据库”。其次，从字词、语义、句子、篇章四个语言层面出发，归纳出共19类、39种可能与文言文可读性评量相关的语言特征并进行了效度验证与筛选，成功构建了一个共17类、30项语言特征的面向文言文可读性评量任务的指标体系。最后，本文融合筛选后的语言特征和TF-IDF的文本表示方法借助机器学习分类模型进行对比实验，成功在文本难度的7分类任务中取得准确率达到62.5%、邻级精度达到95%的难度分级效果。以人民教育出版社初中阶段课外读本文言文选篇作为验证集对本文研制的上述文言文可读性评量工具进行评测，测试所得到的良好结果验证了工具的可靠性，实现了工具的现实应用。据此，本研究建立了初中“浅易文言文”的评量参照体系，初步实现“浅易文言文”评量工作的自动化路径探索，自动从文本、译文中抽取指标，借助预训练分类模型给出难度等级参考，为初中“浅易文言文”的课内外选篇提供了依据，也对当前文言文教育教学提出了一些合理化建议。本文对外开放了本项目所使用的包含不同难度水平的400篇文言文的基本信息、39个语言特征以及标注阅读理解难度的“文言文阅读分级数据集”（CCRGD），弥补了“文言文可读性”研究的领域空白，找到了一条解决“浅易文言文”评量这一长期“历史遗留问题”的路径。本文得到的阶段性实验结果可作为后续相关研究的资源和基准。﹀
外文摘要：	︿ The study of text readability has gone through a century, and the prescriptive expressions of the difficulty of classical Chinese such as "Pingyi" and "Qianyi" have also continued to arouse controversy and discussion in the implementation of the century-old Chinese curriculum standards. In order to solve the problem of selecting articles in education and teaching caused by this fuzzy definition, based on the theoretical and technical achievements of linguistics, pedagogy, computer science and other related disciplines, this paper attempts to explore the existing field of Chinese readability research, using Chinese Information Processing techniques re-examine the inherent language characteristics and evaluation references of "Easy Classical Chinese" at the junior high school stage. First of all, this paper uses the text readability theory to define the concept and connotation of "Easy Classical Chinese" in junior high schools. It takes the Chinese language content of various versions of Chinese textbooks for high school, junior high school and elementary school that have been published since the founding of the People's Republic of China as the extension of the research object, according to the standard classification information Multi-person collaborative annotation was carried out in the labeling task process, and a "basic database of classical Chinese reading classification" with a scale of more than 300 proved to be effective was constructed. Secondly, starting from the four language levels of words, semantics, sentences, and texts, a total of 19 categories and 39 language features that may be related to the evaluation of the readability of classical Chinese were summarized, and the validity was verified and screened, and successfully constructed an indicator system with 17 categories, 30 language features for classical Chinese readability assessment tasks. Finally, this paper combines selected language features and TF-IDF text representation method with the help of machine learning classification models to conduct comparative experiments, and successfully achieves an accuracy rate of 62.5% and a adjacent accuracy of 95% in the 7 classification tasks of text difficulty. The selection of classical Chinese texts from the extracurricular reading aid system at the People's Education Press’s junior high school stage was used as a verification set to evaluate the Classical Chinese Readability Evaluation Tool developed in this article. The good results obtained from the test verify the reliability of the tool. Based on this, this paper established a reference system for the evaluation of "Easy Classical Chinese" in junior high schools, and initially implemented the automatic path exploration of the evaluation work of "Easy Classical Chinese", automatically extracting indicators from the text and translation, and using the pre-training classification model to give the difficulty level For reference. The system provides a basis for the selection of articles in and out of class in the "Easy Classical Chinese" in junior high school, and also puts forward some reasonable suggestions for the teaching. This article opens up to the public the basic information of 400 classical Chinese texts of different levels of difficulty, 39 language features, and the "Classical Chinese Reading Grading Data Set" （CCRGD）that marks the difficulty of reading comprehension, which complements the research blank of "Classical Chinese Readability", found a way to solve the long-term "historical problem" of "Easy Classical Chinese" evaluation. The experimental results obtained in this paper can be used as resources and baselines for follow-up related research. ﹀
参考文献总数：	143
馆藏号：	硕050102/21012
开放日期：	2022-06-04

附件下载