中文题名: | 面向国际文凭项目(IBDP)的海外高中生中文阅读文本难度指标体系研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 045300 |
学科专业: | |
学生类型: | 硕士 |
学位: | 汉语国际教育硕士 |
学位类型: | |
学位年度: | 2024 |
校区: | |
学院: | |
研究方向: | 汉语国际教育 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2024-05-25 |
答辩日期: | 2024-05-22 |
外文题名: | CHINESE READING TEXT DIFFICULTY INDICATOR SYSTEM FOR OVERSEAS HIGH SCHOOL STUDENTS IN THE INTERNATIONAL BACCALAUREATE DIPLOMA PROGRAM (IBDP) |
中文关键词: | |
外文关键词: | Text Difficulty ; IBDP Reading Text ; Linguistic Features ; Machine Learning |
中文摘要: |
文本难度评估指根据文本特征计算文本难度级别,是自然语言处理领域的重要研究问题之一。文本的难度常与语言本身的文本特征息息相关,现有研究通常从汉字、词汇、句法、篇章四个层面特征考察文本难度。本研究在此基础之上结合高中生的心理发展特点,增加一个新层面——情感层面特征指标来探索面向国际文凭项目(IBDP)的海外高中生中文阅读文本难度指标体系。之所以选择IBDP项目,是因为它是当前国际上美誉度较高的国际教育项目,其语言项目包括从低到高三个不同的难度等级:Ab Initio、SL(Standard Level)和HL(High Level)。本文自建IBDP中文阅读测试文本语料库,从五个层面选取了187个影响特征,通过训练七种机器学习模型构建了面向IBDP中文阅读测试文本的难度分类模型。 结果发现:(1)对于全特征模型,随机森林模型(RF)在七种机器学习模型中表现最优;(2)通过考察不同层面的特征以及各类特征组合对语料库中文本难度的预测力,发现其中预测力最强的是基于《国际中文教育中文水平等级标准》的特征组合模型;(3)剔除冗余特征后,有25个特征指标进入了最优模型(IBDP-25),其中预测力最强的是词汇层面中的词汇复杂度维度,尤其是中高级词汇占比;其次是汉字层面中的汉字复杂度维度;句子、篇章和情感层面指标的预测力较弱,其中情感层面指标的预测力略强(负面和积极情感倾向词均进入前25);(4)在IBDP的分级阅读测试文本中,初级(Ab Initio)与标准等级(SL)之间文本难度跨度较小,标准等级(SL)与高级(HL)之间文本难度跨度较大。 本文还对上述最优模型(IBDP-25)进行了应用测试研究,用该模型预测了部分HSK阅读文本难度,并将其与IBDP阅读测试文本难度进行了比较。结果发现:IBDP的Ab Initio等级的阅读文本难度高于HSK4级;SL等级的阅读文本难度低于HSK5级;HL等级的阅读文本难度低于HSK6级,略高于HSK5级。 总体而言,结果表明本文构建的用于预测海外高中阶段中文阅读测试文本难度的可读性特征集合(IBDP-25)能够较好地辅助进行IBDP阅读文本难度计算机自动评估,文章最后讨论了上述结果在教学中的应用。 |
外文摘要: |
Text difficulty assessment refers to the calculation of text difficulty level based on text features, which is one of the important research problems in the field of natural language processing(NLP). The difficulty of a text is often closely related to the textual features of the language itself, and existing research usually examines the difficulty of a text from four levels of features: Chinese characters, vocabulary, syntax, and chapter. In this study, we added a new level, the affective level, to explore the index system of Chinese text difficulty for overseas high school students in the International Baccalaureate Diploma Program (IBDP), taking into account the psychological development of high school students. The IBDP program was chosen because it is a highly reputable international education program, and its language program includes three different difficulty levels from low to high: Ab Initio, SL (Standard Level), and HL (High Level). In this paper, we built our own IBDP Chinese reading test text corpus, selected 187 influential features from five levels, and constructed a difficulty classification model for IBDP Chinese reading test texts by training seven machine learning models. The results show that: (1) for the full-feature model, the random forest model (RF) performs optimally among the seven machine learning models; (2) by examining the predictive power of the features at different levels and the combinations of the features on the difficulty of the texts in the corpus, it is found that the strongest predictive power is the combination of the features model based on the IBDP Chinese Language Proficiency Scale; (3) after removing the redundant features, 25 feature indicators enter the optimal model (IBDP Chinese Reading Test Text Difficulty). indicators entered the optimal model (IBDP-25), of which the strongest predictive power was in the vocabulary frequency dimension in the vocabulary level, especially the proportion of intermediate and advanced vocabulary; followed by the Chinese character frequency dimension in the Chinese character level; and the predictive power of indicators in the sentence, chapter, and affective levels were weaker, with the predictive power of indicators in the affective level being slightly stronger (negatively and positively emotionally inclined words were both in the top 25); (4) in the IBDP (4) in the graded reading test texts of the IBDP, the span of text difficulty between elementary (Ab Initio) and standard level (SL) is small, and the span of text difficulty between standard level (SL) and high level (HL) is large. In this paper, we also conducted an applied test study on the above optimal model (IBDP-25), and used it to predict some HSK reading text difficulties, and compared it with the IBDP reading test text difficulties. The results found that: the reading text difficulty of Ab Initio level of IBDP is higher than that of HSK4; the reading text difficulty of SL level is lower than that of HSK5; the reading text difficulty of HL level is lower than that of HSK6 and slightly higher than that of HSK5. Overall, the results show that the readability feature set (IBDP-25) constructed in this paper for predicting the text difficulty of Chinese reading tests at overseas high school levels can better assist in automatic computerized assessment of IBDP reading text difficulty, and the article concludes with a discussion of the application of the above results in teaching and learning. |
参考文献总数: | 92 |
馆藏号: | 硕045300/24060 |
开放日期: | 2025-05-26 |