中文题名: | 面向中文文本可读性领域的句法复杂度指标体系的构建及应用 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 050102 |
学科专业: | |
学生类型: | 硕士 |
学位: | 文学硕士 |
学位类型: | |
学位年度: | 2020 |
校区: | |
学院: | |
研究方向: | 中文信息处理 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2020-06-10 |
答辩日期: | 2020-05-29 |
外文题名: | Construction and Application of Syntactic Complexity System for ChineseText Readability |
中文关键词: | |
外文关键词: | Text readability analysis ; syntax complexity ; simple Bayes ; support vector machine ; random forest |
中文摘要: |
随着素质教育的推进,如何提升学生的阅读能力已然成为教育工作者关注的重点。在此背景之下,“分级阅读”这一概念随之兴起。“分级阅读”旨在根据学生在不同年龄段的智力和心理发育水平为其提供难度等级不同的读物,对文本的可读性进行分析就是其中一个重要环节。
句法复杂度是指语言单位在产出的过程中语言形式的变化范围和复杂化程度,其一定程度上能够反映文本的可读性。以往的中文文本可读性分析主要围绕汉字、词汇等特征展开,对句法复杂度方面的特征进行研究的比较少,这就使得文本分析的结果缺乏句法方面的支撑。因此构建一个科学的句法复杂度特征体系辅助文本可读性研究显得十分重要。
本文从计算语言学视角出发,构建了面向中文文本可读性领域的句法复杂度特征体系。在此基础上,将其与汉字复杂度、词汇复杂度特征相融合,利用三种常见的分类算法分别对该特征体系进行应用和验证,同时评估特征体系的有效性。本文的主要内容如下:
1. 梳理了文本可读性和句法复杂度的研究进展。首先对本研究的背景和意义进行阐述。主要介绍分级阅读的相关概念及其重要性,引入文本可读性这一人物。随后阐述文本可读性分析和句法复杂度的相关概念和研究现状,指出当前句法复杂度研究的不足之处,提出构建句法复杂度特征体系来辅助文本可读性自动分析。
2. 构建了面向文本可读性领域的句法复杂度特征体系。首先结合语境整体感知句法复杂度对句子难度的影响。接着,从语言学角度出发,以句法丰富度和句法深度为切入点,梳理出基准特征维度和具体内容维度等两个子维度下包括句子长度、单复句、句法结构复杂度、依存句法复杂度、句间关系、特殊句法结构在内的六个一级特征和四十七个二级特征,得出较为完备的句法复杂度特征体系。
3. 进行了句法复杂度特征体系的应用和效度分析。为了评估本文所构建的句法复杂度特征体系的效度,我们将其融入到汉字复杂度加词汇复杂度特征之上,进行小学语文教材可读性自动分析的实验,并构建了特定领域的句法复杂度特征体系。首先详细介绍了该实验的具体思路和技术路线,随后对该实验所用数据进行处理,接着构建了实验所需的特征工程,完成了特征提取和模型搭建的工作。为避免因模型的偏好而产生的误差,本研究使用常见三种算法,围绕句法复杂度特征和基于汉字复杂度和词汇复杂度的特征展开对比实验。最后分析实验结果,证明本句法复杂度特征体系的有效性。
|
外文摘要: |
With the advancement of quality education, how to improve students' reading ability has become the focus of educators. In this context, the concept of " Leveled Reading " has emerged. " Leveled Reading " aims to provide students with various difficulty levels of reading materials according to their intelligence and psychological development level at different ages. Analyzing the readability of the text is an important part. Syntactic complexity refers to the range and complexity of language forms in the process of language units production, which can reflect text difficulty to a certain extent. The traditional Chinese text readability analysis mainly focuses on Chinese characters, vocabulary and other features. And there is less research on features of syntactic complexity, which makes the results of text analysis less scientific. Therefore, it is very important to construct a scientific syntactic complexity feature system to assist text readability research. This paper constructs a syntactic complexity characteristic system for Chinese text readability from the perspective of computational linguistics. On this basis, it is integrated with the traditional features of Chinese character complexity and vocabulary complexity. Using three common classification algorithms to apply and verify the feature system, and in the meantime evaluate the effectiveness of the feature system. The main contents of this article are as follows: 1. Sorting out the current research status of text readability and syntactic complexity. Firstly, this article introduces the relevant background and the importance of graded reading. then it leads to the concept of text readability analysis. Also, it elaborates the related concepts and research status of text readability analysis and syntactic complexity to point out the shortcomings of the current syntactic complexity research and the construction of a syntactic complexity feature system to assist automatic analysis of text readability. 2. Constructing a syntactic complexity feature system oriented to the field of text readability. Firstly, it distinguishes the concepts of grammar and syntax, and determines the research object of this article. Then, from the perspective of linguistics, it sorts out 6 first-level features and 47 sub-dimensions, including sentence length, single complex sentence, complexity of syntactic structure, dependent syntactic complexity, inter-sentence relationship, and special syntactic structure to construct a more complete system of syntactic complexity features by using syntactic richness and syntactic depth as the starting point 3. Verifying the validity of Chinese text readability analysis based on syntactic complexity. In order to evaluate the validity of the syntactic complexity feature system constructed in this paper, we add it to the traditional Chinese character complexity and vocabulary complexity features for automatic analysis of readability. First, the specific ideas and technical route of the experiment are introduced in detail, and then the data used in the experiment is processed. Then, it constructs the feature engineering required for the experiment, and completes the work of feature extraction and model building. In order to avoid errors caused by the preference of the model, three algorithms are used to conduct comparative experiments. It starts a comparative experiment around the features of syntactic complexity and traditional features (based on the complexity of Chinese characters and vocabulary complexity). Finally, the experimental results are analyzed to prove the effectiveness of this syntactic complexity feature system in this paper. |
参考文献总数: | 68 |
作者简介: | 杜月明,女,北京师范大学汉语文化学院2017级硕士研究生,研究方向为中文信息处理,发表论文多篇。 |
馆藏号: | 硕050102/20005 |
开放日期: | 2021-06-10 |