中文题名: | 基于统计与词嵌入向量的近代汉语动量词研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 050101 |
学科专业: | |
学生类型: | 学士 |
学位: | 文学学士 |
学位年度: | 2019 |
学校: | 北京师范大学 |
校区: | |
学院: | |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2019-05-15 |
答辩日期: | 2019-05-15 |
外文题名: | A research on verbal classifiers in pre-modern Chinese based on statistics and word embedding |
中文关键词: | |
中文摘要: |
该文以一个2.3亿字的历时语料库为平台,结合统计方法与词嵌入算法,定量考察近代汉语中13个动量词及其演变情况。首先,综合正则表达式与辞书知识库,完成并评测动量短语自动识别、近代汉语自动分词、动量词修饰的动词自动识别等预处理工作。其次,分时段测查各动量格式、各动量词的频率及其稳定度,发现动量词在文言、白话语体中的词频差异悬殊。第三,基于词嵌入向量,考察动量词的相似词与量词系统的语义分布面貌,表明动量词与名量词群体具有显著的“原型范畴”特征。最后,依照《同义词词林》的语义类体系,考察动量词所修饰的动词的优势和劣势语义类别,发现动词的语义类与动词是否受动量词修饰之间无强制的联系。以宏观视野,展示近代汉语动量词的概貌与发展过程,并尝试拓展汉语史的研究方法。
﹀
|
外文摘要: |
This thesis is based on a diachronic corpus with 230 million
Chinese characters and combines the statistical method and word
embedding algorithm, with the purpose of studying the 13 verbal classifiers and their development in Pre-modern Chinese language. Firstly, using the regular expression and dictionary,we finish and evaluate the automatic recognition of verbal classifiers, word segmentation of Pre-modern Chinese language and the automatic recognition of verbs modified by verbal classifiers. Secondly, we measure the frequency of various verbal classifiers,verbal classifiers’syntactic forms and their stability,finding that the
frequency values of verbal classifiers in classical Chinese and vernacular Chinese are very different. Thirdly, we attempt to observe the verbal classifiers’ similar words and the semantic distribution of the classifiers system by means of word embedding algorithm, reflecting that both verbal and noun classifiers have obvious features of “prototype category”.Finally,according to the lexical semantic system of Synonym CiLin, we analyze the major and minor semantic categories of verbs modified by verbal classifiers,
finding that whether a verb can be modified by verbal classifiers isn’t completely decided by the verb’s semantic category. With a marco vision,we show the basic looks and development of verbal classifiers in premodern Chinese, and we attempt to innovate on the research method in the history of Chinese.
﹀
|
参考文献总数: | 56 |
作者简介: | 蒋彦廷,男,2015级汉语言文学本科生。主要研究方向:计算语言学、中文信息处理。 |
插图总数: | 20 |
插表总数: | 8 |
馆藏号: | 本050101/19045 |
开放日期: | 2020-07-09 |