中文题名: | 专利文本汉英机器翻译中有标记并列结构的处理 |
姓名: | |
学科代码: | 050102 |
学科专业: | |
学生类型: | 硕士 |
学位: | 文学硕士 |
学位年度: | 2013 |
校区: | |
学院: | |
研究方向: | 专利文本汉英机器翻译 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2013-05-30 |
答辩日期: | 2013-05-25 |
外文题名: | Processing of Marked Coordinate Structure in Patent Chinese-English Machine Translation |
中文摘要: |
并列结构的识别一直是短语结构自动识别中的一个难点,作为并列结构重要组成部分的有标记并列结构由于其内部结构复杂、边界跨度大等特点,更是汉语组块识别中一个比较难攻克的问题。传统语言学领域对于并列结构的研究主要在于考察其类型及特点,信息处理学界多用统计方法处理并列结构,这些研究中较少涉及到运用规则的方法,并且几乎没有涉及到专利语料。有标记并列结构在专利文本中分布广泛,整体数量很大,它的正确识别与转换对于专利文本机器翻译系统准确率的提高具有重要的意义。本文以HNC(概念层次网络) 理论为基础,通过对真实的专利语料的考察制定了基于规则的并列结构自动识别规则及策略,并通过汉英对比研究提出了汉英并列结构的转换策略,提高了并列结构自动识别的正确率。本文的主要工作包括以下几方面:(1)收集了专利文本中带有有标记并列结构的汉语句子及其对应英文译文,分为分析集和测试集,构成了分析语料、测试语料。(2)考察了并列结构内外部语法、语义等约束条件,根据并列结构特征以顿号为连接词的代表总结制定了并列结构识别的一般规则。(3)分别对“和、与、或、或者、跟、同、并、并且、及、以及、及其”进行了研究,制定了其概念类别识别规则和这些连接词连接的并列结构的边界识别规则。制定并列识别策略并将这些规则加入到整个机器翻译系统中,实现了有标记并列结构的自动识别。(4)对比并总结两种语言并列结构的不同特点及翻译规律,研究了机器翻译中并列结构的转换策略。(5)进行了封闭测试和开放测试,对测试结果进行了分析并制定了下一步研究方向。
﹀
|
外文摘要: |
The identification of Coordinate Structure has always been one of the difficulties in automatic identification of phrase. The identification of Marked Coordinate Structure, as an important part of Coordinate Structure, is especially a difficult problem to work out because of its complex structure and wide range. Researches on Coordinate Structure of modern Chinese linguistic mainly focus on its type and characteristic, while information processing field’s mainly use statistic methods. Those researches barely refer to rule-based methods or patent files. Marked Coordinate Structure spreads wide in patent files and its correct identification and transformation make much sense to the enhancement of precision in Machine Translation.With research on patent files, we drew up a rule-based strategy of Marked Coordinate Structure based on HNC theory and established a transformation strategy of it from Chinese to English, enhanced the precision of Coordinate Structure identification.What we have done in the research is displayed as following.(1) We collected sentences with Marked Coordinate Structure in patent files and their correspondent translations, putting them into analytic set and test set.(2) We researched the inner and outer grammatical and semantic constraint of Marked Coordinate Structure and drew up general rules of its identification with pause mark as the representative of conjunctions.(3) We analyzed the conjunction words “和(he),与(yu),或(huo),或者(huo zhe),跟(gen),同(tong),并(bing),并且(bing qie),及(ji),以及(yi ji),及其(ji qi)” and drew up their category identification rules and boundary identification rules of Coordinate Structure marked by them. Then we made identification strategy and embed the rules into the whole Machine Translation system, realized the automatic identification of Marked Coordinate Structure.(4) Compared the characteristics of Coordinate Structure in Chinese and English, we summarized the translation law of it and established the transformation strategy from Chinese to English.(5)We took closed test and open test, and made the future research plan according to the analysis of the test results.
﹀
|
参考文献总数: | 39 |
馆藏号: | 硕050102/1356 |
开放日期: | 2013-05-30 |