- 无标题文档
查看论文信息

中文题名:

 字书字料库与小学专书数字化(博士后研究报告)    

姓名:

 张健    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 060300    

学科专业:

 世界史    

学生类型:

 博士后    

学位:

 历史学博士    

学位类型:

 学术学位    

学位年度:

 2024    

校区:

 北京校区培养    

学院:

 历史学院    

研究方向:

 古代汉语    

第一导师姓名:

 蒋重跃    

第一导师单位:

 历史学院    

提交日期:

 2023-12-21    

答辩日期:

 2023-05-21    

外文题名:

 Chinese Character Form Database of Dictionary and the Digitization of Ancient Chinese Dictionary    

中文关键词:

 字料库 ; 小学专书 ; 古籍数字化 ; 版本 ; 未编码字 ; 体例    

外文关键词:

 Chinese Character Form Database ; Chinese ancient dictionary ; Digitilation of ancient book ; Version ; Uncoded Character ; Regualtion of Dictionary    

中文摘要:

古籍整理和出版是古老的传统,而数字化是新兴的技术手段,二者的结合促成了方兴未艾的古籍数字化。从上世纪末的“四库全书全文数据库”算起,大规模古籍数字化工作已经走过了大约三十个年头。近年来,学界日益认识到古籍数字化的重要性和紧迫性,但是相关的理论研究工作仍然没有深入展开。当下古籍数字化的很多成果实际上是由市场主导的商业行为,缺乏统筹观念和理论指导。而且伴随着大型数据库的接连开发,业界逐渐出现重复开发、字库冗杂乃至质量粗糙等问题。不仅如此,古籍数字化在实践中也存在大量亟需解决的问题,始终束缚着从业者进一步开展相关工作的脚步。而这其中又以小学专书的数字化最令业界感到棘手。

广义来说,小学专书是指以小学也即语言文字学为主要内容的古代典籍。《四库全书》在经部下设小学类,所收典籍基本都是小学专书。从内容和体例上可以将小学专书分为字书、韵书和训诂书三类。由于内容专业性较高,体例复杂,生僻字多,小学专书的数字化始终面临着诸多困难而迟迟不能大规模展开。字料库以数字化为基础,字料来源于古籍数据。在数字化过程中,数据采集的方式、手段决定了数字化是否能恰当地为字料库提供数据来源。因此数字化的标准是涉及到字料库建设的首要问题之一。

字料需要标注,字料属性标注分为两个层面。一是字料的物理属性,也即字料来源、图像质量等外在的、形式的属性。二是字料作为汉字在形音义用等方面的属性。以小学专书为对象和内容建构的字料库,其第二类属性直接来源于小学专书对字头的说解。也即,我们需要对小学专书的说解进行属性拆分,并将拆分所得归入字料的各类属性。小学专书对字头的说解是其所蕴含的显性知识。而这些说解分别针对什么问题,涉及哪些方面,是否跟其他字头或条目有关等等问题的答案却是隐藏在小学专书内部的隐性知识。从根本上说,小学专书字料库的任务,就是将小学专书中的隐性知识显性化。这既是对汉字的研究,对字料库的完善,也是对小学专书本身所进行的整理。

字料库理论对小学专书的整理有重要指导意义,也是利用信息科技手段进行文献研究的重要理论基础。字书字料库是字料库理论视角下小学专书数字化的必然结果和最终出路,也是新时代以信息科技手段整理和研究小学专书的重要基础。字书存储着汉字在形音义用等各方面的丰富属性信息,将这些信息剥离出来,分门别类作为字料属性进行标注,利用字料库的框架结构,既能够横向展示不同字书的内容,又能够纵向类聚同一个字的不同属性。字料库的编制,可以加深我们对小学专书的体例、作者的文字观念以及字书的传承脉络的了解。在古籍版本校勘工作中,信息技术的辅助作用体现在对基于字料库的字书属性库和异文数据库的利用,自动校勘平台的使用和网络数据库的整合等方面。

外文摘要:

The digitization of ancient books based on the combination of two acpects: collation and publishment of ancient books and digital technique. It has been about thirty years since the development of Sikuquanshu Full Text Database in the last century. Recently, the importance and urgency of digitization has been recognized while no related theory has been promoted. Without any overall planning and theoretical direction, the digitization of ancient books is dominated by commercial companies. As lots of datebase developing, problems like repeated development and miscellaneous fonts arise now and then. Those problems existing in digitization of ancient books fetter our work on expanding the application of database.

Chinese ancient dictionary refers to a kind of ancient books that mainly contents of knowledge about ligustics and grammatology. This is an traditional classification since Sikuquanshu set Jingbu-xiaoxue to store those books. It can be depart into three kinds of books, which is the dictionary of character pronunciation and meaning. Chinese ancient dictionaries are difficult to digitalise because of its high speciality and complex regulations.

The theory of Chinese Character Form Database has important guiding significance for the collation of Chinese ancient dictionary, and is also an important theoretical basis for literature research using information technology. Chinese Character Form Database of Dictionary is the inevitable result and final outlet of digitization of Chinese ancient dictonary from the perspective of the former theory. It is also an important basis for sorting out and studying Chinese ancient dictonary by means of information technology in the new era. Chinese ancient dictonary store rich attribute information of Chinese characters in shape, sound, meaning and other aspects. These information can be stripped out and labeled as character material attributes by classification. By using the frame structure of the Chinese Character Form Database, the contents of different Chinese ancient dictonary can be displayed horizontally and the different attributes of the same character can be clustered vertically. In fact, the process of compiling the Chinese Character Form Database of Dictionary is also the research process of Chinese ancient dictonaries itself.

参考文献总数:

 64    

馆藏地:

 图书馆学位论文阅览区(主馆南区三层BC区)    

馆藏号:

 博060300/24012    

开放日期:

 2024-12-20    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式