- 无标题文档
查看论文信息

中文题名:

 基于Lucene的全文检索系统的研究与实现    

姓名:

 董华    

学科代码:

 120502    

学科专业:

 情报学    

学生类型:

 硕士    

学位:

 管理学硕士    

学位年度:

 2011    

校区:

 北京校区培养    

学院:

 管理学院    

研究方向:

 信息检索    

第一导师姓名:

 靖培栋    

第一导师单位:

 北京师范大学管理学院    

提交日期:

 2011-06-08    

答辩日期:

 2011-05-30    

外文题名:

 RESEARCH AND IMPLEMENTATION OF THE FULL-TEXT RETRIEVAL SYSTEM BASED ON LUCENE    

中文摘要:
网络与数据库技术的日趋成熟,使得人们在获取知识时,比以往任何时间都更加迅速和富有成效。但是,伴随着海量的信息资源同时到来的是人们在判断知识有效性时的茫然和无从,如何在浩如烟海的资源面前快速、准确地进行检索已经越来越引起人们的关注。正是基于这个原因,信息检索成为了互联网应用的核心,而全文检索技术则是核心中的核心。与普通的检索不同,全文检索技术处理的是非结构化数据,即指不定长或无固定格式的数据,通过索引构造器对非结构化数据创建索引,利用搜索器对索引进行检索,根据权重计算公式对检索结果进行排序,最后达到准确检索目标文件的目的。近几年信息检索技术的迅猛发展,催生了大量商业搜索引擎,具有代表性的如国外的Google及雅虎,国内的百度和搜狗,但是针对企业用户及有特殊需求的个人或研究机构,商业搜索引擎在保密性和灵活性上还远远不能满足这些特定用户的需求,由此,便引发了使用开源搜索软件包构建轻量级搜索引擎的需求,在这种背景下,Lucene应运而生。本文首先介绍了全文检索技术的基本原理,并对全文检索中的同义检索及页面排序等问题进行了一定的研究,最后利用Lucene建立了一个全文检索系统,来验证改进后的全文检索系统的性能。本文的主要工作如下:分析并研究了概念语义同义扩展查询,利用哈工大《同义词词林(扩展版)》对查询词进行同义词的扩展,再参与检索,本文基于Lucene系统构建出同义查询系统,可实现对多个关键词的有效扩展;在同义查询的基础上,对Lucene文本排序机制进行了深入的研究,在对Lucene自身的打分公式进行了有效学习之后,结合同义扩展查询的相关要求,对查询词和扩展词进行不同的权重设定,以实现在提高了查全率的前提之下,将最相关的查询结果返回给用户。利用上述技术,设计并实现了一个基于Lucene的全文检索系统,在本地系统上,对多种非结构化文档,如PDF,WORD,EXCEL等都可以进行快速准确地索引和检索,系统拥有简单清晰的系统界面,并可实现对于索引的实时更新,实验结果表明,该系统能更准确地提供给用户最需要的信息。
外文摘要:
Network and database technology are becoming more and more mature and this makes people to gain knowledge more quickly and productive than any previous time. However, when people facing the vast amount of information resources, they could not easily judge which is helpful to them. How to find the information quickly and accurately has drawn increasing attention. For this reason, information retrieval became the core of Internet applications, while the full-text search technology is the core of the core.Different from ordinary search, full text retrieval technology deals with unstructured data. It finally achieves the goal of accurate document retrieval purposes through the index constructor to create an index of unstructured data, using search engines to search the index, calculated according to weight sort of search results. In recent years, rapid development of information retrieval technology gave birth to a large number of commercial search engines, such as Google and Baidu. But for corporate users and individuals with special needs or research institutions, commercial search engines and flexibility in confidence are still far from meeting these specific needs of users. As a result, people have begun to develop lightweight search engine using open source search software. In this context, Lucene came into being.This paper firstly introduces the basic principles of full-text retrieval technology and then makes research on synonymous search and the page sorting. Finally, the paper establishes a full text search system with Lucene, to verify the improved performance. The main work is as follows:
参考文献总数:

 32    

馆藏号:

 硕120502/1105    

开放日期:

 2011-06-08    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式