Lucene(一)

 

什么是Lucene

Lucene是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎(英文与德文两种西方语言)。

索引和搜索流程图

1、绿色表示索引过程,对要搜索的原始内容进行索引构建一个索引库,索引过程包括:

           确定原始内容即要搜索的内容à获得文档à创建文档à分析文档à索引文档

2、红色表示搜索过程,从索引库中搜索内容,搜索过程包括:

           用户通过搜索界面à创建查询à执行搜索,从索引库搜索à渲染搜索结果

采集数据

从互联网上、数据库、文件系统中等获取需要搜索的原始信息,这个过程就是信息采集,采集数据的目的是为了对原始内容进行索引。

采集数据分类:

1、对于互联网上网页,可以使用工具将网页抓取到本地生成html文件。

2、数据库中的数据,可以直接连接数据库读取表中的数据。

3、文件系统中的某个文件,可以通过I/O操作读取文件的内容。

在Internet上采集信息的软件通常称为爬虫或蜘蛛,也称为网络机器人,爬虫访问互联网上的每一个网页,将获取到的网页内容存储起来。

Lucene不提供信息采集的类库,需要自己编写一个爬虫程序实现信息采集,也可以通过一些开源软件实现信息采集,如下:

Solr(http://lucene.apache.org/solr) ,solr是apache的一个子项目,支持从关系数据库、xml文档中提取原始数据。

Nutch(http://lucene.apache.org/nutch), Nutch是apache的一个子项目,包括大规模爬虫工具,能够抓取和分辨web网站数据。

jsoup(http://jsoup.org/ ),jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。


/*
sql
MySQL - 5.1.72-community 
*********************************************************************
*/
/*!40101 SET NAMES utf8 */;

create table `book` (
	`id` int (11),
	`name` varchar (192),
	`price` float ,
	`pic` varchar (96),
	`description` text 
); 
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('1','java 编程思想','71.5','23488292934.jpg','作者简介  Bruce Eckel,是MindView公司的总裁,该公司向客户提供软件咨询和培训。他是C++标准委员会拥有表决权的成员之一,拥有应用物理学学士和计算机工程硕士学位。除本书外,他还是《C++编程思想》的作者,并与人合著了《C++编程思想第2卷》。\r\n\r\n《计算机科学丛书:Java编程思想(第4版)》赢得了全球程序员的广泛赞誉,即使是最晦涩的概念,在BruceEckel的文字亲和力和小而直接的编程示例面前也会化解于无形。从Java的基础语法到最高级特性(深入的面向对象概念、多线程、自动项目构建、单元测试和调试等),本书都能逐步指导你轻松掌握。\r\n  从《计算机科学丛书:Java编程思想(第4版)》获得的各项大奖以及来自世界各地的读者评论中,不难看出这是一本经典之作。本书的作者拥有多年教学经验,对C、C++以及Java语言都有独到、深入的见解,以通俗易懂及小而直接的示例解释了一个个晦涩抽象的概念。本书共22章,包括操作符、控制执行流程、访问权限控制、复用类、多态、接口、通过异常处理错误、字符串、泛型、数组、容器深入研究、JavaI/O系统、枚举类型、并发以及图形化用户界面等内容。这些丰富的内容,包含了Java语言基础语法以及高级特性,适合各个层次的Java程序员阅读,同时也是高等院校讲授面向对象程序设计语言以及Java语言的绝佳教材和参考书。\r\n  《计算机科学丛书:Java编程思想(第4版)》特点:\r\n  适合初学者与专业人员的经典的面向对象叙述方式,为更新的JavaSE5/6增加了新的示例和章节。\r\n  测验框架显示程序输出。\r\n  设计模式贯穿于众多示例中:适配器、桥接器、职责链、命令、装饰器、外观、工厂方法、享元、点名、数据传输对象、空对象、代理、单例、状态、策略、模板方法以及访问者。\r\n  为数据传输引入了XML,为用户界面引入了SWT和Flash。\r\n  重新撰写了有关并发的章节,有助于读者掌握线程的相关知识。\r\n  专门为第4版以及JavaSE5/6重写了700多个编译文件中的500多个程序。\r\n  支持网站包含了所有源代码、带注解的解决方案指南、网络日志以及多媒体学习资料。\r\n  覆盖了所有基础知识,同时论述了高级特性。\r\n  详细地阐述了面向对象原理。\r\n  在线可获得Java讲座CD,其中包含BruceEckel的全部多媒体讲座。\r\n  在网站上可以观看现场讲座、咨询和评论。\r\n  专门为第4版以及JavaSE5/6重写了700多个编译文件中的500多个程序。\r\n  支持网站包含了所有源代码、带注解的解决方案指南、网络日志以及多媒体学习资料。\r\n  覆盖了所有基础知识,同时论述了高级特性。\r\n  详细地阐述了面向对象原理。\r\n\r\n\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('2','apache lucene','66.0','77373773737.jpg','lucene是apache的开源项目,是一个全文检索的工具包。\r\n# Apache Lucene README file\r\n\r\n## Introduction\r\n\r\nLucene is a Java full-text search engine.  Lucene is not a complete\r\napplication, but rather a code library and API that can easily be used\r\nto add search capabilities to applications.\r\n\r\n * The Lucene web site is at: http://lucene.apache.org/\r\n * Please join the Lucene-User mailing list by sending a message to:\r\n   java-user-subscribe@lucene.apache.org\r\n\r\n## Files in a binary distribution\r\n\r\nFiles are organized by module, for example in core/:\r\n\r\n* `core/lucene-core-XX.jar`:\r\n  The compiled core Lucene library.\r\n\r\nTo review the documentation, read the main documentation page, located at:\r\n`docs/index.html`\r\n\r\nTo build Lucene or its documentation for a source distribution, see BUILD.txt');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('3','mybatis','55.0','88272828282.jpg','MyBatis介绍\r\n\r\nMyBatis 本是apache的一个开源项目iBatis, 2010年这个项目由apache software foundation 迁移到了google code,并且改名为MyBatis。 \r\nMyBatis是一个优秀的持久层框架,它对jdbc的操作数据库的过程进行封装,使开发者只需要关注 SQL 本身,而不需要花费精力去处理例如注册驱动、创建connection、创建statement、手动设置参数、结果集检索等jdbc繁杂的过程代码。\r\nMybatis通过xml或注解的方式将要执行的statement配置起来,并通过java对象和statement中的sql进行映射生成最终执行的sql语句,最后由mybatis框架执行sql并将结果映射成java对象并返回。\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('4','spring','56.0','83938383222.jpg','## Spring Framework\r\nspringmvc.txt\r\nThe Spring Framework provides a comprehensive programming and configuration model for modern\r\nJava-based enterprise applications - on any kind of deployment platform. A key element of Spring is\r\ninfrastructural support at the application level: Spring focuses on the \"plumbing\" of enterprise\r\napplications so that teams can focus on application-level business logic, without unnecessary ties\r\nto specific deployment environments.\r\n\r\nThe framework also serves as the foundation for\r\n[Spring Integration](https://github.com/SpringSource/spring-integration),\r\n[Spring Batch](https://github.com/SpringSource/spring-batch) and the rest of the Spring\r\n[family of projects](http://springsource.org/projects). Browse the repositories under the\r\n[SpringSource organization](https://github.com/SpringSource) on GitHub for a full list.\r\n\r\n[.NET](https://github.com/SpringSource/spring-net) and\r\n[Python](https://github.com/SpringSource/spring-python) variants are available as well.\r\n\r\n## Downloading artifacts\r\nInstructions on\r\n[downloading Spring artifacts](https://github.com/SpringSource/spring-framework/wiki/Downloading-Spring-artifacts)\r\nvia Maven and other build systems are available via the project wiki.\r\n\r\n## Documentation\r\nSee the current [Javadoc](http://static.springsource.org/spring-framework/docs/current/api)\r\nand [Reference docs](http://static.springsource.org/spring-framework/docs/current/reference).\r\n\r\n## Getting support\r\nCheck out the [Spring forums](http://forum.springsource.org) and the\r\n[Spring tag](http://stackoverflow.com/questions/tagged/spring) on StackOverflow.\r\n[Commercial support](http://springsource.com/support/springsupport) is available too.\r\n\r\n## Issue Tracking\r\nSpring\'s JIRA issue tracker can be found [here](http://jira.springsource.org/browse/SPR). Think\r\nyou\'ve found a bug? Please consider submitting a reproduction project via the\r\n[spring-framework-issues](https://github.com/springsource/spring-framework-issues) repository. The\r\n[readme](https://github.com/springsource/spring-framework-issues#readme) provides simple\r\nstep-by-step instructions.\r\n\r\n## Building from source\r\nInstructions on\r\n[building Spring from source](https://github.com/SpringSource/spring-framework/wiki/Building-from-source)\r\nare available via the project wiki.\r\n\r\n## Contributing\r\n[Pull requests](http://help.github.com/send-pull-requests) are welcome; you\'ll be asked to sign our\r\ncontributor license agreement ([CLA](https://support.springsource.com/spring_committer_signup)).\r\nTrivial changes like typo fixes are especially appreciated (just\r\n[fork and edit!](https://github.com/blog/844-forking-with-the-edit-button)). For larger changes,\r\nplease search through JIRA for similiar issues, creating a new one if necessary, and discuss your\r\nideas with the Spring team.\r\n\r\n## Staying in touch\r\nFollow [@springframework](http://twitter.com/springframework) and its\r\n[team members](http://twitter.com/springframework/team/members) on Twitter. In-depth articles can be\r\nfound at the SpringSource [team blog](http://blog.springsource.org), and releases are announced via\r\nour [news feed](http://www.springsource.org/news-events).\r\n\r\n## License\r\nThe Spring Framework is released under version 2.0 of the\r\n[Apache License](http://www.apache.org/licenses/LICENSE-2.0).\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('5','solr','78.0','99999229292.jpg','solr是一个全文检索服务\r\n# Licensed to the Apache Software Foundation (ASF) under one or more\r\n# contributor license agreements.  See the NOTICE file distributed with\r\n# this work for additional information regarding copyright ownership.\r\n# The ASF licenses this file to You under the Apache License, Version 2.0\r\n# (the \"License\"); you may not use this file except in compliance with\r\n# the License.  You may obtain a copy of the License at\r\n#\r\n#     http://www.apache.org/licenses/LICENSE-2.0\r\n#\r\n# Unless required by applicable law or agreed to in writing, software\r\n# distributed under the License is distributed on an \"AS IS\" BASIS,\r\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\r\n# See the License for the specific language governing permissions and\r\n# limitations under the License.\r\n\r\n\r\nWelcome to the Apache Solr project!\r\n-----------------------------------\r\n\r\nSolr is the popular, blazing fast open source enterprise search platform\r\nfrom the Apache Lucene project.\r\n\r\nFor a complete description of the Solr project, team composition, source\r\ncode repositories, and other details, please see the Solr web site at\r\nhttp://lucene.apache.org/solr\r\n\r\n\r\nGetting Started\r\n---------------\r\n\r\nSee the \"example\" directory for an example Solr setup.  A tutorial\r\nusing the example setup can be found at\r\n   http://lucene.apache.org/solr/tutorial.html\r\nor linked from \"docs/index.html\" in a binary distribution.\r\nAlso, there are Solr clients for many programming languages, see \r\n   http://wiki.apache.org/solr/IntegratingSolr\r\n\r\n\r\nFiles included in an Apache Solr binary distribution\r\n----------------------------------------------------\r\n\r\nexample/\r\n  A self-contained example Solr instance, complete with a sample\r\n  configuration, documents to index, and the Jetty Servlet container.\r\n  Please see example/README.txt for information about running this\r\n  example.\r\n\r\ndist/solr-XX.war\r\n  The Apache Solr Application.  Deploy this WAR file to any servlet\r\n  container to run Apache Solr.\r\n\r\ndist/solr-<component>-XX.jar\r\n  The Apache Solr libraries.  To compile Apache Solr Plugins,\r\n  one or more of these will be required.  The core library is\r\n  required at a minimum. (see http://wiki.apache.org/solr/SolrPlugins\r\n  for more information).\r\n\r\ndocs/index.html\r\n  The Apache Solr Javadoc API documentation and Tutorial\r\n\r\n\r\nInstructions for Building Apache Solr from Source\r\n-------------------------------------------------\r\n\r\n1. Download the Java SE 7 JDK (Java Development Kit) or later from http://java.sun.com/\r\n   You will need the JDK installed, and the $JAVA_HOME/bin (Windows: %JAVA_HOME%\\bin) \r\n   folder included on your command path. To test this, issue a \"java -version\" command \r\n   from your shell (command prompt) and verify that the Java version is 1.7 or later.\r\n\r\n2. Download the Apache Ant binary distribution (1.8.2+) from \r\n   http://ant.apache.org/  You will need Ant installed and the $ANT_HOME/bin (Windows: \r\n   %ANT_HOME%\\bin) folder included on your command path. To test this, issue a \r\n   \"ant -version\" command from your shell (command prompt) and verify that Ant is \r\n   available. \r\n\r\n   You will also need to install Apache Ivy binary distribution (2.2.0) from \r\n   http://ant.apache.org/ivy/ and place ivy-2.2.0.jar file in ~/.ant/lib -- if you skip \r\n   this step, the Solr build system will offer to do it for you.\r\n\r\n3. Download the Apache Solr distribution, linked from the above web site. \r\n   Unzip the distribution to a folder of your choice, e.g. C:\\solr or ~/solr\r\n   Alternately, you can obtain a copy of the latest Apache Solr source code\r\n   directly from the Subversion repository:\r\n\r\n     http://lucene.apache.org/solr/versioncontrol.html\r\n\r\n4. Navigate to the \"solr\" folder and issue an \"ant\" command to see the available options\r\n   for building, testing, and packaging Solr.\r\n  \r\n   NOTE: \r\n   To see Solr in action, you may want to use the \"ant example\" command to build\r\n   and package Solr into the example/webapps directory. See also example/README.txt.\r\n\r\n\r\nExport control\r\n-------------------------------------------------\r\nThis distribution includes cryptographic software.  The country in\r\nwhich you currently reside may have restrictions on the import,\r\npossession, use, and/or re-export to another country, of\r\nencryption software.  BEFORE using any encryption software, please\r\ncheck your country\'s laws, regulations and policies concerning the\r\nimport, possession, or use, and re-export of encryption software, to\r\nsee if this is permitted.  See <http://www.wassenaar.org/> for more\r\ninformation.\r\n\r\nThe U.S. Government Department of Commerce, Bureau of Industry and\r\nSecurity (BIS), has classified this software as Export Commodity\r\nControl Number (ECCN) 5D002.C.1, which includes information security\r\nsoftware using or performing cryptographic functions with asymmetric\r\nalgorithms.  The form and manner of this Apache Software Foundation\r\ndistribution makes it eligible for export under the License Exception\r\nENC Technology Software Unrestricted (TSU) exception (see the BIS\r\nExport Administration Regulations, Section 740.13) for both object\r\ncode and source code.\r\n\r\nThe following provides more details on the included cryptographic\r\nsoftware:\r\n    Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for\r\n    extracting text content and metadata from encrypted PDF files.\r\n    See http://www.bouncycastle.org/ for more details on Bouncy Castle.\r\n');

代码实现索引流程

/*
 *创建java bean 
 */
public class Book {
	// 图书ID
	private Integer id;
	// 图书名称
	private String name;
	// 图书价格
	private Float price;
	// 图书图片
	private String pic;
	// 图书描述
	private String desc;
get/set。。。
}
/*
 *创建dao接口
 */
public interface BookDao {

	/**
	 * 查询所有的book数据
	 * 
	 * @return
	 */
	List<Book> queryBookList();
}
/*
 *创建接口实现类
 */
public class BookDaoImpl implements BookDao {

	@Override
	public List<Book> queryBookList() {
		// 数据库链接
		Connection connection = null;
		// 预编译statement
		PreparedStatement preparedStatement = null;
		// 结果集
		ResultSet resultSet = null;
		// 图书列表
		List<Book> list = new ArrayList<Book>();

		try {
			// 加载数据库驱动
			Class.forName("com.mysql.jdbc.Driver");
			// 连接数据库
			connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/solrTest", "root", "root");

			// SQL语句
			String sql = "SELECT * FROM book";
			// 创建preparedStatement
			preparedStatement = connection.prepareStatement(sql);
			// 获取结果集
			resultSet = preparedStatement.executeQuery();
			// 结果集解析
			while (resultSet.next()) {
				Book book = new Book();
				book.setId(resultSet.getInt("id"));
				book.setName(resultSet.getString("name"));
				book.setPrice(resultSet.getFloat("price"));
				book.setPic(resultSet.getString("pic"));
				book.setDesc(resultSet.getString("desc"));
				list.add(book);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		return list;
	}
}
/*
 * 实现索引流程
 * 1.采集数据
 * 2.创建Document文档对象
 * 3.创建分析器(分词器)
 * 4.创建IndexWriterConfig配置信息类
 * 5.创建Directory对象,声明索引库存储位置
 * 6.创建IndexWriter写入对象
 * 7.把Document写入到索引库中
 * 8.释放资源
 */
public class CreateIndexTest {
	@Test
	public void testCreateIndex() throws Exception {
		// 1. 采集数据
		BookDao bookDao = new BookDaoImpl();
		List<Book> bookList = bookDao.queryBookList();

		// 2. 创建Document文档对象
		List<Document> documents = new ArrayList<>();
		for (Book book : bookList) {
			Document document = new Document();

			// Document文档中添加Field域

            // 图书Id
            // Store.YES:表示存储到文档域中
            // 不分词,不索引,储存
                document.add(new StoredField("id", book.getId().toString()));
            // 图书名称
            // 分词,索引,储存
                document.add(new TextField("name", book.getName().toString(), Store.YES));
            // 图书价格
            // 分词,索引,储存
                document.add(new FloatField("price", book.getPrice(), Store.YES));
            // 图书图片地址
            // 不分词,不索引,储存
                document.add(new StoredField("pic", book.getPic().toString()));
            // 图书描述
            // 分词,索引,不储存
                document.add(new TextField("desc", book.getDesc().toString(), Store.NO));

			// 把Document放到list中
			documents.add(document);
		}

		// 3. 创建Analyzer分词器,分析文档,对文档进行分词
		Analyzer analyzer = new StandardAnalyzer();

		// 4. 创建Directory对象,声明索引库存放的位置
		Directory directory = FSDirectory.open(new File("D:/lucene/index"));

		// 5. 创建IndexWriteConfig对象,写入索引需要的配置
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);

		// 6.创建IndexWriter写入对象
		IndexWriter indexWriter = new IndexWriter(directory, config);

		// 7.写入到索引库,通过IndexWriter添加文档对象document
		for (Document doc : documents) {
			indexWriter.addDocument(doc);
		}

		// 8.释放资源
		indexWriter.close();
	}

使用Luke查看索引

Luke作为Lucene工具包中的一个工具(http://www.getopt.org/luke/),可以通过界面来进行索引文件的查询、修改

Field常用类型

搜索分词

对用户输入的关键字进行分词,一般情况索引和搜索使用的分词器一致

IndexSearcher搜索方法如下:

indexSearcher.search(query, n)

根据Query搜索,返回评分最高的n条记录

indexSearcher.search(query, filter, n)

根据Query搜索,添加过滤策略,返回评分最高的n条记录

indexSearcher.search(query, n, sort)

根据Query搜索,添加排序策略,返回评分最高的n条记录

indexSearcher.search(booleanQuery, filter, n, sort)

根据Query搜索,添加过滤策略,添加排序策略,返回评分最高的n条记录

public class SearchIndexTest {
	@Test
	public void testSearchIndex() throws Exception {
		// 1. 创建Query搜索对象
		// 创建分词器
		Analyzer analyzer = new StandardAnalyzer();
		// 创建搜索解析器,第一个参数:默认Field域,第二个参数:分词器
		QueryParser queryParser = new QueryParser("desc", analyzer);

		// 创建搜索对象
		Query query = queryParser.parse("desc:java AND lucene");

		// 2. 创建Directory流对象,声明索引库位置
		Directory directory = FSDirectory.open(new File("C:/itcast/lucene/index"));

		// 3. 创建索引读取对象IndexReader
		IndexReader reader = DirectoryReader.open(directory);

		// 4. 创建索引搜索对象
		IndexSearcher searcher = new IndexSearcher(reader);

		// 5. 使用索引搜索对象,执行搜索,返回结果集TopDocs
		// 第一个参数:搜索对象,第二个参数:返回的数据条数,指定查询结果最顶部的n条数据返回
		TopDocs topDocs = searcher.search(query, 10);
		System.out.println("查询到的数据总条数是:" + topDocs.totalHits);
		// 获取查询结果集
		ScoreDoc[] docs = topDocs.scoreDocs;

		// 6. 解析结果集
		for (ScoreDoc scoreDoc : docs) {
			// 获取文档
			int docID = scoreDoc.doc;
			Document doc = searcher.doc(docID);

			System.out.println("=============================");
			System.out.println("docID:" + docID);
			System.out.println("bookId:" + doc.get("id"));
			System.out.println("name:" + doc.get("name"));
			System.out.println("price:" + doc.get("price"));
			System.out.println("pic:" + doc.get("pic"));
			// System.out.println("desc:" + doc.get("desc"));
		}
		// 7. 释放资源
		reader.close();
	}
}

分词器

在对Docuemnt中的内容进行索引之前,需要使用分词器进行分词 ,分词的目的是为了搜索。分词的主要过程就是先分词后过滤。

分词:采集到的数据会存储到document对象的Field域中,分词就是将Document中Field的value值切分成一个一个的词。

过滤:包括去除标点符号过滤、去除停用词过滤(的、是、a、an、the等)、大写转小写、词的形还原(复数形式转成单数形参、过去式转成现在式。。。)等。 

第三方中文分词器

lucene自带中文分词器无法满足开发需求,故用第三方分词器IK-analyzer: 最新版在https://code.google.com/p/ik-analyzer/上,支持Lucene 4.10从2006年12月推出1.0版开始, IKAnalyzer已经推出了4个大版本。最初,它是以开源项目Luence为应用主体的,结合词典分词和文法分析算法的中文分词组件。从3.0版本开 始,IK发展为面向Java的公用分词组件,独立于Lucene项目,同时提供了对Lucene的默认优化实现。

<!-- IKAnalyzer.cfg.xml配置文件 -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 	-->
	<entry key="ext_dict">ext.dic;</entry> 

	<!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">stopword.dic;</entry> 
	
</properties>

索引维护

删除索引

@Test
public void testIndexDelete() throws Exception {
	// 创建Directory流对象
	Directory directory = FSDirectory.open(new File("d:/lucene/index"));
	IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, null);
	// 创建写入对象
	IndexWriter indexWriter = new IndexWriter(directory, config);

	// 根据Term删除索引库,name:java
	indexWriter.deleteDocuments(new Term("name", "java"));
        //删除全部索引(慎用)
        //indexWriter.deleteAll();
	
	// 释放资源
	indexWriter.close();
}

修改索引

@Test
public void testIndexUpdate() throws Exception {
	// 创建分词器
	Analyzer analyzer = new IKAnalyzer();
	// 创建Directory流对象
	Directory directory = FSDirectory.open(new File("d:/lucene/index"));
	IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
	// 创建写入对象
	IndexWriter indexWriter = new IndexWriter(directory, config);

	// 创建Document
	Document document = new Document();
	document.add(new TextField("id", "1002", Store.YES));
	document.add(new TextField("name", "lucene测试test 002", Store.YES));

	// 执行更新,会把所有符合条件的Document删除,再新增。
	indexWriter.updateDocument(new Term("name", "test"), document);

	// 释放资源
	indexWriter.close();
}

查询索引

对要搜索的信息创建Query查询对象,Lucene会根据Query查询对象生成最终的查询语法。类似关系数据库Sql语法一样,Lucene也有自己的查询语法,比如:“name:lucene”表示查询名字为name的Field域中的“lucene”的文档信息。

可通过两种方法创建查询对象:

1)使用Lucene提供Query子类

Query是一个抽象类,lucene提供了很多查询对象,比如TermQuery项精确查询,NumericRangeQuery数字范围查询等。

	Query query = new TermQuery(new Term("name", "lucene"));

2)使用QueryParse解析查询表达式

QueryParser会将用户输入的查询表达式解析成Query对象实例。

	QueryParser queryParser = new QueryParser("name", new IKAnalyzer());
	Query query = queryParser.parse("name:lucene");

通过Query子类搜索

TermQuery词项查询,TermQuery不使用分析器,搜索关键词进行精确匹配Field域中的词,比如订单号、分类ID号等。 Where name =思念Spring,搜索对象创建:

//搜索对象创建:
@Test
public void testSearchTermQuery() throws Exception {
	// 创建TermQuery搜索对象
	Query query = new TermQuery(new Term("name", "lucene"));

	doSearch(query);
}
//抽取搜索逻辑:
private void doSearch(Query query) throws IOException {
	// 2. 执行搜索,返回结果集
	// 创建Directory流对象
	Directory directory = FSDirectory.open(new File("D:/lucene/index"));

	// 创建索引读取对象IndexReader
	IndexReader reader = DirectoryReader.open(directory);

	// 创建索引搜索对象
	IndexSearcher searcher = new IndexSearcher(reader);

	// 使用索引搜索对象,执行搜索,返回结果集TopDocs
	// 第一个参数:搜索对象,第二个参数:返回的数据条数,指定查询结果最顶部的n条数据返回
	TopDocs topDocs = searcher.search(query, 10);

	System.out.println("查询到的数据总条数是:" + topDocs.totalHits);

	// 获取查询结果集
	ScoreDoc[] docs = topDocs.scoreDocs;

	// 解析结果集
	for (ScoreDoc scoreDoc : docs) {
		// 获取文档id
		int docID = scoreDoc.doc;
		Document doc = searcher.doc(docID);

		System.out.println("======================================");

		System.out.println("docID:" + docID);
		System.out.println("bookId:" + doc.get("id"));
		System.out.println("name:" + doc.get("name"));
		System.out.println("price:" + doc.get("price"));
		System.out.println("pic:" + doc.get("pic"));
		// System.out.println("desc:" + doc.get("desc"));
	}
	// 3. 释放资源
	reader.close();
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值