Lucene（一）hello world

最新推荐文章于 2018-05-12 15:25:31 发布

ClarkKentYang

最新推荐文章于 2018-05-12 15:25:31 发布

阅读量283

点赞数

分类专栏： lucene

本文链接：https://blog.csdn.net/clarkkentyang/article/details/74015307

版权

lucene 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

定义：Lucene是apache旗下的顶级项目，是一个全文检索工具包。可以通过其构建全文检索引擎系统，但其不能独立运行。

应用领域：

1，互联网全文检索引擎

2，站内全文检索引擎

3，优化数据库查询

创建索引：

	@Test
	public void testIndexSearchTest() throws Exception{
		//创建文档列表，保存多个文件信息
		List<Document> docList = new ArrayList<>();
		
		//指定文件所在目录
		File dir = new File("文件路径");
		for (File file : dir.listFiles()) {
			//文件名称
			String fileName = file.getName();
			//文件内容
			String fileContext = FileUtils.readFileToString(file);
			//文件大小
			Long fileSize = FileUtils.sizeOf(file);
			
			//采集文件系统中的文档数据，放入lucene
			//文档对象
			Document document = new Document();
			
			/*
			 * 第一个参数：域名
			 * 第二个参数：域值
			 * 第三个参数：是否存储
			 */
			TextField nameField = new TextField("fileName",fileName, Store.YES);
			TextField contextField = new TextField("fileContext",fileContext, Store.YES);
			TextField sizeField = new TextField("fileSize",fileSize.toString(), Store.YES);
			
			//将域存储到文档中
			document.add(nameField);
			document.add(contextField);
			document.add(sizeField);
			
			//将文档存入文档集合
			docList.add(document);
		}
		
		//创建分词器
		Analyzer analyzer = new StandardAnalyzer();
		//指定索引和文档存储的目录
		Directory directory = FSDirectory.open(new File("生成索引路径"));
		//创建写对象的初始化对象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//创建索引和文档写对象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//将文档加入到索引和文档的对象中
		for (Document document : docList) {
			indexWriter.addDocument(document);
		}
		//提交
		indexWriter.commit();
		indexWriter.close();
	}

通过索引搜索：

	@Test
	public void testIndexSearch() throws Exception{
		
		//创建分词器
		Analyzer analyzer = new StandardAnalyzer();
		//指定索引和文档的目录
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//读取对象
		IndexReader indexReader = IndexReader.open(directory);
		//创建索引搜索对象
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);
		//创建查询语句对象:第一个参数表示搜索域，第二个参数表示分词器
		QueryParser queryParser = new QueryParser("fileContext", analyzer);
		//查询语法：域名：搜索的关键字
		Query query = queryParser.parse("fileName:apache");
		//搜索：第一个参数表示查询语句，第二个参数表示显示条数
		TopDocs topDocs = indexSearcher.search(query, 10);
		System.out.println("一共搜索到记录条数为:"+topDocs.totalHits);
		//遍历结果集
		for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
			//获取docId
			int docId = scoreDoc.doc;
			//通过docId从硬盘中读取数据
			Document document = indexReader.document(docId);
			System.out.println("fileName:"+document.get("fileName")+",fileSize:"+document.get("fileSize"));
		}
	}

索引的删除：

	@Test
	public void testIndexDelete() throws Exception{
		//创建分词器
		Analyzer analyzer = new IKAnalyzer();//IKAnalyzer中文分词器，StandardAnalyzer普通分词器
		//指定索引和文档存储的目录
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//创建写对象的初始化对象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//创建索引和文档写对象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//删除所有
		//indexWriter.deleteAll();
		//Term表示词元，第一个参数表示域名，第二个参数表示要删除的数据
		indexWriter.deleteDocuments(new Term("fileName","apache"));
		indexWriter.commit();
		indexWriter.close();
	}

索引的修改：

	@Test
	public void testIndexUpdate() throws Exception{
		//创建分词器
		Analyzer analyzer = new IKAnalyzer();//IKAnalyzer中文分词器，StandardAnalyzer普通分词器
		//指定索引和文档存储的目录
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//创建写对象的初始化对象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//创建索引和文档写对象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//更新,即先查询，再删除，再添加
		Term term = new Term("fileName","web");
		Document document = new Document();
		document.add(new TextField("fileName", "xxx",Store.YES));
		document.add(new TextField("fileContext", "think in java xxx",Store.YES));
		document.add(new LongField("fileSize", 100L,Store.YES));

		indexWriter.updateDocument(term, document);
		indexWriter.commit();
		indexWriter.close();
	}

各种搜索类：

TermQuery:根据词进行搜索(只能从文本中进行搜索)

QueryParser:根据域名进行搜索,可以设置默认搜索域,推荐使用. (只能从文本中进行搜索)

NumericRangeQuery:从数值范围进行搜索

BooleanQuery:组合查询,可以设置组合条件,not and or.从多个域中进行查询

must相当于and关键字,是并且的意思

should,相当于or关键字或者的意思

must_not相当于not关键字, 非的意思

注意:单独使用must_not 或者独自使用must_not没有任何意义

MatchAllDocsQuery:查询出所有文档

MultiFieldQueryParser:可以从多个域中进行查询,只有这些域中有关键词的存在就查询出来.