Java框架------Lucene（二）

最新推荐文章于 2024-09-14 18:46:05 发布

chufengmiaon4690

最新推荐文章于 2024-09-14 18:46:05 发布

阅读量96

点赞数

文章标签： java

原文链接：https://my.oschina.net/u/4146666/blog/3076467

版权

四、分词

所谓的分词，就是将一段文本，根据一定的规则，拆分成一个一个词。

Lucene是根据分析器实现分词的。针对不同的语言提供了不同的分析器。并且提供了一个通用的标准分析器StandardAnalyzer。

分词是全文检索的核心

4.1索引库结构

从上图中，我们发现：

（1）索引库中有两个区域：索引区、文档区。

（2）文档区存放的是文档。Lucene给每一个文档自动加上一个文档编号docID。

（3）索引区存放的是索引。注意：

索引是以域为单位的，不同的域，彼此相互独立。

索引是根据分词规则创建出来的，根据索引就能找到对应的文档。

4.2、Field域

问题：我们已经知道，Lucene是在写入文档时，完成分词、索引的。那Lucene是怎么知道的呢？

答：Lucene是根据文档中的域的属性，来确定是否要分词、创建索引的。所以，我们必须搞清楚域有哪些属性。

4.2.1 三大属性

（1）是否分词（tokenized）

只有设置了分词属性为true，lucene才会对这个域进行分词处理。

在实际的开发中，有一些字段是不需要分词的，比如商品id，商品图片等。

而有一些字段是必须分词的，比如商品名称，描述信息等。

（2）是否索引（indexed）

只有设置了索引属性为true，lucene才为这个域的Term词创建索引。

在实际的开发中，有一些字段是不需要创建索引的，比如商品的图片等。我们只需要对参与搜索的字段做索引处理。

（3）是否存储（stored）

只有设置了存储属性为true，在查找的时候，才能从文档中获取这个域的值。

在实际开发中，有一些字段是不需要存储的。比如：商品的描述信息。

因为商品描述信息，通常都是大文本数据，读的时候会造成巨大的IO开销。而描述信息是不需要经常查询的字段，这样的话就白白浪费了cpu的资源了。

因此，像这种不需要经常查询，又是大文本的字段，通常不会存储到索引库。

4.2.2、Field常用类型

将Lucene（一）中的BookDao的getDocument方法进行修改

public List<Document> getDocument(List<UBook> books) {
		//创建Document对象及其集合
		List<Document> docList = new ArrayList<>();
		Document doc = null;
		for (UBook book : books) {
			//Document创建对象
			doc = new Document();
			//NO表示不显示
			//不分词，不索引，要存储
			Field id =new StoredField("id", book.getBookid().toString());
			//分词，索引，存储
			Field name = new TextField("name", book.getName().toString(), Store.YES);
			//分词，索引，存储
			Field price =new FloatField("price", book.getPrice(), Store.YES);
			//不分词，不索引，存储
			Field pic = new StoredField("pic", book.getPic().toString());
			//分词，索引，不存储
			Field description = new TextField("description", book.getDescription().toString(), Store.NO);
			doc.add(id);
			doc.add(name);
			doc.add(price);
			doc.add(pic);
			doc.add(description);
			docList.add(doc);
			
		}
		
		return docList;
		
	}

得到如下输出

4.3、添加索引

由于内容的更新，我们需要对索引进行增删改

4.3.1、删除索引

（1）删除单个索引

倒数第3个test

（2）清空索引库

倒数第2个test

4.3.2、更新索引

先删除满足条件的索引，再添加新的索引

倒数第1个test

package textpage;

import java.io.File;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;

import dao.BookDao;
import pdjo.UBook;

public class Text {
	
	@Test
	public void testName() throws Exception {
		BookDao bookDao = new BookDao();
		List<UBook> books = bookDao.getAll();
		for (UBook uBook : books) {
			System.out.println("id="+uBook.getBookid()+"name="+uBook.getName());
		}
	}
	//创建索引库
	@Test
	public void testLcene() throws Exception {
		
		BookDao dao = new BookDao();
		//分析文档，对文档中的field域进行分词
		Analyzer analyzer = new StandardAnalyzer();
		//创建索引
		//1）创建索引库目录
		Directory directory = FSDirectory.open(new File("C:\\lucene\\3"));
		//2创建IndexWriterConfig对象
		IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
		//3.创建IndexWriter对象
		IndexWriter writer = new IndexWriter(directory, config);
		//4.通过IndexWriter对象添加文档对象
		writer.addDocuments(dao.getDocument(dao.getAll()));
		//5.关闭IndexWriter
		writer.close();
		System.out.println("索引库创建成功");
	}
	//执行搜索
	@Test
	public void testLucene1() throws Exception {
		//1.创建查询（Query对象，即构建查询对象）
		//创建分析器
		Analyzer analyzer = new StandardAnalyzer();
		//确定索引的
		QueryParser queryParser = new QueryParser("name", analyzer);
		//
		Query query = queryParser.parse("name:lucene");
		//2.读取索引库
		//2.1指定搜索目录directory
		Directory directory = FSDirectory.open(new File("C:\\lucene\\3"));
		//2.2读取索引
		IndexReader reader = DirectoryReader.open(directory);
		//2.3索引库查询
		IndexSearcher searcher = new IndexSearcher(reader);
		//3.获得记录
		//3.1参数一，查询对象；参数二，指定返回的最大记录
		TopDocs topDocs = searcher.search(query, 10);
		//获得返回记录
		ScoreDoc [] scoreDocs = topDocs.scoreDocs;
		for (ScoreDoc scoreDoc : scoreDocs) {
			//获得id
			int docID = scoreDoc.doc;
			Document doc = searcher.doc(docID);
			System.out.println("索引ID"+docID);
			System.out.println("编号"+doc.get("id"));
			System.out.println("名称"+doc.get("name"));
			System.out.println("价格"+doc.get("price"));
			System.out.println("图片"+doc.get("pic"));
			System.out.println("描述"+doc.get("description"));
			
		}
		reader.close();
	}
	
	//删除指定索引
	@Test
	public void testdel() throws Exception {
		//1.指定索引库目录
		Directory directory = FSDirectory.open(new File("C:\\lucene\\3"));
		//2.创建
		IndexWriterConfig cfg = new IndexWriterConfig(Version.LATEST, new StandardAnalyzer());
		//3.创建IndexWriter
		IndexWriter writer = new IndexWriter(directory, cfg);
		//4.通过IndexWriter来删除索引
		writer.deleteDocuments(new Term("name", "lucene"));
		writer.close();
		System.out.println("删除成功");
		
	}
	//清空索引库
	@Test
	public void testdelall() throws Exception {
		//1.指定索引库目录
		Directory directory = FSDirectory.open(new File("C:\\lucene\\3"));
		//2.创建
		IndexWriterConfig cfg = new IndexWriterConfig(Version.LATEST, new StandardAnalyzer());
		//3.创建IndexWriter
		IndexWriter writer = new IndexWriter(directory, cfg);
		//4.通过IndexWriter删除索引
		writer.deleteAll();
		writer.close();
		System.out.println("索引库清空成功");
	}
	//更新索引
	//先删除满足条件的索引，再添加新的索引
	@Test
	public void testupdate() throws Exception {
		//1.指定索引库目录
		Directory directory = FSDirectory.open(new File("C:\\lucene\\3"));
		//2.创建
		IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, new StandardAnalyzer());
		//3.创建
		IndexWriter writer = new IndexWriter(directory, config);
		//4.通过IndexWriter修改索引
		//a.创建修改后的对象
		Document document = new Document();
		//文件名称
		Field field = new StringField("name", "updateIndex",Store.YES);
		document.add(field);
		//修改指定索引为新的索引
		writer.updateDocument(new Term("name", "lucene"), document);
		//
		writer.close();
		System.out.println("更新成功");
	}	
	
}

转载于:https://my.oschina.net/u/4146666/blog/3076467