lucene全文搜索之二:创建索引器(创建IKAnalyzer分词器和索引目录管理)基于lucene5.5.3...

前言:

lucene全文搜索之一中讲解了lucene开发搜索服务的基本结构,本章将会讲解如何创建索引器、管理索引目录和中文分词器的使用。

包括标准分词器,IKAnalyzer分词器以及两种索引目录的创建

luncene5.5.3集合jar包下载地址:http://download.csdn.net/detail/eguid_1/9677589

、创建索引器

创建lucene的索引器需要两个要素:一个是分词器,一个是索引目录。

那么就让我们创建这两个实例


1、创建分词器

(1)创建lucene内置分词器

/**
	 * 创建内置分词器
	 * @param stopwords CharArraySet停词
	 * @param stopWords Reader停词
	 * @param 都为null则返回默认分词器
	 * @return 分词器
	 * @throws IOException
	 */
	public Analyzer createAnalyzer(CharArraySet stopwords, Reader stopWords) throws IOException {
		StandardAnalyzer analyzer = null;
		if (stopwords != null) {
			analyzer = new StandardAnalyzer(stopwords);
		} else if (stopWords != null) {
			try {
				analyzer = new StandardAnalyzer(stopWords);
			} catch (IOException e) {
				throw e;
			}
		} else {
			analyzer = new StandardAnalyzer();
		}
		return analyzer;
	}

(2)创建IKAnalyzer分词器

IKAnalyzer源码及配置使用请查看使用IK Analyzer中文分词器(修改IK Analyzer源码使其支持lucene5.5.x)

/**
	 * 创建IKAnalyzer分词器
	 * @param isSmart -true:智能分词,false:最细粒度分词
	 * @return
	 */
	public Analyzer createAnalyzer(boolean isSmart)
	{
		return new IKAnalyzer(isSmart);
	}

2、创建索引目录

索引目录分为文件目录和内存虚拟目录

(1)创建索引文件目录

/**
	 * 创建文件目录
	 * @param path -路径
	 * @param lockFactory -文件锁
	 * @return Directory -索引目录
	 * @throws IOException -路径错误导致IO异常
	 */
	public Directory createDirectory(Path path, LockFactory lockFactory) throws IOException {
		FSDirectory dir = null;
		// 打开目录
		try {
			if (lockFactory == null)
				dir = FSDirectory.open(path);
			else
				dir = FSDirectory.open(path, lockFactory);
		} catch (IOException e) {
			throw e;
		}
		return dir;
	}
	/**
	 * 创建文件目录
	 * 路径格式:“d:”,“dir”,“search” 等于 “d://dir/search”
	 * @param lockFactory -文件锁
	 * @param first -路径
	 * @param more -路径
	 * @return Directory -索引目录
	 * @throws IOException
	 */
	public Directory createDirectory(LockFactory lockFactory,String first,String ...more) throws IOException{
		Path path=FileSystems.getDefault().getPath(first,more);
		return createDirectory(path,lockFactory);
	}


(2)创建内存虚拟索引目录

public RAMDirectory createRAMDirectory(LockFactory lockFactory, FSDirectory dir, IOContext context)
			throws IOException {
		RAMDirectory ramDirectory = null;
		if (lockFactory != null) {
			ramDirectory = new RAMDirectory(lockFactory);
		} else if (dir != null && context != null) {
			try {
				ramDirectory = new RAMDirectory(dir, context);
			} catch (IOException e) {
				throw e;
			}
		} else {
			ramDirectory = new RAMDirectory();
		}
		return ramDirectory;
	}


创建完了分词器和索引目录,那么我们就可以通过这两个要素构建索引配置

3、创建索引配置

/**
	 * 根据分词器创建索引配置
	 * @param analyzer -分词器可以选择默认也可以使用IK或者庖丁
	 * @param openMode -模式(有三种模式:OpenMode.APPEND -增加;OpenMode.CREATE -创建;OpenMode.CREATE_OR_APPEND -创建和增加;)
	 * @param commitOnClose -是否关闭时才提交索引
	 * @return
	 */
	public IndexWriterConfig createIndexConf(Analyzer analyzer,OpenMode openMode,boolean commitOnClose) {
		IndexWriterConfig indexConf = null;
		if (analyzer != null) {
			indexConf = new IndexWriterConfig(analyzer);
			indexConf.setOpenMode(openMode);//一般使用OpenMode.CREATE_OR_APPEND
			indexConf.setCommitOnClose(commitOnClose);//默认是true:索引器关闭后才提交索引,false就是手动提交索引
		}
		return indexConf;
	}


创建完索引配置,就可以根据配置创建一个索引器了

4、根据配置和索引目录创建索引

/**
	 * 创建索引器
	 * @param dir -索引目录
	 * @param indexConf -索引配置
	 * @return IndexWriter 返回索引器
	 * @throws IOException 
	 */
	public IndexWriter createIndex(Directory dir,IndexWriterConfig indexConf) throws IOException {
		IndexWriter indexWriter =null;
		try {
			indexWriter=new IndexWriter(dir, indexConf);
		} catch (IOException e) {
			throw e;
		}
		return indexWriter;
	}

有了索引器,我们就可以对索引进行增删查了(没有改

5、索引器的增删改

重要:lucene中索引只有增删查的API,没有更新/改的API,如果想要更新/改 索引必须先删掉索引再添加

/**
	 * 增加索引
	 * @param indexWriter
	 * @param doc
	 * @return
	 */
	public boolean addIndex(IndexWriter indexWriter, Document doc) {
		boolean ret = true;
		if (indexWriter != null && indexWriter.isOpen() && doc != null) {
			try {
				indexWriter.addDocument(doc);
				indexWriter.commit();//commitOnClose设置为false,这里就需要手动提交,否则关闭索引器后不会自动提交
			} catch (IOException e) {
				ret = false;
			}
		}
		return ret;
	}
	/**
	 * 根据词语删除索引
	 * @param indexWriter
	 * @param terms
	 * @return
	 */
	public boolean removeIndex(IndexWriter indexWriter, Term ...terms){
		boolean ret = true;
		if (indexWriter != null && indexWriter.isOpen() && terms != null) {
			try {
				indexWriter.deleteDocuments(terms);
				
			} catch (IOException e) {
				ret = false;
			}
		}
		return ret;
	}
	/**
	 * 删除搜索结果对应的索引
	 * @param indexWriter
	 * @param querys
	 * @return
	 */
	public boolean removeIndex(IndexWriter indexWriter, Query ...querys){
		boolean ret = true;
		if (indexWriter != null && indexWriter.isOpen() && querys != null) {
			try {
				indexWriter.deleteDocuments(querys);
			} catch (IOException e) {
				ret = false;
			}
		}
		return ret;
	}


不需要使用索引,也可以这样关闭索引

6、关闭索引器

/**
	 * 关闭索引器
	 * @param indexWriter
	 * @param commitOnClose -关闭时是否提交索引(防止正在创建的索引没有及时提交)
	 * @return true:关闭成功:关闭失败
	 */
	public boolean close(IndexWriter indexWriter, boolean commitOnClose) {
		boolean ret = false;
		if (indexWriter != null && indexWriter.isOpen()) {
			try {
				if (commitOnClose) {
					indexWriter.flush();
					indexWriter.commit();
				}
				indexWriter.close();
				ret=true;
			} catch (IOException e) {
				try {
					//防止提交时异常导致索引关闭失败,再次尝试关闭
					indexWriter.close();
					ret=true;
				} catch (IOException e1) {
					ret=false;
				}
			}
		}
		return ret;
	}



下一章: 

lucene全文搜索之三:生成索引字段,创建索引文档(给文档/字段加权)基于lucene5.5.2






转载于:https://www.cnblogs.com/eguid/p/6821575.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
说明:依赖jar包:lucene-core-2.3.2.jar、IKAnalyzer3.2.8.jar。 一、LuceneUtil 工具类代码: package com.zcm.lucene; import java.io.File; import java.io.IOException; import java.io.StringReader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.wltea.analyzer.IKSegmentation; import org.wltea.analyzer.Lexeme; /** * Apache Lucene全文检索和IKAnalyzer分词工具类 * Company: 91注册码 * time:2014-04-22 * @author www.91zcm.com * @date * @version 1.1 */ public class LuceneUtil { /**索引创建的路径**/ private static String LucenePath = "d://index"; /** * 创建索引 * @throws Exception */ public static int createIndex(List list) throws Exception{ /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); /**注意最后一个boolean类型的参数:表示是否重新创建,true表示新创建(以前存在时回覆盖)**/ IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,true); for (int i = 0; i < list.size(); i++) { LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); } /**查看IndexWriter里面有多少个索引**/ int num = indexWriter.docCount(); System.out.println("总共------》" + num); indexWriter.optimize(); indexWriter.close(); return num; } /** * IKAnalyzer分词 * @param word * @return * @throws IOException */ public static List tokenWord(String word) throws IOException{ List tokenArr = new ArrayList(); StringReader reader = new StringReader(word); /**当为true时,分词器进行最大词长切分**/ IKSegmentation ik = new IKSegmentation(reader, true); Lexeme lexeme = null; while ((lexeme = ik.next()) != null){ tokenArr.add(lexeme.getLexemeText()); } return tokenArr; } /** * 创建索引(单个) * @param list * @throws Exception */ public static void addIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer, false); /**增加document到索引去 **/ Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void addIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 更新索引(单个) * @param list * @throws Exception */ public static void updateIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.updateDocument(term, doc); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void updateIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.updateDocument(term, doc); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(单个) * @param list * @throws Exception */ public static void deleteIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.deleteDocuments(term); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void deleteIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**删除索引 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.deleteDocuments(term); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 检索数据 * @param word * @return */ public static List search(String word) { List list = new ArrayList(); Hits hits = null; try { IndexSearcher searcher = new IndexSearcher(LucenePath); String[] queries = {word,word}; String[] fields = {"title", "remark"}; BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD}; Query query = MultiFieldQueryParser.parse(queries, fields, flags, new StandardAnalyzer()); if (searcher != null) { /**hits结果**/ hits = searcher.search(query); LuceneVO vo = null; for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); vo = new LuceneVO(); vo.setAid(Integer.parseInt(doc.get("aid"))); vo.setRemark(doc.get("remark")); vo.setTitle(doc.get("title")); list.add(vo); } } } catch (Exception ex) { ex.printStackTrace(); } return list; } } 二、Lucene用到的JavaBean代码: package com.zcm.lucene; /** * Apache Lucene全文检索用到的Bean * Company: 91注册码 * time:2014-04-22 * @author www.91zcm.com * @date * @version 1.1 */ public class LuceneVO { private Integer aid; /**文章ID**/ private String title; /**文章标题**/ private String remark; /**文章摘要**/ public Integer getAid() { return aid; } public void setAid(Integer aid) { this.aid = aid; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getRemark() { return remark; } public void setRemark(String remark) { this.remark = remark; } } 备注:源码来源于www.91zcm.com 开源博客中的全文检索代码。(http://www.91zcm.com/)

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值