最近在一个项目的搜索版块,顺便学习了一下Lucene的用法,防止自己忘记…
Lucene作为Apache的一个搜索框架,所需jar包可以在Apache官网进行下载,从中选出我们需要的jar包添加到项目目录中;
创建一个测试类,对于文章进行一个索引,其中需要进行索引的是title和content
public class Article{
private int id;
private String title;
private String content;
.......// setter getter
}
然后需要添加索引,并且对title和content进行关键字索引
public class ArticleIndexSearcher {
private static final String path = ""; //索引存储位置
public Directory openDirectory() {
try {
return FSDirectory.open(new File(path).toPath());
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public IndexWriter getWriter() {
IndexWriter indexWriter = null;
Directory dir = openDirectory();
//这里设置添加索引时候的分词器,采用的是IKAnalyer
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new IKAnalyzer6x());
try {
indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
indexWriter = new IndexWriter(dir, indexWriterConfig);
} catch (IOException e) {
e.printStackTrace();
}
return indexWriter;
}
public void addIndex(Article article) {
IndexWriter indexWriter = getWriter();
try {
indexWriter.addDocument(documentFrom(article));
indexWriter.commit();
} catch (IOException e) {
e.printStackTrace();
}
}
private Document documentFrom(Article article) {
Document doc = new Document();
FieldType fieldType = new FieldType();
fieldType.setStored(true); //支持存储
fieldType.setTokenized(true); //支持分词
doc.add(new Field("id", article.getId() + "", fieldType));
doc.add(new Field("title", article.getTitle(), fieldType));
doc.add(new TextField("content", article.getContent(), Field.Store.YES));
return doc;
}
public List<Article> doSearch(String keyWord) {
Directory dir = openDirectory();
IndexSearcher indexSearcher = null;
List<Article> articles=new ArrayList<>();
try {
DirectoryReader reader = DirectoryReader.open(dir);
indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("title", new IKAnalyzer6x());
Query query = parser.parse(keyWord);
QueryParser parser1 = new QueryParser("content", new IKAnalyzer6x());
Query query1 = parser1.parse(keyWord);
BooleanQuery.Builder booleanQuery = new BooleanQuery.Builder();
booleanQuery.add(query, BooleanClause.Occur.SHOULD).add(query1, BooleanClause.Occur.SHOULD);
TopDocs topDocs=indexSearcher.search(booleanQuery.build(),100); //设置返回Top100
System.out.println(" 总命中数"+topDocs.totalHits);
for(ScoreDoc scoreDoc:topDocs.scoreDocs){
Document doc=indexSearcher.doc(scoreDoc.doc);
Article article = parseDocument(doc);
articles.add(article);
}
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
return articles;
}
private Article parseDocument(Document doc) {
String id=doc.getField("id").stringValue();
String title=doc.getField("title").stringValue();
String content=doc.getField("content").stringValue();
Article article=new Article();
article.setId(Integer.parseInt(id));
article.setTitle(title);
article.setContent(content);
return article;
}
}
在创建索引的过程中,需要注意的是这里采用的分词器和之后进行索引搜索时采用的分词器需要一致,而这里采用的是IKAnalyer分词器,由于IKAnayler对于Lucene6支持存在问题,需要改写IKAnalyer类,具体的方法可以参考博客 http://blog.sina.com.cn/s/blog_69a69e1a0102w8br.html ;