Lucene的使用:
在全文索引工具中,都是由三部分组成:
1.索引部分
2.分词部分
3.搜索部分
Lucene的开发步骤如下,一开始不理解没关系,知道大概的步骤就行了,我会附上一个小的Dome示例代码
1.创建一个Java Project
2.导入所需的jar包,core.jar包必须导入,junit也要导入,这样才能进行单元测试3.完成创建索引部分
a.创建Directory
b.创建IndexWriter
c.创建Document对象
d.为Document对象添加Field
e.通过IndexWriter添加文档到索引中
// 模拟数据
private String[] ids = { "1", "2", "3", "4", "5", "6" };
private String[] emails = { "aa@itat.org", "bb@itat.org", "cc@cc.org",
"dd@sina.org", "ee@zttc.edu.org", "ff@itat.org" };
private String[] content = { "welecome to wisited the space I like dog",
"Hello boy,do you like me", "my name is cc,I like music",
"I like football,I like you", "I like basketball",
"I like movie and swimming" };
private int[] attachs = { 2, 3, 1, 4, 5, 5 };
private String[] names = { "zhangsan", "lisi", "john", "jetty", "mike",
"jake" };
private Directory directory = null;
public Index_Util() {
try {
directory = FSDirectory.open(new File("F:/Lucene/index02"));
} catch (IOException e) {
e.printStackTrace();
}
}
public void index() {
Document doc = null;
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_35,
new StandardAnalyzer(Version.LUCENE_35));
IndexWriter writer = null;
try {
writer = new IndexWriter(directory, iwc);
writer.deleteAll();
// 添加document
for (int i = 0; i < ids.length; i++) {
doc = new Document();
doc.add(new Field("ids", ids[i], Store.YES,
Index.NOT_ANALYZED_NO_NORMS));
doc.add(new Field("emails", emails[i], Store.YES,
Index.NOT_ANALYZED));
doc.add(new Field("content", content[i], Store.NO,
Index.ANALYZED));
doc.add(new Field("names", names[i], Store.YES,
Index.NOT_ANALYZED_NO_NORMS));
// 索引数字和日期
doc.add(new NumericField("attachs", Store.YES, false)
.setIntValue(attachs[i]));
doc.add(new NumericField("date", Store.YES, true)
.setLongValue(dates[i].getTime()));
// 提取加权邮箱后缀,进行加权处理
String et = emails[i].substring(emails[i].lastIndexOf("@") + 1);
// 如果HashMap中包含此后缀
if (scores.containsKey(et)) {
doc.setBoost(scores.get(et));
} else {
doc.setBoost(0.5f);
}
writer.addDocument(doc);
}
} catch (CorruptIndexException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (LockObtainFailedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
// 关闭writer
if (writer != null) {
try {
writer.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Field的参数中:
Field.store.Yes 或 NO (存储域选项):
设置为Yes表示把这个域中的内容完全存储到文件中,方便还原;
设置为No 表示把这个域的内容不存储到文件中,但是可以被索引,此时内容无法完全还原,无法使用doc.get还原;
Field.Index (索引选项):
Index.Analyzed : 进行分词和索引,适用于标题内容等;
Index.NOT_ANALYZED :进行索引,但是不进行分词;
Index.ANALYZED_NOT_NORMS:进行分词但是不存储norms信息,这个norms中包括了创建索引的时间和权值等信息;
Index.NOT_ANALYZED_NOT_NORMS:即不进行分词也不进行存储norms信息,如身份证号、姓名等,适用于精确搜索,没有加权的意义;
Index.NO : 不进行索引;
创建文档并且添加索引,可以这样理解:
文档相当于表中的每一条记录;
域相当于表中的每一个字段;
先创建文档,再添加域。
content存储方法:
可以使用commons-io.jar包中的方法, 通过输入流将文件读进来,转换成String,再存储处理
String content = FileUtils.readFileToString(file);
4.完成搜索部分
a.创建Directory
b.创建IndexReader
c.根据IndexReader创建IndexSearcher
d.创建搜索的Query
e.根据searcher搜索并返回TopDocs
f.根据TopDocs获取ScoreDoc对象
g.根据searcher和ScoreDoc对象获取具体的Document对象
h.根据Document对象获取需要的值
i.关闭reader
public void search() {
IndexReader reader = null;
try {
reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
// 使用TermQuary进行精确查找
TermQuery termQuery = new TermQuery(new Term("content", "like"));
TopDocs scores = searcher.search(termQuery, 10);
for (ScoreDoc sd : scores.scoreDocs) {
Document doc = searcher.doc(sd.doc);
//获取日期并转换
String dt = doc.get("date");
SimpleDateFormat date = new SimpleDateFormat("yyyy-MM-dd");
String dt2 = date.format(Long.parseLong(dt));
//System.out.println(dt2);
System.out.println("DocNo---->" + sd.doc + " name---->"
+ doc.get("names") + " email----->"
+ doc.get("emails") + " attachs---->"
+ doc.get("attachs")+" date---->"+dt2);
}
} catch (CorruptIndexException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}