最近在学习Lucene,官方版本已经更新至5.0,网址:http://lucene.apache.org/
Lucene官网 写道
The Apache LuceneTM project develops open-source search software, including:
1.Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
2.Solr, is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
3.Open Relevance Project, is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
4.PyLucene, is a Python port of the Core project.
Lucene Core是最核心的内容。
1.Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
2.Solr, is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
3.Open Relevance Project, is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
4.PyLucene, is a Python port of the Core project.
provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
Lucene 是基于Java基础的索引和搜索技术,包括拼写检查,高亮显示和高端的分词能力。它不是一个完全的应用,而是提供了一项能力,一种技术,JAVA搜索技术的一个解决方案。
Lucene的技术我就不详细介绍了,百度百科上有.
接下来我为大家介绍,我使用Lucene设计和开发了一个简易文本文件搜索的Demo,主要包括以下几个方面:
- 环境准备
使用到的jar包:见附件。
- 主体思路
1.获取需要被索引的数据
File[] files = files2Index.listFiles(new FilenameFilter()
{
@Override
public boolean accept(File dir, String name)
{
return name.endsWith("txt");
}
});
2.使用Lucene创建索引
IndexWriter indexWriter = getIndexWriter();
BufferedReader br = null;
String line = null;
StringBuilder sb = null;
for (File file : files)
{
// 创建txt文件的索引,包括名称和内容
Document doc = new Document();
doc.add(new TextField(NAME, file.getName(), Store.YES));
try
{
br = new BufferedReader(new FileReader(file));
sb = new StringBuilder();
while ((line = br.readLine()) != null)
{
sb.append(line);
}
doc.add(new TextField(CONTENT, sb.toString(), Store.YES));
indexWriter.addDocument(doc);
indexWriter.commit();
br.close();
} catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
// log
} catch (IOException e)
{
// TODO Auto-generated catch block
// log
}
}
/**
* 索引写入类
*
* @return
*/
private IndexWriter getIndexWriter()
{
IndexWriter indexWriter = null;
try
{
indexWriter = new IndexWriter(FSDirectory.open(indexDir.toPath()),
new IndexWriterConfig(new SmartChineseAnalyzer()));
} catch (IOException e)
{
// TODO Auto-generated catch block
// log
}
return indexWriter;
}
3.根据索引查询内容
public List<String> getFoundFileNames(String queryContent)
{
ScoreDoc[] scoreDocs = queryIndex(queryContent);
List<String> results = new ArrayList<String>();
Set<String> fields = new HashSet<String>();
fields.add(NAME);
fields.add(CONTENT);
for (ScoreDoc scDoc : scoreDocs)
{
try
{
Document resDoc = indexSearcher.doc(scDoc.doc, fields);
results.add(resDoc.getValues(NAME)[0]);
} catch (IOException e)
{
// TODO Auto-generated catch block
// log
}
}
return results;
}
private ScoreDoc[] queryIndex(String queryContent)
{
try
{
// 索引搜索
indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory
.open(indexDir.toPath())));
// 查询内容转换器
QueryParser parser = new QueryParser("", new SmartChineseAnalyzer());
return indexSearcher.search(parser.parse(queryContent), MAX_COUNT).scoreDocs;
} catch (IOException e)
{
// TODO Auto-generated catch block
// log
} catch (ParseException e)
{
// TODO Auto-generated catch block
// log
}
return null;
}
PS:首先需要在对应的目录下面创建一些TXT文件,索引目录如果不存在会自动创建文件夹
测试代码:
@Test
public void test01()
{
IndexFile file = new IndexFile(new File("E:\\APP\\luceneTest\\文本文件"),
new File("E:\\APP\\luceneTest\\indexs\\01"));
file.createIndex();
System.out.println("01:" + file.getFoundFileNames("NAME:\"文本\""));
System.out.println("01:" + file.getFoundFileNames("NAME:\"txt\""));
}
@Test
public void test02()
{
IndexFile file = new IndexFile(new File("E:\\APP\\luceneTest\\文本文件"),
new File("E:\\APP\\luceneTest\\indexs\\02"));
file.createIndex();
System.out.println("02:" + file.getFoundFileNames("CONTENT:\"我\""));
System.out.println("02:" + file.getFoundFileNames("CONTENT:\"开发\""));
}