Lucene3.4 下载地址:http://lucene.apache.org/ 14 September 2011
简介如下:(官网简介:)
What Is Apache Lucene?
The Apache Lucene™ project develops open-source search software, including:
Apache Lucene Core™ (formerly named Lucene Java), our flagship sub-project, provides a Java-based indexing and search implementation, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
Apache Solr™ is our high performance enterprise search server, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, distributed search, database integration, web admin and search interfaces.
Apache PyLucene™ is a Python port of the the Lucene Core project.
Apache Open Relevance Project™ is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
★示例:本示例要实现的功能是:查找txt文本文档中的关键字,如果找到,则显示匹配结果,并输出文件名、存放路径、大小、内容.
★原理:采集建立索引,从信息源中拷贝到本地进行加工处理,这里的信息源可以是数据库、互联网等,存入索引库(一组文件的集合,二进制).搜索时从本地的信息集合中进行搜索.文本信息在建立索引和搜索时,都会使用到分词器进行分词,并且使用的是同一个分词器.索引库可以理解为包含索引表和索引表对应的数据、文档等的集合.搜索时,分词器对关键字进行处理,比照索引表,通过索引表找到数据。
★示例实战:
建立测试hello.txt文件内容如下:
hello1 world test for fd. document document
Just a case; hel
hello是 测试测试搜索 1 hrllo hello hello hello
1.建立一个Java Project
2.导入Lucene3.4 必须jar包
lucene-core-3.4.0.jar//核心jar包
contrib\highlighter\lucene-highlighter-3.4.0.jar //高亮
contrib\analyzers\lucene-analyzers-3.4.0.jar //分词器
新建数据源(本地)文件夹luceneDataSource,索引文件夹luceneIndex
3.LuceneDemo.java源代码:
importjava.io.File;
importorg.apache.lucene.analysis.Analyzer;
importorg.apache.lucene.analysis.standard.StandardAnalyzer;
importorg.apache.lucene.document.Document;
importorg.apache.lucene.index.IndexWriter;
importorg.apache.lucene.index.IndexWriter.MaxFieldLength;
importorg.apache.lucene.queryParser.MultiFieldQueryParser;
importorg.apache.lucene.queryParser.QueryParser;
importorg.apache.lucene.search.Filter;
importorg.apache.lucene.search.IndexSearcher;
importorg.apache.lucene.search.Query;
importorg.apache.lucene.search.ScoreDoc;
importorg.apache.lucene.search.TopDocs;
importorg.apache.lucene.store.FSDirectory;
importorg.apache.lucene.util.Version;
importorg.junit.Test;
importcom.yaxing.utils.File2Document;
publicclassLuceneDemo {
String filePath ="J:\\MyEclipse-8.6\\lucene\\LuceneDemo\\luceneDataSource\\hello.txt";
File indexPath =newFile("J:\\MyEclipse-8.6\\lucene\\LuceneDemo\\luceneIndex");
Analyzer analyzer =newStandardAnalyzer(Version.LUCENE_34);
/**
* 建立索引 IndexWriter 增、删、改
* */
@Test
publicvoidcreatIndex()throwsException {
// file-->Document
Document doc = File2Document.file2Document(filePath);
//Directory dir = FSDirectory.open(indexPath);
IndexWriter indexWriter =newIndexWriter(FSDirectory.open(indexPath), analyzer,true,MaxFieldLength.LIMITED);
indexWriter.addDocument(doc);
indexWriter.close();
}
/**
* 搜索 IndexSearcher
* 用来在索引库中进行查询
* */
@Test
publicvoidsearch()throwsException {
String queryString ="搜索";
//把要搜索的文本解析为Query
String[] fields = {"name","content"};
QueryParser queryParser =newMultiFieldQueryParser(Version.LUCENE_34, fields, analyzer);//查询解析器
Query query = queryParser.parse(queryString);
//查询
IndexSearcher indexSearcher =newIndexSearcher(FSDirectory.open(indexPath));
Filter filter =null;
TopDocs topDocs = indexSearcher.search(query, filter,10000);//topDocs 类似集合
System.out.println("总共有【"+topDocs.totalHits+"】条匹配结果.");
//输出
for(ScoreDoc scoreDoc:topDocs.scoreDocs){
intdocSn = scoreDoc.doc;//文档内部编号
Document doc = indexSearcher.doc(docSn);//根据文档编号取出相应的文档
File2Document.printDocumentInfo(doc);//打印出文档信息
}
}
}
4.File2Document.java源码
importjava.io.BufferedReader;
importjava.io.File;
importjava.io.FileInputStream;
importjava.io.FileNotFoundException;
importjava.io.IOException;
importjava.io.InputStreamReader;
importorg.apache.lucene.document.Document;
importorg.apache.lucene.document.Field;
importorg.apache.lucene.document.Field.Index;
importorg.apache.lucene.document.Field.Store;
publicclassFile2Document {
//文件属性: content,name,size,path
publicstaticDocument file2Document(String path){
File file =newFile(path);
Document doc =newDocument();
//Store.YES 是否存储 yes no compress
//Index 是否进行索引 Index.ANALYZED 分词后进行索引
doc.add(newField("name",file.getName(),Store.YES,Index.ANALYZED));
doc.add(newField("content",readFileContent(file),Store.YES,Index.ANALYZED));//readFileContent()读取文件类容
doc.add(newField("size",String.valueOf(file.length()),Store.YES,Index.NOT_ANALYZED));//不分词,文件大小(int)转换成String
doc.add(newField("path",file.getAbsolutePath(),Store.YES,Index.NOT_ANALYZED));//不需要根据文件的路径来查询
returndoc;
}
/**
* 读取文件类容
* */
privatestaticString readFileContent(File file) {
try{
BufferedReader reader =newBufferedReader(newInputStreamReader(newFileInputStream(file)));
StringBuffer content =newStringBuffer();
try{
for(String line=null;(line = reader.readLine())!=null;){
content.append(line).append("\n");
}
}catch(IOException e) {
e.printStackTrace();
}
returncontent.toString();
}catch(FileNotFoundException e) {
e.printStackTrace();
}
returnnull;
}
/**
*
* 获取name属性值的两种方法
* 1.Filed field = doc.getFiled("name");
* field.stringValue();
* 2.doc.get("name");
*
* @param doc
* */
publicstaticvoidprintDocumentInfo(Document doc){
System.out.println("name -->"+doc.get("name"));
System.out.println("content -->"+doc.get("content"));
System.out.println("path -->"+doc.get("path"));
System.out.println("size -->"+doc.get("size"));
}
}
5.Junit测试结果:
String queryString = "搜索";
总共有【1】条匹配结果.
name -->hello.txt
content -->hello1 world test for fd. document document
Just a case; hel
hello是 测试测试搜索 1 hrllo hello hello hello
path -->J:\MyEclipse-8.6\lucene\LuceneDemo\luceneDataSource\hello.txt
size -->109
String queryString = "hello";
总共有【1】条匹配结果.
name -->hello.txt
content -->hello1 world test for fd. document document
Just a case; hel
hello是 测试测试搜索 1 hrllo hello hello hello
path -->J:\MyEclipse-8.6\lucene\LuceneDemo\luceneDataSource\hello.txt
size -->109
索引建立如下:
String queryString = "zazazaza";
总共有【0】条匹配结果.