Lucene深入研究(1)

<!-- -->

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

<!-- ======================================================== --><!-- = Java Sourcecode to HTML automatically converted code = --><!-- = Java2Html Converter 5.0 [2006-02-26] by Markus Gebhard markus@jave.de = --><!-- = Further information: http://www.java2html.de = -->
Analyzeranalyzer=newStandardAnalyzer();

//Storetheindexinmemory:
Directorydirectory=newRAMDirectory();
//Tostoreanindexondisk,usethisinstead:
//Directorydirectory=FSDirectory.getDirectory("/tmp/testindex");
IndexWriteriwriter=newIndexWriter(directory,analyzer,true);
iwriter.setMaxFieldLength(25000);
Documentdoc=newDocument();
Stringtext="Thisisthetexttobeindexed.";
doc.add(newField("fieldname",text,Field.Store.YES,
Field.Index.TOKENIZED));
iwriter.addDocument(doc);
iwriter.optimize();
iwriter.close();

//Nowsearchtheindex:
IndexSearcherisearcher=newIndexSearcher(directory);
//Parseasimplequerythatsearchesfor"text":
QueryParserparser=newQueryParser("fieldname",analyzer);
Queryquery=parser.parse("text");
Hitshits=isearcher.search(query);
assertEquals(1,hits.length());
//Iteratethroughtheresults:
for(inti=0;i<hits.length();i++){
DocumenthitDoc=hits.doc(i);
assertEquals("Thisisthetexttobeindexed.",hitDoc.get("fieldname"));
}
isearcher.close();
directory.close();
<!-- = END of automatically generated HTML code = --><!-- ======================================================== -->

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Documents by adding Fields;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHTML demo is more sophisticated. It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
[ ... create an index containing all the relnotes ]

> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html

阅读更多
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭