Lucene 2.4.1 API

最新推荐文章于 2013-05-29 20:36:00 发布

西二旗小码农

最新推荐文章于 2013-05-29 20:36:00 发布

阅读量791

点赞数

分类专栏：搜索引擎相关文章标签： lucene api query search combinations java

本文链接：https://blog.csdn.net/lwm_1985/article/details/6608579

版权

搜索引擎相关专栏收录该内容

34 篇文章 0 订阅

订阅专栏

The article is from the site: http://lucene.apache.org/java/2_4_1/api/overview-summary.html#overview_description

Apache Lucene is a high-performance, full-featured text search engine library.

Core
org.apache.lucene	Top-level package.
org.apache.lucene.analysis	API and code to convert text into indexable/searchable tokens.
org.apache.lucene.analysis.standard	A fast grammar-based tokenizer constructed with JFlex.
org.apache.lucene.document	The logical representation of a `Document` for indexing and searching.
org.apache.lucene.index	Code to maintain and access indices.
org.apache.lucene.queryParser	A simple query parser implemented with JavaCC.
org.apache.lucene.search	Code to search indices.
org.apache.lucene.search.function	Programmatic control over documents scores.
org.apache.lucene.search.payloads	The payloads package provides Query mechanisms for finding and using payloads.
org.apache.lucene.search.spans	The calculus of spans.
org.apache.lucene.store	Binary i/o API, used for all index data.
org.apache.lucene.util	Some utility classes.
org.apache.lucene.util.cache

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

Analyzer analyzer = new StandardAnalyzer();

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.getDirectory("/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
    iwriter.setMaxFieldLength(25000);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.TOKENIZED));
    iwriter.addDocument(doc);
    iwriter.optimize();
    iwriter.close();
    
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    Hits hits = isearcher.search(query);
    assertEquals(1, hits.length());
    // Iterate through the results:
    for (int i = 0; i < hits.length(); i++) {
      Document hitDoc = hits.doc(i);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    directory.close();

The Lucene API is divided into several packages:

org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a java.io.Reader into a TokenStream, an enumeration of Tokens. A TokenStream is composed by applying TokenFilters to the output of a Tokenizer. A few simple implemenations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.document provides a simple Document class. A document is simply a set of named Fields, whose values may be strings or instances of java.io.Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the abstract Searcher which turns queries into Hits. IndexSearcher implements search over a single IndexReader.
org.apache.lucene.queryParser uses JavaCC to implement a QueryParser.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, a collection of named files written by an IndexOutput and read by an IndexInput. Two implementations are provided, FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.util contains a few handy data structures, e.g., BitVector and PriorityQueue.

To use Lucene, an application should:

Create Documents by adding Fields;
Create an IndexWriter and add documents to it with addDocument();
Call QueryParser.parse() to build a query from a string; and
Create an IndexSearcher and pass the query to its search() method.

Some simple examples of code which does this are:

FileDocument.java contains code to create a Document for a file.
IndexFiles.java creates an index for all the files contained in a directory.
DeleteFiles.java deletes some of these files from the index.
SearchFiles.java prompts for queries and searches an index.

To demonstrate these, try something like:

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHTML demo is more sophisticated. It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
[ ... create an index containing all the relnotes ]
> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html

西二旗小码农

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Lucene 2.4.1 API

The article is from the site: http://lucene.apache.org/java/2_4_1/api/overview-summary.html#overview_description Apache Lucene is a high-per
复制链接

扫一扫

专栏目录