lucene2.3.0 note

http://archive.apache.org/dist/lucene/java/lucene-2.3.0.zip

doc:

  

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):
    Analyzer analyzer = new StandardAnalyzer();

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.getDirectory("/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
    iwriter.setMaxFieldLength(25000);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.TOKENIZED));
    iwriter.addDocument(doc);
    iwriter.optimize();
    iwriter.close();
    
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    Hits hits = isearcher.search(query);
    assertEquals(1, hits.length());
    // Iterate through the results:
    for (int i = 0; i < hits.length(); i++) {
      Document hitDoc = hits.doc(i);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    directory.close();

The Lucene API is divided into several packages:

    org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a java.io.Reader into a TokenStream, an enumeration of Tokens.  A TokenStream is composed by applying TokenFilters to the output of a Tokenizer.  A few simple implemenations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer.
    org.apache.lucene.document provides a simple Document class.  A document is simply a set of named Fields, whose values may be strings or instances of java.io.Reader.
    org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
    org.apache.lucene.search provides data structures to represent queries (TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the abstract Searcher which turns queries into Hits. IndexSearcher implements search over a single IndexReader.
    org.apache.lucene.queryParser uses JavaCC to implement a QueryParser.
    org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, a collection of named files written by an IndexOutput and read by an IndexInput.  Two implementations are provided, FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
    org.apache.lucene.util contains a few handy data structures, e.g., BitVector and PriorityQueue.

To use Lucene, an application should:

    Create Documents by adding Fields;
    Create an IndexWriter and add documents to it with addDocument();
    Call QueryParser.parse() to build a query from a string; and
    Create an IndexSearcher and pass the query to its search() method.

source(testing)

package hellolucene;

import java.io.File;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class HelloLucene {

	public static String INDEX_FILES_PATHSTRING = "C:\\QQDownload\\lucene\\indexstore";
	
	public void pre() throws Exception {
		File file = new File(HelloLucene.INDEX_FILES_PATHSTRING);
		for(File file2 : file.listFiles()) {
			file2.delete();
		}
	}
	
	private String[] deviceIds = new String[]{"1111", "2222", "3333"};
	private String[] attrss = new String[] {
				"__location:shanghai,__width:33,__usize:10,__IP Address:10.10.10.166", 
				"__location:hefei,__width:13,__usize:10,__IP Address:10.10.10.176",
				"__location:beijing,__width:34,__usize:10,__IP Address:10.10.10.78"
				};
	private String[] groups = new String[] {
			"232 2983 29838",
			"232 2983 282929",
			"232 2977 29"
	};
	
	public void index() throws Exception {
		Directory directory = FSDirectory.getDirectory(HelloLucene.INDEX_FILES_PATHSTRING);
		IndexWriter indexWriter = new IndexWriter(directory, new StandardAnalyzer());
		for(int i=0; i<deviceIds.length; i++) {
			Document document = new Document();
			document.add(new Field("deviceId", deviceIds[i], Field.Store.YES, Field.Index.TOKENIZED));
			document.add(new Field("attrs", attrss[i], Field.Store.YES, Field.Index.TOKENIZED));
			document.add(new Field("group", groups[i], Field.Store.YES, Field.Index.TOKENIZED));
			indexWriter.addDocument(document);
		}
		indexWriter.optimize();
		indexWriter.close();
		System.out.println("indexed "+deviceIds.length+" items");
	}
	
	public void search() throws Exception {
		Directory directory = FSDirectory.getDirectory(HelloLucene.INDEX_FILES_PATHSTRING);
		IndexSearcher indexSearcher = new IndexSearcher(directory);
		QueryParser queryParser = new QueryParser("group", new StandardAnalyzer());
		Query query = queryParser.parse("29");
		Hits hits = indexSearcher.search(query);
		for(int i=0;i<hits.length();i++) {
			Document hitDocument = hits.doc(i);
			System.out.println("deviceId:" + hitDocument.get("deviceId"));
			System.out.println("attrs:" + hitDocument.get("attrs"));
			System.out.println("group:" + hitDocument.get("group"));
			System.out.println("---------------------------------------------------------");
		}
		indexSearcher.close();
		directory.close();
	}
	
	
	public static void main(String[] args) throws Exception {
		HelloLucene helloLucene = new HelloLucene();
		helloLucene.pre();
		helloLucene.index();
		helloLucene.search();
	}
}
根据group值为29进行搜索,并不会搜索出group为2983,29838等记录,看来lucene对group进行分词过(以' '空字符进行分词过)。


下一步的学习是如何构建复杂的query,比如说,group id在以下集合{2983,29,2977}之一的item就要能够搜索出来。




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值