lucene--创建index

这是我在lucene in action 中看到的,本来想翻译一下,但是翻译成汉语就没有原来的味道了。

What is indexing, and why is it important? 

         Suppose you needed to search a large number of files, and you wanted to be able 

         to find files that contained a certain word or a phrase. How would you go about 

         writing a program to do this? A naïve approach would be to sequentially scan 

         each file for the given word or phrase. This approach has a number of flaws, the 

         most obvious of which is that it doesn’t scale to larger file sets or cases where files

         are very large. This is where indexing comes in: To search large amounts of text 

         quickly, you must first index that text and convert it into a format that will let you 

         search it rapidly, eliminating the slow sequential scanning process. This conver- 

         sion process is called indexing, and its output is called an index. 

            You can think of an index as a data structure that allows fast random access to 

        words stored inside it. The concept behind it is analogous to an index at the end 

         of a book, which lets you quickly locate pages that discuss certain topics. In the 

         case of Lucene, an index is a specially designed data structure, typically stored 

         on the file system as a set of index files. We cover the structure of index files in 

         detail in appendix B, but for now just think of a Lucene index as a tool that 

         allows quick word lookup. 

创建index的过程:

1创建Directory -- 我们的索引是创建在硬盘还是创建在内存

2创建IndexWriter

3创建Document对象   索引文档(名称,路径,大小,修改时间,内容)什么形式呈现

4 为Document添加Field

5通过IndexWriter添加文档到索引中

5关闭 IndexWriter 

示例代码:

package com.java.lucene.index;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

public class MyIndex {
	private String[] ids = {"1","2","3","4","5","6"};
	private String[] names = {"tian","bao","xing","zhen","kun","xing"};
	private String[] emails = {"aa@qq.com","bb@qq.com","cc@qq.com",
			"dd@qq.com","ee@qq.com","ff@qq.com"};
	private String[] contents = {
			"Lucene Core, our flagship sub-project, provides Java-based indexing and search technology",
			"Solr is a high performance search server built using Lucene Core, with XML/HTTP and ",
			"Open Relevance Project is a subproject ",
			"PyLucene is a Python port of the Core project.",
			"22 July 2012 - Apache Lucene 3.6.1 and Apache Solr 3.6.1 available",
			"Lucene 3.6.1 Release Highlights"
	};
	
	private Directory directory = null;
	
	public MyIndex(){
		try {
			//1创建Directory -- 我们的索引是创建在硬盘还是创建在内存
//			Directory directory = new RAMDirectory(); // 建立在内存中的索引
			directory = FSDirectory.open(new File("d:/tools/lucene/index02"));
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void index() {
		IndexWriter writer = null;
		try {
			//2创建IndexWriter
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
			Document doc = null;
			for(int i=0;i<ids.length;i++) {
				//3创建Document对象   索引文档(名称,路径,大小,修改时间,内容)什么形式呈现
				doc = new Document();
				//4 为Document添加Field
				doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED));
				doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED));
				doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				//5通过IndexWriter添加文档到索引中
				writer.addDocument(doc);
			}
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer!=null){
					//6关闭writer
					writer.close();
				}
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	
}

 

 

 


 

转载于:https://my.oschina.net/winHerson/blog/71904

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值