lucener java,lucene整理2 -- 主要的类

江布郎

于 2021-03-12 19:32:09 发布

阅读量61

点赞数

文章标签： lucener java

1. lucene中主要的类

1.1. Document文档类

1.1.1.常用方法

方法

描述

void add(Field field)

往Document对象中添加字段

void removeField(String name)

删除字段。若多个字段以同一个字段名存在，则删除首先添加的字段；若不存在，则Document保持不变

void removeFields(String name)

删除所有字段。若字段不存在，则Document保持不变

Field getField(String name)

若多个字段以同一个字段名存在，则返回首先添加的字段；若字段不存在，则Document保持不变

Enumeration fields()

返回Document对象的所有字段，以枚举类型返回

Field [] getFields(String name)

根据名称得到一个Field的数组

String [] getValues(String name)

根据名称得到一个Field的值的数组

1.1.2.示例

Document doc1 = new Document();

doc1.add(new Field("name", "word1 word2 word3",

Field.Store.NO,Field.Index.TOKENIZED));

Document doc2 = new Document();

doc2.add(new Field("name", "word1 word2 word3",

Field.Store.NO,Field.Index.TOKENIZED));

1.2. Field字段类

1.2.1.构造方法

1) public Field(String name,String value,Store store,Index index);//直接的字符串方式

2) public Field(String name,String value,Store store,Index index,TermVector termVector);

3) public Field(String name,String value,Reader reader);//使用Reader从外部传入

4) public Field(String name,String value,Reader reader,TermVector termVector);

5) public Field(String name,byte[] value,Store store)//使用直接的二进制byte传入

当Field值为二进制时，可以使用Lucene的压缩功能将其值进行压缩。

1.2.2.Store类

静态属性

描述

Store.NO

表示该Field不需要存储

Store.YES

表示该Field需要存储

Store.COMPRESS

表示用压缩方式来保存这个Field的值

1.2.3.Index类

静态属性

描述

Index.NO

不需要索引

Index.TOKENIZED

先被分词再被索引

Index.UN_TOKENIZED

不对该Field进行分词，但会对它进行索引

Index.NO_NORMS

对该Field进行索引，但是不使用Analyzer，同时禁止它参加评分，主要是为了减少内存的消耗。

1.2.4.示例

new Field("name", "word1 word2 word3",Field.Store.YES,Field.Index.TOKENIZED)

1.3. IndexWriter类

1.3.1.构造方法

1) public IndexWriter(String path,Analyzer a,Boolean create)

2) public IndexWriter(File path,Analyzer a,Boolean create)

3) public IndexWriter(Directory d,Analyzer a,Boolean create)

第一个参数：索引存放在什么地方

第二个参数：分析器，继承自org.apache.lucene.analysis.Analyzer类

第三个参数：为true时，IndexWriter不管目录内是否已经有索引了，一律清空，重新建立；当为false时，则IndexWriter会在原有基础上增量添加索引。所以在更新的过程中，需要设置该值为false。

1.3.2.添加文档

public void addDocument(Document doc)

public void addDocument(Document doc,Analyzer analyzer)//使用一个开发者自定义的，而非事先在构建IndexWriter时声明的Analyzer来进行分析

writer.addDocument(doc1);

1.3.3.性能参数

1) mergeFactor控制Lucene在把索引从内存写入磁盘上的文件系统时内存中最大的Document数量，同时它还控制内存中最大的Segment数量。默认为10.

writer.setMergeFactor(10);

2) maxMergeDocs限制一个Segment中最大的文档数量。一个较大的maxMergeDocs适用于对大批量的文档建立索引，增量式的索引则应使用较小的maxMergeDocs。

writer.setMaxMergeDocs(1000);

3) minMergeDocs用于控制内存中持有的文档数量的，它对磁盘上的Segment大小没有任何影响。

1.3.4.限制Field的长度

maxFieldLength限制Field的长度，默认值为10000.最大值100000个。

public void setMaxFieldLength(int maxFieldLength)

writer.addDocument(doc1);

writer.setMaxFieldLength(100000);

writer.addDocument(doc2);

1.3.5.复合索引格式

setUseCompoundFile(Boolean) 默认true

writer.setUseCompoundFile(true);//复合索引

writer.setUseCompoundFile(false);

1.3.6.优化索引

writer.optimize();

将磁盘上的多个segment进行合并，组成一个全新的segment。这种方法并不会增加建索时的速度，反而会降低建索的速度。所以应该在建完索引后在调用这个函数

1.3.7.示例

IndexWriter writer = new IndexWriter(path, new StandardAnalyzer(), true);

writer.addDocument(doc1);

writer.addDocument(doc2);

Sytem.out.println(writer.docCount());

writer.close();

IndexSearcher searcher = new IndexSearcher(path);

Hits hits = null;

Query query = null;

QueryParser parser =new QueryParser("name", new StandardAnalyzer());

query =parser.parse("word1");

hits = searcher.search(query);

System.out.println("查找 word1 共" + hits.length() + "个结果");

1.4. Directory类

Directory：用于索引的存放位置

a) FSDirectory.getDirectory(path, true)第二个参数表示删除掉目录内原有内容

IndexWriter writer = new IndexWriter(FSDirectory.getDirectory(path, true), new StandardAnalyzer(), true);//删除原有索引

或

FSDirectory fsDir=FSDirectory.getDirectory(path,true);

IndexWriter writer = new IndexWriter(fsDir, new StandardAnalyzer(), true);

b) RAMDirectory在内存中存放，读取速度快，但程序一运行结束，它的内容就不存在了

RAMDirectory ramDir=new RAMDirectory();

IndexWriter writer = new IndexWriter(ramDir, new StandardAnalyzer(), true);

或

IndexWriter writer = new IndexWriter(new RAMDirectory(), new StandardAnalyzer(), true);

1.5. IndexReader类

IndexReader类――索引的读取工具

1.5.1.删除文档

IndexReader reader=IndexReader.open(path);

reader.deleteDocument(0);//删除第一个

reader.close();

1.5.2.反删除

reader.undeleteAll();

1.5.3.按字段删除

reader.deleteDocuments(new Term("name","word1"));

若要真正物理删除，则只需使用IndexWriter对索引optimize一次即可！

1.5.4.示例

IndexReader reader=IndexReader.open(path);

for(int i=0;i

System.out.println(reader.document(i));

}

System.out.println("版本："+reader.getVersion());

System.out.println("索引内的文档数量："+reader.numDocs());

//reader.deleteDocuments(new Term("name","word1"));

Term term1=new Term("name","word1");

TermDocs docs=reader.termDocs(term1);

while(docs.next())

{

System.out.println("含有所查找的"+term1+"的Document的编号为"+docs.doc());

System.out.println("Term在文档中的出现次数"+docs.freq());

}

reader.close();

1.6. IndexModifier类

集成了IndexWriter的大部分功能和IndexReader中对索引删除的功能 ------ Lucene2.0的新类

1.6.1.示例

public static void main(String[] args) throws Exception {

IndexModifier modifier=new IndexModifier("C:\\Q1",new StandardAnalyzer(),true);

Document doc1=new Document();

doc1.add(new Field("bookname","钢铁是怎样炼成的",Field.Store.YES,Field.Index.TOKENIZED));

Document doc2=new Document();

doc2.add(new Field("bookname","山山水水",Field.Store.YES,Field.Index.TOKENIZED));

modifier.addDocument(doc1);

modifier.addDocument(doc2);

System.out.println(modifier.docCount());

modifier.setUseCompoundFile(false);

modifier.close();

IndexModifier mo=new IndexModifier("C:\\Q1",new StandardAnalyzer(),false);

mo.deleteDocument(0);

System.out.println(mo.docCount());

mo.close();

}

1.7. IndexSearcher类

1.7.1.构造方法

IndexSearcher searcher = new IndexSearcher(String path);

IndexSearcher searcher = new IndexSearcher(Directory directory);

IndexSearcher searcher = new IndexSearcher(IndexReader r);

IndexSearcher searcher = new IndexSearcher(IndexReader r,Boolean closeReader);

IndexSearcher searcher = new IndexSearcher(path);

IndexSearcher searcher = new IndexSearcher(FSDirectory.getDirectory(path,false) );

1.7.2.search方法

//返回Hits对象

public Hits search(Query query)

public Hits search(Query query,Filter filter)

public Hits search(Query query,Sort sort)

public Hits search(Query query,Filter filter,Sort sort)

//检索只返回得分最高的Document

public TopDocs search(Query query,Filter filter,int n)

public TopDocs search(Weight weight,Filter filter,int n)

public TopFieldDocs search(Weight weight,Filter filter,int n,Sort sort)

public TopFieldDocs search(Query query,Filter filter,int n,Sort sort)

//传入HitCollector,将结果保存在HitCollector中

public void search(Query query,HitCollector results)

public void search(Query query,Filter filter,HitCollector results)

public void search(Weight weight,Filter filter,HitCollector results)

1.7.3.Searcher的explain方法

public Explaination explain(Query query,int doc)throws IOException

for(int i=0;i

{

Document d=hits.doc(i);

System.out.println(i+" "+hits.score(i)+" "+d.get("contents"));

System.out.println(searcher.explain(query,hits.id(i)).toString());

}

1.7.4.示例

IndexSearcher searcher = new IndexSearcher(path);

Hits hits = null;

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

query =parser.parse("11");

hits = searcher.search(query);

System.out.println("查找 word1 共" + hits.length() + "个结果");

for(int i=0;i

{

Document d=hits.doc(i);

System.out.println(d+" "+i+" "+hits.score(i)+" "+d.get("contents"));

}

searcher.close();

1.8. Hits类

1.8.1.概述

Hits类――检索结果

1.8.2.常用方法

方法名

描述

int length()

返回搜索到结果的总数量

Document doc(int i)

返回第i个文档

int id(int i)

返回第i个文档的内部ID号

float score(int i)

返回第i个文档的得分

Iterator iterator()

取得Hits集合的遍历对象

1.8.3.示例

for(int i=0;i

{

Document d=hits.doc(i);

System.out.println(d+" "+" "+hits.score(i)+" "+d.get("contents"));

System.out.println("文档的内部ID号:" + hits.id(i));

}

1.9. QueryParser类

1.9.1.改变默认的布尔逻辑

Ø 默认为“或”关系

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

query =parser.parse("hello world!");

System.out.println(query.toString());

Ø 改变默认布尔逻辑

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

query =parser.parse("hello world");//若world后加！会出错

System.out.println(query.toString());

Ø AND OR NOT – 关键字

也可以不用改变默认布尔逻辑，而直接让用户在输入关键字时指定不同词条间的布尔联系。例如，用户输入 hello AND world 必须为大写

逻辑与：AND (大写)

逻辑或：OR (大写)

逻辑非：- 例如： hello - world

也可以是NOT 例如： hello NOT world

1.9.2.不需要分词

不进行分词，将其完整的作为一个词条进行处理，则需要在词组的外面加上引号

String queryStr="\"God helps those who help themselves\"";

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.3.设置坡度值,支持FuzzyQuery

String queryStr="\"God helps those who help themselves\"~1";//设置坡度为1

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.4.设置通配符，支持WildcardQuery

String queryStr="wor?"

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.5.查找指定的Field

String queryStr="linux publishdate:2006-09-01";

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

例如：要求用户选择某一方面的

1.9.6.范围的查找，支持RangeQuery

String queryStr="[1990-01-01 TO 1998-12-31]";

QueryParser parser=new QueryParser("publishdate",

new StandardAnalyzer());

Query query=parser.parse(queryStr);

System.out.println(query.toString());

输出结果为publishdate:[081xmghs0 TO 0boeetj3z]

因为建立索引时，如果按照日期表示的字符串来进行索引，实际上比较的是字符串的字典顺序。而首先将日期转为以毫秒计算的时间后，则可以精确地比较两个日期的大小了。于是，lucene提供DateTools工具，用来完成其内部对时间的转化和处理，将毫秒级的时间转化为一个长字符串来进行表示，并进行索引。所以，遇到日期型数据时，最好用DateTools进行转换，再进行索引！

1.9.7.现在还不支持SpanQuery

1.10. MultiFieldQueryParser类--多域搜索

//在不同的Field上进行不同的查找

public static Query parse(String []queries,String[] fields,Analyzer analyzer)throws ParseException

//在不同的Field上进行同一个查找，指定它们之间的布尔关系

public static Query parse(String query,String[] fields,BooleanClause.Occur[] flags,Analyzer analyzer) throws ParseException

//在不同的Field上进行不同的查找，指定它们之间的布尔关系

public static Query parse(String []queries,String [] fields,BooleanClause.Occur[] flags,Analyzer analyzer)throws ParseException

String [] queries={"钢", "[10 TO 20]"};

String[] fields={“bookname”,”price”};

BooleanClause.Occur[] clauses={BooleanClause.Occur.MUST,BooleanClause.Occur.MUST};

Query query=MultiFieldQueryParser.parse(queries,fields,clauses,new StandardAnalyzer());

System.out.println(query.toString());

1.11. MultiSearcher类--多个索引搜索

IndexSearcher searcher1=new IndexSearcher(path1);

IndexSearcher searcher2=new IndexSearcher(path2);

IndexSeacher [] searchers={searcher1,seacher2};

MultiSearcher searcher=new MultiSearcher(searchers);

Hits hits=searcher.search(query);

for(int i=0;i

System.out.println(hits.doc(i));

}

1.12. ParalellMultiSearcher类---多线程搜索

IndexSearcher searcher1=new IndexSearcher(path1);

IndexSearcher searcher2=new IndexSearcher(path2);

IndexSearcher [] searchers={searcher1,searcher2};

ParallelMultiSearcher searcher=new ParallelMultiSearcher(searchers);

long start=System.currentTimeMillis();

Hits hits=searcher.search(query);

long end=System.currentTimeMillis();

System.out.println((end-start)+"ms");

本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/xiaoping8411/archive/2010/03/23/5409953.aspx

江布郎

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lucener java,lucene整理2 -- 主要的类

1. lucene中主要的类1.1. Document文档类1.1.1.常用方法方法描述void add(Field field)往Document对象中添加字段void removeField(String name)删除字段。若多个字段以同一个字段名存在，则删除首先添加的字段；若不存在，则Document保持不变void removeFields(String name)删除所有字段。若...
复制链接

扫一扫