lucene问题汇总:
从lucene API 可知,IndexReader和IndexWriter都具有删除指定文档的功能。
IndexReader:
void | deleteDocument(int docNum) Deletes the document numbered docNum . |
int | deleteDocuments(Term term) Deletes all documents that have a given term indexed. |
IndexWriter:
void | deleteDocuments(Query... queries) Deletes the document(s) matching any of the provided queries. |
void | deleteDocuments(Query query) Deletes the document(s) matching the provided query. |
void | deleteDocuments(Term... terms) Deletes the document(s) containing any of the terms. |
void | deleteDocuments(Term term) Deletes the document(s) containing term . |
但是在实际应用中,我们删除文档到底是使用IndexReader还是IndexWriter呢?
从IndexReader deleteDocuments方法可知:“Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations.”
即:reader进行删除后,此reader马上能够生效,而用writer删除后,会被缓存,只有写入到索引文件中,当reader再次打开的时候,才能够看到。
实例演示:
IndexWriter w = null ;
IndexSearcher sea = null ;
try {
dir = SimpleFSDirectory.open( new File( " d:/realtime " ));
w = new IndexWriter(dir, null , MaxFieldLength.UNLIMITED ) ;
sea = new IndexSearcher(dir);
QueryParser parser = new QueryParser(Version.LUCENE_30, " f " , new StandardAnalyzer(Version.LUCENE_30));
String queryString = " fox6 " ;
Query query = parser.parse(queryString);
TopDocs docs = sea.search(query, 1 );
ScoreDoc[] sd = docs.scoreDocs;
for ( int i = 0 ; i < sd.length; i ++ ) {
System.out.println( " sea 1 : " + sea.doc(sd[i].doc));
}
// 删除之前可以检索到
// 执行删除操作
System.out.println( " delete : " + queryString );
w.deleteDocuments( new Term( " f " , queryString));
System.out.println( " delete finish ... " );
// 删除之后可以检索到(说明indexWriter执行删除doc动作后不会立即生效)
docs = sea.search(query, 1 );
sd = docs.scoreDocs;
for ( int i = 0 ; i < sd.length; i ++ ) {
System.out.println( " sea 2 : " + sea.doc(sd[i].doc));
}
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
} finally {
if (w != null ){
try {
w.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (sea != null ){
try {
sea.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
使用IndexWriter进行删除操作,删除后对删除的内容任然可以检索到,说明在当前IndexWriter实例当中删除的内容被缓存起来,并未马上生效!
注意:
即使IndexWriter在删除后进行提交(commit操作),也不会对删除动作马上生效。
下面来看看IndexReader执行同样操作的情况:
IndexReader reader = null ;
IndexSearcher sea = null ;
try {
dir = SimpleFSDirectory.open( new File( " d:/realtime " ));
reader = IndexReader.open(dir, false );
sea = new IndexSearcher(reader);
QueryParser parser = new QueryParser(Version.LUCENE_30, " f " , new StandardAnalyzer(Version.LUCENE_30));
String queryString = " fox7 " ;
Query query = parser.parse(queryString);
TopDocs docs = sea.search(query, 1 );
ScoreDoc[] sd = docs.scoreDocs;
for ( int i = 0 ; i < sd.length; i ++ ) {
System.out.println(reader.document(sd[i].doc));
}
// 删除之前可以检索到
// 执行删除操作
System.out.println( " delete : " + queryString );
reader.deleteDocuments( new Term( " f " , queryString));
System.out.println( " delete finish ... " );
// 删除之后检索不到(reader 对doc 删除后马上起效)
docs = sea.search(query, 1 );
sd = docs.scoreDocs;
for ( int i = 0 ; i < sd.length; i ++ ) {
System.out.println(reader.document(sd[i].doc));
}
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
} finally {
if (reader != null ){
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (sea != null ){
try {
sea.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
IndexReader进行删除操作后,对删除的内容已不能检索,说明当前IndexReader实例已经对删除生效。
从以上实例我们可以清楚了解IndexWriter和IndexReader在删除文档方面的不同之处,不过在此建议:
在应用中使用IndexWriter进行删除操作!
为什么呢?
1.在实时索引当中,我们通常会对一份磁盘索引用IndexWriter打开,然后一份磁盘索引被IndexWriter打开后,IndexReader的删除操作将无法进行,否则会报LockObtainFailedException异常。
2.Lucene的更新操作都是先删除后再添加的过程,如果使用IndexReader删除后,该动作马上生效,而添加动作又不会马上可见,会造成数据上的混乱(至少是理解上的混乱)。
例如:用户更新文档A,用IndexReader先删除A文档,然后再添加修改过后的A文档,此时若添加的文档未被IndexSearcher重新打开,而用户对A文档进行检索发现检索不出A文档,此时将产生数据不一致。