在Apache Lucene中如何删除恢复记录

最新推荐文章于 2022-02-12 18:10:11 发布

sinton

最新推荐文章于 2022-02-12 18:10:11 发布

阅读量1.1k

点赞数

文章标签： lucene apache delete 全文检索 list url

本文链接：https://blog.csdn.net/sinton/article/details/1336436

版权

有两种办法删除,如下:

The delete(int) method is used when the sequential number of the document to be deleted within the index is known. For example, when iterating the document list and deleting documents that match certain criteria, the sequential number of the current document is available for the document number iterator.

The delete(Term) method can be used when a term that matches exactly the document(s) you want to delete can be specified. For example, if you know the location (URL) of the document, you can use it to delete the document of that location. Or, if you want to delete all the documents from a specific site, have in your index a 'site' field that contains the host name of the site so you can delete all the documents of that site in a single and quick operation.

第一种方法说明: 新建一个全文检索库后, 添加的记录默认从0开始计算, 即第一条编号为0,第二条为1,依次递增.delete(int)此中的参数即为此编号, 比如你添加了三条记录后,delete(1)则删除了第二条.

代码如下:

IndexReader indexReader = IndexReader.open("g://indexdb//db1");
indexReader.delete(1);
indexReader.close();

第二种方法说明: 适合于批量次删除, 删除符合条件的一批(条)记录, 该方法较之前者更为常用. term是lucene中的一个基本概念, 采用键值对形式表示.

代码如下:

IndexReader indexReader = IndexReader.open("g://indexdb//db1");
Term term = new Term("filename","doc1.txt");
indexReader.delete(term);
indexReader.close();

用term表示需要删除记录的条件,如上,即删除凡文件叫为doc1.txt全删除(并不见得只有一条记录哦).

入库后的记录无法修改, 若要修改只有先删再加.

恢复Document

因为Document的删除延迟到IndexReader实例关闭时才执行，Lucene允许程序改变想法并恢复已做删除标记的Document。对IndexReader的undeleteAll()方法的调用通过清除索引目录中的.del文件来恢复所有删除的Document。所以在关闭IndexReader实例关闭之后Document就保留在索引中了。只能使用与删除Document时同一个IndexReader实例，才能调用undeleteAll()来恢复Document.