[lucene异常]why am I getting a TooManyClause exception

异常情况:

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
 at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:106)

 

出错代码:

BooleanQuery用一个变量存储搜索字句 clauses 是一个List类型,同时使用另外一个变量限制其长度

而出现这个异常的代码,就是:

 

为什么呢?看了官方的文档

以下类型的查询是扩大了Lucene中的搜索:RangeQuery,PrefixQuery,WildcardQuery,FuzzyQuery。

比如,如果索引文档中包括条件 car 和 cars,那么使用 ca* 搜索之前,将被扩展成 car or cars。(查询字句的数目增大了,尤其是数据量较大,数据相似度较高,搜索条件较短的情况下这个出现的概率更高),这个条件列表的长度默认被限制在1024。当超出了1024的时候,就从上面的代码中抛出了异常。

解决方法有三种推荐的:

1、使用RangeFilter替换部分查询RangeQuery,但是效率会有影响;

2、设置默认长度值,BooleanQuery.setMaxClauseCount(),设置成10000,或者取消这个限制,BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

3、针对个别特殊的字段进行一些优化,比如时间字段保留到yyyyMMdd位,以避免后面时分位带来的搜索条件的扩大。

--------------------------------------

 

官方FAQ解释:

The following types of queries are expanded by Lucene before it does the search: RangeQuery, PrefixQuery, WildcardQuery, FuzzyQuery. For example, if the indexed documents contain the terms "car" and "cars" the query "ca*" will be expanded to "car OR cars" before the search takes place. The number of these terms is limited to 1024 by default. Here's a few different approaches that can be used to avoid the TooManyClauses exception:

  • Use a filter to replace the part of the query that causes the exception. For example, a RangeFilter can replace a RangeQuery on date fields and it will never throw the TooManyClauses exception -- You can even use ConstantScoreRangeQuery to execute your RangeFilter as a Query. Note that filters are slower than queries when used for the first time, so you should cache them using CachingWrapperFilter. Using Filters in place of Queries generated by QueryParser can be achieved by subclassing QueryParser and overriding the appropriate function to return a ConstantScore version of your Query.

  • Increase the number of terms using BooleanQuery.setMaxClauseCount(). Note that this will increase the memory requirements for searches that expand to many terms. To deactivate any limits, use BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

  • A specfic solution that can work on very precise fields is to reduce the precision of the data in order to reduce the number of terms in the index. For example, the DateField class uses a microsecond resultion, which is often not required. Instead you can save your dates in the "yyyymmddHHMM" format, maybe even without hours and minutes if you don't need them (this was simplified in Lucene 1.9 thanks to the new DateTools class).

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值