java lucene 模糊查询,Lucene通配符查询

博客讨论了在使用 Lucene 进行全文搜索时遇到的问题,特别是关于如何处理带有通配符的查询。作者指出,Lucene 的通配符、前缀和模糊查询不通过分析器,导致大小写不匹配。为了解决这个问题,提出了自定义 QueryParser 的可能性,以便在构造通配符查询之前对术语进行分析。这可能改变用户的预期行为,因为通常不建议对通配符查询应用分析,以避免意外结果。
摘要由CSDN通过智能技术生成

I have this question relating to Lucene.

I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text "textToLook".

I have a Lucene Analyzer with several filters. One of them is lowerCaseFilter, so when I create the index, words will be lowercased.

Imagine I want to search into two fields field1 and field2 so the lucene query would be something like this (note that 'textToLook' now is 'texttolook'):

field1: texttolook* field2:texttolook*

In my class I have something like this to create the query. I works when there is no wildcard.

String text = "textToLook";

String[] fields = {"field1", "field2"};

//analyser is the same as the one used for indexing

Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("customAnalyzer");

MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);

org.apache.lucene.search.Query queryTextoLibre = parser.parse(text);

With this code the query would be:

field1: texttolook field2:texttolook

but If I set text to "textToLook*" I get

field1: textToLook* field2:textToLook*

which won't find correctly as the indexes are in lowercase.

I have read in lucene website this:

" Wildcard, Prefix, and Fuzzy queries

are not passed through the Analyzer,

which is the component that performs

operations such as stemming and

lowercasing"

My problem cannot be solved by setting the behaviour case insensitive cause my analyzer has other fields which for examples remove some suffixes of words.

I think I can solve the problem by getting how the text would be after going through the filters of my analyzer, then I could add the "*" and then I could build the Query with MultiFieldQueryParser. So in this example I woud get "textToLower" and after being passed to to these filters I could get "texttolower". After this I could make "textotolower*".

But, is there any way to get the value of my text variable after going through all my analyzer's filters? How can I get all the filters of my analyzer? Is this possible?

Thanks

解决方案

Can you use QueryParser.setLowercaseExpandedTerms(true)?

** EDIT **

Okay, I understand your issue now. You actually want the wildcarded term to be stemmed before it's run through the wildcard query.

You can subclass QueryParser and override

protected Query getWildcardQuery(String field, String termStr) throws ParseException

to run termStr through the analyzer before the WildcardQuery is constructed.

This might not be what the user expects, though. There's a reason why they've decided not to run wildcarded terms through the analyzer, per the faq:

The reason for skipping the Analyzer

is that if you were searching for

"dogs*" you would not want "dogs"

first stemmed to "dog", since that

would then match "dog*", which is not

the intended query.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值