java lucene 模糊查询,Lucene通配符查询

最新推荐文章于 2021-02-24 05:02:36 发布

小寺川

最新推荐文章于 2021-02-24 05:02:36 发布

阅读量184

点赞数

文章标签： java lucene 模糊查询

博客讨论了在使用 Lucene 进行全文搜索时遇到的问题，特别是关于如何处理带有通配符的查询。作者指出，Lucene 的通配符、前缀和模糊查询不通过分析器，导致大小写不匹配。为了解决这个问题，提出了自定义 QueryParser 的可能性，以便在构造通配符查询之前对术语进行分析。这可能改变用户的预期行为，因为通常不建议对通配符查询应用分析，以避免意外结果。

摘要由CSDN通过智能技术生成

I have this question relating to Lucene.

I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text "textToLook".

I have a Lucene Analyzer with several filters. One of them is lowerCaseFilter, so when I create the index, words will be lowercased.

Imagine I want to search into two fields field1 and field2 so the lucene query would be something like this (note that 'textToLook' now is 'texttolook'):

field1: texttolook* field2:texttolook*

In my class I have something like this to create the query. I works when there is no wildcard.

String text = "textToLook";

String[] fields = {"field1", "field2"};

//analyser is the same as the one used for indexing

Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("customAnalyzer");

MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);

org.apache.lucene.search.Query queryTextoLibre = parser.parse(text);

With this code the query would be:

field1: texttolook field2:texttolook

but If I set text to "textToLook*" I get

field1: textToLook* field2:textToLook*

which won't find correctly as the indexes are in lowercase.

I have read in lucene website this:

" Wildcard, Prefix, and Fuzzy queries

are not passed through the Analyzer,

which is the component that performs

operations such as stemming and

lowercasing"

My problem cannot be solved by setting the behaviour case insensitive cause my analyzer has other fields which for examples remove some suffixes of words.

I think I can solve the problem by getting how the text would be after going through the filters of my analyzer, then I could add the "*" and then I could build the Query with MultiFieldQueryParser. So in this example I woud get "textToLower" and after being passed to to these filters I could get "texttolower". After this I could make "textotolower*".

But, is there any way to get the value of my text variable after going through all my analyzer's filters? How can I get all the filters of my analyzer? Is this possible?

Thanks

解决方案

Can you use QueryParser.setLowercaseExpandedTerms(true)?

** EDIT **

Okay, I understand your issue now. You actually want the wildcarded term to be stemmed before it's run through the wildcard query.

You can subclass QueryParser and override

protected Query getWildcardQuery(String field, String termStr) throws ParseException

to run termStr through the analyzer before the WildcardQuery is constructed.

This might not be what the user expects, though. There's a reason why they've decided not to run wildcarded terms through the analyzer, per the faq:

The reason for skipping the Analyzer

is that if you were searching for

"dogs*" you would not want "dogs"