最近在使用solr,基本功能已经搞得差不多的情况下,试了下搜索 ? [] 之类的特殊字符。但是特么返回结果是0。作为一个出名的搜索引擎,连这点小功能都搞不定,老脸还能挂得住吗?
官网的答案是这么说的:
Escaping Special Characters
Solr gives the following characters special meaning when they appear in a query:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : /
To make Solr interpret any of these characters literally, rather as a special character, precede the character with a backslash character \. For example, to search for (1+1):2 without having Solr interpret the plus sign and parentheses as special characters for formulating a sub-query with two terms, escape the characters by preceding each one with a backslash:
但是试来试去,就是不起作用。经过一天呕心沥血地google,几近绝望打算阅读源码的情况下,总算是在stackoverflow上找到了clue。
You are using the standard text_general
field for the title attribute. This might not be a good choice. text_general
is meant to be for huge chunks of text (or at least sentences) and not so much for exact matching of names or titles.
The problem here is that text_general
uses the StandardTokenizerFactory.
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
StandardTokenizerFactory
does the following:
A good general purpose tokenizer that strips many extraneous characters and sets token types to meaningful values. Token types are only useful for subsequent token filters that are type-aware of the same token types.
This means the '-' character will be completely ignored and be used to tokenize the String.
"kong-fu" will be represented as "kong" and "fu". The '-' disappears.
This does also explain why select?q=title:\-
won't work here.
Choose a better fitting field type:
Instead of the StandardTokenizerFactory
you could use the solr.WhitespaceTokenizerFactory
, that only splits on whitespace for exact matching of words. So making your own field type for the title attribute would be a solution.
Solr also has a mininal fieldtype called text_ws
. Depending on your requirements this might be enough.
然后偶就把字段类型改成了text_ws,竟然特妈成功了。浑身上下感到不得不留贴纪念。
stackoverflow真是个大神出没的地方!