Solr的一些查询参数

最新推荐文章于 2024-05-09 17:51:30 发布

weixin_33805557

最新推荐文章于 2024-05-09 17:51:30 发布

阅读量84

点赞数

文章标签： c/c++ 人工智能 php

原文链接：https://my.oschina.net/fir01/blog/158569

版权

为什么80%的码农都做不了架构师？>>>

fl: 是逗号分隔的列表，用来指定文档结果中应返回的 Field 集。默认为 “*”，指所有的字段。

defType: 指定query parser，常用defType=lucene, defType=dismax, defType=edismax

q: 查询字符串，必须的。

q.alt: 当q字段为空时，用于设置缺省的query，通常设置q.alt为*:*。

qf: query fields，指定solr从哪些field中搜索。

pf: 用于指定一组field，当query完全匹配pf指定的某一个field时，来进行boost。

简言之pf的作用是boosting phrases over words。

fq: filter query，（filter query）过虑查询，作用：在q查询符合结果中同时是fq查询符合的，例如：q=mm&fq=date_time:[20081001 TO 20091031]，找关键字mm，并且date_time是20081001到20091031之间的。官方文档：http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002

不常用

mm: minimal should match。Solr支持三种查询clause，即“必须出现”， “不能出现”和“可以出现”，分别对应于AND, -, OR。

在默认情况下，使用OR这个clause。mm用于设置在使用OR这个clause时，需要出现最少的满足条件的clause数量，详见这里。

ps: Phrase Slop. Amount of slop on phrase queries built for "pf" fields (affects boosting). ps is about pf parameter. ps affects boosting, if you play with ps value, numFound and result set do not change. But the order of result set change. This is about the phrase query that is constructed out of the entire "q" param. ps is slop applied to the phrases created from the entire query for evaluating pf boosts.ps will only (potentially) change the ranked ordering of your result set, by loosening what a "phrase match" means to the pf boost.

ps的例子：

Lets say your query is apache solr. (without quotation marks)

Lets say these three documents contains all of these words and returned.

1-) solr is built on the top of apache lucene.
2-) apache solr is fast, mature and popular.
3-) solr is hosted under apache umbrella.

Even if you don't use pf and ps parameters, those documents will be in result set anyway. Lets say that they appear in this order 1,2,3.

Then we include pf and ps parameter, q=apache solr&pf=title^1.2&ps=1
Second document is boosted, lets say it comes first now. The order is changed. The documents - that have the all query words close each other - are boosted. Again the same three documents are returned.

qs: Query Phrase Slop. Amount of slop on phrase queries explicitly included in the user's query string (in qf fields; affects matching). qs affects matching. If you play with qs, numFound changes. This parameter is about when you have explicit phrase query in your raw query. i.e. &q="apache lucene" . qs is slop applied to phrases explicitly in the &q with double quotes. qs will (potentially) change your result set.

tie: tie breaker。

bq: 对某个field的value进行boost，例如brand:IBM^5.0。

bf: Function (with optional boosts) that will be included in the user's query to influence the score. Any function supported natively by Solr can be used, along with a boost value, e.g.: recip(rord(myfield),1,2,3)^1.5

wt: writer type，指定输出格式，可以有 xml, json, php, phps。

q.op: 覆盖schema.xml的defaultOperator（有空格时用"AND"还是用"OR"操作逻辑）。
df: 默认的查询字段。
qt: query type，指定那个类型来处理查询请求，一般不用指定，默认是standard。

hight:
hl-highlight，h1=true，表示采用高亮。可以用h1.fl=field1,field2 来设定高亮显示的字段。

hl.fl: 用空格或逗号隔开的字段列表。要启用某个字段的highlight功能，就得保证该字段在schema中是stored。如果该参数未被给出，那么就会高亮默认字段 standard handler会用df参数，dismax字段用qf参数。你可以使用星号去方便的高亮所有字段。如果你使用了通配符，那么要考虑启用 hl.requiredFieldMatch选项。
hl.requireFieldMatch:
如果置为true，除非该字段的查询结果不为空才会被高亮。它的默认值是false，意味着它可能匹配某个字段却高亮一个不同的字段。如果hl.fl使用了通配符，那么就要启用该参数。尽管如此，如果你的查询是all字段（可能是使用 copy-field 指令），那么还是把它设为false，这样搜索结果能表明哪个字段的查询文本未被找到
hl.usePhraseHighlighter:
如果一个查询中含有短语（引号框起来的）那么会保证一定要完全匹配短语的才会被高亮。
hl.highlightMultiTerm
如果使用通配符和模糊搜索，那么会确保与通配符匹配的term会高亮。默认为false，同时hl.usePhraseHighlighter要为true。
hl.snippets：
这是highlighted片段的最大数。默认值为1，也几乎不会修改。如果某个特定的字段的该值被置为0（如f.allText.hl.snippets=0），这就表明该字段被禁用高亮了。你可能在hl.fl=*时会这么用。
hl.fragsize:
每个snippet返回的最大字符数。默认是100.如果为0，那么该字段不会被fragmented且整个字段的值会被返回。大字段时不会这么做。
hl.mergeContiguous:
如果被置为true，当snippet重叠时会merge起来。
hl.maxAnalyzedChars:
会搜索高亮的最大字符，默认值为51200，如果你想禁用，设为-1
hl.alternateField:
如果没有生成snippet（没有terms 匹配），那么使用另一个字段值作为返回。
hl.maxAlternateFieldLength:
如果hl.alternateField启用，则有时需要制定alternateField的最大字符长度，默认0是即没有限制。所以合理的值是应该为
hl.snippets * hl.fragsize这样返回结果的大小就能保持一致。
hl.formatter:一个提供可替换的formatting算法的扩展点。默认值是simple，这是目前仅有的选项。显然这不够用，你可以看看org.apache.solr.highlight.HtmlFormatter.java 和 solrconfig.xml中highlighting元素是如何配置的。
注意在不论原文中被高亮了什么值的情况下，如预先已存在的em tags，也不会被转义，所以在有时会导致假的高亮。
hl.fragmenter:
这个是solr制定fragment算法的扩展点。gap是默认值。regex是另一种选项，这种选项指明highlight的边界由一个正则表达式确定。这是一种非典型的高级选项。为了知道默认设置和fragmenters (and formatters)是如何配置的，可以看看solrconfig.xml中的highlight段。
regex 的fragmenter有如下选项：
hl.regex.pattern:正则表达式的pattern
hl.regex.slop:这是hl.fragsize能变化以适应正则表达式的因子。默认值是0.6，意思是如果hl.fragsize=100那么fragment的大小会从40-160.

这些值都可以在select中加入，也可以用solrj的api去设定，也可以配置在solrconfig.xml中配置。
示例如下：
<requestHandler name="search" class="solr.SearchHandler" default="true">
    <!– default values for query parameters can be specified, these
         will be overridden by parameters in the request
      –>
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <bool name="hl">true</bool>
       <str name="hl.fl">title,content</str>
       <str name="f.content.hl.fragsize">200</str>
       <str name="mlt.qf">
         id^10.0 title^10.0 content^1.0
       </str>
     </lst>
</requestHandler>

solr用着还是不错，比自己去搞Lucene好多了，可以加上jsoup去抓网页指定的内容，再配置下xml,用solrj写个好看的搜索页面。有空自己用grails写一个玩下看。

转载于:https://my.oschina.net/fir01/blog/158569