ES 相似度算法设置(续)

Tuning BM25

One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:

k1
This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is  1.2. Lower values result in quicker saturation, and higher values in slower saturation.
b
This parameter controls how much effect field-length normalization should have. A value of  0.0disables normalization completely, and a value of  1.0 normalizes fully. The default is  0.75.

The practicalities of tuning BM25 are another matter. The default values for k1 and b should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.

The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:

PUT /my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type":       "string",
          "similarity": "BM25" 
        },
        "body": {
          "type":       "string",
          "similarity": "default" 
        }
      }
  }
}

The title field uses BM25 similarity.

The body field uses the default similarity (see Lucene’s Practical Scoring Function).

Currently, it is not possible to change the similarity mapping for an existing field. You would need to reindex your data in order to do that.

Configuring BM25

Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:

PUT /my_index
{
  "settings": {
    "similarity": {
      "my_bm25": { 
        "type": "BM25",
        "b":    0 
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type":       "string",
          "similarity": "my_bm25" 
        },
        "body": {
          "type":       "string",
          "similarity": "BM25" 
        }
      }
    }
  }
}

参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html

转载于:https://www.cnblogs.com/bonelee/p/6472828.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值