进阶-第27__深度探秘搜索技术_实战用function_score自定义相关度分数算法

我们可以做到自定义一个function_score函数,自己将某个field的值,跟es内置算出来的分数进行运算,然后由自己指定的field来进行分数的增强

添加字段和数据

给所有的帖子数据增加follower数量

POST /forum/article/_bulk

{ "update": { "_id": "1"} }

{ "doc" : {"follower_num" : 5} }

{ "update": { "_id": "2"} }

{ "doc" : {"follower_num" : 10} }

{ "update": { "_id": "3"} }

{ "doc" : {"follower_num" : 25} }

{ "update": { "_id": "4"} }

{ "doc" : {"follower_num" : 3} }

{ "update": { "_id": "5"} }

{ "doc" : {"follower_num" : 60} }

 

 

将对帖子搜索得到的分数,跟follower_num进行运算,由follower_num在一定程度上增强帖子的分数

看帖子的人越多,那么帖子的分数就越高

测试-不和点击量挂钩

GET /forum/article/_search

{

      "query": {

        "multi_match": {

          "query": "java spark",

          "fields": ["tile", "content"]

        }

 

  }

}

结果:

{

  "took": 2,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 0.68640786,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 0.68640786,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language",

          "sub_title": "learned a lot of course",

          "author_first_name": "Smith",

          "author_last_name": "Williams",

          "follower_num": 10

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 0.68324494,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith",

          "follower_num": 60

        }

      }

    ]

  }

}

测试-和点击量挂钩

GET /forum/article/_search

{

  "query": {

    "function_score": {

      "query": {

        "multi_match": {

          "query": "java spark",

          "fields": ["tile", "content"]

        }

      },

      "field_value_factor": {

        "field": "follower_num",//影响分数的字段

        "modifier": "log1p",//对影响分数的字段进行log(1 + number_of_votes)

        "factor": 0.5// * log(1 + factor * number_of_votes) 对应性分数的字段进行乘以影响因子

      },

      "boost_mode": "multiply",//员分数和影响分数字段的关系old_score * log(1 + factor * number_of_votes),如果不写:默认的是相乘

      "max_boost": 2//最大分数的限制

    }

  }

}

结果:

{

  "took": 2,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 1.0189654,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 1.0189654,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith",

          "follower_num": 60

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 0.53412914,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language",

          "sub_title": "learned a lot of course",

          "author_first_name": "Smith",

          "author_last_name": "Williams",

          "follower_num": 10

        }

      }

    ]

  }

}

 

 

如果只有field(比如:follower_num),那么会将每个doc的分数都乘以follower_num,如果有的doc follower是0,那么分数就会变为0,效果很不好。因此一般会加个log1p函数,公式会变为,new_score = old_score * log(1 + number_of_votes),这样出来的分数会比较合理

再加个factor,可以进一步影响分数,new_score = old_score * log(1 + factor * number_of_votes)

boost_mode(如果不写:默认的是相乘),可以决定分数与指定字段的值如何计算,multiply,sum,min,max,replace

max_boost,限制计算出来的分数不要超过max_boost指定的值(这个参数越来越没结果了)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值