进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验

尝鲜

GET /forum/article/_search

{

    "query": {

        "match_phrase": {

            "title": {

                "query": "java spark",

                "slop":  1

            }

        }

    }

}

 

结果:

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 0,

    "max_score": null,

    "hits": []

  }

}

 

 

slop(移动)的含义是什么?

 

query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop

slop实际移动举例

实际举例,一个query string经过几次移动之后可以匹配到一个document,然后设置slop

 

hello world, java is very good, spark is also very good.

 

java spark,match phrase,搜不到

 

如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配

 

java           is               very           good         spark         is

 

java     spark

java        -->        spark                      移动一位

java            -->                        spark             移动两位

java             -->                          spark   移动三位

 

这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了

 

slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。而是说,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上

 

slop,设置的是3,那么就ok

 

GET /forum/article/_search

{

    "query": {

        "match_phrase": {

            "title": {

                "query": "spark data",

                "slop":  3

            }

        }

    }

}

 

就可以把刚才那个doc匹配上,那个doc会作为结果返回

 

但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的

 

做实验,验证slop的含义

实验一

GET /forum/article/_search

{

  "query": {

    "match_phrase": {

      "content": {

        "query": "spark data",

        "slop": 3

      }

    }

  }

}

结果:

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 0,

    "max_score": null,

    "hits": []

  }

}

 

实验二

GET /forum/article/_search

{

  "query": {

    "match_phrase": {

      "content": {

        "query": "spark data",

        "slop": 2

      }

    }

  }

}

结果

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 0,

    "max_score": null,

    "hits": []

  }

}

 

实验三

GET /forum/article/_search

{

  "query": {

    "match_phrase": {

      "content": {

        "query": "spark data",

        "slop": 3

      }

    }

  }

}

结果:

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 1,

    "max_score": 0.21824157,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 0.21824157,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith"

        }

      }

    ]

  }

}

 

Spark  is  best  big  data  solution based on scala ,an programming language similar to java spark

 

spark data

         --> data  移动一位

             -->  data  移动两位

spark               -->   data  移动三位

实验四增强

GET /forum/article/_search

{

  "query": {

    "match_phrase": {

      "content": {

        "query": "data spark",

        "slop": 5

      }

    }

  }

}

结果:

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 1,

    "max_score": 0.154366,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 0.154366,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith"

        }

      }

    ]

  }

}

 

 

spark             is                          best        big                data

 

data          spark

-->               data/spark   移动一位

spark          àdata     移动两位

spark             -->                      data     移动三位

spark                                         -->               data    移动四位

spark                                                              -->               data    移动五位

 

slop搜索下,关键词离的越近,relevance score就会越高,做实验说明。。。

{

  "took": 4,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 1.3728157,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 1.3728157,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language",

          "sub_title": "learned a lot of course",

          "author_first_name": "Smith",

          "author_last_name": "Williams",

          "new_author_last_name": "Williams",

          "new_author_first_name": "Smith"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 0.5753642,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith",

          "new_author_last_name": "Peter Smith",

          "new_author_first_name": "Tonny"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "1",

        "_score": 0.28582606,

        "_source": {

          "articleID": "XHDK-A-1293-#fJ3",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-01",

          "tag": [

            "java",

            "hadoop"

          ],

          "tag_cnt": 2,

          "view_cnt": 30,

          "title": "this is java and elasticsearch blog",

          "content": "i like to write best elasticsearch article",

          "sub_title": "learning more courses",

          "author_first_name": "Peter",

          "author_last_name": "Smith",

          "new_author_last_name": "Smith",

          "new_author_first_name": "Peter"

        }

      }

    ]

  }

}

 

实验

GET /forum/article/_search

{

  "query": {

    "match_phrase": {

      "content": {

        "query": "java best",

        "slop": 15

      }

    }

  }

}

结果

{

  "took": 3,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 0.65380025,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 0.65380025,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language",

          "sub_title": "learned a lot of course",

          "author_first_name": "Smith",

          "author_last_name": "Williams",

          "new_author_last_name": "Williams",

          "new_author_first_name": "Smith"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "5",

        "_score": 0.07111243,

        "_source": {

          "articleID": "DHJK-B-1395-#Ky5",

          "userID": 3,

          "hidden": false,

          "postDate": "2017-03-01",

          "tag": [

            "elasticsearch"

          ],

          "tag_cnt": 1,

          "view_cnt": 10,

          "title": "this is spark blog",

          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",

          "sub_title": "haha, hello world",

          "author_first_name": "Tonny",

          "author_last_name": "Peter Smith",

          "new_author_last_name": "Peter Smith",

          "new_author_first_name": "Tonny"

        }

      }

    ]

  }

}

 

 

 

其实,加了slop的phrase match,就是proximity match,近似匹配

 

1、java spark,短语,doc,phrase match

2、java spark,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match

 

 

移动搜索的短语,以达到文档的内容

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值