es match_phrase和edge_ngram&ngram分词器的区别

****** 本文仅作为项目中用到知识点的记录,防止下次看到再去各种百度,个人理解!!!仅供参考!!!

由于官网对于match_phrase的解释有限,可参考这篇文章,讲的比较详细,点这里有match和match_phrase的比较

note: match和match_phrase一样 都会对搜索的条件进行分词查询,但是上面文章有一点提到的,图中红色选中的部分,不太理解,举例如下:在这里插入图片描述
使用的是edge_ngram分词器
ngram会细分,如name 会分词成n,na,am,me,但是edge_ngram只会从开头分词,如n,na

1.创建mapping,并指定自定义的edge_ngram分词器

PUT localhost:9200/edge_ngram_custom_example
{
  "mappings":{
      "properties": {
        "content": {
            "type": "text",
            "analyzer": "my_edge_ngram"
            }
        }
    },  
  "settings": {
    "analysis": {
      "analyzer": {
          "my_edge_ngram": {
                "tokenizer": "custom_edge_ngram"
          }        
      },
      "tokenizer": {
        "custom_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 2
          ,"token_chars": [
            "letter",
            "punctuation",
            "symbol",
            "digit"
            ]
        }
      }
    }
  }
}

2.添加数据

POST localhost:9200/edge_ngram_custom_example/_doc/2

{
    "content": "that isnot a test"
}

3.查询1

POST localhost:9200/edge_ngram_custom_example/_search

{
    "query":{
        "match_phrase":{
            "content": {
                "query": "th is a t"
                // ,"slop": 0
            }
            
        }
    }
}

结果:
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.7260926,
        "hits": [
            {
                "_index": "edge_ngram_custom_example",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.7260926,
                "_source": {
                    "content": "that isnot a test"
                }
            }
        ]
    }
}

4.查询2

POST localhost:9200/edge_ngram_custom_example/_search
{
    "query":{
        "match_phrase":{
            "content": {
                "query": "th i a t"
                // ,"slop": 0
            }
            
        }
    }
}

结果:
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

可以看到 查询1和查询2的查询条件只是少了一个s,name我们来看下分词结果

POST localhost:9200/edge_ngram_custom_example/_analyze
{
   "analyzer": "my_edge_ngram",
   "text": "that isnot a test"
}

结果:
{
    "tokens": [
        {
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 0
        },
        {
            "token": "th",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 1
        },
        {
            "token": "i",
            "start_offset": 5,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 3
        },
        {
            "token": "a",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 4
        },
        {
            "token": "t",
            "start_offset": 13,
            "end_offset": 14,
            "type": "word",
            "position": 5
        },
        {
            "token": "te",
            "start_offset": 13,
            "end_offset": 15,
            "type": "word",
            "position": 6
        }
    ]
}

可以看到is 是被分词成"i"和"is"的,按照上面的说法position必须连续 th 和 a 中间隔着一个i和is 理论上是根本没法连续的,但是 使用position 1,3,4,5的顺序就能查到,1,2,4,5就没有查到,不太理解为啥,有知道的可以评论一下

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值