ElasticSearch30：初识搜索引擎_query string的分词以及mapping引入案例遗留问题的大揭秘

最新推荐文章于 2024-04-15 02:48:35 发布

一枚程序员

最新推荐文章于 2024-04-15 02:48:35 发布

阅读量497

点赞数

分类专栏： ElasticSearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/m0_37557582/article/details/78952942

版权

ElasticSearch 专栏收录该内容

60 篇文章 2 订阅

订阅专栏

1.query string的分词

query string 必须以和index建立时相同的analyzer建立分词

query string对exact value和full text区别对待,不同类型的field可能有些是full text,有的是exact value

例如：date:exact value

_all:full text

比如说，我们有一个document，其中有一个field，包含的value：hello you and me，建立倒排索引

我们要搜索这个document对应的index，搜索文本是hell me，这个搜索文本就是query string

query string ，默认情况下，es会使用它对应的field建立倒排索引时相同的分词器去进行分词，分词和normalization，只有这样，才能实现正确的搜索。

如我们在建立倒排索引的时候，将dogs->dog，结果你搜索的时候，还是一个dogs，那就搜索不到结果了。所以需要在搜索的时候，那个dogs也必须变成dog才行。

2.mapping的原理揭秘

执行命令

GET /website/article/_search?q=2018

GET /website/article/_search?q=2018-01-01

搜索的是_all field，document所有的field拼接成一个字符串，进行分词

如下面这个document，_all field则是：2018-01-02 my second article this is my second article in this website 11400，

其他几个document也一样

      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "post_date": "2018-01-02",
          "title": "my second article",
          "content": "this is my second article in this website",
          "author_id": 11400
        }
      },

所以分词

2017 doc1 doc2 doc3

01 doc1

02 doc1 doc2

03 doc1 doc3

因为_all执行的是full text，所以在搜索2017或者2017-01-01（分词成2017,01,01），都会搜索到三条数据doc1，doc2，doc3

而执行下面的查询时：

GET /website/article/_search?q=post_date:2018-01-01
GET /website/article/_search?q=post_date:2018

date：会作为exact value去建立索引。

doc1 doc2 doc3

2017-01-01 *

2017-01-02 *

2017-01-03 *

那么post_date:2018-01-01,query string不会分词，是以exact value的方式搜索，所以只查询到一条数据

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "post_date": "2018-01-01",
          "title": "my first article",
          "content": "this is my first article in this website",
          "author_id": 11400
        }
      }
    ]
  }
}

post_date:2018为什么也是查询到一条数据？这个是es5.2后的优化，做了额外的一些优化。（后面的文章分析）

post_date:01 是查询不到数据的

3.测试分词器

指定分词器类型，以及需要分词的text文本

GET /_analyze
{
"analyzer": "standard",
"text": "text to analyzer"
}

执行结果：

{
  "tokens": [
    {
      "token": "text",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "to",
      "start_offset": 5,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "analyzer",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}