Elasticsearch7系列-检索语法学习

最新推荐文章于 2024-07-22 10:55:12 发布

Mr.Songx

最新推荐文章于 2024-07-22 10:55:12 发布

阅读量1.1k

点赞数

分类专栏： Elasticsearch7 文章标签： elasticsearch 实时大数据大数据搜索引擎实时搜索

本文链接：https://blog.csdn.net/RIGHTSONG/article/details/115580816

版权

Elasticsearch7 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

match类查询

分词查询，一般使用在text类型的字段上，检索前会先对搜索词进行分词，分词完毕后再逐个以分词结果去查询，只要被查询字段的分词集合中（字段类型是text才有分词集）包含match字段的分词集合中任意一个

设置字段为"index":"not_analyzed"后该字段数据将不会被分词，这样的话使用match就无法检索到

分词查询

前面提到match搜索会先对搜索词进行分词，对于最基本的match搜索来说，只要match搜索词的分词集合中的一个存在于文档中即可，例如，当我们搜索福建省福州市，搜索词会先分词为福建省和福州市,只要文档中包含福建省和福州市任意一个词，都会被搜索到

注意：默认的standard分词器对中文不友好，福建省福州市分词后的结果是单个词一个组，如果想分词后的结果是福建省和福州市，需使用其他分词器，例如IK

GET /bank/_search
{
  "query": {
  	"match": {
    	"address": "福建省福州市"
  	}
  }
}

其中bank为索引名，address为字段名

结果说明

took：代表该次检索花费的时间（毫秒）

hits.max_score：代表所有记录中与match条件内匹配度最高的值，由lucene使用TF/IDF算法打分

hits.total.value：代表符合match语法的记录数

_score：单条记录的匹配度

最终结果默认是按相似度分数/相关性分数从高到低排序

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 32.424847,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "995",
        "_score" : 32.424847,
        "_source" : {
          "account_number" : 995,
          "balance" : 21153,
          "firstname" : "Phelps",
          "lastname" : "Parrish",
          "age" : 25,
          "gender" : "M",
          "address" : "福建省福州市",
          "employer" : "Pearlessa",
          "email" : "phelpsparrish@pearlessa.com",
          "city" : "Brecon",
          "state" : "ME"
        }
      }
    ]
  }
}

如果我们想搜索的结果是福建省与福州市都包含的文档的话，我们可以修改匹配模式，默认匹配模式是or,也就是福建省或福州市匹配就可以，我们可以改成and，也就是福建省与福州市需要同时匹配

GET /bank/_search
{
  "query": {
    "match": {
      "address": {
        "query":"福建省福州市",
        "operator": "and"
      }
    }
  }
}

其中bank为索引名，address为字段名

分词短语查询

match_phrase查询首先解析查询字符串来产生一个词条列表。然后会搜索所有的词条，但只保留包含了所有搜索词条的文档，并且词条的位置要邻接，如下所示，我查询bank索引address字段包含中华人民共和国福建省福州市短语的记录，该方式只会检索出包含中华人民共和国福建省福州市且这三个单词相邻的记录

GET /bank/_search
{
  "query": {
  	"match_phrase": {
    	"address": "中华人民共和国福建省福州市"
  	}
  }
}

结果说明

从结果中可以看出，使用match_phrase语法检索后，仅有一条符合条件，根据上面的匹配查询的结果集可以看出，确实只有一条记录的address是同时包含中华人民共和国、福建省和福州市且三个单词是相连的；

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 17.139534,
    "hits" : [
      {
        "_index" : "bank2",
        "_type" : "_doc",
        "_id" : "990",
        "_score" : 17.139534,
        "_source" : {
          "account_number" : 990,
          "balance" : 44456,
          "firstname" : "Kelly",
          "lastname" : "Steele",
          "age" : 35,
          "gender" : "M",
          "address" : "中华人民共和国福建省福州市马尾区名称中心铂悦府",
          "employer" : "Eschoir",
          "email" : "kellysteele@eschoir.com",
          "city" : "Stewartville",
          "state" : "ID"
        }
      }
    ]
  }
}

slop：词条偏移量

问题：若我希望中华人民共和国福州市也能被检索到呢？

有几种方式（着重说明第二种）：

1、使用match+and的方式，该方式的逻辑是把中华人民共和国福州市先进行分词后，然后在对文档进行建设，包含所有分词才算匹配

2、使用match_pharse+slop的方式，slop代表允许词条位置位移次数，下面我们举个例子

使用match_phrase默认情况下，slop是0，也就是要求分词后所有词项与 doc 中出现的相对顺序位置一样，也因此在上面不指定slop值的例子中，不管我检索内容是中华人民共和国福州市还是福建省中华人民共和国都会搜索不到，因为这两个的短语分词后的结果与doc中出现的顺序不一样

GET /bank3/_search
{
  "query": {
    "match_phrase": {
      "address": {
        "query": "502 Terrace",
        "slop": 1
      }
    }
  }
}

如上例子所示，在bank3索引中，有一条address = 502 Baycliff Terrace的数据，而这个字段分词后将形成3个词：502 Baycliff Terrace ，这3个词的分词后的结果如下所示

{
  "tokens" : [
    {
      "token" : "502",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "ARABIC",
      "position" : 0
    },
    {
      "token" : "baycliff",
      "start_offset" : 4,
      "end_offset" : 12,
      "type" : "ENGLISH",
      "position" : 1
    },
    {
      "token" : "terrace",
      "start_offset" : 13,
      "end_offset" : 20,
      "type" : "ENGLISH",
      "position" : 2
    }
  ]
}

我们可以看到，502的词条位置position=0，baycliff的词条位置position=1，terrace的词条位置positon=2，默认情况下使用match_pharse查询slop=0，那么查询条件的分词结果502和terrace的相对位置差值为1，跟文档中的差值为2不一致，因此此时是检索不出来的，此时若设置slop=1，那么查询条件分词后，把terrace从position+1后，两个词条的相对位置差值就与文档中的差值一致了，此时就可以检索出来了

检索中文内容时的slop值

为什么以上例子我没有使用检索中文类型的字段来说明呢，我们可以看下，若我检索的是中文字段

GET /bank3/_search
{
  "query": {
    "match_phrase": {
      "address": {
        "query": "中华人民共和国福州市",
        "slop": 1
      }
    }
  }
}

如上例子所示，我们可能很简单的会认为，我们通过以上检索，要匹配到中华人民共和国福建省福州市马尾区名称中心铂悦府这条记录，只需要设置slop=1就可以了，但是并不是这样的，需要设置slop=3才行，这是为什么呢？这要从中文的分词说起，该字段使用ik分词器，对分词后的结果为

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "福建省",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "福建",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 10
    },
    {
      "token" : "省",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 11
    },
    {
      "token" : "福州市",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "福州",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 13
    },
    {
      "token" : "市",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "CN_CHAR",
      "position" : 14
    },
    {
      "token" : "马尾区",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "CN_WORD",
      "position" : 15
    },
    {
      "token" : "马尾",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 16
    },
    {
      "token" : "区",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "CN_CHAR",
      "position" : 17
    },
    {
      "token" : "名称",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "CN_WORD",
      "position" : 18
    },
    {
      "token" : "中心",
      "start_offset" : 18,
      "end_offset" : 20,
      "type" : "CN_WORD",
      "position" : 19
    },
    {
      "token" : "铂",
      "start_offset" : 20,
      "end_offset" : 21,
      "type" : "CN_CHAR",
      "position" : 20
    },
    {
      "token" : "悦",
      "start_offset" : 21,
      "end_offset" : 22,
      "type" : "CN_CHAR",
      "position" : 21
    },
    {
      "token" : "府",
      "start_offset" : 22,
      "end_offset" : 23,
      "type" : "CN_CHAR",
      "position" : 22
    }
  ]
}

可以看到福州市这个词的词条位置position=12，而中华人民共和国这个词的词条位置position=0，那么我们的检索条件的两个词的词条位置也是需要相差12-0=12才能检索到，而检索条件中华人民共和国福州市分词后的结果为

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "福州市",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "福州",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 10
    },
    {
      "token" : "市",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 11
    }
  ]
}

其中福州市这个词的词条位置position=9，而中华人民共和国这个词的词条位置position=0，这两个相对位置差值为9-0=9，如果设置slop=3，可以代表把福州市的position+3=12，那么此时两个词条的相对位置差值就是12，与文档中这两个词的相对位置差一致，这样才能被检索出来

多字段分词匹配

multi_match提供了一个简便的方法用来对多个字段执行相同的查询，即对指定的多个字段进行match查询，其有三种类型，

best_fields，most_fields以及cross_fields，默认为best_fields。

GET /bank3/_search
{
  "query": {
    "multi_match": {
        "query": "中华人民共和国 phelpsparrish",
        "fields": ["address","email"]
    }
  }
}

如上面例子所示，同时对address与email字段查询中华人民共和国福州市phelpsparrish，结果为

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 15.804777,
    "hits" : [
      {
        "_index" : "bank3",
        "_type" : "_doc",
        "_id" : "990",
        "_score" : 15.804777,
        "_source" : {
          "account_number" : 990,
          "balance" : 44456,
          "firstname" : "Kelly",
          "lastname" : "Steele",
          "age" : 35,
          "gender" : "M",
          "address" : "中华人民共和国福建省福州市马尾区名称中心铂悦府",
          "employer" : "Eschoir",
          "email" : "kellysteele@eschoir.com",
          "city" : "Stewartville",
          "state" : "ID"
        }
      },
      {
        "_index" : "bank3",
        "_type" : "_doc",
        "_id" : "995",
        "_score" : 6.5046196,
        "_source" : {
          "account_number" : 995,
          "balance" : 21153,
          "firstname" : "Phelps",
          "lastname" : "Parrish",
          "age" : 25,
          "gender" : "M",
          "address" : "福建省福州市",
          "employer" : "Pearlessa",
          "email" : "phelpsparrish@pearlessa.com",
          "city" : "Brecon",
          "state" : "ME"
        }
      }
    ]
  }
}

更多特性

我们不去深究multi_match更多特性，想要了解的可以查看官网文档
官网：query-dsl-multi-match-query

boost：匹配权重控制

boost用于设置某个查询条件权重（通俗点就是设置某个查询条件的重要程度），值越大代表越重要，那么如果匹配上，那么包含这个文档的相关性分数将会根据这个权重做相应的加大

需求
搜索标题中包含java的帖子，同时如果标题中包含hadoop和elasticsearch就优先搜索出来，同时，如果一个帖子包含java hadoop，一个帖子包含java elasticsearch,包含hadoop的帖子要比elasticsearch优先搜索出来。
解决方案
我们可以这样做，设置hadoop与elasticsearch的boost，hadoop的boost>elasticsearch的boost，这样包含hadoop的文档计算出来的文档相关性分数就会比包含elasticsearch的文档大

GET /forum/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "java"
          }
        }
      ],
      "should": [
        {
          "match": {
            "title": {
              "query": "hadoop",
              "boost": 5
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "elasticsearch",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

bool查询语法请看后续章节

term类查询

精确查询，一般使用在keyword类型的字段上，搜索前不会再对搜索词进行分词，所以我们的所搜索的词必须是文档分词结果集中的一个，也就是说整个搜索词将被带入文档的

例如：我们的文档中有个字段值是“北京奥运”，那么字段在默认分词器的作用下，分词集合中可能会有2个值：北京、奥运（不同分词器可能结果不同），那么此时使用term查询输入北京、奥运两个词是都是可以匹配到的，如果我们输入“北京奥运欢迎您”，那么就无法匹配到

注意：在es的文档中，字符串类型有两种，分别是text与keyword，其中text类型的字段存入时，会被分词后存储，分词后英文将被统一转换成小写。而在使用term来对text字段查询时，由于没有对查询条件分词，所以查询条件若是输入大写英文，这时大写英文不会转成小写，就可能导致在文档的分词集合中匹配不上（匹配时是大小写区分的），对keyword类型的字段检索没有这个问题

单条件精确查询

GET /bank/_search
{
  "query": {
    "term": {
      "address": "福"
    }
  }
}

其中bank为索引名，address为字段名

结果说明

文档使用默认分词器standard对“福建省福州市”分词后的结果是“福”，“建”，“省”，“福”，“州”，“市” 6个词，此时使用term查询做精确匹配，输入该6个词中的任意一个都能匹配到，输入福建，福州等词是匹配不到的

如果你想要的结果是我要精确匹配"福建省福州市"且不想让“福”，“建”，“省”，“福”，“州”，“市” 等任意字或词语匹配到，那么需要在建立索引阶段指定该字段为"index": "not_analyzed" 此时该字段在存储时不会被分词，只保留全文字索引，这样使用term语句输入"福建省福州市"就能匹配到

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.980161,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "995",
        "_score" : 6.980161,
        "_source" : {
          "account_number" : 995,
          "balance" : 21153,
          "firstname" : "Phelps",
          "lastname" : "Parrish",
          "age" : 25,
          "gender" : "M",
          "address" : "福建省福州市",
          "employer" : "Pearlessa",
          "email" : "phelpsparrish@pearlessa.com",
          "city" : "Brecon",
          "state" : "ME"
        }
      }
    ]
  }
}

range范围查询

range查询将文档与具有一定范围内字词的字段进行匹配。 Lucene查询的类型取决于字段类型，对于字符串字段，TermRangeQuery，对于数字/日期字段，查询是NumericRangeQuery

符号	含义
gte	greater-than or equal to, 大于或等于
gt	greater-than, 大于
lte	less-than or equal to, 小于或等于
lt	less-than, 小于

数值范围查询

需求：查询年龄在10~40直接的信息

GET /bank3/_search
{
  "query":{
    "range" : {
        "age" : {
            "gte" : 10,
            "lte" : 40
            }
        }
    }
}

时间范围查询

需求: 查询网站中最近一天发布的博客

GET website/_search
{
    "query": {
        "range": {
            "post_date": {
            	"gte": "now-1d/d",	// 当前时间的上一天, 四舍五入到最近的一天
            	"lt":  "now/d"		// 当前时间, 四舍五入到最近的一天
        	}
        }
    }
}

在上述DSL中是通过表达式来查询的（now -1d /d等表达式符号）,下面具体说明下表达式

时间表达式

Elasticsearch中时间可以表示为now, 也就是系统当前时间, 也可以是以||结尾的日期字符串表示.

在日期之后, 可以选择一个或多个数学表达式:

+1h —— 加1小时;
-1d —— 减1天;
/d —— 四舍五入到最近的一天.

下面是Elasticsearch支持数学表达式的时间单位:

表达式	含义	表达式	含义
y	年	M	月
w	星期	d	天
h	小时	H	小时
m	分钟	s	秒

说明: 假设系统当前时间now = 2018-10-01 12:00:00:

now+1h: now的毫秒值 + 1小时, 结果是: 2018-10-01 13:00:00.
now-1h: now的毫秒值 - 1小时, 结果是: 2018-10-01 11:00:00.
now-1h/d: now的毫秒值 - 1小时, 然后四舍五入到最近的一天的起始, 结果是: 2018-10-01 00:00:00.
2018.10.01||+1M/d: 2018-10-01的毫秒值 + 1月, 再四舍五入到最近一天的起始, 结果是: 2018-11-01 00:00:00.

关于时间的四舍五入

对日期中的日、月、小时等进行四舍五入时, 取决于范围的结尾是包含(include)还是排除(exclude).

向上舍入: 移动到舍入范围的最后一毫秒;

向下舍入: 一定到舍入范围的第一毫秒.

举例说明:

① “gt”: “2018-12-18||/M” —— 大于日期, 需要向上舍入, 结果是2018-12-31T23:59:59.999, 也就是不包含整个12月.

② “gte”: “2018-12-18||/M” —— 大于或等于日期, 需要向下舍入, 结果是 2018-12-01, 也就是包含整个12月.

③ “lt”: “2018-12-18||/M” —— 小于日期, 需要向上舍入, 结果是2018-12-01, 也就是不包含整个12月.

④ “lte”: “2018-12-18||/M” —— 小于或等于日期, 需要向下舍入, 结果是2018-12-31T23:59:59.999, 也就是包含整个12月.

日期格式化时间范围查询

GET website/_search
{
    "query": {
        "range": {
            "post_date": {
                "gte": "2/1/2018", 
                "lte": "2019",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

时区范围查询

GET website/_search
{
    "query": {
        "range": {
            "post_date": {
                "gte": "2018-01-01 00:00:00",
                "lte": "now",
                "format": "yyyy-MM-dd hh:mm:ss",
                "time_zone": "+1:00"
            }
        }
    }
}

ES中的日期类型必须按照UTC时间格式存储, 所以, 上述的2018-01-01 00:00:00将被转换为2017-12-31T23:00:00 UTC.

复合查询

使用bool查询+must/shoud/must not/filter子查询+match/term/range/filter进行组合查询，每一个子查询都独自地计算文档的相关性得分。一旦他们的得分被计算出来， bool 查询就将这些得分进行合并并且返回一个代表整个布尔操作的得分score

例子：查询age=40且address匹配福州且state不匹配ID的记录

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": 40 } },
        { "match": { "address": "福州"} }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

其中bank为索引名，age、address、state为字段名

must子查询

文档必须符合这些条件才能被包含进来，must子查询中的语句是and关系，支持match、term、range组合

must not子查询

文档必须不匹配这些条件才能被包含进来，must not子查询中的语句是or关系，支持match、term、range组合

should子查询

should子查询不影响查询返回的结果记录条数，影响的是每条记录的 _score值，符合should子查询条件的文档的_score将被增加，

filter子查询

文档必须符合这些条件才能被包含进来，但是不会影响稳定相关性分数的计算

综合说明

每一个子查询都独自地计算文档的相关性得分。一旦他们的得分被计算出来， bool查询就将这些得分进行合并并且返回一个代表整个布尔操作的得分score，下面看一个例子

注意如果这里不用gender.keyword检索M，那么会检索不出来，因为直接对text类型的gender检索，文档本身的gender字段的值分词后被转成了小写，此时使用大写的英文+term检索，由于term不对检索条件分词，也就不会把M转成小写m，就会导致检索结果为空

GET bank3/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": { "gender.keyword": "M" } },
        { "range": {
            "age": {
              "gte":30,
              "lte": 40
            }
          }
        }
      ],
      "must_not": [
        { "term": { "state.keyword":  "ID"   } }
      ],
      "should": [
        { "term": { "city.keyword":  "Brogan" } }
      ],
      "filter": [
        { "term": {
          "employer.keyword": "Pyrami"
        }}
      ]
    }
  }
}

检索结果

{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 8.182548,
    "hits" : [
      {
        "_index" : "bank3",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 8.182548,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      }
    ]
  }
}

根据字段排序

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

其中bank为索引名，account_number为字段名

分页查询

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

其中bank为索引名，account_number为字段名

根据ID查询

GET /<indexname>/_doc/<_id>

性能分析：profile

在很多时候，我们写的检索语句在生产环境中查询速度很慢，有时候仅仅是一个简单的单字段检索也可能很慢，那么有可能是集群资源问题，有可能是索引设置问题，所以我们需要能够有像sql中explain关键字的功能，能够对DSL的执行过程进行全面解析，并计算出每个步骤的具体耗时，那么我们可以用profile API

profile Api使用方式

使用起来很简单，在_search中直接加入，如下所示

GET bank3/_search
{
  "profile": "true", 
  "query": {
    "bool": {
      "must": [
        {"term": { "gender.keyword": "M" } },
        { "range": {
            "age": {
              "gte":30,
              "lte": 40
            }
          }
        }
      ],
      "must_not": [
        { "term": { "state.keyword":  "ID"   } }
      ],
      "should": [
        { "term": { "city.keyword":  "Brogan" } }
      ],
      "filter": [
        { "term": {
          "employer.keyword": "Pyrami"
        }}
      ]
    }
  }
}

结果说明

加入profile:true后，检索结果除了展示符合条件的文档外，还会另外展示profile信息

{
  "took" : 46,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 8.182548,
    "hits" : [
      {
        "_index" : "bank3",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 8.182548,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[9g5uTGlSSYy6ck-g75QbsQ][bank3][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "+gender.keyword:M +age:[30 TO 40] -state.keyword:ID city.keyword:Brogan #employer.keyword:Pyrami",
                "time_in_nanos" : 3673536,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 1,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 26460,
                  "match" : 12474,
                  "next_doc_count" : 1,
                  "score_count" : 1,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 41317,
                  "advance_count" : 1,
                  "score" : 42366,
                  "build_scorer_count" : 3,
                  "create_weight" : 1596931,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1953988
                },
                "children" : [
                  {
                    "type" : "TermQuery",
                    "description" : "gender.keyword:M",
                    "time_in_nanos" : 168947,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 6,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 1,
                      "compute_max_score_count" : 5,
                      "compute_max_score" : 18604,
                      "advance" : 9849,
                      "advance_count" : 2,
                      "score" : 1663,
                      "build_scorer_count" : 4,
                      "create_weight" : 45465,
                      "shallow_advance" : 22432,
                      "create_weight_count" : 1,
                      "build_scorer" : 70934
                    }
                  },
                  {
                    "type" : "IndexOrDocValuesQuery",
                    "description" : "age:[30 TO 40]",
                    "time_in_nanos" : 38493,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 1,
                      "shallow_advance_count" : 6,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 3166,
                      "next_doc_count" : 0,
                      "score_count" : 1,
                      "compute_max_score_count" : 5,
                      "compute_max_score" : 3015,
                      "advance" : 1623,
                      "advance_count" : 1,
                      "score" : 852,
                      "build_scorer_count" : 3,
                      "create_weight" : 2064,
                      "shallow_advance" : 3517,
                      "create_weight_count" : 1,
                      "build_scorer" : 24256
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "state.keyword:ID",
                    "time_in_nanos" : 25757,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 1132,
                      "advance_count" : 1,
                      "score" : 0,
                      "build_scorer_count" : 2,
                      "create_weight" : 5520,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 19105
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "city.keyword:Brogan",
                    "time_in_nanos" : 171235,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 2,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 1,
                      "compute_max_score_count" : 1,
                      "compute_max_score" : 23985,
                      "advance" : 1383,
                      "advance_count" : 1,
                      "score" : 2004,
                      "build_scorer_count" : 2,
                      "create_weight" : 130689,
                      "shallow_advance" : 3206,
                      "create_weight_count" : 1,
                      "build_scorer" : 9968
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "employer.keyword:Pyrami",
                    "time_in_nanos" : 68009,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 2104,
                      "advance_count" : 2,
                      "score" : 0,
                      "build_scorer_count" : 3,
                      "create_weight" : 1172,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 64733
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 73809,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 183976
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

profile Api响应结果说明

如下图所示，整个响应结果是按分片来展示的，也就是说ES的检索，每个分片是单独检索的，其中searches节点内部显示的是与检索相关的具体耗时情况统计，aggregations节点内部显示的是与聚合相关的具体耗时情况统计。有了这个统计分析，我们可以根据统计情况具体分析性能原因
在这里插入图片描述