


  • ElasticSearch是一个开源的高扩展的分布式全文检索引擎
  • Elasticsearch是一个NoSql的文档型数据库。
  • 它的目的是通过简单的Restful API来隐藏Lucene的复杂性从而让全文检索变得简单。
  • ElasticSearch官方博客

1.1 市面上常见的搜索引擎框架

  • Lucene: 老牌搜索引擎
  • Solr: 基于Lucene改版的搜索引擎
  • Elaticsearch: 基于Lucene的分布式搜索引擎
  • ELK:分布式日志分析框架 ( elasticSearch / kibanna / Logstash


  • Elasticsearch 中的一些重要概念: cluster, node, index, document, shards 及 replica

  • cluster:ES集群由一个或者多个节点(node)组成,可以在elasticsearch.yml设置集群名称。

  • node:一个节点代表一个elasticSearch实例。在大多数环境中,每个节点都在单独的虚拟机上运行。

  • index

    • 1.概念:索引是文档的集合,每个索引由一个或多个文档组成,这些文档可以被分配到不同的分片(shard)当中。
    • 2.文档被分配到指定索引的某个分片的算法: 每当一个文档进来后,根据文档的 id 会自动进行 hash 计算,并存放于计算出来的 shard 实例中,这样的结果可以使得所有的shard都比较有均衡的存储,而不至于有的 shard 很忙。
      shard_num = hash(_id) % num_primary_shards
    • 3.注意:从上面的公式我们也可以看出来,我们的 主shard 数目是不可以动态修改的,否则之后也找不到相应的主shard号码了。必须指出的是,replica 的数目是可以动态修改的。
  • shards: elasticSearch提供了将索引划分成多份的能力,每一份都称作为一个分片。

  • replica

          2.提高性能:get 和 search 请求可以由主 shard 或副本 shard 处理。
  • type
1.概念: 类型(type) 是文档的逻辑容器,类似于表是行的容器。
在默认的情况下是 _doc。

    在ES 6.0以后,一个Index只能含有一个type。
    不同type中具有相同名称的字段在 Lucene中被同一个字段支持。
  • mapping: 每个type的 所有字段定义 称为映射(类似于表结构),规定了每个字段的数据类型定义和其它功能设置。

  • document: ES是面向文档的,文档是搜索的最小数据单元,相当于关系型数据库的数据行。


Databases (数据库)Index(索引库)



4.1 ElasticSearch的 mapping 中的 fields 实现 multi-fields(多字段类型)

  • elastic-fields
  • a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations
  • text类型用于全文搜索;keyword类型用于排序,聚合,以及精确字符串匹配
  • 1 创建 my-index-000001 索引 设置mappings,city字段包含: text类型字段 和 keyword类型字段的raw
PUT my-index-000001
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "raw": { 
            "type": "keyword",
            "ignore_above": 256
  • 2 添加文档到my-index-000001索引
PUT my-index-000001/_doc/1
  "city": "New York"

PUT my-index-000001/_doc/2
  "city": "York"

PUT my-index-000001/_doc/3
  "city": "MY York"
  • 3 进行查询,并进行排序 和 聚合
GET my-index-000001/_search
  "query": {
    "match": {
    "city": "york" // text 类型 用于全文搜索
  "sort": [
      "city.raw": {
        "order": "asc" // keyword 类型 用于排序
  "aggs": {
    "cities": {
      "terms": {
        "field": "city.raw" // keyword 类型 用于聚合
  • 4 kibana控制台显示查询结果
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "city" : "MY York"
        "sort" : [
          "MY York"
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "city" : "New York"
        "sort" : [
          "New York"
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "city" : "York"
        "sort" : [
  "aggregations" : {
    "cities" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
          "key" : "MY York",
          "doc_count" : 1
          "key" : "New York",
          "doc_count" : 1
          "key" : "York",
          "doc_count" : 1

4.2 时间类型date

  • 创建索引 的同时 设置 mapping中date类型字段的format属性
  • date 默认的format为: strict_date_optional_time || epoch_millis
  • || 表示 或者
  • fomart属性表示 可以接收的时间字符串格式类型
PUT my_index2
  "mappings": {
    "properties": {
      "create_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
  • 索引 my_index2 能够接收多种时间格式的文档:
    • yyyy-MM-dd HH:mm:ss.SSS
    • yyyy-MM-dd HH:mm:ss
    • yyyy-MM-dd
    • epoch_millis
  • 使用bulk API 添加多个 时间格式不同的文档
POST _bulk
{"create_time":1659669006440} // epoch_millis
{"create_time":"2015-01-01 12:10:30.899"} // yyyy-MM-dd HH:mm:ss.SSS
{"create_time":"2015-01-02 12:10:30"} //  yyyy-MM-dd HH:mm:ss
{"create_time":"2015-01-02"} // yyyy-MM-dd
  • ES底层把接收到的时间类型都转为了 UTC 时间类型
    • yyyy-MM-ddTHH:mm:ssZ :标准的UTC时间,T表示间隔符,Z表示0时区
    • yyyy-MM-ddTHH:mm:ss.SSSZ
    • China Standard Time 中国标准时区 = (UTC+8) 或者 GMT+8
  • JAVA 转换普通时间为UTC格式
SimpleDateFormat utcDateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
String format = utcDateFormat.format(new Date());
System.out.println("format = " + format);// format = 2022-08-05T08:17:00.412Z
Date date = new Date();
Instant instant = date.toInstant();
System.out.println("instant = " + instant.toString());
  • 标准的UTC时间格式字符串 ,添加到一个没有设置mappings的索引上面,字段会被自动映射成date类型
# 创建索引
PUT my_index3

# 添加数据    
POST _bulk

# 执行结果
  "took" : 82,
  "errors" : false,
  "items" : [
      "create" : {
        "_index" : "my_index3",
        "_type" : "_doc",
        "_id" : "qTcPbYIBs_E2asF4ZY82",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      "create" : {
        "_index" : "my_index3",
        "_type" : "_doc",
        "_id" : "qjcPbYIBs_E2asF4ZY82",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201

# create_time字段 被自动映射成date字段 (mappings.properties.create_time.type="date")
GET my_index3
  "my_index3" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "create_time" : {
          "type" : "date"
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
        "number_of_shards" : "1",
        "provided_name" : "my_index3",
        "creation_date" : "1659687084248",
        "number_of_replicas" : "1",
        "uuid" : "jk9gIuPySFqCLKl6m2lolA",
        "version" : {
          "created" : "7120099"

4.3 字符串类型: text 和 keyword类型的区别(term 和 match 查询)

  • text类型用于全文搜索分词后 再建立 倒排索引
    • text类型适合 长文本的 短语检索 全文检索
    • 默认分词器为standard ,对于中文来说就是按字分词
    • 可以在mapping中的fields中添加keyword子类型,以实现精确查询
PUT my-index
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "keyword": { 
            "type": "keyword",
            "ignore_above": 256
  • keyword类型用于排序聚合,以及精确字符串匹配不分词 直接建立倒排索引
    • ignore_above属性:可以设置 文档字段被存储到 倒排索引 最大长度,超过最大长度,文档字段值 将不会存倒排索引
    • keword类型字段 + wildcard通配符查询 等价于 mysql 中的 like 模糊查询
    • 当需要对其进行精确查找时,如用户名,身份证,电子邮件,电话等可以用keyword

4.3.1 text 和 keword类型 的查询验证

  • 创建索引 test_match,并且设置mappings
PUT test_match
  • bulk API批量添加文档
POST _bulk
{"text_name":"JAVA BOOK","key_name":"JAVA BOOK"}
{"text_name":"ES BOOK","key_name":"ES BOOK"}
{"text_name":"JAVA READ BOOK","key_name":"JAVA READ BOOK"}
  • analyze API 查看 text 和 keyword类型的分词情况(可指定分词器)
    • text 类型字段 被索引的时候:会进行分词 并且转小写
GET test_match/_analyze
  "field": "text_name",
  "text": "ES BOOK",
  "analyzer": "standard" // 可指定分词器

  "tokens" : [
      "token" : "es",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
      "token" : "book",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1

GET test_match/_analyze
  "field": "text_name",
  "text": "中国深圳",
  "analyzer": "standard"

  "tokens" : [
      "token" : "中",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
      "token" : "国",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
      "token" : "深",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
      "token" : "圳",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
  • keyword 类型被索引的时候:不会进行分词,原始值原封不动的存入倒排索引,大小写不变
    • keyword = “ES BOOK” ,存入倒排索引 中的存储的词为大写的 ES BOOK
GET test_match/_analyze
  "field": "key_name",
  "text": "ES BOOK"

  "tokens" : [
      "token" : "ES BOOK",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
} term 查询 text 字段类型
  • term 查询的是 倒排索引的 词列,并且查询之前不会对查询字段值进行分词,会把查询的字段值当成整体
    • text 字段被索引时会进行默认的分词 :如 ”JAVA BOOK“ 到了倒序索引 会被 存储为 java 和 book两个分词
GET test_match/_search
  "query": {
    "term": {
      "text_name": {
        "value": "book"

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    "max_score" : 0.14181954,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.11955717,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"

######## term 不会对"java book"进行 分词再查询,会把它当成整体########
######## 因此以下查询会返回空########
GET test_match/_search
  "query": {
    "term": {
      "text_name": {
        "value": "java book"

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [ ]
} term或 terms 查询 keyword 字段类型
  • keyword 类型被索引时 不会进行分词存储,并且也不会转成小写,因此倒序索引的词列为原始值,需要区分大小写
  • 所以查询的时候 需要 传入完整的大写 JAVA BOOK 或者 JAVA READ BOOK
GET test_match/_search
  "query": {
    "term": {
      "key_name": {
        "value": "JAVA BOOK"

######## 查询结果 ########
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 0.9808291,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
######## terms查询keyword字段 ########
GET test_match/_search
  "query": {
    "terms": {
      "key_name": [
        "JAVA BOOK",
        "JAVA READ BOOK"

######## 查询结果 ########
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
} match 查询 keyword 字段类型
  • match 查询 keyword 字段类型时,不会先分词再查询 等效于 term 查询
  • match 的 operator属性 的 or 或者 and 都不会影响到查询结果
GET test_match/_search
  "query": {
    "match": {
      "key_name": {
        "query": "JAVA READ BOOK",
        "operator": "or"


  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 0.9808291,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"

GET test_match/_search
  "query": {
    "match": {
      "key_name": {
        "query": "JAVA BOOK",
        "operator": "or"


  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 0.9808291,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
} match 查询 text 字段类型
  • match 查询 text类型字段时,会先对查询的列值进行分词 >再转小写>再进行查询(因此查询值的大小写 不敏感)
    • 如 text_name = ”JaVA Es“ ,首先会把它们分成 JaVA 和 Es ,再转小写=java es,再去倒排索引中存在 java 和 es 词列 的文档
GET test_match/_search
  "query": {
    "match": {
      "text_name": "JaVA Es"

####### text_name 字段中 含有 java 或者 es 词列的都被查询出来 #######
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    "max_score" : 1.0417082,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0417082,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.49917626,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.42081726,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
  • match 的 operator的 and表示 必须包含所有分词
GET test_match/_search
  "query": {
    "match": {
        "query": "JAVA BOoK",
        "operator": "and"

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 0.6409958,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"

  • match 的 operator的 or 表示包含其中一个分词即可
GET test_match/_search
  "query": {
    "match": {
        "query": "jaVA BOoK",
        "operator": "or"

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    "max_score" : 0.6409958,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"

  • match 的 minimum_should_match 表示最少匹配的分词个数
    • minimum_should_match 和 operator = and 是互斥的,将导致无法查询出来数据
    • minimum_should_match 必须和 operator = or 配合使用
GET test_match/_search
  "query": {
    "match": {
        "query": "JAVA fuck BOoK",
        "operator": "and",
        "minimum_should_match": 1

######## minimum_should_match 和 operator = and 是互斥的,将导致无法查询出来数据  ########
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [ ]

GET test_match/_search
  "query": {
    "match": {
        "query": "JAVA fuck BOoK", 
        "operator": "or",
        "minimum_should_match": 2 // 最少匹配2个分词
######## minimum_should_match 必须和 operator = or 配合使用 ########
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 0.6409958,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
} keyword类型字段 + wildcard通配符查询 实现Mysql的like模糊查询
  • wildcard查询中的 ? 代表一个字符
  • wildcard查询中的 * 表示0或多个字符
  • wildcard 查询 keyword类型字段是区分大小写的
GET test_match/_search
  "query": {"wildcard": {
    "key_name": {
      "value": "*READ*"

### 查询结果 ###
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"

GET test_match/_search
  "query": {"wildcard": {
    "key_name": {
      "value": "JAVA READ BOO?"

### 查询结果 ###

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
} wildcard通配符查询 查询text类型
  • wildcard 查询text字段 只能模糊一个分词(大小写不敏感)
GET test_match/_search
  "query": {"wildcard": {
    "text_name": {
      "value": "*BOOK*" // 值为 book 也会得到相同结果 

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
  • wildcard 查询text字段 超过一个分词将无法模糊查询到文档
    • 说明wildcard模糊查询 模糊的是text字段的 倒排索引的词列 所以最多只能模糊一个分词
GET test_match/_search
  "query": {"wildcard": {
    "text_name": {
      "value": "*java book*"

### 查询 无文档返回 ### 
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [ ]

4.4 alias 数据类型


5.1 inverted index(倒排索引),doc_values 及 source

5.2 Metadata fields - 元数据字段

5.3 理解 mapping 中的 store 属性

5.4 fields从搜索中获取选定的字段

5.5 mapping中的 analyzer 和 search_analyzer

  • analyzer:插入文档时,将text类型的字段做分词然后插入倒排索引。
  • search_analyzer:查询时,先对要查询的text类型的输入做分词,再去倒排索引中搜索。
  • 如city字段:索引文档的时候,会采用 standard分词器,查询city字段的时候会采用english分词器
PUT my-index
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer": "standard", 
        "search_analyzer": "english", 
        "fields": {
          "raw": { 
            "type": "keyword",
  • 查看分词器分词情况
GET _analyze
  "analyzer": "standard",
  "text": "JAVA BOOK"

GET _analyze
  "analyzer": "english",
  "text": "JAVA BOOK"

5.6 mapping 中 keyword类型 的ignore_above属性

  • 1 创建索引 设置mappings
  • 设置city.ignore_above=10:那么city字段长度超过10在存储的时候,不会存 倒排索引 但是会存储到_source域
PUT my-index-xxx
  "mappings": {
    "properties": {
      "city": {
        "type": "keyword",
        "ignore_above": 10
      "country": {
        "type": "keyword",
        "ignore_above": 20
  • 2 添加文档到索引(city字段长度小于10的文档1 和 city字段长度大于10的文档2
## city字段长度小于10
PUT my-index-xxx/_doc/1
  "city": "123456789","country": "china"
## city字段长度大于10
PUT my-index-xxx/_doc/2
  "city": "1234567890abc","country": "china" 
  • 3 通配符wildcard模糊查询 city 字段:city字段长度大于10的文档2无法被查询出来
GET my-index-xxx/_search
  "query": {
    "wildcard": {
    "city": "12345*"

############## 查询结果 ##################
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
  • 使用term精准查询city字段:文档2 无法被查询出来
GET my-index-xxx/_search
 "query": {"term": {
   "city": "1234567890abc"

############## 查询结果 ##################
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [ ]
  • 无条件分页查询 或 全量查询:文档2可被查询出来
GET my-index-xxx/_search
  "from": 0,
  "size": 20

GET my-index-xxx/_search

############## 查询结果 ##################
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "city" : "1234567890abc",
          "country" : "china"
  • 使用term查询country字段:文档2可被查询出来
GET my-index-xxx/_search
 "query": {"term": {
   "country": {
     "value": "china"

############## 查询结果 ##################
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 0.18232156,
    "hits" : [
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.18232156,
        "_source" : {
          "city" : "1234567890abc",
          "country" : "china"

6. ElasticSearch索引一个文档的过程?

  • ElasticSearch:索引一个文档的过程
  • 1.html_strip_filter: 去除文本中的html标签。
  • 2.standard_tokenizer: 对文本进行分词,拆分成一个个术语。
  • 3.low_case_filter:对拆出来的词进行全部转换为小写。
  • 4.stop_words_filter:对某些分词进行禁用,默认是关闭的。
  • 5.create_Inverted_index:建立分词对应分档id的映射关系。

Document ID        Document
1                  It is Sunday tomorrow
2                  Sunday is the last day of the week
3                  The choice is yours
====2.对应 倒序索引映射表 ,词也进行了排序,查询词将相当快速====

词        出现次数    对应文档Id
Term	 Frequency	Document
choice	 1	        3
day	     1	        2
is	     3	        1,2,3
it	     1	        1
last	 1	        2
of	     1	        2
sunday	 2	        1,2
the	     3	        2,3
tomorrow 1	        1
week	 1	        2
yours	 1	        3

7. Elasticsearch的分页查询

7.1 运用 scroll 接口对大量数据实现更好的分页

7.2 运用 search_after 来进行深度分页

8. Elasticsearch的模糊搜索

8.1 wildcard通配符查询

  • wildcard查询中的 ? 代表一个字符
  • wildcard查询中的 * 表示0或多个字符
  • wildcard 查询 keyword类型字段是区分大小写的
  • 详细见

8.2 fuzzy 纠错模糊搜索

9. Elasticsearch的索引(Index)的结构

  • 可以看到 Index的结构包含 :aliases + mappings + settings 三大部分
##### 已经存在的索引 fuzzyindex
GET fuzzyindex

##### 查询结果 #####
  "fuzzyindex" : {
    "aliases" : {
      "a1" : { }
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
        "number_of_shards" : "1",
        "provided_name" : "fuzzyindex",
        "creation_date" : "1659835562222",
        "number_of_replicas" : "1",
        "uuid" : "OWyYUkzdSbGqv9rEt9WsJg",
        "version" : {
          "created" : "7120099"

9.1 Index ~ alias

9.2 Index ~ mappings

  • mappings 负责 定制 索引 的表结构定义
  • Elasticsearch:mapping 定制
  • 不需要事先定义一个相应的 mapping 才可以生产文档。字段类型是动态进行识别的。这和传统的数据库是不一样的。如果有动态加入新的字段,mapping 也可以自动进行调整并识别新加入的字段自动识别字段有一个问题,那就是有的字段可能识别并不精确,比如对于位置信息。那么我们需要对这个字段进行修改。
  • 注意:我们不能为已经建立好的 index 动态修改 mapping。这是因为一旦修改,那么之前建立的索引就变成不能搜索的了。一种办法是 reindex 从而重新建立我们的索引。如果在之前的 mapping 加入新的字段,那么我们可以不用重新建立索引。

9.3 Index ~ settings

  • number_of_shards:索引的主分片数量,默认1个
  • number_of_replicas:设置每个主分片拥有几个副本分片,默认1个
  • 一旦我们把 number_of_shards 定下来了,我们就不可以修改了,除非把 index 删除,并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards这 个数值有关的。一旦这个数值发生改变,那么之后寻找那个文档所在的 shard 就会不准确。

9.4 创建索引的时候 设置 aliases索引别名、mappings表结构 和 settings分片

PUT fuzzyindex
  "aliases" : {
      "a1" : {}
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "1"

10. Elasticsearch有用的查询示例

  • Elasticsearch:有用的 Elasticsearch 查询示例
  • Bool Query
  • AND/OR/NOT 运算符可用于微调我们的搜索查询,以提供更相关或特定的结果。 这是在搜索 API 中作为 bool 查询实现的。 bool 查询接受一个 must 参数(相当于 AND)、一个 must_not 参数(相当于 NOT)和一个 should 参数(相当于 OR)。 例如,如果我想搜索标题中包含“Elasticsearch” 或 “Solr” 一词的书,并且作者是 “clinton Gormley” 但不是 “radu gheorge” 的作者:
  • bool.must 嵌套了 一个bool.(should,must)
POST /bookdb_index/_search
  "query": {
    "bool": {
      "must": {
        "bool": {
          "should": [
              "match": {
                "title": "Elasticsearch"
              "match": {
                "title": "Solr"
          "must": {
            "match": {
              "authors": "clinton gormely"
      "must_not": {
        "match": {
          "authors": "radu gheorge"
  • 注意:如你所见,bool 查询可以包装任何其他查询类型,包括其他 bool 查询,以创建任意复杂或深度嵌套的查询

11. Elasticsearch的实用工具

11.1 translate API 把 SQL 翻译成 DSL

GET /_sql/translate
  "query": """
    SELECT * FROM twitter 
    WHERE age in(30,20) and address.keyword = '中国北京市海淀区'

########## 翻译结果 ##########
  "size" : 1000,
  "query" : {
    "bool" : {
      "must" : [
          "terms" : {
            "age" : [
            "boost" : 1.0
          "term" : {
            "address.keyword" : {
              "value" : "中国北京市海淀区",
              "boost" : 1.0
      "adjust_pure_negative" : true,
      "boost" : 1.0
  "_source" : false,
  "fields" : [
      "field" : "DOB",
      "format" : "strict_date_optional_time_nanos"
      "field" : "address"
      "field" : "age"
      "field" : "city"
      "field" : "country"
      "field" : "location"
      "field" : "message"
      "field" : "province"
      "field" : "uid"
      "field" : "user"
  "sort" : [
      "_doc" : {
        "order" : "asc"

########## 使用翻译的DSL结果进行查询 ##########
########## filter_path=hits.hits 过滤返回的JSON字符串 ##########

GET twitter/_search?filter_path=hits.hits
  "query" : {
    "bool" : {
      "must" : [
          "terms" : {
            "age" : [
            "boost" : 1.0
          "term" : {
            "address.keyword" : {
              "value" : "中国北京市海淀区",
              "boost" : 1.0
      "adjust_pure_negative" : true,
      "boost" : 1.0

########## 查询结果 ##########
  "hits" : {
    "hits" : [
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9444616,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          "DOB" : "1999-04-01"
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.9444616,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          "DOB" : "1999-04-01"

11.2 explain属性开启ES查询过程分析

GET twitter/_search
  "query": {
    "bool": {
      "must": [
          "match": {
            "user": "老贾"
          "match": {
            "age": 30
  "explain": true

######################## 结果 ########################
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 3.4849067,
    "hits" : [
        "_shard" : "[twitter][0]",
        "_node" : "GE7kTcqxSQ2wmso2X9D6Ag",
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.4849067,
        "_source" : {
          "user" : "老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          "DOB" : "1989-04-01"
        "_explanation" : {
          "value" : 3.4849067,
          "description" : "sum of:",
          "details" : [
              "value" : 2.4849067,
              "description" : "sum of:",
              "details" : [
                  "value" : 0.6931471,
                  "description" : "weight(user:老 in 3) [PerFieldSimilarity], result of:",
                  "details" : [
                      "value" : 0.6931471,
                      "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                      "details" : [
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                          "value" : 0.6931472,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                              "value" : 4,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                              "value" : 8,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                          "value" : 0.45454544,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                              "value" : 2.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                              "value" : 2.0,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                  "value" : 1.7917595,
                  "description" : "weight(user:贾 in 3) [PerFieldSimilarity], result of:",
                  "details" : [
                      "value" : 1.7917595,
                      "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                      "details" : [
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                          "value" : 1.7917595,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                              "value" : 1,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                              "value" : 8,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                          "value" : 0.45454544,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                              "value" : 2.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                              "value" : 2.0,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
              "value" : 1.0,
              "description" : "age:[30 TO 30]",
              "details" : [ ]

12. ES怎么实现 全字段搜索(“全文搜索”)?

  • 定义一个keyword类型的essearch冗余字段
  • 存储文档到ES的时候,把所有支持查询的字段全部拼接到essearch字段(空格隔开)
  • 通过wilcard通配符查询 查询 essearch字段
  • 注意:拼接的时候转小写,通配符查询之前也对 查询的字符串 做转小写操作
PUT copy_index
  "mappings": {
    "properties": {
        "type": "keyword",
        "ignore_above" : 256
        "type": "keyword",
        "ignore_above" : 256
        "type": "keyword",
         "ignore_above" : 32766

############添加文档,同时把所有支持查询的字段拼接到 冗余字段essearch上,空格隔开############
PUT copy_index/_doc/1
"essearch":"王 老二uid"}

PUT copy_index/_doc/2
"essearch":"王 二xx"}

PUT copy_index/_doc/3
"essearch":"王 三hh"}

PUT copy_index/_doc/4
"essearch":"王 四aa"}

############对支持全字段查询的冗余字段essearch 进行通配符查询############
GET copy_index/_search 
 "query": {
   "wildcard": {
     "essearch": {


  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "copy_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "王",
          "second_name" : "老二uid",
          "essearch" : "王 老二uid"
        "_index" : "copy_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "王",
          "second_name" : "二xx",
          "essearch" : "王 二xx"


13.01 创建一个索引和文档

##创建一个叫做 twitter 的索引(index),并插入一个ID1文档(document)
PUT twitter/_doc/1
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"

13.02 修改一个文档

PUT twitter/_doc/1
   "user": "GB",
   "uid": 1,
   "city": "北京",
   "province": "北京",
   "country": "中国",

## 局部更新
## 使用PUT方法,每次修改一个文档的时候,需要把每一项都写出来,有些字段多的情况相当不方便
## 使用POST _update来做修改
POST twitter/_update/1
  "doc": {
    "city": "成都",
    "province": "四川"

## 对根据条件查询出来的结果 进行脚本更新
POST twitter/_update_by_query
  "query": {
    "match": {
      "user": "GB"
  "script": {
    "source": "ctx._source.city = params.city;ctx._source.province = params.province",
    "lang": "painless",
    "params": {

13.03 UPSERT 一个文档

## UPSERT :存在即更新,不存在就插入 SAVE OR UPDATE

POST /twitter/_update/4
  "doc": {
     "user": "66666",
     "uid": 5,
     "city": "555",
      "province": "555",
  "doc_as_upsert": true

13.04 检查一个文档是否存在

HEAD twitter/_doc/1

13.05 删除一个文档

DELETE twitter/_doc/1

## 在关系型数据库,我们通常是根据条件删除,在ES中可以使用_delete_by_query实现
POST twitter/_delete_by_query
  "query": {
    "match": {
      "city": "上海"

13.06 检查一个索引是否存在

HEAD twitter

13.07 删除一个索引

DELETE twitter

13.08 批处理命令 bulk

## 千万不要添加除了换行以外的空格,否则会导致错误

## 索引操作 index总是能成功
POST _bulk
{ "index" : { "_index" : "twitter", "_id": 2 }}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

## 删除操作
POST _bulk
{ "delete" : { "_index" : "twitter", "_id": 1 }}

## 更新操作
POST _bulk
{ "update" : { "_index" : "twitter", "_id": 2 }}
{"doc": { "city": "长沙"}}

## index总是能成功 create的话,文档存在的话就会报错
POST _bulk
{ "create" : { "_index" : "twitter", "_id": 1} }

13.09 关闭/启用索引

POST twitter/_close
POST twitter/_open

13.10 冻结/解冻 索引

POST twitter/_freeze

GET twitter/_search

## 如果要包含冻结索引做搜索
## 必须使用查询参数 ignore_throttled = false 来执行搜索请求
GET twitter/_search?ignore_throttled=false

## 解冻 索引
POST twitter/_unfreeze


  • 教程:ES的查询
  • 在ES的搜索中,主要有两类搜索:query 和 aggregation
  • query 可以帮住我们进行全文检索
  • aggregation 可以帮住我们对数据进行统计和分析。
  • 可以结合query和aggregation一起使用,比如先对文档进行query然后再进行aggregation。

14.1 搜索所有文档

GET /_search 

GET /_all/_search 

GET /Index1,Index2../_search 

POST /index*,-index3/_search 

GET /Index1/_search?size=2&from=0 

    GET twitter/_search
      "size": 2,
      "from": 0, 
      "query": {
        "match_all": {}

 GET Index1/_search?size=2&from=0&filter_path=hits.hits 
  "hits" : {
    "hits" : [
        "_index" : "Index1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "user" : "李四",
          "msg" : "happy birthday!",
          "uid" : 4,
          "age" : 25,
          "birth" : "1994-04-01"

14.2 Source Filtering


    GET twitter/_search
        "hits" : [
                    "_index" : "twitter",
                    "_type" : "_doc",
                    "_id" : "1",
                    "_score" : 1.0,
                    "_source" : {
                      "city" : "北京",
                      "user" : "张三"

GET /twitter/_search?filter_path=hits
  "_source": false,
  "query": {"match_all": {}}
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        "max_score" : 1.0,
        "hits" : [
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0
   GET /twitter/_search?filter_path=hits.hits
      "_source": {
        "includes": ["user","age"]
      "query": {"match_all": {}}
      "hits" : {
        "hits" : [
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "user" : "张三",
              "age" : 20

# 指定返回文档以u开头和c开头的字段       
GET /twitter/_search?filter_path=hits.hits
  "_source": {
    "includes": ["u*","c*"]
  "query": {"match_all": {}}
      "hits" : {
        "hits" : [
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "uid" : 2,
              "country" : "中国",
              "city" : "北京",
              "user" : "张三"

# 使用   excludes 属性排除uid和 country 字段   
    GET /twitter/_search?filter_path=hits.hits
      "_source": {
        "includes": ["u*","c*"],
        "excludes": ["uid","country"]     
      "query": {"match_all": {}}

      "hits" : {
        "hits" : [
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "city" : "北京",
              "user" : "张三"
 # 如果我们把_source设置为[],那么就是显示所有字段
        GET /twitter/_search?filter_path=hits.hits
          "query": {"match_all": {}}

14.3 Count

GET /twitter/_count

GET /twitter/_count

14.4 Settings

GET twitter/_settings

      "twitter" : {
        "settings" : {
          "index" : {
            "routing" : {
              "allocation" : {
                "include" : {
                  "_tier_preference" : "data_content"
            "number_of_shards" : "1",
            "provided_name" : "twitter",
            "creation_date" : "1618919497240",
            "number_of_replicas" : "1",
            "uuid" : "Nl1Cg--JQACp561b-KEVZw",
            "version" : {
              "created" : "7120099"

#一旦我们把 number_of_shards 定下来了,我们就不可以修改了,除非把 index 删除,并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards这 个数值有关的。一旦这个数值发生改变,那么之后寻找那个文档所在的 shard 就会不准确。

# 副本是可以动态设置的,主分片确定下来了就不能修改
PUT twitter
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1

14.5 Mapping



####正确的步骤是预先设定好settings 和mapping ,再进行文档的存储###

DELETE twitter

PUT twitter
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1

PUT twitter/_mapping
  "properties": {
    "address": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    "age": {
      "type": "long"
    "city": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    "country": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    "location": {
      "type": "geo_point"
    "message": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    "province": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    "uid": {
      "type": "long"
    "user": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256

14.6 match query

GET twitter/_search
      "query": {
        "match": {
          "user": { 
            "query": "朝阳区-老贾",
            "operator": "or",
            "minimum_should_match": 3

"user": { "query": "朝阳区-老贾","operator": "or", "minimum_should_match": 3}
     query: 对应查询的值
     operator:默认的操作是 or 操作。上面查询的结果是任何匹配:“朝”,“阳”,“区”,“老”及“贾”这5个字                  中的任何3个的文档将被显。同时也有and操作,标识要同时满足
     minimum_should_match : 最少匹配的词个数

14.7 Ids query

GET twitter/_search
  "query": {
    "ids": {"values": ["1", "2"] }

14.8 Multi_match

# 在很多情况下,我们并不知道具体是哪一个字段field包含这个搜索关键词,在这种情况下,就可以使用muliti_match来进行多字段查询。
GET twitter/_search
  "query": {
    "multi_match": {
      "query": "朝阳",
      "fields": [
      "type": "best_fields"
# multi_match 的type 为best_fields 最终的分数 _score 是按照得分最高的那个字段的分数为准。
# 对 address 含有 “朝阳” 的文档的分数进行3倍的加权


14.9 Prefix query

GET /twitter/_search
  "query": {
    "prefix": {"user": {"value": "老"}}

14.10 Term query

# term query 会在给定字段中进行精确的字词匹配。
GET /twitter/_search
  "query": {
    "term": {"user.keyword": {"value": "老吴"}}

14.11 Terms query

#查询 user.keyword 里含有“老吴” 或 “张三”的所有文档
GET /twitter/_search
  "query": {
    "terms": {"user.keyword": ["老吴","张三"]}

14.12 Terms_set query


# 创建索引,设置mapping,为 最少匹配术语个数 设置一个字段“required_matches” 
PUT /job-candidates
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      "programming_languages": {
        "type": "keyword"
      "required_matches": {
        "type": "long"
# 添加文档1
PUT /job-candidates/_doc/1?refresh
  "name": "Jane Smith",
  "programming_languages": [ "c++", "java" ],
  "required_matches": 2
# 添加文档2
PUT /job-candidates/_doc/2?refresh
  "name": "Jason Response",
  "programming_languages": [ "java", "php" ],
  "required_matches": 2

## 进行trems_set查询,使用minimum_should_match_field 引用文档最少匹配字段值
GET /job-candidates/_search?filter_path=hits.hits
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_field": "required_matches"

#返回结果 :
  "hits" : {
    "hits" : [
        "_index" : "job-candidates",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1005894,
        "_source" : {
          "name" : "Jane Smith",
          "programming_languages" : [
          "required_matches" : 2
        "_index" : "job-candidates",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1005894,
        "_source" : {
          "name" : "Jason Response",
          "programming_languages" : [
          "required_matches" : 2

## 如果没有为文档设定专门最少匹配字段的话,可以使用另外一种方式,可以达到相同效果
## 使用 minimum_should_match_script 设定最少匹配术语的个数
GET /job-candidates/_search
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_script": {"source": "2"}

14. 13 Bool query

Bool Query 复合查询。
# 复合查询主要是由 
# must(必须等于), q
# must_not(必须不等于), 
# should(有满足条件更好,没有就算了)
# filter 共同来组成的

# 复合查询的一般格式
POST _search
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      "filter": {
        "term" : { "tag" : "tech" }
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      "minimum_should_match" : 1,
      "boost" : 1.0

#查询的是必须是 北京城市的,并且年刚好是30岁的。
GET twitter/_search
  "query": {
    "bool": {
      "must": [
        {"match": {"city": "北京"}},
        {"match": {"age": "30"}}
GET twitter/_search
  "query": {
    "bool": {
      "must_not": [
        {"match": {"city": "北京"}}

# 查询类型对 hits 及 _score 的影响

类型	   影响hits	影响 _score
must	 Yes	   Yes
must_not Yes	   No
should	 No*	   Yes
filter	 Yes	   No

#should 只有在特殊的情况下才会影响 hits,也就是说只有should区域的时候

#查询user字段是“老王” 或 “老吴”的文档,这时候should就影响到了最终的查询结果。
GET twitter/_search
  "query": {
    "bool": {
      "should": [
        {"match": {"user": {"query": "老王","operator": "and"}}},
        {"match": {"user": {"query": "老吴","operator": "and"}}}

14.14 位置查询


14.15 Range query

#范围查询 查询年龄>= 20 && <= 28的文档,并按age字段值升序排列
GET /twitter/_search
  "query": {
    "range": {
      "age": {"gte": 20,"lte": 28}
  "sort": [
    {"age": {"order": "asc"}}

# [gt , lt , gte ,lte]

14.16 Exists query

#通过 exists 来查询一个字段是否存在

#如果文档里只要 city 这个字段不为空,那么就会被返回。
GET twitter/_search
  "query": {
    "exists": {
      "field": "city"

# 查询不含 city 这个字段的所有的文档
GET twitter/_search
  "query": {
    "bool": {
      "must_not": [{"exists": {"field": "city"}}]

14.17 match_phrase query

# match_phrase 匹配短语查询,要求词语间的先后顺序,并且可以设置间隔词数
GET twitter/_search
  "query": {
    "match_phrase": {
      "message": {"query": "Happy birthday","slop": 1}
  "highlight": {
    "fields": {"message": {}}

#slop为1,表明Happy 和 birthday 之前是可以允许一个 token 的差别.
# 比如  "Happy Good BirthDay" 也会被匹配上

14.18 Named query

#Named query 名命查询,为每个叶子查询 使用"_name"取名字.
GET twitter/_search?filter_path=hits.hits
  "query": {
    "bool": {
      "must": [
          "match": {
            "city": {
              "query": "北京",
              "_name": "城市"
          "match": {
            "country": {
              "query": "中国",
              "_name": "国家"
      "should": [
          "match": {
            "_id": {
              "query": "1",
              "_name": "ID"

##返回结果会多出一个字段 matched_queries (匹配的查询)
  "hits" : {
    "hits" : [
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3152701,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          "DOB" : "1999-04-01"
        "matched_queries" : [
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.31527004,
        "_source" : {
          "user" : "老刘",
          "message" : "出发,下一站云南!",
          "uid" : 3,
          "age" : 22,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          "DOB" : "1997-04-01"
        "matched_queries" : [

14.19 通配符查询

#可以使用wildcard 查询一个字符串里含有字符


GET twitter/_search?filter_path=hits.hits
  "query": {
    "wildcard": {
      "message.keyword": {
        "value": "*BirthDay My*"

  "hits" : {
    "hits" : [
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 26,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          "DOB" : "1993-04-01"

GET /twitter/_search?filter_path=hits.hits
  "query": {
    "match": {
      "message": {
        "query": "*BirthDay My*",
        "operator": "and",

  "hits" : {
    "hits" : [
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.7926655,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 26,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          "DOB" : "1993-04-01"
        "matched_queries" : [

14.20 Sql query

GET /_sql?
  "query": """
    SELECT * FROM twitter 
    WHERE age = 30

## 翻译成ES查询
GET /_sql/translate
  "query": """
    SELECT * FROM twitter 
    WHERE age = 30

14.21 Multi Search API

#Multi Search API:把多个请求放到一个 API 请求中来实现,节省 API 的请求个数。
GET twitter/_msearch
{"query":{"bool":{"filter":{"term":{"city.keyword":"北京"}}}}, "size":1}

14.22 Profile API

#Profile API 是调试工具,以帮助确定某些请求为何缓慢.
GET twitter/_search
  "profile": "true", 
  "query": {
    "match": {
      "city": "北京"

15. ES聚合 Aggregation

15.1 range 聚合

# ranges聚合 把数据分成不同的桶。通常这样的方法只适合字段为数字的字段。
# range聚合 含头不含尾,我们不关心命中的文档,所以把size设置为0
GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
            "from": 20,
            "to": 22
            "from": 22,
            "to": 25
            "from": 25,
            "to": 30

  "aggregations" : {
    "age" : {
      "buckets" : [
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3

# 我们可以在buckets聚合之下,再做子聚合:
GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
            "from": 20,
            "to": 22
            "from": 22,
            "to": 25
            "from": 25,
            "to": 30
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"

## 返回结果,意思就是对buckets区域的每一个桶内,进行求它们自己的平均年龄
## 针对每个桶 20-2222-2525-30,分别计算它们的平均年龄。
  "aggregations" : {
    "age" : {
      "buckets" : [
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 20.0
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 22.0
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3,
          "avg_age" : {
            "value" : 26.333333333333332

## 甚至可以进行更多子聚合
GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
            "from": 20,
            "to": 22
            "from": 22,
            "to": 25
            "from": 25,
            "to": 30
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          "min": {
            "field": "age"
          "max": {
            "field": "age"

## 返回结果
  "aggregations" : {
    "age" : {
      "buckets" : [
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 20.0
          "avg_age" : {
            "value" : 20.0
          "min_age" : {
            "value" : 20.0
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 22.0
          "avg_age" : {
            "value" : 22.0
          "min_age" : {
            "value" : 22.0
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3,
          "max_age" : {
            "value" : 28.0
          "avg_age" : {
            "value" : 26.333333333333332
          "min_age" : {
            "value" : 25.0

15.2 Filters聚合

# Filters聚合:我们可以针对非数字字段来进行建立不同的 bucket桶。每个存储桶将收集与其关联的过滤器匹配的所有文档。

GET /twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "by_cities": {
      "filters": {
        "filters": {
          "bj": {

## 返回结果
      "aggregations" : {
        "by_cities" : {
          "buckets" : {
            "bj" : {
              "doc_count" : 5
            "sh" : {
              "doc_count" : 1

15.3 Filter聚合

GET /twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "bj": {
      "filter": {
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"

## 返回结果
  "aggregations" : {
    "bj" : {
      "doc_count" : 5,
      "avg_age" : {
        "value" : 24.6

15.4 date_range聚合

## 使用 date_range 来统计在某个时间段里的文档数
POST twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "birth_range": {
      "date_range": {
        "field": "DOB",
        "format": "yyyy-MM-dd",
        "ranges": [
            "from": "1989-01-01",
            "to": "1990-01-01"
            "from": "1991-01-01",
            "to": "1992-01-01"

## 返回结果
  "aggregations" : {
    "birth_range" : {
      "buckets" : [
          "key" : "1989-01-01-1990-01-01",
          "from" : 5.99616E11,
          "from_as_string" : "1989-01-01",
          "to" : 6.31152E11,
          "to_as_string" : "1990-01-01",
          "doc_count" : 1
          "key" : "1991-01-01-1992-01-01",
          "from" : 6.62688E11,
          "from_as_string" : "1991-01-01",
          "to" : 6.94224E11,
          "to_as_string" : "1992-01-01",
          "doc_count" : 1

15.5 tems聚合

#可以通过 term 聚合来查询某一个关键字出现的频率.在如下的 term 聚合中,我们想寻找在所有的文档出现 ”Happy birthday” 里按照城市进行分类的一个聚合
GET /twitter/_search?filter_path=aggregations
  "query": {
    "match": {
      "message": "happy birthday"
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10
## terms.size表示前10名的城市

  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
          "key" : "北京",
          "doc_count" : 2
          "key" : "上海",
          "doc_count" : 1

##正常的情况下,聚合是按照 doc_count 来进行排序,如果你想按照 key 进行排序的话
GET /twitter/_search?filter_path=aggregations
  "query": {
    "match": {
      "message": "happy birthday"
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "_key": "asc"

## 返回 按照英文首字母排序
  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
          "key" : "上海",
          "doc_count" : 1
          "key" : "北京",
          "doc_count" : 2

## 也可以使用 _count 来进行升序的排列
GET /twitter/_search?filter_path=aggregations
  "query": {
    "match": {
      "message": "happy birthday"
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "_count": "asc"


##使用 嵌套聚合 的结果进行排序
GET /twitter/_search?filter_path=aggregations
  "query": {
    "match": {
      "message": "happy birthday"
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "avg_age": "desc"
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"

## 返回结果
  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
          "key" : "上海",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 28.0
          "key" : "北京",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 25.5

15.6 Histogram 聚合

## 数值字段 间隔聚合,它根据值动态构建 固定间隔 的存储桶。
GET twitter/_search?filter_path=aggregations
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 2

###含头不含尾  20-22 122-24 1{
  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
          "key" : 20.0,
          "doc_count" : 1
          "key" : 22.0,
          "doc_count" : 1
          "key" : 24.0,
          "doc_count" : 1
          "key" : 26.0,
          "doc_count" : 1
          "key" : 28.0,
          "doc_count" : 1
          "key" : 30.0,
          "doc_count" : 1

GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 2,
        "order": {
          "avg_age": "desc"
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"

## 返回结果
  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
          "key" : 30.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 30.0
          "key" : 28.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 28.0
          "key" : 26.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 26.0
          "key" : 24.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 25.0
          "key" : 22.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 22.0
          "key" : 20.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 20.0

15.7 date_histogram聚合

 GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "date_interval": {
      "date_histogram": {
        "field": "DOB",
        "format": "yyyy-MM-dd", 
        "interval": "year"

  "aggregations" : {
    "date_interval" : {
      "buckets" : [
          "key_as_string" : "1989-01-01",
          "key" : 599616000000,
          "doc_count" : 1
          "key_as_string" : "1990-01-01",
          "key" : 631152000000,
          "doc_count" : 0
          "key_as_string" : "1991-01-01",
          "key" : 662688000000,
          "doc_count" : 1
          "key_as_string" : "1992-01-01",
          "key" : 694224000000,
          "doc_count" : 0
          "key_as_string" : "1993-01-01",
          "key" : 725846400000,
          "doc_count" : 1
          "key_as_string" : "1994-01-01",
          "key" : 757382400000,
          "doc_count" : 1
          "key_as_string" : "1995-01-01",
          "key" : 788918400000,
          "doc_count" : 0
          "key_as_string" : "1996-01-01",
          "key" : 820454400000,
          "doc_count" : 0
          "key_as_string" : "1997-01-01",
          "key" : 852076800000,
          "doc_count" : 1
          "key_as_string" : "1998-01-01",
          "key" : 883612800000,
          "doc_count" : 0
          "key_as_string" : "1999-01-01",
          "key" : 915148800000,
          "doc_count" : 2

15.8 cardinality聚合

#可以使用 cardinality 基数聚合 来统计到底有多少个城市:
GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "num_of_city": {
      "cardinality": {
        "field": "city"

  "aggregations" : {
    "num_of_city" : {
      "value" : 2

15.9 metricm聚合


GET twitter/_search
  "size": 0,
  "aggs": {
    "average_age": {
      "avg": {
        "field": "age"

#2.使用 global 聚合,对所有文档执行,而不受查询的影响。
POST twitter/_search?filter_path=aggregations
  "size": 0,
  "query": {
    "match": {
      "city": "北京"
  "aggs": {
    "avg_age_beijing": {
      "avg": {
        "field": "age"
      "global": {},
      "aggs": {
        "age_global_avg": {
          "avg": {
            "field": "age"

##返回结果 出现全局平均年龄 和 北京用户的平均年龄
  "aggregations" : {
    "avg_age_all" : {
      "doc_count" : 7,
      "age_global_avg" : {
        "value" : 25.166666666666668
    "avg_age_beijing" : {
      "value" : 24.6

#3.使用 stats 对整个年龄进行统计
GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"

  "aggregations" : {
    "age_stats" : {
      "count" : 6,
      "min" : 20.0,
      "max" : 30.0,
      "avg" : 25.166666666666668,
      "sum" : 151.0

15.10 percentile聚合

GET twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "age_quartiles": {
      "percentiles": {
        "field": "age",
        "percents": [

## 返回结果 升序排列,按照百分比统计 20 22 25 26 28 30
## 25%的人平均年龄是低于22.0岁,而50%的人的年龄是低于25.5岁,而所有的人的年龄都是低于30岁的
  "aggregations" : {
    "age_quartiles" : {
      "values" : {
        "25.0" : 22.0,
        "50.0" : 25.5,
        "75.0" : 28.0,
        "100.0" : 30.0

15.11 missing聚合

GET /twitter/_search?filter_path=aggregations
  "size": 0,
  "aggs": {
    "total_missing_age": {
      "missing": {
        "field": "sss"

#结果 显示7个文档没有sss字段
  "aggregations" : {
    "total_missing_age" : {
      "doc_count" : 7

16. ES JAVA Client 使用

16.1 Http Jest 客户端的使用

  • 思路如下:
  • 1.使用elasticsearch-sql 解析SQL生成ES查询DSL
  • 2.再通过http JEST 客户端使用 DSL字符串进行查询(直接使用elasticsearch-sql有深度翻页问题)
  • 3.MAVEN依赖,统一使用 6.3.0 版本

        <!--JAVA Netty客户端 transport-->

        <!--ES JEST客户端-->

        <!--elasticsearch-sql SQL转换DSL-->
  • 使用示例
//+++++++++++++++++++++++ TransportClient
//        Settings settings = Settings.builder().put("cluster.name", "mcc-sit").build();
        TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY)
                                 .addTransportAddress(new TransportAddress(InetAddress.getByName(""), 9300));
        String explain = ESActionFactory.create(transportClient, "SELECT * FROM LC14  WHERE ID='ae04642d-9eb9-11ec-b493-005056861704' ").explain().explain();
        System.out.println("explain = " + explain);
        // ==============================JEST=====================
        JestClientFactory factory = new JestClientFactory();
                new HttpClientConfig
        JestClient jestClient = factory.getObject();

        Search.Builder ssss = new Search.Builder(explain).addIndex("lc14").addType("base");
        SearchResult execute2 = jestClient.execute(ssss.build());
        String sourceAsString = execute2.getSourceAsString();
        System.out.println("sourceAsString = " + sourceAsString);

    //时间类型 范围查询 需要调用 date的toInstant方法
        Date date = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse("2022-04-06 09:43:29");

        String SQL_006 ="SELECT * FROM test_es_001 WHERE birth >= '"+date.toInstant().toString()+"'";
         String explainxSSSS = ESActionFactory.create(transportClient,SQL_008).explain().explain();
        System.out.println("==========explainxSSSS==========>>> " + explainxSSSS);
        SearchResult test_es_001 = jestClient.execute(new Search.Builder(explainxSSSS).addIndex("test_es_001").addType("test_es_001").build());
        System.out.println("test_es_001.getSourceAsStringList() = " + test_es_001.getSourceAsStringList());
// 模糊查询
        String SQL_008 ="SELECT * FROM test_es_001 WHERE remark.keyword like '%的撒%'";
         String explainWilad = ESActionFactory.create(transportClient,SQL_008).explain().explain();
          SearchResult test_es_001 = jestClient.execute(new Search.Builder(explainWilad ).addIndex("test_es_001").addType("test_es_001").build());
// 稍微复杂一点SQL转换
String SQL_009 =
"SELECT * FROM test_es_001 WHERE (age >= 38 or name.keyword = '张三') and remark.keyword like '%的撒%'";
 String explainSQL_009 = ESActionFactory.create(transportClient,SQL_009).explain().explain();
        System.out.println("==========explainSQL_009==========>>> " + explainSQL_009);

        SearchResult test_es_009 = jestClient.execute(new Search.Builder(explainSQL_009).addIndex("test_es_001").addType("test_es_001").build());
        System.out.println("test_es_009.getSourceAsStringList() = " + test_es_009.getSourceAsStringList());
  • JEST 基本ES操作
//todo    =======1. HTTP JEST 客户端 创建 Index  =======
        CreateIndex.Builder builder4CreateIndex =  new CreateIndex.Builder("test_es_001");
        //        builder4CreateIndex.settings()
        JestResult execute4Create = jestClient.execute(builder4CreateIndex.build());

//todo    =======2. HTTP JEST 客户端 删除 Index  =======
        DeleteIndex.Builder builder4DeleteIndex = new DeleteIndex.Builder("test_es_001");
        JestResult execute4Delete = jestClient.execute(builder4DeleteIndex.build());

//todo =======3. HTTP JEST 客户端 批量插入数据到 ES =======
        // ## 注意 P.User实体 主键需带上  @JestId 注解

        List<P.User> usersData = Arrays.asList(
                new P.User(123451, "小1", 33,true,new Date(),"xxxxyyyzzz"),
                new P.User(123462, "小离", 44,false,new Date(),"a的撒旦"),
                new P.User(123473, "小二", 55,true,new Date(),"213123213顶顶顶顶...."));

        // 使用BULK API
        Bulk.Builder bulkBuilder = new Bulk.Builder();
        for (P.User usersDatum : usersData) {
            Index.Builder obj = new Index.Builder(usersDatum).index("test_es_001").type("test_es_001");

        BulkResult bulkResult4Insert = jestClient.execute(bulkBuilder.build());
        boolean succeeded = bulkResult4Insert.isSucceeded();
        System.out.println("succeeded = " + succeeded);

//todo ======= 4. HTTP JEST 客户端 批量更新ES数据(批量更新需注意 doc节点 ) =======
        List<Update> updateList = new ArrayList<>();
        List<P.User> usersData4Update = Arrays.asList(
                new P.User(123451, "小111_001", 38,true,new Date(),"xxxxyyyzzz111"),
                new P.User(123462, "小离1_002", 44,false,new Date(),"a的撒旦"),
                new P.User(123473, "小二1_003", 55,true,new Date(),"213123213顶顶顶顶...."));

        for (P.User user : usersData4Update) {
            JSONObject object = (JSONObject)JSON.toJSON(user);
            JSONObject docPayload = new JSONObject().fluentPut("doc", object);
            Update update = new Update.Builder(docPayload)

        Bulk bulk4Update = new Bulk.Builder().addAction(updateList)

        BulkResult bulkResult4Update = jestClient.execute(bulk4Update);
        System.out.println("bulkResult4Update.getErrorMessage() = " + bulkResult4Update.getErrorMessage());

//todo ======= 5. HTTP JEST 客户端 批量删除 ES数据  =======
        List<Delete> deleteList = new ArrayList<>();
        String[] id2deleteA = {"12345","12346","12347"};// 数据 ID 集合
        for (String id : id2deleteA) {
            Delete.Builder db  = new Delete.Builder(id).index("test_es_001").type("test_es_001");
        Bulk.Builder BB = new Bulk.Builder();
        BulkResult bulkResult4Delete = jestClient.execute(BB.build());

//todo ======= 6. HTTP JEST 客户端 根据ID 查询ES数据  =======

        List<Integer> integerList = Arrays.asList(123451, 123462, 123473);
        for (Integer id : integerList) {
            DocumentResult documentResult = jestClient.execute(new Get.Builder("test_es_001", id + "").type("test_es_001").build());
            P.User sourceAsObject = documentResult.getSourceAsObject(P.User.class);
            System.out.println("sourceAsObject = " + sourceAsObject);

//todo ======= 7. HTTP JEST 客户端 根据 DSL 查询ES数据  =======

        Search.Builder searchBuilder4DSL =
                // dsl
                new Search.Builder(" {\"from\":0,\"size\":200,\"query\":{\"bool\":{\"filter\":[{\"bool\":{\"must\":[{\"wildcard\":{\"remark.keyword\":{\"wildcard\":\"*的撒*\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}}}")
        SearchResult execute2 = jestClient.execute(searchBuilder4DSL.build());
        List<String> sourceAsStringList = execute2.getSourceAsStringList();
        System.out.println("@@@@@@@@@@@@@@@@@sourceAsStringList@@@@@@@@@@@@@@@@@ = " + sourceAsStringList);

16.2 RestHighLevelClient的使用

  • 测试实体类
public class User{
    private Long id;
    private String name;
    private Integer age;
    private Boolean isValid;
    // 控制时间类型转JSON的输出格式
    @com.alibaba.fastjson.annotation.JSONField(format = "yyyy-MM-dd HH:mm:ss")
    private Date birth;
  • user索引的别名+表结构+settings
  "user" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        "birth" : {
          "type" : "date",
          "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        "id" : {
          "type" : "long"
        "isValid" : {
          "type" : "boolean"
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
        "number_of_shards" : "1",
        "provided_name" : "user",
        "creation_date" : "1660829464473",
        "number_of_replicas" : "1",
        "uuid" : "N-iPkQT7Rdm2nH7YJL2Kuw",
        "version" : {
          "created" : "7120099"

16.2.1 Bulk API 批量保存

  • Bulk API 同时支持 IndexRequest + UpdateRequest + DeleteRequest
 User user = User.builder().id(1L).name("zs").age(18).isValid(true).birth(new Date()).build();
 User user2 = User.builder().id(2L).name("zs xxx").age(18).isValid(true).birth(new Date()).build();
 User user3 = User.builder().id(3L).name("xxx 001").age(20).isValid(true).birth(new Date()).build();
 User user4 = User.builder().id(4L).name("zs xxx 001").age(24).isValid(true).birth(new Date()).build();
 User user5 = User.builder().id(5L).name("001 zs xxx").age(24).isValid(true).birth(new Date()).build();

 List<User> userList = Arrays.asList(user, user2, user3, user4,user5);
 BulkRequest bulkRequest = new BulkRequest("user","_doc");
 for (User obj : userList) {
     IndexRequest indexRequest = new IndexRequest();
     // 设置 写请求(增删改)的 刷新策略 默认NONE 请求无视数据是否已被刷新完成 直接结束请求
     indexRequest.source(JSON.toJSONString(obj), XContentType.JSON).id(obj.getId()+"");
 BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
 BulkItemResponse[] bulkItemResponses = bulkResponse.getItems();
 RestStatus status = bulkResponse.status();
 int status1 = status.getStatus();

16.2.2 GetRequest

  • 根据 id 查询文档
 GetRequest getRequest = new GetRequest("user","4");
 GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
 User u1 = JSONObject.parseObject(response.getSourceAsBytes(), User.class);
 User u2 = JSONObject.parseObject(response.getSourceAsString(), User.class);

16.2.3 IndexRequest

  • 使用 IndexRequest 如果索引 _id 已经存在则进行更新,不存在就进行新增
User user_i = User.builder().id(1L).name("zs").age(18).isValid(true).birth(new Date()).build();
IndexRequest indexRequest = new IndexRequest("user");
indexRequest.source(JSON.toJSONString(user_i), XContentType.JSON).id(user_i.getId()+"");
IndexResponse indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
int status_i = indexResponse.status().getStatus();
System.out.println("status_i = " + status_i);

16.2.4 UpdateRequest(局部更新)

  • 使用UpdateRequest 修改或新增文档
  • upsert 属性 等于 false的时候, 如果文档不存在 即抛出 document_missing_exception
  • upsert 属性 等于 true的时候 , 如果文档不存在 就会进行插入操作
User user_001 = User.builder().id(10L).name("zs").age(180).isValid(true).birth(new Date()).build();
UpdateRequest updateRequest = new UpdateRequest("user","10");
updateRequest.doc(JSON.toJSONString(user_001), XContentType.JSON);
System.out.println("updateRequest.toString() = " + updateRequest.toString());
UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
int status_upset = updateResponse.status().getStatus();
  • 局部更新:只会更新指定的字段值
// todo UpdateRequest 局部更新
UpdateRequest updateRequest = new UpdateRequest("user","1");
User user1 = new User.UserBuilder().id(1L).birth(new Date()).build();
updateRequest.doc(JSON.toJSONString(user1), XContentType.JSON);
UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
System.out.println("updateResponse = " + updateResponse);

16.2.5 DeleteRequest

  • 根据 id 删除文档
DeleteRequest deleteRequest = new DeleteRequest("user");
DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
int status_del = deleteResponse.status().getStatus();
System.out.println("status_del = " + status_del);

16.2.5 CountRequest

  • 根据查询条件获取结果集的数量
// todo 构造查询条件 SearchSourceBuilder
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

// todo CountRequest 根据查询条件计算数据量
CountRequest countRequest = new CountRequest(new String[]{"user"});
CountResponse countResponse = restHighLevelClient.count(countRequest, RequestOptions.DEFAULT);
long count = countResponse.getCount();
System.out.println("count = " + count);

16.2.6 MultiGetRequest

  • 根据多个id,一次获取多个文档
MultiGetRequest multiGetRequest = new MultiGetRequest();
MultiGetResponse itemResponses = restHighLevelClient.mget(multiGetRequest, RequestOptions.DEFAULT);
for (MultiGetItemResponse response : itemResponses) {
    String sourceAsString = response.getResponse().getSourceAsString();
    System.out.println("sourceAsString = " + sourceAsString);

16.2.6 SearchRequest

  • 完整单个查询构造过程
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "zs xxx 001");
//        matchQueryBuilder.analyzer("");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
System.out.println("searchSourceBuilder = " + searchSourceBuilder);
//        searchSourceBuilder.fetchSource();
SearchRequest searchRequest = new SearchRequest(new String[]{"user"},searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
StringJoiner stringJoiner = new StringJoiner(",","[","]");
for (SearchHit searchHit : searchResponse.getHits()) {
List<User> userListA = JSONArray.parseArray(stringJoiner.toString(), User.class);
System.out.println("userListA.toString() = " + userListA.toString());

16.2.7 MultiSearchRequest

  • 一次发送多个查询请求SearchRequest
 // todo MSearch
 MultiSearchRequest mSR = new MultiSearchRequest();
 SearchRequest SR1 = new SearchRequest("user");
 SR1.source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()));

 SearchRequest SR2 = new SearchRequest("user");
 SR2.source(new SearchSourceBuilder().query(QueryBuilders.termQuery("name.keyword","zs xxx 001")));

 MultiSearchResponse multiSearchResponse = restHighLevelClient.msearch(mSR, RequestOptions.DEFAULT);
 for (MultiSearchResponse.Item item : multiSearchResponse.getResponses()) {
     SearchResponse response = item.getResponse();
     for (SearchHit hit : response.getHits()) {
         System.out.println("hit.getSourceAsString() = " + hit.getSourceAsString());

16.2.8 Scroll

  • Scroll游标滚动查询 应对 大数据量返回的情况
  • 运用 scroll 接口对大量数据返回 实现更好的分页
// todo 构造查询条件 SearchSourceBuilder
 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

 // todo CountRequest 根据查询条件计算数据量
 CountRequest countRequest = new CountRequest(new String[]{"user"});
 CountResponse countResponse = restHighLevelClient.count(countRequest, RequestOptions.DEFAULT);
 long count = countResponse.getCount();
 System.out.println("count = " + count);

 //todo 根据 总数量 和 pageSize计算总页数
 int pageSize = 2;
 long totalPage = (count / pageSize) + (count % pageSize == 0?0:1);

 // todo Scroll 游标滚动查询 应对 大数据量返回的情况
 Scroll scroll = new Scroll(TimeValue.timeValueMinutes(2));// 设置游标id存活时间
 // 记录所有游标id
 List<String> scrollIdList = new ArrayList<>();
 // 添加sourceBuilder 并且设置每次返回的size
 sourceBuilder.size(pageSize).sort("_id", SortOrder.ASC);
 SearchRequest searchRequest = new SearchRequest(new String[]{"user"},sourceBuilder);

 List<User> returnList = new ArrayList<>();
 String scrollId = null;
 SearchResponse searchResponse;
 for (int i = 0; i < totalPage; i++) {
     if (i ==0){
         searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
     }else {
         SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
         searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
     scrollId = searchResponse.getScrollId();
     for (SearchHit hit : searchResponse.getHits()) {
         User user = JSONObject.parseObject(hit.getSourceAsString(), User.class);
 System.out.println("returnList = " + returnList);

 // todo Clear the scroll context once the scroll is completed
 ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
 try {
     ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
     boolean succeeded = clearScrollResponse.isSucceeded();
     System.out.println("succeeded = " + succeeded);
  } catch (IOException e) {
     System.out.println("Clear the scroll context IOException...");

16.2.9 BoolQuery

  • 复合查询 主要由 must + mustNot + should + filter 四大部分组成
  • 可通过 minimumShouldMatch 属性设置 should 部分 最小生效条件个数,默认should不影响结果集
  • 复合查询 + 分页 + 简单聚合操作 DEMO
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.matchQuery("name","zs xxx 000").minimumShouldMatch("2").operator(Operator.OR))
         .filter(QueryBuilders.termsQuery("age",new long[]{18L,24L}));
//        boolQuery.mustNot()
//        boolQuery.should();
//        boolQuery.minimumShouldMatch(1);

SearchSourceBuilder ssb = new SearchSourceBuilder();
// 分页
// ssb.from(0).size(100);

// 聚合操作
System.out.println("ssb = " + ssb);

SearchRequest sr = new SearchRequest(new String[]{},ssb);
SearchResponse sres = restHighLevelClient.search(sr, RequestOptions.DEFAULT);
for (SearchHit hit : sres.getHits()) {
    System.out.println("hit = " + hit.getSourceAsString());

// 获取聚合结果,需要强制为具体的实现类
Aggregation ageSUM = sres.getAggregations().get("ageSUM");
Aggregation totalCount = sres.getAggregations().get("totalCount");


disk usage exceeded flood-stage watermark ,read_only_allow_delete
--elastic.yml 加入如下配置    
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.flood_stage: 2gb
# 到达就停止创建新shards
cluster.routing.allocation.disk.watermark.low: 30gb
cluster.routing.allocation.disk.watermark.high: 20gb

Kibana控制台输入 取消只读
  PUT _settings
    "index": {
        "blocks": {
           "read_only_allow_delete": "false"


  • @Document
  • @Field
  • @Field(type = FieldType.Keyword) 和 @Field(type = FieldType.Text)区别
查看 Keyword 和 Text 区别
  • @Field(store = true)
(1)store = false时,默认设置;那么给字段只存储在"_source"的Field域中
(2)store = true时,该字段的value会存储在一个跟_source平级的独立Fields域中,
  • 0
