ElasticSearch

ElasticSearch

1.ElasticSearch是什么?

  • ElasticSearch是一个开源的高扩展的分布式全文检索引擎
  • Elasticsearch是一个NoSql的文档型数据库。
  • 它的目的是通过简单的Restful API来隐藏Lucene的复杂性从而让全文检索变得简单。
  • ElasticSearch官方博客

1.1 市面上常见的搜索引擎框架

  • Lucene: 老牌搜索引擎
  • Solr: 基于Lucene改版的搜索引擎
  • Elaticsearch: 基于Lucene的分布式搜索引擎
  • ELK:分布式日志分析框架 ( elasticSearch / kibanna / Logstash

2.ElasticSearch的核心概念?

  • Elasticsearch 中的一些重要概念: cluster, node, index, document, shards 及 replica
    在这里插入图片描述

  • cluster:ES集群由一个或者多个节点(node)组成,可以在elasticsearch.yml设置集群名称。

  • node:一个节点代表一个elasticSearch实例。在大多数环境中,每个节点都在单独的虚拟机上运行。

  • index

    • 1.概念:索引是文档的集合,每个索引由一个或多个文档组成,这些文档可以被分配到不同的分片(shard)当中。
    • 2.文档被分配到指定索引的某个分片的算法: 每当一个文档进来后,根据文档的 id 会自动进行 hash 计算,并存放于计算出来的 shard 实例中,这样的结果可以使得所有的shard都比较有均衡的存储,而不至于有的 shard 很忙。
      shard_num = hash(_id) % num_primary_shards
    • 3.注意:从上面的公式我们也可以看出来,我们的 主shard 数目是不可以动态修改的,否则之后也找不到相应的主shard号码了。必须指出的是,replica 的数目是可以动态修改的。
  • shards: elasticSearch提供了将索引划分成多份的能力,每一份都称作为一个分片。

  • replica

1.概念:默认情况下,ES为每个索引创建一个主分片和一个副本。
       这意味着每个索引将包含一个主分片,每个主分片将具有一个副本。
    
2.主分片和副本分片之间的主要区别在于:
        只有主分片可以接受索引请求。副本和主分片都可以提供查询请求。
    
3.副本存在的意义:    
          1.增加故障转移:如果主要故障,可以将副本分片提升为主分片
          2.提高性能:get 和 search 请求可以由主 shard 或副本 shard 处理。
4.注意:                                 
     1.默认情况下,每个主分片都有一个副本,但可以在现有索引上动态更改副本数。 
     2.永远不会在与其主分片相同的节点上启动副本分片。   
  • type
1.概念: 类型(type) 是文档的逻辑容器,类似于表是行的容器。
在默认的情况下是 _doc。

2.注意:
    在ES 6.0以后,一个Index只能含有一个type。
    这其中的原因是:相同index下的不同type中具有相同名称的字段在ES索引中,
    不同type中具有相同名称的字段在 Lucene中被同一个字段支持。
    在未来8.0的版本中,type将被彻底删除。
  • mapping: 每个type的 所有字段定义 称为映射(类似于表结构),规定了每个字段的数据类型定义和其它功能设置。

  • document: ES是面向文档的,文档是搜索的最小数据单元,相当于关系型数据库的数据行。

3.ElaticSearch对比关系型数据库?

关系型数据库ElaticSearch
Databases (数据库)Index(索引库)
Tables(表)Types(类型)
Rows(行)Documents(文档)
columns(列)Fields(字段)

4.ElasticSearch的数据类型有哪些?

在这里插入图片描述

4.1 ElasticSearch的 mapping 中的 fields 实现 multi-fields(多字段类型)

  • elastic-fields
  • a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations
  • text类型用于全文搜索;keyword类型用于排序,聚合,以及精确字符串匹配
  • 1 创建 my-index-000001 索引 设置mappings,city字段包含: text类型字段 和 keyword类型字段的raw
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "raw": { 
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
  • 2 添加文档到my-index-000001索引
PUT my-index-000001/_doc/1
{
  "city": "New York"
}

PUT my-index-000001/_doc/2
{
  "city": "York"
}

PUT my-index-000001/_doc/3
{
  "city": "MY York"
}
  • 3 进行查询,并进行排序 和 聚合
GET my-index-000001/_search
{
  "query": {
    "match": {
    "city": "york" // text 类型 用于全文搜索
    }
  },
  "sort": [
    {
      "city.raw": {
        "order": "asc" // keyword 类型 用于排序
      }
    }
  ],
  "aggs": {
    "cities": {
      "terms": {
        "field": "city.raw" // keyword 类型 用于聚合
      }
    }
  }
}
  • 4 kibana控制台显示查询结果
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "city" : "MY York"
        },
        "sort" : [
          "MY York"
        ]
      },
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "city" : "New York"
        },
        "sort" : [
          "New York"
        ]
      },
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "city" : "York"
        },
        "sort" : [
          "York"
        ]
      }
    ]
  },
  "aggregations" : {
    "cities" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "MY York",
          "doc_count" : 1
        },
        {
          "key" : "New York",
          "doc_count" : 1
        },
        {
          "key" : "York",
          "doc_count" : 1
        }
      ]
    }
  }
}

4.2 时间类型date

  • 创建索引 的同时 设置 mapping中date类型字段的format属性
  • date 默认的format为: strict_date_optional_time || epoch_millis
  • || 表示 或者
  • fomart属性表示 可以接收的时间字符串格式类型
PUT my_index2
{
  "mappings": {
    "properties": {
      "create_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
  • 索引 my_index2 能够接收多种时间格式的文档:
    • yyyy-MM-dd HH:mm:ss.SSS
    • yyyy-MM-dd HH:mm:ss
    • yyyy-MM-dd
    • epoch_millis
  • 使用bulk API 添加多个 时间格式不同的文档
POST _bulk
{"create":{"_index":"my_index2","_type":"_doc"}}
{"create_time":1659669006440} // epoch_millis
{"create":{"_index":"my_index2","_type":"_doc"}}
{"create_time":"2015-01-01 12:10:30.899"} // yyyy-MM-dd HH:mm:ss.SSS
{"create":{"_index":"my_index2","_type":"_doc"}}
{"create_time":"2015-01-02 12:10:30"} //  yyyy-MM-dd HH:mm:ss
{"create":{"_index":"my_index2","_type":"_doc"}}
{"create_time":"2015-01-02"} // yyyy-MM-dd
  • ES底层把接收到的时间类型都转为了 UTC 时间类型
    • yyyy-MM-ddTHH:mm:ssZ :标准的UTC时间,T表示间隔符,Z表示0时区
    • yyyy-MM-ddTHH:mm:ss.SSSZ
    • China Standard Time 中国标准时区 = (UTC+8) 或者 GMT+8
  • JAVA 转换普通时间为UTC格式
SimpleDateFormat utcDateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
utcDateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
String format = utcDateFormat.format(new Date());
System.out.println("format = " + format);// format = 2022-08-05T08:17:00.412Z
Date date = new Date();
Instant instant = date.toInstant();
System.out.println("instant = " + instant.toString());
  • 标准的UTC时间格式字符串 ,添加到一个没有设置mappings的索引上面,字段会被自动映射成date类型
=============================Kibana====================================
# 创建索引
PUT my_index3

# 添加数据    
POST _bulk
{"create":{"_index":"my_index3","_type":"_doc"}}
{"create_time":"2015-01-01T12:10:30Z"}
{"create":{"_index":"my_index3","_type":"_doc"}}
{"create_time":"2022-08-05T07:58:27.868Z"}

# 执行结果
   
{
  "took" : 82,
  "errors" : false,
  "items" : [
    {
      "create" : {
        "_index" : "my_index3",
        "_type" : "_doc",
        "_id" : "qTcPbYIBs_E2asF4ZY82",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "create" : {
        "_index" : "my_index3",
        "_type" : "_doc",
        "_id" : "qjcPbYIBs_E2asF4ZY82",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

# create_time字段 被自动映射成date字段 (mappings.properties.create_time.type="date")
GET my_index3
 
{
  "my_index3" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "create_time" : {
          "type" : "date"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "my_index3",
        "creation_date" : "1659687084248",
        "number_of_replicas" : "1",
        "uuid" : "jk9gIuPySFqCLKl6m2lolA",
        "version" : {
          "created" : "7120099"
        }
      }
    }
  }
}

4.3 字符串类型: text 和 keyword类型的区别(term 和 match 查询)

  • text类型用于全文搜索分词后 再建立 倒排索引
    • text类型适合 长文本的 短语检索 全文检索
    • 默认分词器为standard ,对于中文来说就是按字分词
    • 可以在mapping中的fields中添加keyword子类型,以实现精确查询
PUT my-index
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "keyword": { 
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
  • keyword类型用于排序聚合,以及精确字符串匹配不分词 直接建立倒排索引
    • ignore_above属性:可以设置 文档字段被存储到 倒排索引 最大长度,超过最大长度,文档字段值 将不会存倒排索引
    • keword类型字段 + wildcard通配符查询 等价于 mysql 中的 like 模糊查询
    • 当需要对其进行精确查找时,如用户名,身份证,电子邮件,电话等可以用keyword

4.3.1 text 和 keword类型 的查询验证

  • 创建索引 test_match,并且设置mappings
PUT test_match
{
  "mappings":{
    "properties":{
      "text_name":{
        "type":"text"
      },
      "key_name":{
        "type":"keyword"
      }
    }
  }
}
  • bulk API批量添加文档
POST _bulk
{"create":{"_index":"test_match","_type":"_doc"}}
{"text_name":"JAVA BOOK","key_name":"JAVA BOOK"}
{"create":{"_index":"test_match","_type":"_doc"}}
{"text_name":"ES BOOK","key_name":"ES BOOK"}
{"create":{"_index":"test_match","_type":"_doc"}}
{"text_name":"JAVA READ BOOK","key_name":"JAVA READ BOOK"}
  • analyze API 查看 text 和 keyword类型的分词情况(可指定分词器)
    • text 类型字段 被索引的时候:会进行分词 并且转小写
GET test_match/_analyze
{
  "field": "text_name",
  "text": "ES BOOK",
  "analyzer": "standard" // 可指定分词器
}

############################对text类型字段的分词结果############################
{
  "tokens" : [
    {
      "token" : "es",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "book",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

GET test_match/_analyze
{
  "field": "text_name",
  "text": "中国深圳",
  "analyzer": "standard"
}

############################对text类型字段的汉字分词结果############################
{
  "tokens" : [
    {
      "token" : "中",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "国",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "深",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "圳",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    }
  ]
}
  • keyword 类型被索引的时候:不会进行分词,原始值原封不动的存入倒排索引,大小写不变
    • keyword = “ES BOOK” ,存入倒排索引 中的存储的词为大写的 ES BOOK
GET test_match/_analyze
{
  "field": "key_name",
  "text": "ES BOOK"
}

{
  "tokens" : [
    {
      "token" : "ES BOOK",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    }
  ]
}
4.3.1.1 term 查询 text 字段类型
  • term 查询的是 倒排索引的 词列,并且查询之前不会对查询字段值进行分词,会把查询的字段值当成整体
    • text 字段被索引时会进行默认的分词 :如 ”JAVA BOOK“ 到了倒序索引 会被 存储为 java 和 book两个分词
GET test_match/_search
{
  "query": {
    "term": {
      "text_name": {
        "value": "book"
      }
    }
  }
}

##########查询结果:含有book词的text字段全被查出##########
##########大写的”BOOK“将无法查询出文档,因为倒排索引的单词被转为了小写##########
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.14181954,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.11955717,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}

######## term 不会对"java book"进行 分词再查询,会把它当成整体########
######## 因此以下查询会返回空########
GET test_match/_search
{
  "query": {
    "term": {
      "text_name": {
        "value": "java book"
      }
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
4.3.1.2 term或 terms 查询 keyword 字段类型
  • keyword 类型被索引时 不会进行分词存储,并且也不会转成小写,因此倒序索引的词列为原始值,需要区分大小写
  • 所以查询的时候 需要 传入完整的大写 JAVA BOOK 或者 JAVA READ BOOK
GET test_match/_search
{
  "query": {
    "term": {
      "key_name": {
        "value": "JAVA BOOK"
      }
    }
  }
}

######## 查询结果 ########
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808291,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      }
    ]
  }
}
######## terms查询keyword字段 ########
GET test_match/_search
{
  "query": {
    "terms": {
      "key_name": [
        "JAVA BOOK",
        "JAVA READ BOOK"
      ]
    }
  }
}

######## 查询结果 ########
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}
4.3.1.3 match 查询 keyword 字段类型
  • match 查询 keyword 字段类型时,不会先分词再查询 等效于 term 查询
  • match 的 operator属性 的 or 或者 and 都不会影响到查询结果
GET test_match/_search
{
  "query": {
    "match": {
      "key_name": {
        "query": "JAVA READ BOOK",
        "operator": "or"
      }
    }
  }
}

########查询结果########

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808291,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}

GET test_match/_search
{
  "query": {
    "match": {
      "key_name": {
        "query": "JAVA BOOK",
        "operator": "or"
      }
    }
  }
}

########查询结果########

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808291,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.9808291,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      }
    ]
  }
}
4.3.1.4 match 查询 text 字段类型
  • match 查询 text类型字段时,会先对查询的列值进行分词 >再转小写>再进行查询(因此查询值的大小写 不敏感)
    • 如 text_name = ”JaVA Es“ ,首先会把它们分成 JaVA 和 Es ,再转小写=java es,再去倒排索引中存在 java 和 es 词列 的文档
GET test_match/_search
{
  "query": {
    "match": {
      "text_name": "JaVA Es"
    }
  }
}

####### text_name 字段中 含有 java 或者 es 词列的都被查询出来 #######
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0417082,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0417082,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.49917626,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.42081726,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}
  • match 的 operator的 and表示 必须包含所有分词
GET test_match/_search
{
  "query": {
    "match": {
      "text_name":{
        "query": "JAVA BOoK",
        "operator": "and"
      }
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.6409958,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}

  • match 的 operator的 or 表示包含其中一个分词即可
GET test_match/_search
{
  "query": {
    "match": {
      "text_name":{
        "query": "jaVA BOoK",
        "operator": "or"
      }
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.6409958,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.14181954,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        }
      }
    ]
  }
}

  • match 的 minimum_should_match 表示最少匹配的分词个数
    • minimum_should_match 和 operator = and 是互斥的,将导致无法查询出来数据
    • minimum_should_match 必须和 operator = or 配合使用
GET test_match/_search
{
  "query": {
    "match": {
      "text_name":{
        "query": "JAVA fuck BOoK",
        "operator": "and",
        "minimum_should_match": 1
      }
    }
  }
}

######## minimum_should_match 和 operator = and 是互斥的,将导致无法查询出来数据  ########
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

GET test_match/_search
{
  "query": {
    "match": {
      "text_name":{
        "query": "JAVA fuck BOoK", 
        "operator": "or",
        "minimum_should_match": 2 // 最少匹配2个分词
      }
    }
  }
}
######## minimum_should_match 必须和 operator = or 配合使用 ########
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.6409958,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 0.6409958,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 0.5403744,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}

4.3.1.5 keyword类型字段 + wildcard通配符查询 实现Mysql的like模糊查询
  • wildcard查询中的 ? 代表一个字符
  • wildcard查询中的 * 表示0或多个字符
  • wildcard 查询 keyword类型字段是区分大小写的
GET test_match/_search
{
  "query": {"wildcard": {
    "key_name": {
      "value": "*READ*"
    }
  }}
}

### 查询结果 ###
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}

GET test_match/_search
{
  "query": {"wildcard": {
    "key_name": {
      "value": "JAVA READ BOO?"
    }
  }}
}

### 查询结果 ###

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}
4.3.1.5 wildcard通配符查询 查询text类型
  • wildcard 查询text字段 只能模糊一个分词(大小写不敏感)
GET test_match/_search
{
  "query": {"wildcard": {
    "text_name": {
      "value": "*BOOK*" // 值为 book 也会得到相同结果 
    }
  }}
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "D6Boc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA BOOK",
          "key_name" : "JAVA BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EKBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "ES BOOK",
          "key_name" : "ES BOOK"
        }
      },
      {
        "_index" : "test_match",
        "_type" : "_doc",
        "_id" : "EaBoc4IBGSzUCqoEE7Qn",
        "_score" : 1.0,
        "_source" : {
          "text_name" : "JAVA READ BOOK",
          "key_name" : "JAVA READ BOOK"
        }
      }
    ]
  }
}
  • wildcard 查询text字段 超过一个分词将无法模糊查询到文档
    • 说明wildcard模糊查询 模糊的是text字段的 倒排索引的词列 所以最多只能模糊一个分词
GET test_match/_search
{
  "query": {"wildcard": {
    "text_name": {
      "value": "*java book*"
    }
  }}
}

### 查询 无文档返回 ### 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

4.4 alias 数据类型

5.ElasticSearch储存

5.1 inverted index(倒排索引),doc_values 及 source

5.2 Metadata fields - 元数据字段

5.3 理解 mapping 中的 store 属性

5.4 fields从搜索中获取选定的字段

5.5 mapping中的 analyzer 和 search_analyzer

  • analyzer:插入文档时,将text类型的字段做分词然后插入倒排索引。
  • search_analyzer:查询时,先对要查询的text类型的输入做分词,再去倒排索引中搜索。
  • 如city字段:索引文档的时候,会采用 standard分词器,查询city字段的时候会采用english分词器
PUT my-index
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer": "standard", 
        "search_analyzer": "english", 
        "fields": {
          "raw": { 
            "type": "keyword",
            "ignore_above":256
          }
        }
      }
    }
  }
}
  • 查看分词器分词情况
GET _analyze
{
  "analyzer": "standard",
  "text": "JAVA BOOK"
}

GET _analyze
{
  "analyzer": "english",
  "text": "JAVA BOOK"
}

5.6 mapping 中 keyword类型 的ignore_above属性

  • 1 创建索引 设置mappings
  • 设置city.ignore_above=10:那么city字段长度超过10在存储的时候,不会存 倒排索引 但是会存储到_source域
PUT my-index-xxx
{
  "mappings": {
    "properties": {
      "city": {
        "type": "keyword",
        "ignore_above": 10
      },
      "country": {
        "type": "keyword",
        "ignore_above": 20
      }
    }
  }
}
  • 2 添加文档到索引(city字段长度小于10的文档1 和 city字段长度大于10的文档2
## city字段长度小于10
PUT my-index-xxx/_doc/1
{
  "city": "123456789","country": "china"
}
## city字段长度大于10
PUT my-index-xxx/_doc/2
{
  "city": "1234567890abc","country": "china" 
}
  • 3 通配符wildcard模糊查询 city 字段:city字段长度大于10的文档2无法被查询出来
GET my-index-xxx/_search
{
  "query": {
    "wildcard": {
    "city": "12345*"
    }
  }
}

############## 查询结果 ##################
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
        }
      }
    ]
  }
}
  • 使用term精准查询city字段:文档2 无法被查询出来
GET my-index-xxx/_search
{
 "query": {"term": {
   "city": "1234567890abc"
 }}
}

############## 查询结果 ##################
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
  • 无条件分页查询 或 全量查询:文档2可被查询出来
GET my-index-xxx/_search
{
  "from": 0,
  "size": 20
}

GET my-index-xxx/_search

############## 查询结果 ##################
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
        }
      },
      {
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "city" : "1234567890abc",
          "country" : "china"
        }
      }
    ]
  }
}
  • 使用term查询country字段:文档2可被查询出来
GET my-index-xxx/_search
{
 "query": {"term": {
   "country": {
     "value": "china"
   }
 }}
}

############## 查询结果 ##################
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "city" : "123456789",
          "country" : "china"
        }
      },
      {
        "_index" : "my-index-xxx",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.18232156,
        "_source" : {
          "city" : "1234567890abc",
          "country" : "china"
        }
      }
    ]
  }
}

6. ElasticSearch索引一个文档的过程?

  • ElasticSearch:索引一个文档的过程
  • 1.html_strip_filter: 去除文本中的html标签。
  • 2.standard_tokenizer: 对文本进行分词,拆分成一个个术语。
  • 3.low_case_filter:对拆出来的词进行全部转换为小写。
  • 4.stop_words_filter:对某些分词进行禁用,默认是关闭的。
  • 5.create_Inverted_index:建立分词对应分档id的映射关系。
====1.有下列三个文档====

Document ID        Document
1                  It is Sunday tomorrow
2                  Sunday is the last day of the week
3                  The choice is yours
    
====2.对应 倒序索引映射表 ,词也进行了排序,查询词将相当快速====

词        出现次数    对应文档Id
Term	 Frequency	Document
choice	 1	        3
day	     1	        2
is	     3	        1,2,3
it	     1	        1
last	 1	        2
of	     1	        2
sunday	 2	        1,2
the	     3	        2,3
tomorrow 1	        1
week	 1	        2
yours	 1	        3

7. Elasticsearch的分页查询

7.1 运用 scroll 接口对大量数据实现更好的分页

7.2 运用 search_after 来进行深度分页

8. Elasticsearch的模糊搜索

8.1 wildcard通配符查询

  • wildcard查询中的 ? 代表一个字符
  • wildcard查询中的 * 表示0或多个字符
  • wildcard 查询 keyword类型字段是区分大小写的
  • 详细见 4.3.1.5

8.2 fuzzy 纠错模糊搜索

9. Elasticsearch的索引(Index)的结构

  • 可以看到 Index的结构包含 :aliases + mappings + settings 三大部分
##### 已经存在的索引 fuzzyindex
GET fuzzyindex

##### 查询结果 #####
{
  "fuzzyindex" : {
    "aliases" : {
      "a1" : { }
    },
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "fuzzyindex",
        "creation_date" : "1659835562222",
        "number_of_replicas" : "1",
        "uuid" : "OWyYUkzdSbGqv9rEt9WsJg",
        "version" : {
          "created" : "7120099"
        }
      }
    }
  }
}

9.1 Index ~ alias

9.2 Index ~ mappings

  • mappings 负责 定制 索引 的表结构定义
  • Elasticsearch:mapping 定制
  • 不需要事先定义一个相应的 mapping 才可以生产文档。字段类型是动态进行识别的。这和传统的数据库是不一样的。如果有动态加入新的字段,mapping 也可以自动进行调整并识别新加入的字段自动识别字段有一个问题,那就是有的字段可能识别并不精确,比如对于位置信息。那么我们需要对这个字段进行修改。
  • 注意:我们不能为已经建立好的 index 动态修改 mapping。这是因为一旦修改,那么之前建立的索引就变成不能搜索的了。一种办法是 reindex 从而重新建立我们的索引。如果在之前的 mapping 加入新的字段,那么我们可以不用重新建立索引。

9.3 Index ~ settings

  • number_of_shards:索引的主分片数量,默认1个
  • number_of_replicas:设置每个主分片拥有几个副本分片,默认1个
  • 一旦我们把 number_of_shards 定下来了,我们就不可以修改了,除非把 index 删除,并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards这 个数值有关的。一旦这个数值发生改变,那么之后寻找那个文档所在的 shard 就会不准确。

9.4 创建索引的时候 设置 aliases索引别名、mappings表结构 和 settings分片

PUT fuzzyindex
{
  "aliases" : {
      "a1" : {}
    },
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "1"
      }
    }
}

10. Elasticsearch有用的查询示例

  • Elasticsearch:有用的 Elasticsearch 查询示例
  • Bool Query
  • AND/OR/NOT 运算符可用于微调我们的搜索查询,以提供更相关或特定的结果。 这是在搜索 API 中作为 bool 查询实现的。 bool 查询接受一个 must 参数(相当于 AND)、一个 must_not 参数(相当于 NOT)和一个 should 参数(相当于 OR)。 例如,如果我想搜索标题中包含“Elasticsearch” 或 “Solr” 一词的书,并且作者是 “clinton Gormley” 但不是 “radu gheorge” 的作者:
  • bool.must 嵌套了 一个bool.(should,must)
POST /bookdb_index/_search
{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "should": [
            {
              "match": {
                "title": "Elasticsearch"
              }
            },
            {
              "match": {
                "title": "Solr"
              }
            }
          ],
          "must": {
            "match": {
              "authors": "clinton gormely"
            }
          }
        }
      },
      "must_not": {
        "match": {
          "authors": "radu gheorge"
        }
      }
    }
  }
}
  • 注意:如你所见,bool 查询可以包装任何其他查询类型,包括其他 bool 查询,以创建任意复杂或深度嵌套的查询

11. Elasticsearch的实用工具

11.1 translate API 把 SQL 翻译成 DSL

GET /_sql/translate
{
  "query": """
    SELECT * FROM twitter 
    WHERE age in(30,20) and address.keyword = '中国北京市海淀区'
  """
}

########## 翻译结果 ##########
{
  "size" : 1000,
  "query" : {
    "bool" : {
      "must" : [
        {
          "terms" : {
            "age" : [
              30,
              20
            ],
            "boost" : 1.0
          }
        },
        {
          "term" : {
            "address.keyword" : {
              "value" : "中国北京市海淀区",
              "boost" : 1.0
            }
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : false,
  "fields" : [
    {
      "field" : "DOB",
      "format" : "strict_date_optional_time_nanos"
    },
    {
      "field" : "address"
    },
    {
      "field" : "age"
    },
    {
      "field" : "city"
    },
    {
      "field" : "country"
    },
    {
      "field" : "location"
    },
    {
      "field" : "message"
    },
    {
      "field" : "province"
    },
    {
      "field" : "uid"
    },
    {
      "field" : "user"
    }
  ],
  "sort" : [
    {
      "_doc" : {
        "order" : "asc"
      }
    }
  ]
}

########## 使用翻译的DSL结果进行查询 ##########
########## filter_path=hits.hits 过滤返回的JSON字符串 ##########

GET twitter/_search?filter_path=hits.hits
{
  "query" : {
    "bool" : {
      "must" : [
        {
          "terms" : {
            "age" : [
              30,
              20
            ],
            "boost" : 1.0
          }
        },
        {
          "term" : {
            "address.keyword" : {
              "value" : "中国北京市海淀区",
              "boost" : 1.0
            }
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  }
}

########## 查询结果 ##########
{
  "hits" : {
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9444616,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          },
          "DOB" : "1999-04-01"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.9444616,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          },
          "DOB" : "1999-04-01"
        }
      }
    ]
  }
}

11.2 explain属性开启ES查询过程分析

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "user": "老贾"
          }
        },
        {
          "match": {
            "age": 30
          }
        }
      ]
    }
  },
  "explain": true
}

######################## 结果 ########################
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 3.4849067,
    "hits" : [
      {
        "_shard" : "[twitter][0]",
        "_node" : "GE7kTcqxSQ2wmso2X9D6Ag",
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.4849067,
        "_source" : {
          "user" : "老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          },
          "DOB" : "1989-04-01"
        },
        "_explanation" : {
          "value" : 3.4849067,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 2.4849067,
              "description" : "sum of:",
              "details" : [
                {
                  "value" : 0.6931471,
                  "description" : "weight(user:老 in 3) [PerFieldSimilarity], result of:",
                  "details" : [
                    {
                      "value" : 0.6931471,
                      "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                      "details" : [
                        {
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.6931472,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                            {
                              "value" : 4,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                            },
                            {
                              "value" : 8,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                            }
                          ]
                        },
                        {
                          "value" : 0.45454544,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                            {
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                            },
                            {
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 2.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                            },
                            {
                              "value" : 2.0,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                },
                {
                  "value" : 1.7917595,
                  "description" : "weight(user:贾 in 3) [PerFieldSimilarity], result of:",
                  "details" : [
                    {
                      "value" : 1.7917595,
                      "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                      "details" : [
                        {
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.7917595,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                            {
                              "value" : 1,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                            },
                            {
                              "value" : 8,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                            }
                          ]
                        },
                        {
                          "value" : 0.45454544,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                            {
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                            },
                            {
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 2.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                            },
                            {
                              "value" : 2.0,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 1.0,
              "description" : "age:[30 TO 30]",
              "details" : [ ]
            }
          ]
        }
      }
    ]
  }
}

12. ES怎么实现 全字段搜索(“全文搜索”)?

  • 定义一个keyword类型的essearch冗余字段
  • 存储文档到ES的时候,把所有支持查询的字段全部拼接到essearch字段(空格隔开)
  • 通过wilcard通配符查询 查询 essearch字段
  • 注意:拼接的时候转小写,通配符查询之前也对 查询的字符串 做转小写操作
############创建索引指定mappings############
PUT copy_index
{
  "mappings": {
    "properties": {
      "first_name":{
        "type": "keyword",
        "ignore_above" : 256
      },
      "second_name":{
        "type": "keyword",
        "ignore_above" : 256
      },
      "essearch":{
        "type": "keyword",
         "ignore_above" : 32766
      }
    }
  }
}

############添加文档,同时把所有支持查询的字段拼接到 冗余字段essearch上,空格隔开############
PUT copy_index/_doc/1
{"first_name":"王","second_name":"老二uid",
"essearch":"王 老二uid"}

PUT copy_index/_doc/2
{"first_name":"王","second_name":"二xx",
"essearch":"王 二xx"}

PUT copy_index/_doc/3
{"first_name":"王","second_name":"三hh",
"essearch":"王 三hh"}

PUT copy_index/_doc/4
{"first_name":"王","second_name":"四aa",
"essearch":"王 四aa"}

############对支持全字段查询的冗余字段essearch 进行通配符查询############
GET copy_index/_search 
{
 "query": {
   "wildcard": {
     "essearch": {
       "value":"*二*"
     }
   }
 }
}

############查询结果############

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "copy_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "王",
          "second_name" : "老二uid",
          "essearch" : "王 老二uid"
        }
      },
      {
        "_index" : "copy_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "王",
          "second_name" : "二xx",
          "essearch" : "王 二xx"
        }
      }
    ]
  }
}

13.ES基础操作

13.01 创建一个索引和文档

##创建一个叫做 twitter 的索引(index),并插入一个ID1文档(document)
PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

13.02 修改一个文档

PUT twitter/_doc/1
{
   "user": "GB",
   "uid": 1,
   "city": "北京",
   "province": "北京",
   "country": "中国",
   "location":{
     "lat":"29.084661",
     "lon":"111.335210"
   }
}

## 局部更新
## 使用PUT方法,每次修改一个文档的时候,需要把每一项都写出来,有些字段多的情况相当不方便
## 使用POST _update来做修改
POST twitter/_update/1
{
  "doc": {
    "city": "成都",
    "province": "四川"
  }
}

## 对根据条件查询出来的结果 进行脚本更新
POST twitter/_update_by_query
{
  "query": {
    "match": {
      "user": "GB"
    }
  },
  "script": {
    "source": "ctx._source.city = params.city;ctx._source.province = params.province",
    "lang": "painless",
    "params": {
      "city":"长沙市222333",
       "province":"湖南22333"
    }
  }
}

13.03 UPSERT 一个文档

## UPSERT :存在即更新,不存在就插入 SAVE OR UPDATE

POST /twitter/_update/4
{
  "doc": {
     "user": "66666",
     "uid": 5,
     "city": "555",
      "province": "555",
      "country":"5555"
  },
  "doc_as_upsert": true
}

13.04 检查一个文档是否存在

HEAD twitter/_doc/1

13.05 删除一个文档

DELETE twitter/_doc/1

## 在关系型数据库,我们通常是根据条件删除,在ES中可以使用_delete_by_query实现
POST twitter/_delete_by_query
{
  "query": {
    "match": {
      "city": "上海"
    }
  }
}

13.06 检查一个索引是否存在

HEAD twitter

13.07 删除一个索引

DELETE twitter

13.08 批处理命令 bulk

## 千万不要添加除了换行以外的空格,否则会导致错误

## 索引操作 index总是能成功
POST _bulk
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

## 删除操作
POST _bulk
{ "delete" : { "_index" : "twitter", "_id": 1 }}

## 更新操作
POST _bulk
{ "update" : { "_index" : "twitter", "_id": 2 }}
{"doc": { "city": "长沙"}}

## index总是能成功 create的话,文档存在的话就会报错
POST _bulk
{ "create" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}

13.09 关闭/启用索引

##关闭索引后,将阻止读/写操作,查询后会报错
POST twitter/_close
POST twitter/_open

13.10 冻结/解冻 索引

##将数索引行冻结,将无法查询到数据,不会报错
POST twitter/_freeze

GET twitter/_search

## 如果要包含冻结索引做搜索
## 必须使用查询参数 ignore_throttled = false 来执行搜索请求
GET twitter/_search?ignore_throttled=false

## 解冻 索引
POST twitter/_unfreeze

14.ES查询分类

  • 教程:ES的查询
  • 在ES的搜索中,主要有两类搜索:query 和 aggregation
  • query 可以帮住我们进行全文检索
  • aggregation 可以帮住我们对数据进行统计和分析。
  • 可以结合query和aggregation一起使用,比如先对文档进行query然后再进行aggregation。

14.1 搜索所有文档

#没有指定索引,将搜索在该集群下所有的索引。默认返回个数10个。
GET /_search 

#等同于上面的命令 
GET /_all/_search 

#查询多个索引下的数据
GET /Index1,Index2../_search 

#查询所有以index开头的索引,并且排除index3。
POST /index*,-index3/_search 

#查询Index1下面的数据,只返回2个文档且从下标等于2的位置开始(下标从0开始计数)
GET /Index1/_search?size=2&from=0 

    上面对应的DSL(领域特定语言)查询为
    GET twitter/_search
    {
      "size": 2,
      "from": 0, 
      "query": {
        "match_all": {}
      }
    }

#可以使用filter_path来控制输出较少的JSON字段
 GET Index1/_search?size=2&from=0&filter_path=hits.hits 
返回结果如下:
{
  "hits" : {
    "hits" : [
      {
        "_index" : "Index1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "user" : "李四",
          "msg" : "happy birthday!",
          "uid" : 4,
          "age" : 25,
          "birth" : "1994-04-01"
        }
      }
    ]
  }
}

14.2 Source Filtering

Source过滤器:我们可以通过_source来定义我们想要返回的文档字段,当然也可以排除某些字段

    GET twitter/_search
    {
      "_source":["user","city"],
      "query":{"match_all":{}} 
    }
    返回
        "hits" : [
                  {
                    "_index" : "twitter",
                    "_type" : "_doc",
                    "_id" : "1",
                    "_score" : 1.0,
                    "_source" : {
                      "city" : "北京",
                      "user" : "张三"
                    }
                  }
                  ...
        ]

#我们也可以把_source设置为false,不返回文档任何字段信息
GET /twitter/_search?filter_path=hits
{
  "_source": false,
  "query": {"match_all": {}}
}
  返回:
     {
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0
          },
          ...  
        ]
      }
    }
         
#_source过滤器同时也拥有includes属性和excludes属性,用来定义返回和排除,同时也支持通配符。
   GET /twitter/_search?filter_path=hits.hits
    {
      "_source": {
        "includes": ["user","age"]
      },
      "query": {"match_all": {}}
    }      
     返回:
    {
      "hits" : {
        "hits" : [
          {
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "user" : "张三",
              "age" : 20
            }
          },
          ...
        ]
      }
    }

# 指定返回文档以u开头和c开头的字段       
GET /twitter/_search?filter_path=hits.hits
{
  "_source": {
    "includes": ["u*","c*"]
  },
  "query": {"match_all": {}}
}        
 返回:
    {
      "hits" : {
        "hits" : [
          {
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "uid" : 2,
              "country" : "中国",
              "city" : "北京",
              "user" : "张三"
            }
          },
          ...
        ]
      }
   }

# 使用   excludes 属性排除uid和 country 字段   
    GET /twitter/_search?filter_path=hits.hits
    {
      "_source": {
        "includes": ["u*","c*"],
        "excludes": ["uid","country"]     
      },
      "query": {"match_all": {}}
    }

    返回:
    {
      "hits" : {
        "hits" : [
          {
            "_index" : "twitter",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "city" : "北京",
              "user" : "张三"
            }
          },
          ...
        ]
      }
   }     
  
 # 如果我们把_source设置为[],那么就是显示所有字段
        GET /twitter/_search?filter_path=hits.hits
        {
          "_source":[],
          "query": {"match_all": {}}
        }
     
         

14.3 Count

#查询twitter索引里到底有多少文档
GET /twitter/_count

GET /twitter/_count
{
    "query":{"match":{"city":"北京"}}
}

14.4 Settings

GET twitter/_settings

    {
      "twitter" : {
        "settings" : {
          "index" : {
            "routing" : {
              "allocation" : {
                "include" : {
                  "_tier_preference" : "data_content"
                }
              }
            },
            "number_of_shards" : "1",
            "provided_name" : "twitter",
            "creation_date" : "1618919497240",
            "number_of_replicas" : "1",
            "uuid" : "Nl1Cg--JQACp561b-KEVZw",
            "version" : {
              "created" : "7120099"
            }
          }
        }
      }
    }

#一旦我们把 number_of_shards 定下来了,我们就不可以修改了,除非把 index 删除,并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards这 个数值有关的。一旦这个数值发生改变,那么之后寻找那个文档所在的 shard 就会不准确。

# 副本是可以动态设置的,主分片确定下来了就不能修改
PUT twitter
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

14.5 Mapping

ES号称是schemaless。每一个index都有一个对应的mapping,这个mapping在我们存储第一个文档的时候就已经产生。它是对每隔输入的字段进行自动识别从而判断它们的数据类型,我们可以这样理解schemaless:
    1.可以不事先定义一个相应的mapping也可以生产文档。字段类型是动态进行识别的。这和传统数据库是不一样的。
    2.如果有动态加入新的字段,mapping也可以自动进行调整并识别新加入的字段。
    3.自动识别字段存在识别不精确的问题。例如位置信息。

#注意:我们不能为已经建立好的index动态修改mapping。一旦修改,之前建立的索引就变成不能搜索了。一种办法是重新建立索引并指定新的mapping。如果是在之前的mapping加入新的字段,那么我们可以不用重新建立索引。

####正确的步骤是预先设定好settings 和mapping ,再进行文档的存储###

#1.删除索引
DELETE twitter

#2.设置settings中的主分片数和副本数
PUT twitter
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

#3.设置索引的mapping
PUT twitter/_mapping
{
  "properties": {
    "address": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "age": {
      "type": "long"
    },
    "city": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "country": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "location": {
      "type": "geo_point"
    },
    "message": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "province": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "uid": {
      "type": "long"
    },
    "user": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

14.6 match query

GET twitter/_search
    {
      "query": {
        "match": {
          "user": { 
            "query": "朝阳区-老贾",
            "operator": "or",
            "minimum_should_match": 3
          }
        }
      }
    }

其中
"user": { "query": "朝阳区-老贾","operator": "or", "minimum_should_match": 3}
     query: 对应查询的值
     operator:默认的操作是 or 操作。上面查询的结果是任何匹配:“朝”,“阳”,“区”,“老”及“贾”这5个字                  中的任何3个的文档将被显。同时也有and操作,标识要同时满足
     minimum_should_match : 最少匹配的词个数

14.7 Ids query

#通过多个_id值来进行查询
GET twitter/_search
{
  "query": {
    "ids": {"values": ["1", "2"] }
  }
}

14.8 Multi_match

# 在很多情况下,我们并不知道具体是哪一个字段field包含这个搜索关键词,在这种情况下,就可以使用muliti_match来进行多字段查询。
GET twitter/_search
{
  "query": {
    "multi_match": {
      "query": "朝阳",
      "fields": [
        "user",
        "address^3",
        "message"
      ],
      "type": "best_fields"
    }
  }
}
# multi_match 的type 为best_fields 最终的分数 _score 是按照得分最高的那个字段的分数为准。
# 对 address 含有 “朝阳” 的文档的分数进行3倍的加权

multi_match的type总共有以下几种:
        best_fields
        most_fields
        cross_fields
        phrase
        phrase_prefix
        bool_prefix

14.9 Prefix query

#查询twitter索引下user字段以“老”开头的所有文档
GET /twitter/_search
{
  "query": {
    "prefix": {"user": {"value": "老"}}
  }
}

14.10 Term query

# term query 会在给定字段中进行精确的字词匹配。
GET /twitter/_search
{
  "query": {
    "term": {"user.keyword": {"value": "老吴"}}
  }
}

14.11 Terms query

#查询 user.keyword 里含有“老吴” 或 “张三”的所有文档
GET /twitter/_search
{
  "query": {
    "terms": {"user.keyword": ["老吴","张三"]}
  }
}

14.12 Terms_set query

#查询在提供的字段中包含最少数目的精确术语的文档

# 创建索引,设置mapping,为 最少匹配术语个数 设置一个字段“required_matches” 
PUT /job-candidates
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "programming_languages": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}
 
# 添加文档1
PUT /job-candidates/_doc/1?refresh
{
  "name": "Jane Smith",
  "programming_languages": [ "c++", "java" ],
  "required_matches": 2
}
 
# 添加文档2
PUT /job-candidates/_doc/2?refresh
{
  "name": "Jason Response",
  "programming_languages": [ "java", "php" ],
  "required_matches": 2
}

## 进行trems_set查询,使用minimum_should_match_field 引用文档最少匹配字段值
GET /job-candidates/_search?filter_path=hits.hits
{
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

#返回结果 :
{
  "hits" : {
    "hits" : [
      {
        "_index" : "job-candidates",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1005894,
        "_source" : {
          "name" : "Jane Smith",
          "programming_languages" : [
            "c++",
            "java"
          ],
          "required_matches" : 2
        }
      },
      {
        "_index" : "job-candidates",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1005894,
        "_source" : {
          "name" : "Jason Response",
          "programming_languages" : [
            "java",
            "php"
          ],
          "required_matches" : 2
        }
      }
    ]
  }
}

## 如果没有为文档设定专门最少匹配字段的话,可以使用另外一种方式,可以达到相同效果
## 使用 minimum_should_match_script 设定最少匹配术语的个数
GET /job-candidates/_search
{
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_script": {"source": "2"}
      }
    }
  }
}

14. 13 Bool query

Bool Query 复合查询。
如果说上面的查询是[叶子查询]的话,那么复合查询可以把很多个[叶子查询]组合起来从而形成更为复杂的查询。
# 复合查询主要是由 
# must(必须等于), q
# must_not(必须不等于), 
# should(有满足条件更好,没有就算了)
# filter 共同来组成的

# 复合查询的一般格式
POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

#查询的是必须是 北京城市的,并且年刚好是30岁的。
GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"city": "北京"}},
        {"match": {"age": "30"}}
      ]
    }
  }
}
#查询不在北京的所有的文档
GET twitter/_search
{
  "query": {
    "bool": {
      "must_not": [
        {"match": {"city": "北京"}}
      ]
    }
  }
}

# 查询类型对 hits 及 _score 的影响

类型	   影响hits	影响 _score
must	 Yes	   Yes
must_not Yes	   No
should	 No*	   Yes
filter	 Yes	   No

#should 只有在特殊的情况下才会影响 hits,也就是说只有should区域的时候

#查询user字段是“老王” 或 “老吴”的文档,这时候should就影响到了最终的查询结果。
GET twitter/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"user": {"query": "老王","operator": "and"}}},
        {"match": {"user": {"query": "老吴","operator": "and"}}}
      ]
    }
  }
}

14.14 位置查询

ES最厉害的是位置查询。这在很多的关系数据库里并没有。

14.15 Range query

#范围查询 查询年龄>= 20 && <= 28的文档,并按age字段值升序排列
GET /twitter/_search
{
  "query": {
    "range": {
      "age": {"gte": 20,"lte": 28}
    }
  },
  "sort": [
    {"age": {"order": "asc"}}
  ]
}

# [gt , lt , gte ,lte]

14.16 Exists query

#通过 exists 来查询一个字段是否存在

#如果文档里只要 city 这个字段不为空,那么就会被返回。
GET twitter/_search
{
  "query": {
    "exists": {
      "field": "city"
    }
  }
}

# 查询不含 city 这个字段的所有的文档
GET twitter/_search
{
  "query": {
    "bool": {
      "must_not": [{"exists": {"field": "city"}}]
    }
  }
}

14.17 match_phrase query

# match_phrase 匹配短语查询,要求词语间的先后顺序,并且可以设置间隔词数
GET twitter/_search
{
  "query": {
    "match_phrase": {
      "message": {"query": "Happy birthday","slop": 1}
    }
  },
  "highlight": {
    "fields": {"message": {}}
  }
}

#slop为1,表明Happy 和 birthday 之前是可以允许一个 token 的差别.
# 比如  "Happy Good BirthDay" 也会被匹配上

14.18 Named query

#Named query 名命查询,为每个叶子查询 使用"_name"取名字.
GET twitter/_search?filter_path=hits.hits
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "city": {
              "query": "北京",
              "_name": "城市"
            }
          }
        },
        {
          "match": {
            "country": {
              "query": "中国",
              "_name": "国家"
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "_id": {
              "query": "1",
              "_name": "ID"
            }
          }
        }
      ]
    }
  }
}

##返回结果会多出一个字段 matched_queries (匹配的查询)
{
  "hits" : {
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3152701,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          },
          "DOB" : "1999-04-01"
        },
        "matched_queries" : [
          "国家",
          "ID",
          "城市"
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.31527004,
        "_source" : {
          "user" : "老刘",
          "message" : "出发,下一站云南!",
          "uid" : 3,
          "age" : 22,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          },
          "DOB" : "1997-04-01"
        },
        "matched_queries" : [
          "国家",
          "城市"
        ]
      }
    ]
  }
}

14.19 通配符查询

#可以使用wildcard 查询一个字符串里含有字符

#使用wildcard模糊查询的时候如果不想对字段内容进行分词查询的话可以将内容变成keyword模式去查询,
#这样我们进行查询的时候就不会进行分词查询了

GET twitter/_search?filter_path=hits.hits
{
  "query": {
    "wildcard": {
      "message.keyword": {
        "value": "*BirthDay My*"
      }
    }
  }
}

##查询返回
{
  "hits" : {
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 26,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          },
          "DOB" : "1993-04-01"
        }
      }
    ]
  }
}

#################
GET /twitter/_search?filter_path=hits.hits
{
  "query": {
    "match": {
      "message": {
        "query": "*BirthDay My*",
        "operator": "and",
        "_name":"姓名匹配"
      }
    }
  }
}

##返回结果
{
  "hits" : {
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.7926655,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 26,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          },
          "DOB" : "1993-04-01"
        },
        "matched_queries" : [
          "姓名匹配"
        ]
      }
    ]
  }
}

14.20 Sql query

GET /_sql?
{
  "query": """
    SELECT * FROM twitter 
    WHERE age = 30
  """
}

## 翻译成ES查询
GET /_sql/translate
{
  "query": """
    SELECT * FROM twitter 
    WHERE age = 30
  """
}

14.21 Multi Search API

#Multi Search API:把多个请求放到一个 API 请求中来实现,节省 API 的请求个数。
#可以同时操作不同的索引
GET twitter/_msearch
{"index":"twitter"}
{"query":{"match_all":{}},"from":0,"size":1}
{"index":"twitter"}
{"query":{"bool":{"filter":{"term":{"city.keyword":"北京"}}}}, "size":1}
{"index":"twitter1"}
{"query":{"match_all":{}}}

14.22 Profile API

#Profile API 是调试工具,以帮助确定某些请求为何缓慢.
GET twitter/_search
{
  "profile": "true", 
  "query": {
    "match": {
      "city": "北京"
    }
  }
}

15. ES聚合 Aggregation

15.1 range 聚合

# ranges聚合 把数据分成不同的桶。通常这样的方法只适合字段为数字的字段。
# range聚合 含头不含尾,我们不关心命中的文档,所以把size设置为0
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 22
          },
          {
            "from": 22,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

##返回
{
  "aggregations" : {
    "age" : {
      "buckets" : [
        {
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1
        },
        {
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3
        }
      ]
    }
  }
}

# 我们可以在buckets聚合之下,再做子聚合:
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 22
          },
          {
            "from": 22,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

## 返回结果,意思就是对buckets区域的每一个桶内,进行求它们自己的平均年龄
## 针对每个桶 20-2222-2525-30,分别计算它们的平均年龄。
{
  "aggregations" : {
    "age" : {
      "buckets" : [
        {
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 20.0
          }
        },
        {
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 22.0
          }
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3,
          "avg_age" : {
            "value" : 26.333333333333332
          }
        }
      ]
    }
  }
}

## 甚至可以进行更多子聚合
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 22
          },
          {
            "from": 22,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "min_age":{
          "min": {
            "field": "age"
          }
        },
        "max_age":{
          "max": {
            "field": "age"
          }
        }
      }
    }
  }
}

## 返回结果
{
  "aggregations" : {
    "age" : {
      "buckets" : [
        {
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 20.0
          },
          "avg_age" : {
            "value" : 20.0
          },
          "min_age" : {
            "value" : 20.0
          }
        },
        {
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 22.0
          },
          "avg_age" : {
            "value" : 22.0
          },
          "min_age" : {
            "value" : 22.0
          }
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3,
          "max_age" : {
            "value" : 28.0
          },
          "avg_age" : {
            "value" : 26.333333333333332
          },
          "min_age" : {
            "value" : 25.0
          }
        }
      ]
    }
  }
}

15.2 Filters聚合

# Filters聚合:我们可以针对非数字字段来进行建立不同的 bucket桶。每个存储桶将收集与其关联的过滤器匹配的所有文档。

GET /twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "by_cities": {
      "filters": {
        "filters": {
          "bj": {
           "match":{
              "city":"北京"
           }
          },
          "sh":{
            "match":{
              "city":"上海"
            }
          }
        }
      }
    }
  }
}

## 返回结果
    {
      "aggregations" : {
        "by_cities" : {
          "buckets" : {
            "bj" : {
              "doc_count" : 5
            },
            "sh" : {
              "doc_count" : 1
            }
          }
        }
      }
    }

15.3 Filter聚合

#在当前文档集上下文中定义与指定过滤器匹配的所有文档的单个存储桶,可以理解为只有一个过滤器的Filters聚合。
GET /twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "bj": {
      "filter": {
        "match":{
          "city":"北京"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

## 返回结果
{
  "aggregations" : {
    "bj" : {
      "doc_count" : 5,
      "avg_age" : {
        "value" : 24.6
      }
    }
  }
}

15.4 date_range聚合

## 使用 date_range 来统计在某个时间段里的文档数
POST twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "birth_range": {
      "date_range": {
        "field": "DOB",
        "format": "yyyy-MM-dd",
        "ranges": [
          {
            "from": "1989-01-01",
            "to": "1990-01-01"
          },
          {
            "from": "1991-01-01",
            "to": "1992-01-01"
          }
        ]
      }
    }
  }
}

## 返回结果
{
  "aggregations" : {
    "birth_range" : {
      "buckets" : [
        {
          "key" : "1989-01-01-1990-01-01",
          "from" : 5.99616E11,
          "from_as_string" : "1989-01-01",
          "to" : 6.31152E11,
          "to_as_string" : "1990-01-01",
          "doc_count" : 1
        },
        {
          "key" : "1991-01-01-1992-01-01",
          "from" : 6.62688E11,
          "from_as_string" : "1991-01-01",
          "to" : 6.94224E11,
          "to_as_string" : "1992-01-01",
          "doc_count" : 1
        }
      ]
    }
  }
}

15.5 tems聚合

#可以通过 term 聚合来查询某一个关键字出现的频率.在如下的 term 聚合中,我们想寻找在所有的文档出现 ”Happy birthday” 里按照城市进行分类的一个聚合
GET /twitter/_search?filter_path=aggregations
{
  "query": {
    "match": {
      "message": "happy birthday"
    }
  },
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10
      }
    }
  }
}
## terms.size表示前10名的城市

##返回
{
  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "北京",
          "doc_count" : 2
        },
        {
          "key" : "上海",
          "doc_count" : 1
        }
      ]
    }
  }
}

##正常的情况下,聚合是按照 doc_count 来进行排序,如果你想按照 key 进行排序的话
GET /twitter/_search?filter_path=aggregations
{
  
  "query": {
    "match": {
      "message": "happy birthday"
    }
  },
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

## 返回 按照英文首字母排序
{
  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "上海",
          "doc_count" : 1
        },
        {
          "key" : "北京",
          "doc_count" : 2
        }
      ]
    }
  }
}

## 也可以使用 _count 来进行升序的排列
GET /twitter/_search?filter_path=aggregations
{
  
  "query": {
    "match": {
      "message": "happy birthday"
    }
  },
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

##返回结果和上面一致

##使用 嵌套聚合 的结果进行排序
GET /twitter/_search?filter_path=aggregations
{
  "query": {
    "match": {
      "message": "happy birthday"
    }
  },
  "size": 0,
  "aggs": {
    "city_agr": {
      "terms": {
        "field": "city",
        "size": 10,
        "order": {
          "avg_age": "desc"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

## 返回结果
{
  "aggregations" : {
    "city_agr" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "上海",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 28.0
          }
        },
        {
          "key" : "北京",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 25.5
          }
        }
      ]
    }
  }
}

15.6 Histogram 聚合

## 数值字段 间隔聚合,它根据值动态构建 固定间隔 的存储桶。
GET twitter/_search?filter_path=aggregations
{
  "size":20,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 2
      }
    }
  }
}

###含头不含尾  20-22 122-24 1{
  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 20.0,
          "doc_count" : 1
        },
        {
          "key" : 22.0,
          "doc_count" : 1
        },
        {
          "key" : 24.0,
          "doc_count" : 1
        },
        {
          "key" : 26.0,
          "doc_count" : 1
        },
        {
          "key" : 28.0,
          "doc_count" : 1
        },
        {
          "key" : 30.0,
          "doc_count" : 1
        }
      ]
    }
  }
}


GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 2,
        "order": {
          "avg_age": "desc"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

## 返回结果
{
  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 30.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 30.0
          }
        },
        {
          "key" : 28.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 28.0
          }
        },
        {
          "key" : 26.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 26.0
          }
        },
        {
          "key" : 24.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 25.0
          }
        },
        {
          "key" : 22.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 22.0
          }
        },
        {
          "key" : 20.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 20.0
          }
        }
      ]
    }
  }
}

15.7 date_histogram聚合

##时间字段间隔聚合
 GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "date_interval": {
      "date_histogram": {
        "field": "DOB",
        "format": "yyyy-MM-dd", 
        "interval": "year"
      }
    }
  }
}

###
{
  "aggregations" : {
    "date_interval" : {
      "buckets" : [
        {
          "key_as_string" : "1989-01-01",
          "key" : 599616000000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "1990-01-01",
          "key" : 631152000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "1991-01-01",
          "key" : 662688000000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "1992-01-01",
          "key" : 694224000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "1993-01-01",
          "key" : 725846400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "1994-01-01",
          "key" : 757382400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "1995-01-01",
          "key" : 788918400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "1996-01-01",
          "key" : 820454400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "1997-01-01",
          "key" : 852076800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "1998-01-01",
          "key" : 883612800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "1999-01-01",
          "key" : 915148800000,
          "doc_count" : 2
        }
      ]
    }
  }
}

15.8 cardinality聚合

#可以使用 cardinality 基数聚合 来统计到底有多少个城市:
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "num_of_city": {
      "cardinality": {
        "field": "city"
      }
    }
  }
}

###返回结果
{
  "aggregations" : {
    "num_of_city" : {
      "value" : 2
    }
  }
}

15.9 metricm聚合

#可以使用Metrics来统计我们数值数据

#1.如avg统计所有用户的平均年龄
GET twitter/_search
{
  "size": 0,
  "aggs": {
    "average_age": {
      "avg": {
        "field": "age"
      }
    }
  }
}

#2.使用 global 聚合,对所有文档执行,而不受查询的影响。
POST twitter/_search?filter_path=aggregations
{
  "size": 0,
  "query": {
    "match": {
      "city": "北京"
    }
  },
  "aggs": {
    "avg_age_beijing": {
      "avg": {
        "field": "age"
      }
    },
    "avg_age_all":{
      "global": {},
      "aggs": {
        "age_global_avg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

##返回结果 出现全局平均年龄 和 北京用户的平均年龄
{
  "aggregations" : {
    "avg_age_all" : {
      "doc_count" : 7,
      "age_global_avg" : {
        "value" : 25.166666666666668
      }
    },
    "avg_age_beijing" : {
      "value" : 24.6
    }
  }
}

#3.使用 stats 对整个年龄进行统计
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  }
}

##返回结果
{
  "aggregations" : {
    "age_stats" : {
      "count" : 6,
      "min" : 20.0,
      "max" : 30.0,
      "avg" : 25.166666666666668,
      "sum" : 151.0
    }
  }
}

15.10 percentile聚合

#percentile百分位数聚合
GET twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "age_quartiles": {
      "percentiles": {
        "field": "age",
        "percents": [
          25,
          50,
          75,
          100
        ]
      }
    }
  }
}

## 返回结果 升序排列,按照百分比统计 20 22 25 26 28 30
## 25%的人平均年龄是低于22.0岁,而50%的人的年龄是低于25.5岁,而所有的人的年龄都是低于30岁的
{
  "aggregations" : {
    "age_quartiles" : {
      "values" : {
        "25.0" : 22.0,
        "50.0" : 25.5,
        "75.0" : 28.0,
        "100.0" : 30.0
      }
    }
  }
}

15.11 missing聚合

#可以通过missing聚合统计出来缺失某个字段的文档个数
GET /twitter/_search?filter_path=aggregations
{
  "size": 0,
  "aggs": {
    "total_missing_age": {
      "missing": {
        "field": "sss"
      }
    }
  }
}

#结果 显示7个文档没有sss字段
{
  "aggregations" : {
    "total_missing_age" : {
      "doc_count" : 7
    }
  }
}

16. ES JAVA Client 使用

16.1 Http Jest 客户端的使用

  • 思路如下:
  • 1.使用elasticsearch-sql 解析SQL生成ES查询DSL
  • 2.再通过http JEST 客户端使用 DSL字符串进行查询(直接使用elasticsearch-sql有深度翻页问题)
  • 3.MAVEN依赖,统一使用 6.3.0 版本
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>6.3.0</version>
        </dependency>

        <!--JAVA Netty客户端 transport-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.3.0</version>
        </dependency>

        <!--ES JEST客户端-->
        <dependency>
            <groupId>io.searchbox</groupId>
            <artifactId>jest</artifactId>
            <version>6.3.0</version>
        </dependency>

        <!--elasticsearch-sql SQL转换DSL-->
        <dependency>
            <groupId>org.nlpcn</groupId>
            <artifactId>elasticsearch-sql</artifactId>
            <version>6.3.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.elasticsearch</groupId>
                    <artifactId>elasticsearch</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.elasticsearch.client</groupId>
                    <artifactId>transport</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.alibaba</groupId>
                    <artifactId>fastjson</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
  • 使用示例
//+++++++++++++++++++++++ TransportClient
//        Settings settings = Settings.builder().put("cluster.name", "mcc-sit").build();
        TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY)
                                 .addTransportAddress(new TransportAddress(InetAddress.getByName("10.16.99.184"), 9300));
 SQL转换 DSL
        String explain = ESActionFactory.create(transportClient, "SELECT * FROM LC14  WHERE ID='ae04642d-9eb9-11ec-b493-005056861704' ").explain().explain();
        System.out.println("explain = " + explain);
        
        // ==============================JEST=====================
        JestClientFactory factory = new JestClientFactory();
        factory.setHttpClientConfig(
                new HttpClientConfig
                        .Builder(Collections.singletonList("http://10.16.99.184:9200"))
                        .multiThreaded(true)
                        .defaultMaxTotalConnectionPerRoute(100)
                        .maxTotalConnection(100)
                        .build());
        JestClient jestClient = factory.getObject();

        Search.Builder ssss = new Search.Builder(explain).addIndex("lc14").addType("base");
        SearchResult execute2 = jestClient.execute(ssss.build());
        String sourceAsString = execute2.getSourceAsString();
        System.out.println("sourceAsString = " + sourceAsString);

    //时间类型 范围查询 需要调用 date的toInstant方法
        Date date = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse("2022-04-06 09:43:29");

        String SQL_006 ="SELECT * FROM test_es_001 WHERE birth >= '"+date.toInstant().toString()+"'";
         String explainxSSSS = ESActionFactory.create(transportClient,SQL_008).explain().explain();
        System.out.println("==========explainxSSSS==========>>> " + explainxSSSS);
        SearchResult test_es_001 = jestClient.execute(new Search.Builder(explainxSSSS).addIndex("test_es_001").addType("test_es_001").build());
        System.out.println("test_es_001.getSourceAsStringList() = " + test_es_001.getSourceAsStringList());
// 模糊查询
        String SQL_008 ="SELECT * FROM test_es_001 WHERE remark.keyword like '%的撒%'";
         String explainWilad = ESActionFactory.create(transportClient,SQL_008).explain().explain();
          SearchResult test_es_001 = jestClient.execute(new Search.Builder(explainWilad ).addIndex("test_es_001").addType("test_es_001").build());
          
// 稍微复杂一点SQL转换
String SQL_009 =
"SELECT * FROM test_es_001 WHERE (age >= 38 or name.keyword = '张三') and remark.keyword like '%的撒%'";
 String explainSQL_009 = ESActionFactory.create(transportClient,SQL_009).explain().explain();
        System.out.println("==========explainSQL_009==========>>> " + explainSQL_009);

        SearchResult test_es_009 = jestClient.execute(new Search.Builder(explainSQL_009).addIndex("test_es_001").addType("test_es_001").build());
        System.out.println("test_es_009.getSourceAsStringList() = " + test_es_009.getSourceAsStringList());
  • JEST 基本ES操作
//todo    =======1. HTTP JEST 客户端 创建 Index  =======
        CreateIndex.Builder builder4CreateIndex =  new CreateIndex.Builder("test_es_001");
        //        builder4CreateIndex.settings()
        JestResult execute4Create = jestClient.execute(builder4CreateIndex.build());

//todo    =======2. HTTP JEST 客户端 删除 Index  =======
        DeleteIndex.Builder builder4DeleteIndex = new DeleteIndex.Builder("test_es_001");
        JestResult execute4Delete = jestClient.execute(builder4DeleteIndex.build());

//todo =======3. HTTP JEST 客户端 批量插入数据到 ES =======
        // ## 注意 P.User实体 主键需带上  @JestId 注解

        List<P.User> usersData = Arrays.asList(
                new P.User(123451, "小1", 33,true,new Date(),"xxxxyyyzzz"),
                new P.User(123462, "小离", 44,false,new Date(),"a的撒旦"),
                new P.User(123473, "小二", 55,true,new Date(),"213123213顶顶顶顶...."));

        // 使用BULK API
        Bulk.Builder bulkBuilder = new Bulk.Builder();
        for (P.User usersDatum : usersData) {
            Index.Builder obj = new Index.Builder(usersDatum).index("test_es_001").type("test_es_001");
            bulkBuilder.addAction(obj.build());
        }

        BulkResult bulkResult4Insert = jestClient.execute(bulkBuilder.build());
        boolean succeeded = bulkResult4Insert.isSucceeded();
        System.out.println("succeeded = " + succeeded);

//todo ======= 4. HTTP JEST 客户端 批量更新ES数据(批量更新需注意 doc节点 ) =======
        List<Update> updateList = new ArrayList<>();
        List<P.User> usersData4Update = Arrays.asList(
                new P.User(123451, "小111_001", 38,true,new Date(),"xxxxyyyzzz111"),
                new P.User(123462, "小离1_002", 44,false,new Date(),"a的撒旦"),
                new P.User(123473, "小二1_003", 55,true,new Date(),"213123213顶顶顶顶...."));

        for (P.User user : usersData4Update) {
            JSONObject object = (JSONObject)JSON.toJSON(user);
            JSONObject docPayload = new JSONObject().fluentPut("doc", object);
            Update update = new Update.Builder(docPayload)
                    .id(user.getId()+"")
                    .refresh(true)
                    .build();
            updateList.add(update);
        }

        Bulk bulk4Update = new Bulk.Builder().addAction(updateList)
                .defaultIndex("test_es_001")
                .defaultType("test_es_001").build();

        BulkResult bulkResult4Update = jestClient.execute(bulk4Update);
        System.out.println("bulkResult4Update.getErrorMessage() = " + bulkResult4Update.getErrorMessage());


//todo ======= 5. HTTP JEST 客户端 批量删除 ES数据  =======
        List<Delete> deleteList = new ArrayList<>();
        String[] id2deleteA = {"12345","12346","12347"};// 数据 ID 集合
        for (String id : id2deleteA) {
            Delete.Builder db  = new Delete.Builder(id).index("test_es_001").type("test_es_001");
            deleteList.add(db.build());
        }
        Bulk.Builder BB = new Bulk.Builder();
        BB.addAction(deleteList);
        BulkResult bulkResult4Delete = jestClient.execute(BB.build());

//todo ======= 6. HTTP JEST 客户端 根据ID 查询ES数据  =======

        List<Integer> integerList = Arrays.asList(123451, 123462, 123473);
        for (Integer id : integerList) {
            DocumentResult documentResult = jestClient.execute(new Get.Builder("test_es_001", id + "").type("test_es_001").build());
            P.User sourceAsObject = documentResult.getSourceAsObject(P.User.class);
            System.out.println("sourceAsObject = " + sourceAsObject);
        }

//todo ======= 7. HTTP JEST 客户端 根据 DSL 查询ES数据  =======

        Search.Builder searchBuilder4DSL =
                // dsl
                new Search.Builder(" {\"from\":0,\"size\":200,\"query\":{\"bool\":{\"filter\":[{\"bool\":{\"must\":[{\"wildcard\":{\"remark.keyword\":{\"wildcard\":\"*的撒*\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}}}")
                        .addIndex("test_es_001")
                        .addType("test_es_001");
        SearchResult execute2 = jestClient.execute(searchBuilder4DSL.build());
        List<String> sourceAsStringList = execute2.getSourceAsStringList();
        System.out.println("@@@@@@@@@@@@@@@@@sourceAsStringList@@@@@@@@@@@@@@@@@ = " + sourceAsStringList);

16.2 RestHighLevelClient的使用

  • MAVEN
	<dependency>
	    <groupId>org.elasticsearch.client</groupId>
	    <artifactId>elasticsearch-rest-high-level-client</artifactId>
	    <version>7.2.1</version>
	</dependency>
	<dependency>
	    <groupId>org.elasticsearch</groupId>
	    <artifactId>elasticsearch</artifactId>
	    <version>7.2.1</version>
	</dependency>
	<dependency>
	    <groupId>org.elasticsearch.client</groupId>
	    <artifactId>elasticsearch-rest-client</artifactId>
	    <version>7.2.1</version>
	</dependency>
  • 测试实体类
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class User{
    private Long id;
    private String name;
    private Integer age;
    private Boolean isValid;
    // 控制时间类型转JSON的输出格式
    @com.alibaba.fastjson.annotation.JSONField(format = "yyyy-MM-dd HH:mm:ss")
    private Date birth;
}
  • user索引的别名+表结构+settings
{
  "user" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birth" : {
          "type" : "date",
          "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        },
        "id" : {
          "type" : "long"
        },
        "isValid" : {
          "type" : "boolean"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "user",
        "creation_date" : "1660829464473",
        "number_of_replicas" : "1",
        "uuid" : "N-iPkQT7Rdm2nH7YJL2Kuw",
        "version" : {
          "created" : "7120099"
        }
      }
    }
  }
}

16.2.1 Bulk API 批量保存

  • Bulk API 同时支持 IndexRequest + UpdateRequest + DeleteRequest
 User user = User.builder().id(1L).name("zs").age(18).isValid(true).birth(new Date()).build();
 User user2 = User.builder().id(2L).name("zs xxx").age(18).isValid(true).birth(new Date()).build();
 User user3 = User.builder().id(3L).name("xxx 001").age(20).isValid(true).birth(new Date()).build();
 User user4 = User.builder().id(4L).name("zs xxx 001").age(24).isValid(true).birth(new Date()).build();
 User user5 = User.builder().id(5L).name("001 zs xxx").age(24).isValid(true).birth(new Date()).build();

 List<User> userList = Arrays.asList(user, user2, user3, user4,user5);
 BulkRequest bulkRequest = new BulkRequest("user","_doc");
 for (User obj : userList) {
     IndexRequest indexRequest = new IndexRequest();
     // 设置 写请求(增删改)的 刷新策略 默认NONE 请求无视数据是否已被刷新完成 直接结束请求
     indexRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.NONE);
     indexRequest.source(JSON.toJSONString(obj), XContentType.JSON).id(obj.getId()+"");
     bulkRequest.add(indexRequest);
 }
 BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
 BulkItemResponse[] bulkItemResponses = bulkResponse.getItems();
 RestStatus status = bulkResponse.status();
 int status1 = status.getStatus();

16.2.2 GetRequest

  • 根据 id 查询文档
 GetRequest getRequest = new GetRequest("user","4");
 GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
 User u1 = JSONObject.parseObject(response.getSourceAsBytes(), User.class);
 User u2 = JSONObject.parseObject(response.getSourceAsString(), User.class);

16.2.3 IndexRequest

  • 使用 IndexRequest 如果索引 _id 已经存在则进行更新,不存在就进行新增
User user_i = User.builder().id(1L).name("zs").age(18).isValid(true).birth(new Date()).build();
IndexRequest indexRequest = new IndexRequest("user");
indexRequest.source(JSON.toJSONString(user_i), XContentType.JSON).id(user_i.getId()+"");
IndexResponse indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
int status_i = indexResponse.status().getStatus();
System.out.println("status_i = " + status_i);

16.2.4 UpdateRequest(局部更新)

  • 使用UpdateRequest 修改或新增文档
  • upsert 属性 等于 false的时候, 如果文档不存在 即抛出 document_missing_exception
  • upsert 属性 等于 true的时候 , 如果文档不存在 就会进行插入操作
User user_001 = User.builder().id(10L).name("zs").age(180).isValid(true).birth(new Date()).build();
UpdateRequest updateRequest = new UpdateRequest("user","10");
updateRequest.docAsUpsert(true);
updateRequest.doc(JSON.toJSONString(user_001), XContentType.JSON);
System.out.println("updateRequest.toString() = " + updateRequest.toString());
UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
int status_upset = updateResponse.status().getStatus();
  • 局部更新:只会更新指定的字段值
// todo UpdateRequest 局部更新
UpdateRequest updateRequest = new UpdateRequest("user","1");
User user1 = new User.UserBuilder().id(1L).birth(new Date()).build();
updateRequest.doc(JSON.toJSONString(user1), XContentType.JSON);
UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
System.out.println("updateResponse = " + updateResponse);

16.2.5 DeleteRequest

  • 根据 id 删除文档
DeleteRequest deleteRequest = new DeleteRequest("user");
deleteRequest.id("10");
DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
int status_del = deleteResponse.status().getStatus();
System.out.println("status_del = " + status_del);

16.2.5 CountRequest

  • 根据查询条件获取结果集的数量
// todo 构造查询条件 SearchSourceBuilder
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());

// todo CountRequest 根据查询条件计算数据量
CountRequest countRequest = new CountRequest(new String[]{"user"});
countRequest.source(sourceBuilder);
CountResponse countResponse = restHighLevelClient.count(countRequest, RequestOptions.DEFAULT);
long count = countResponse.getCount();
System.out.println("count = " + count);

16.2.6 MultiGetRequest

  • 根据多个id,一次获取多个文档
MultiGetRequest multiGetRequest = new MultiGetRequest();
multiGetRequest.add("user","2");
multiGetRequest.add("user","3");
MultiGetResponse itemResponses = restHighLevelClient.mget(multiGetRequest, RequestOptions.DEFAULT);
for (MultiGetItemResponse response : itemResponses) {
    String sourceAsString = response.getResponse().getSourceAsString();
    System.out.println("sourceAsString = " + sourceAsString);
}

16.2.6 SearchRequest

  • 完整单个查询构造过程
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "zs xxx 001");
//        matchQueryBuilder.analyzer("");
matchQueryBuilder.operator(Operator.OR);
matchQueryBuilder.minimumShouldMatch("2");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
System.out.println("searchSourceBuilder = " + searchSourceBuilder);
//        searchSourceBuilder.fetchSource();
SearchRequest searchRequest = new SearchRequest(new String[]{"user"},searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
StringJoiner stringJoiner = new StringJoiner(",","[","]");
for (SearchHit searchHit : searchResponse.getHits()) {
    stringJoiner.add(searchHit.getSourceAsString());
}
List<User> userListA = JSONArray.parseArray(stringJoiner.toString(), User.class);
System.out.println("userListA.toString() = " + userListA.toString());

16.2.7 MultiSearchRequest

  • 一次发送多个查询请求SearchRequest
 // todo MSearch
 MultiSearchRequest mSR = new MultiSearchRequest();
 SearchRequest SR1 = new SearchRequest("user");
 mSR.add(SR1);
 SR1.source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()));

 SearchRequest SR2 = new SearchRequest("user");
 mSR.add(SR2);
 SR2.source(new SearchSourceBuilder().query(QueryBuilders.termQuery("name.keyword","zs xxx 001")));

 MultiSearchResponse multiSearchResponse = restHighLevelClient.msearch(mSR, RequestOptions.DEFAULT);
 for (MultiSearchResponse.Item item : multiSearchResponse.getResponses()) {
     SearchResponse response = item.getResponse();
     for (SearchHit hit : response.getHits()) {
         System.out.println("hit.getSourceAsString() = " + hit.getSourceAsString());
     }
 }

16.2.8 Scroll

  • Scroll游标滚动查询 应对 大数据量返回的情况
  • 运用 scroll 接口对大量数据返回 实现更好的分页
// todo 构造查询条件 SearchSourceBuilder
 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
 sourceBuilder.query(QueryBuilders.matchAllQuery());

 // todo CountRequest 根据查询条件计算数据量
 CountRequest countRequest = new CountRequest(new String[]{"user"});
 countRequest.source(sourceBuilder);
 CountResponse countResponse = restHighLevelClient.count(countRequest, RequestOptions.DEFAULT);
 long count = countResponse.getCount();
 System.out.println("count = " + count);

 //todo 根据 总数量 和 pageSize计算总页数
 int pageSize = 2;
 long totalPage = (count / pageSize) + (count % pageSize == 0?0:1);

 // todo Scroll 游标滚动查询 应对 大数据量返回的情况
 Scroll scroll = new Scroll(TimeValue.timeValueMinutes(2));// 设置游标id存活时间
 // 记录所有游标id
 List<String> scrollIdList = new ArrayList<>();
 // 添加sourceBuilder 并且设置每次返回的size
 sourceBuilder.size(pageSize).sort("_id", SortOrder.ASC);
 SearchRequest searchRequest = new SearchRequest(new String[]{"user"},sourceBuilder);

 List<User> returnList = new ArrayList<>();
 String scrollId = null;
 SearchResponse searchResponse;
 for (int i = 0; i < totalPage; i++) {
     if (i ==0){
         searchRequest.scroll(scroll);
         searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
     }else {
         SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
         scrollRequest.scroll(scroll);
         searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
     }
     scrollId = searchResponse.getScrollId();
     scrollIdList.add(scrollId);
     for (SearchHit hit : searchResponse.getHits()) {
         User user = JSONObject.parseObject(hit.getSourceAsString(), User.class);
         returnList.add(user);
     }
 }
 System.out.println("returnList = " + returnList);

 // todo Clear the scroll context once the scroll is completed
 ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
 clearScrollRequest.scrollIds(scrollIdList.stream().distinct().collect(Collectors.toList()));
 try {
     ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
     boolean succeeded = clearScrollResponse.isSucceeded();
     System.out.println("succeeded = " + succeeded);
  } catch (IOException e) {
     System.out.println("Clear the scroll context IOException...");
     e.printStackTrace();
 }

16.2.9 BoolQuery

  • 复合查询 主要由 must + mustNot + should + filter 四大部分组成
  • 可通过 minimumShouldMatch 属性设置 should 部分 最小生效条件个数,默认should不影响结果集
  • 复合查询 + 分页 + 简单聚合操作 DEMO
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.matchQuery("name","zs xxx 000").minimumShouldMatch("2").operator(Operator.OR))
         .filter(QueryBuilders.termsQuery("age",new long[]{18L,24L}));
//        boolQuery.mustNot()
//        boolQuery.should();
//        boolQuery.minimumShouldMatch(1);

SearchSourceBuilder ssb = new SearchSourceBuilder();
// 分页
// ssb.from(0).size(100);
ssb.query(boolQuery);

// 聚合操作
ssb.aggregation(AggregationBuilders.sum("ageSUM").field("age"));
ssb.aggregation(AggregationBuilders.count("totalCount").field("id"));
System.out.println("ssb = " + ssb);

SearchRequest sr = new SearchRequest(new String[]{},ssb);
SearchResponse sres = restHighLevelClient.search(sr, RequestOptions.DEFAULT);
for (SearchHit hit : sres.getHits()) {
    System.out.println("hit = " + hit.getSourceAsString());
}

// 获取聚合结果,需要强制为具体的实现类
Aggregation ageSUM = sres.getAggregations().get("ageSUM");
Aggregation totalCount = sres.getAggregations().get("totalCount");

17.cluster_block_exception

cluster_block_exception 
disk usage exceeded flood-stage watermark ,read_only_allow_delete
磁盘使用率超过了下限,导致节点不能插入和修改数据
  
    
--elastic.yml 加入如下配置    
cluster.routing.allocation.disk.threshold_enabled: true
#到达后节点数据无法写入
cluster.routing.allocation.disk.watermark.flood_stage: 2gb
# 到达就停止创建新shards
cluster.routing.allocation.disk.watermark.low: 30gb
#达后会迁移现有shards到其他节点
cluster.routing.allocation.disk.watermark.high: 20gb

Kibana控制台输入 取消只读
  PUT _settings
  {
    "index": {
        "blocks": {
           "read_only_allow_delete": "false"
         }
    }
  }      

18.SpringDataElasticsearch常用注解

  • @Document
@Document:作用在类上,标记实体类为文档对象
         indexName:数据库名
         type:表名
         shards:分片数
         replicas:副本数
  • @Field
@Field:作用在成员变量上,标记为文档的字段,并定制映射属性
         type:字段类型,取值是枚举,FieldType
         index:表示是否索引,默认为true
         store:表示是否存储,默认为false
         analyzer:分词器名称
         searchAnalyzer:指定字段使用搜索时的分词器
         format:时间类型的字段的格式化 
  • @Field(type = FieldType.Keyword) 和 @Field(type = FieldType.Text)区别
查看 Keyword 和 Text 区别
  • @Field(store = true)
 不管我们将store值设置为true或false,elasticsearch都会将该字段存储到Fields域中
 
(1)store = false时,默认设置;那么给字段只存储在"_source"的Field域中
(2)store = true时,该字段的value会存储在一个跟_source平级的独立Fields域中,
    同时也会存储在_source中,所以有两份拷贝
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值