Elasticsearch倒排索引、kibana、ES检索常用命令、ES进阶检索、ES聚合、SpringBoot整合Elasticsearch

ProjectNo

已于 2022-03-30 15:02:31 修改

阅读量1.3k

点赞数 2

分类专栏： java 文章标签： elasticsearch 索引增删改查聚合索引索引映射 Java整合ES

于 2021-09-24 18:31:51 首次发布

本文链接：https://blog.csdn.net/projectNo/article/details/120411648

版权

java 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

本文以Elasticsearch 7.5 官方文档为准，其他版本请查看官方文档
https://www.elastic.co/guide/index.html

在这里插入图片描述

1、什么是Elasticsearch？

Elasticsearch是基于Elastic堆栈核心的分布式搜索和分析引擎。Logstash和Beats有助于收集、聚合和丰富数据，并将其存储在Elasticsearch中。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。无论对有结构化或非结构化文本、数字数据或地理空间数据，Elasticsearch都可以以支持快速搜索，以方式高效地存储和索引这些数据。可以简单的数据检索和聚合信息来分析数据的趋势和模式。随着数据量和查询量的增长，Elasticsearch的分布式特性使您的部署能够无缝地进行。

Elasticsearch可以在各种用例或场景中快速且灵活的处理数据：

在应用程序或者网站中用于搜索
存储和分析日志、指标和安全事件数据
使用机器学习自动实时建模数据
作为存储引擎自动化业务工作流
作为地理信息系统（GIS）管理、集成和分析空间信息
作为生物信息学研究工具存储和处理遗传数据

Elasticsearch是一个分布式的文档存储系统，不是将信息存储为列数据行，而是存储为序列化的JSON格式的文档，当集群中有多个Elasticsearch节点时，存储的文档分布在整个集群中，可以从任一节点快速访问。

当一个文件被存储时，它会被编入索引，并在秒级别以内进行完全所搜。Elasticsearch使用反向索引的数据结构，支持非常快的全文搜索，用倒排索引列出所有文档中出现的每个唯一单词，并标识每个单词出现在哪些文档，这个索引可以看做是优化文档的集合，每个文档都是字段的集合，每个字段包含数据的键值对。

Elasticsearch索引每个字段中的所有数据，不同的索引字段有不同的数据存储结构，例如文本存储在反向索引中，数字和地理字段存储在BKD树中。Elasticsearch 之所以查找速度快是和这些数据结构有很大的关系。

Elasticsearch 底层都是依赖于Apache Lucene，Elasticsearch 封装了Lucene提供了简单的REST API支持结构化查询、全文查询和结合这两者的复杂查询。 Lucene 能实现全文搜索主要是因为它实现了倒排索引的查询结构。

（1）倒排索引

倒排索引也是索引，既然是索引那就是快速检索所需数据，倒排索引的原理是通过分词器把文档分割成单独的不重复的词，将这些词排序成列表，标记每个词出现在哪些文档。例如有下面几个文档，内容如下：

士兵突击
士兵突击特别篇
士兵侦察
士兵突击特别篇报道

词	记录
士兵	1，2，3，4
突击	1，2，4
特别篇	2，4
侦察	3
报道	4

这时我们查找“士兵突击”的时候就会先把查找内容分词，然后查找对应的词，再去找到对应的记录；如果查找“士兵特别篇”，查找不到完全对应的内容，就会根据查找相关性得分从高到低返回匹配的的词的记录。

这种结构由文档中所有不重复词的列表构成，对于其中每个词都有一个文档列表与之关联。这种由属性值来确定记录的位置的结构就是倒排索引。带有倒排索引的文件我们称为倒排文件。

2、基本概念

索引（名词）

类似于传统关系型数据库中的一个数据库，是存储文档的地方。

索引（动词）

索引一个文档就是存储一个文档到索引（名词）中。

文档

Elasticsearch中的主要实体数据叫文档。

3、安装Elasticsearch和kibana

https://www.elastic.co/start
在这里插入图片描述
下载完成在bin路径下启动kibana（http://127.0.0.1:5601）和Elasticsearch（http://127.0.0.1:9200）

kibana是一个Elasticsearch的可视化界面，我们打开控制台来发送一些请求，如下步骤kibana => Management => Dev tools => Console：
在这里插入图片描述

4、检索常用命令

（1）查看_cat

GET /_cat/nodes 查看所有节点
GET /_cat/health 查看健康状态
GET /_cat/master 查看主节点
GET /_cat/indices 查看所有节点信息

例如：

GET _cat/nodes

输出：

127.0.0.1 19 29 2    cdfhilmrstw * P3951098A244

（2）索引（保存）文档

PUT方式索引和修改：

给索引为test_info添加，类别user，id为1的文档：

PUT /test_info/user/1
{
  "name": "wang"
}

输出：

{
    "_index": "test_info",
    "_type": "user",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

result为created，此时为索引文档，如果相同的请求再发送一次输出：

{
    "_index": "test_info",
    "_type": "user",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1
}

result编程和updated，且版本（_version）和序列号（_seq_no）都变成了2，此时是更新文档，如果当前id文档不存在就是索引文档，反之就是更新。

POST方式索引和修改：

POST /test_info/user
{
  "name": "zhang"
}

POST修改可以不传id，PUT方式必须传id，POST不传id会自动生成一个id然后索引文档，如果传id处理方式和PUT一致。

（3）查询

查询文档：获取索引为test_info，类型为user，id为1的文档

GET /test_info/user/1

结果如下

{
  "_index" : "test_info", //文档所在索引
  "_type" : "user",  //文档类型
  "_id" : "1", //id
  "_version" : 2, //版本号
  "_seq_no" : 2, //并发版本控制，每次更新会+1，用来做乐观锁
  "_primary_term" : 1, //同上，主分片重新分配，重启就会变化
  "found" : true,
  "_source" : { //文档内容
    "name" : "wang"
  }
}

（4）修改

之前保存的时候用put和post可以修改，除了这两种方式还可以这样更新文档：
更新test_info索引下，类型为user，id为1的文档：

POST test_info/user/1/_update
{
  "doc": {
    "name": "li"
  }
}

结果如下：

{
  "_index" : "test_info",
  "_type" : "user",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

_update会对比文档，如果修改的文档和已索引文档一致，就不会更新，而put和post不带_update带id的更新方式不会对比文档直接更新文档：

{
  "_index" : "test_info",
  "_type" : "user",
  "_id" : "1",
  "_version" : 3,
  "result" : "noop", //无操作
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

（5）删除

删除文档：

DELETE test_info/user/1

删除索引：

DELETE test_info

（6）批量导入

语法格式（两行为一个整体）：

{action:{metadata}}
{request body  }

{action:{metadata}}
{request body  }

例如：给索引为test_info，类型为user索引两个文档：

POST /test_info/user/_bulk
{"index":{"_id":1}}
{"name":"Join"}
{"index":{"_id":2}}
{"name":"Doe"}

结果如下：

{
  "took" : 929, //花费时间，毫秒
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test_info",
        "_type" : "user",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test_info",
        "_type" : "user",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

bulk复杂操作：

POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

运行结果：

{
  "took" : 1425,
  "errors" : false,
  "items" : [
    {
      "delete" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",	//删除的文档未找到
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created", //索引成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "JqzkC3wBHvnj4b2Hv2wn",
        "_version" : 1,
        "result" : "created", //索引成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated", //修改成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

5、进阶检索

测试数据：https://blog.csdn.net/projectNo/article/details/120414848
复制过来批量插入：
在这里插入图片描述

查看索引：

GET _cat/indices

yellow open bank                            T0LimmutSouXMeoqOSrQ1g 1 1 1000     0 372.6kb 372.6kb66.7kb

（1）检索文档

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html

（2）search

search检索文档的两种方式：

1、通过REST request uri 发送搜索参数（uri +检索参数）；

GET bank/_search?q=*&sort=account_number:asc

q=*：查询所有
sort：排序字段
asc：升序

2、通过REST request body发送参数（uri+请求体）；

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

查询结果如下：
在这里插入图片描述

参数	作用
took	花费时间：单位毫秒
timed_out	是否超时
_shards	多少分片被搜索了，以及多少成功/失败的搜索分片
hits.max_score	获取文档相关性最高得分
hits.total.value	多少匹配文档被找到
hits.sort	结果的排序key（列），没有的话按照score排序
hits._score	相关得分

（3）from和size

默认情况下只返回前10个文档，如果要分页可以在请求中指定from和size参数：

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

from和size类似于MySQL中limit的用法。

（4）source

返回部分字段：

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["balance","firstname"]  
}

结果：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "firstname" : "Amber",
          "balance" : 39225
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "firstname" : "Hattie",
          "balance" : 5686
        }
      },
      省略......
    ]
  }
}

（5）query

1）match
上面用"query": { "match_all": {} },可以查询到所有文档，如果更复杂的匹配可以用match，如果是非字符串，会进行精确匹配。如果是字符串，会进行全文检索：
例如查询address中包含mill lane的文档：

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

全文检索最终会按照评分进行排序，会对检索条件进行分词匹配：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      省略......
    ]
  }
}

2）match_phrase
要匹配整个短语，不进行分词，可以使用match_phrase：

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

结果只匹配到1个文档，相关得分最大的是9.507477，它的address的值是198 Mill Lane，不会匹配到address包含mill或者address包含Lane的文档：

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

3）match+keyword

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill" 
    }
  }
}

结果一条文档也没有检索到，文本字段的匹配如果使用keyword，匹配的条件就是要显示字段的全部值，要进行精确匹配的。
4）bool
要构造更复杂的查询，可以使用bool查询来组合多个查询条件。可以根据必须（must）匹配、应该（should）匹配或必须不（must_not）匹配指定标准。
查询年龄必须为40且state不为ID的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 43,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "474",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 474,
          "balance" : 35896,
          "firstname" : "Obrien",
          "lastname" : "Walton",
          "age" : 40,
          "gender" : "F",
          "address" : "192 Ide Court",
          "employer" : "Suremax",
          "email" : "obrienwalton@suremax.com",
          "city" : "Crucible",
          "state" : "UT"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "479",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 479,
          "balance" : 31865,
          "firstname" : "Cameron",
          "lastname" : "Ross",
          "age" : 40,
          "gender" : "M",
          "address" : "904 Bouck Court",
          "employer" : "Telpod",
          "email" : "cameronross@telpod.com",
          "city" : "Nord",
          "state" : "MO"
        }
      },
      省略......
    ]
  }
}

must和should会影响相关性得分，分数越高，文档越符合搜索条件，默认Elasticsearch会根据得分由高到低返回文档；must_not子句中的条件被视为筛选器，它会影响文档是否包含在结果中，但不会影响文档的评分。
5）bool/filter
返回balance值在1000和2000之间的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 1000,
            "lte": 2000
          }
        }
      }
    }
  }
}

结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "87",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 87,
          "balance" : 1133,
          "firstname" : "Hewitt",
          "lastname" : "Kidd",
          "age" : 22,
          "gender" : "M",
          "address" : "446 Halleck Street",
          "employer" : "Isologics",
          "email" : "hewittkidd@isologics.com",
          "city" : "Coalmont",
          "state" : "ME"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "417",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 417,
          "balance" : 1788,
          "firstname" : "Wheeler",
          "lastname" : "Ayers",
          "age" : 35,
          "gender" : "F",
          "address" : "677 Hope Street",
          "employer" : "Fortean",
          "email" : "wheelerayers@fortean.com",
          "city" : "Ironton",
          "state" : "PA"
        }
      },
      省略......
    ]
  }
}

filter不会影响相关性得分，但是会过滤结果。
6）term
和match类似，term也可以用属性检索，但是全文检索建议用match，而一些精确的字段，比如年龄、工资或者日期，非text字段使用term

GET bank/_search
{
  "query": {
    "term": {
      "account_number": 970
    }
  }
}

结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

6、聚合 Aggregations

聚合框架是基于搜索查询，提供了从数据中分组和提取数据的能力，以构建复杂的数据摘要，类似于SQL Group by和SQL聚合函数。
在elasticsearch中，执行搜索返回hits（命中结果），并且同时返回聚合结果，把以响应中的所有hits（命中结果）分隔开的能力。这是非常强大且有效的，你可以执行查询和多个聚合，并且在一次使用中得到各自的（任何一个的）返回结果，使用一次简洁和有效的API来避免网络往返。
聚合语法：

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的类型 -->
            <aggregation_body> <!--聚合体：对哪些字段进行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!--元 -->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

聚合分为四种类型：

指标聚合 Metrics Aggregations
桶聚合 Bucket Aggregations
矩阵聚合 Matrix Aggregations
管道集合 Pipeline Aggregations

（1）指标聚合

对一个数据集求最大、最小、和或平均值等指标的聚合，Elasticsearch 7.5指标聚合类型如下：
在这里插入图片描述
1）max、min、sum和avg
最大值聚合：

GET bank/_search?size=0
{
  "aggs": { 
    "ageAgg": {  
      "max": {
        "field": "balance"
      }
    }
  }
}

size=0 不返回hits，结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "value" : 49989.0
    }
  }
}

2）有效文档计数 count

GET bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}

结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "value" : 1000
    }
  }
}

3）cardinality 值去重计数

GET bank/_search?size=0
{
  "aggs": {
    "age_cardinality": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}

结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_cardinality" : {
      "value" : 21
    }
  }
}

4）stats 统计 count、max、min、avg和sum 5个值

GET bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "stats": {
        "field": "age"
      }
    }
  }
}

结果：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "count" : 1000,
      "min" : 20.0,
      "max" : 40.0,
      "avg" : 30.171,
      "sum" : 30171.0
    }
  }
}

5）Percentiles 占比百分位对应的值统计

GET bank/_search?size=0
{
  "aggs": {
    "age_percentiles": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}

对指定字段（脚本）的值按从小到大累计每个值对应的文档数的占比，默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值，例如"50.0" : 31.0 age小于31的占比为50%，或者50%的age小于31。
结果：

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_percentiles" : {
      "values" : {
        "1.0" : 20.0,
        "5.0" : 21.0,
        "25.0" : 25.0,
        "50.0" : 31.0,
        "75.0" : 35.0,
        "95.0" : 39.0,
        "99.0" : 40.0
      }
    }
  }
}

6）Percentiles rank 统计值小于等于指定值的文档占比
例如：统计年龄小于30和35的文档的占比

GET bank/_search?size=0
{
  "aggs": {
    "aggs_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [30, 35]
      }
    }
  }
}

结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "aggs_perc_rank" : {
      "values" : {
        "30.0" : 49.0,
        "35.0" : 75.8
      }
    }
  }
}

其他指标聚合请参照官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-aggregations-metrics.html

（2）桶聚合

桶聚合不像指标聚合那样计算字段上的值，而是创建文档的Bucket，每个Bucket都与一个标准（取决于聚合类型）相关联，该标准确定当前上下文中的文档是否“落入”其中，换句话说，桶聚合有效地定义了文档集。
与指标聚合相反，桶聚合可以保存子聚合。这些子聚合将针对其“父”桶聚合创建的Bucket进行聚合。

（3）子聚合（基于聚合的结果集合）

例如不但统计年龄分布，还要统计年龄分布的平均工资：

GET bank/_search?size=0
{
  "query": {
    "match": {
      "state": "AK"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 5
      },
      "aggs": {
        "balanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

结果如下，每个区间都会统计平均工资：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 22,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 12,
      "buckets" : [
        {
          "key" : 20,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 41416.0
          }
        },
        {
          "key" : 26,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 14901.5
          }
        },
        {
          "key" : 33,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 32760.5
          }
        },
        {
          "key" : 36,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 14936.0
          }
        },
        {
          "key" : 37,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 16099.5
          }
        }
      ]
    }
  }
}

（4）复杂子聚合（各种套娃）

统计所有年龄分布，并且这些年龄段中gender为M的平均薪资和gender为F的平均薪资，以及这个年龄段的总体平均薪资：

GET bank/_search?size=0
{
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 5
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

注意文本字段应该用.keyword进行精确匹配，否则会报错，结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 716,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "F",
                "doc_count" : 38,
                "balanceAvg" : {
                  "value" : 26348.684210526317
                }
              },
              {
                "key" : "M",
                "doc_count" : 22,
                "balanceAvg" : {
                  "value" : 23405.68181818182
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 25269.583333333332
          }
        },
        省略......
      ]
    }
  }
}

7、Mapping字段映射

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping.html

（1）映射是定义文档及其包含的字段如何存储和索引的过程。例如，使用映射来定义：

哪些字符串字段应视为全文字段。
哪些字段包含数字、日期或地理位置。
日期值的格式。

（2）字段数据类型

每个字段都有一个数据类型，例如：

简单的类型：text、keyword、date、long、double、boolean或者ip

一种支持JSON层次结构的类型，如object或nested。

或者是一种特殊类型，如geo_point、geo_shape或completion。

为不同的目的以不同的方式索引同一字段通常很有用。例如，字符串字段可以作为全文搜索的文本字段索引，也可以作为排序或聚合的关键字字段索引。可以使用标准分析器、英语分析器和法语分析器为字符串字段编制索引，也可以使用插件分词器，例如中文我们一般用IK分词器。
（3）查看映射

GET bank/_mapping

结果：

{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

mappings下面的properties就是我们的映射，例如account_number字段映射类型为long，address类型为text，text类型可以做全文检索，fields中keyword也可以用精确匹配。
（4）创建显示映射

注意：Elasticsearch 7.0之后移除了type，也就是说，索引下面直接保存文档，类型被废弃掉了。

PUT /my-index
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

执行，结果如下：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my-index"
}

再来查看这个索引的映射：

GET my-index/_mapping

结果：

{
  "my-index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}

（5）新增索引映射
在之前已有的索引上再加一个字段的索引：

PUT /my-index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false
    }
  }
}

{
  "acknowledged" : true
}

（6）修改索引映射
除了支持的映射参数外，不能更改现有字段的映射或字段类型，更改现有字段可能会使已编制索引的数据无效。
如果真的需要更改字段的映射，使用正确的映射创建新索引，并将数据重新索引到该索引中。
例如我们要修改my-index的email为text类型，可以这样操作：
1）先创建新索引映射：

PUT /my-index-new
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

2）数据迁移，source为之前索引，dest为新索引：

POST _reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-index-new"
  }
}

如果之前索引下面还有类型可以这样操作：

POST reindex
{
  "source":{
      "index":"bank",
      "type":"account"
   },
  "dest":{
      "index":"new-bank"
   }
}

3）删除老索引

DELETE my-index

8、文本分词

（1）分词器

文本分词是将非结构化文本（如正文或产品描述）转换为针对搜索进行优化的结构化格式的过程。何时配置文本分词：Elasticsearch在索引或搜索文本字段时执行文本分词。如果索引不包含文本字段，则无需进一步设置；但是，如果使用文本字段或文本搜索未按预期返回结果，则配置文本分词通常会有所帮助。
分词也是我们倒排索引的一个处理方式，例如whitespace tokenizer分词器，遇到空白字符时分割文本。它会将文本"Just do it."分割为[Just ,do ,it]，

POST _analyze
{
  "analyzer": "standard",
  "text": "Just do it."
}

结果：

{
  "tokens" : [
    {
      "token" : "just",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "do",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "it",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

tokenizer（分词器）还负责记录各个terms(词条)的顺序或position位置（用于phrase短语和word proximity词近邻查询），以及term（词条）所代表的原始word（单词）的start（起始）和end（结束）的character offsets（字符串偏移量），用于高亮显示搜索的内容。

（2）中文词器插件

Elasticsearch中有很多分词器，但是我们中文分词一般都不适用，例如：

POST _analyze
{
  "analyzer": "standard",
  "text": "士兵突击特别篇报道"
}

结果：

{
  "tokens" : [
    {
      "token" : "士",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "兵",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "突",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "击",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "特",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "别",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "篇",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "报",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "道",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    }
  ]
}

按照每一个字分词这样很不友好，检索效率也不高，索引我们引入一个分词插件，IK分词器：https://github.com/medcl/elasticsearch-analysis-ik/releases，下载对应版本并解压到Elasticsearch目录下的plugins下，重启Elasticsearch即可，再来尝试也是ik_smart分词中文：

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "士兵突击特别篇报道"
}

结果：

{
  "tokens" : [
    {
      "token" : "士兵",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "突击",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "特别篇",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "报道",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

还有一种最大化分词器ik_max_word：

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "士兵突击特别篇报道"
}

结果：

{
  "tokens" : [
    {
      "token" : "士兵",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "突击",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "特别篇",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "特别",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "篇",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "报道",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

（3）自定义分词器

如果以上还不能满足需求，那么IK分词插件还可以自定义词库，通过配置文件去扩展词汇或者访问其他服务器资源词库：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry> 
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后，需要重启elasticsearch容器，否则修改不生效。

9、SpringBoot整合Elasticsearch

说了这么终于上正菜
（1）新建maven工程，点击下一步
在这里插入图片描述
（2）工程起名，点击完成

还是来参考官方文档：https://www.elastic.co/guide/index.html
点击Elasticsearch Clients

点击Java REST Client

官方依赖：

（3）pom完整依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.2.7.RELEASE</version>
	</parent>
	<groupId>com.example</groupId>
	<artifactId>elasticsearch-demo</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>demo</name>
	<properties>
		<java.version>1.8</java.version>
		<elasticsearch.version>7.14.2</elasticsearch.version>
	</properties>
	<dependencies>
		<!-- web-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<!-- test-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<!-- elasticsearch-->
		<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>elasticsearch-rest-high-level-client</artifactId>
			<version>7.14.2</version>
		</dependency>
		<!-- lombok-->
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
		</dependency>
		<!-- fastjson-->
		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>fastjson</artifactId>
			<version>1.2.60</version>
		</dependency>
	</dependencies>
</project>

（4）完善工程
创建启动类Application.java

@SpringBootApplication
public class Application {
	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}
}

新建测试类DemoApplicationTests.java

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {

}

（5）创建Elasticsearch配置类ESConfig.java
在这里插入图片描述

@Configuration
public class ESConfig {
    @Bean
    public RestHighLevelClient esRestClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

（6）工程结构
在这里插入图片描述

（1）Java Map保存一个文档

在这里插入图片描述
执行（同步）示例：

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {
	@Autowired
	ESConfig esConfig;
	@Test
	void indexDoc01() throws IOException {
		//实例一个Map
		Map<String, Object> jsonMap = new HashMap<>();
		//存放数据
		jsonMap.put("user", "kimchy");
		jsonMap.put("postDate", new Date());
		jsonMap.put("message", "trying out Elasticsearch");
		IndexRequest indexRequest = new IndexRequest("posts") //传入索引
				.id("1").source(jsonMap); //传入id和数据
		System.out.println(indexRequest.toString()); //输出索引请求
		IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT); //保存并获取结果
		System.out.println(index.toString()); //输出索引结果
	}
}

执行结果：

index {[posts][_doc][1], source[{"postDate":"2021-09-24T06:57:46.664Z","message":"trying out Elasticsearch","user":"kimchy"}]}
IndexResponse[index=posts,type=_doc,id=1,version=4,result=updated,seqNo=3,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

用Kibana查看保存的数据：

GET posts/_search

结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "posts",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "postDate" : "2021-09-24T09:52:40.490Z",
          "message" : "trying out Elasticsearch",
          "user" : "kimchy"
        }
      }
    ]
  }
}

（2）Java bean保存一个文档

如何把Java bean存进Elasticsearch中，例如有个员工实体，员工实体中有部门实体的信息：
新增两个实体类：
DepartmentDao.java

@Data
public class DepartmentDao {
    private Long id;
    private String departName;
}

UserDao.java
@Data
public class UserDao {
    private Long id;
    private String name;
    private int age;
    private String email;
    private DepartmentDao dept;
}

在这里插入图片描述
编写测试方法：

	@Test
	void indexDoc02() throws IOException {
		DepartmentDao department = new DepartmentDao();
		department.setDepartName("人事部");
		department.setId(1L);
		UserDao user = new UserDao();
		user.setDept(department);
		user.setAge(18);
		user.setEmail("12345678@qq.com");
		user.setName("张三");
		user.setId(1L);
		String jsonString = JSON.toJSONString(user);
		IndexRequest indexRequest = new IndexRequest("users")
				.id("1").source(jsonString, XContentType.JSON);
		System.out.println(indexRequest.toString());
		IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT);
		System.out.println(index.toString());
	}

运行输出：

index {[users][_doc][1], source[{"age":18,"dept":{"departName":"人事部","id":1},"email":"12345678@qq.com","id":1,"name":"张三"}]}
IndexResponse[index=users,type=_doc,id=1,version=1,result=updated,seqNo=1,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

用Kibana查看保存的数据：

GET users/_search

结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "users",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "age" : 18,
          "dept" : {
            "departName" : "人事部",
            "id" : 1
          },
          "email" : "12345678@qq.com",
          "id" : 1,
          "name" : "张三"
        }
      }
    ]
  }
}

（3）Java 检索索引文档

	@Test
	void find01() throws IOException {
		// 创建检索请求
		SearchRequest searchRequest = new SearchRequest();
		searchRequest.indices("bank");
		SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
		// 构造检索条件
//        sourceBuilder.query(); //检索
//        sourceBuilder.from(); //起始位置
//        sourceBuilder.size(); //获取数量
//        sourceBuilder.aggregation(); //聚合
		//构建terms聚合
		TermsAggregationBuilder agg1 = AggregationBuilders.terms("ageAgg").field("age").size(10);// 聚合名称和聚合文档数量
		// 参数为AggregationBuilder
		sourceBuilder.aggregation(agg1);
		sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
		System.out.println(sourceBuilder.toString());
		searchRequest.source(sourceBuilder);
		// 执行检索
		SearchResponse response = esConfig.esRestClient().search(searchRequest, RequestOptions.DEFAULT);
		// 分析响应结果
		System.out.println(response.toString());
	}

运行输出：

{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"ageAgg":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"lterms#ageAgg":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]}}}

我们格式化一下返回数据：

{
    "took":2,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":4,
            "relation":"eq"
        },
        "max_score":5.4032025,
        "hits":[
            {
                "_index":"bank",
                "_type":"account",
                "_id":"970",
                "_score":5.4032025,
                "_source":{
                    "account_number":970,
                    "balance":19648,
                    "firstname":"Forbes",
                    "lastname":"Wallace",
                    "age":28,
                    "gender":"M",
                    "address":"990 Mill Road",
                    "employer":"Pheast",
                    "email":"forbeswallace@pheast.com",
                    "city":"Lopezo",
                    "state":"AK"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"136",
                "_score":5.4032025,
                "_source":{
                    "account_number":136,
                    "balance":45801,
                    "firstname":"Winnie",
                    "lastname":"Holland",
                    "age":38,
                    "gender":"M",
                    "address":"198 Mill Lane",
                    "employer":"Neteria",
                    "email":"winnieholland@neteria.com",
                    "city":"Urie",
                    "state":"IL"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"345",
                "_score":5.4032025,
                "_source":{
                    "account_number":345,
                    "balance":9812,
                    "firstname":"Parker",
                    "lastname":"Hines",
                    "age":38,
                    "gender":"M",
                    "address":"715 Mill Avenue",
                    "employer":"Baluba",
                    "email":"parkerhines@baluba.com",
                    "city":"Blackgum",
                    "state":"KY"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"472",
                "_score":5.4032025,
                "_source":{
                    "account_number":472,
                    "balance":25571,
                    "firstname":"Lee",
                    "lastname":"Long",
                    "age":32,
                    "gender":"F",
                    "address":"288 Mill Street",
                    "employer":"Comverges",
                    "email":"leelong@comverges.com",
                    "city":"Movico",
                    "state":"MT"
                }
            }
        ]
    },
    "aggregations":{
        "lterms#ageAgg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":38,
                    "doc_count":2
                },
                {
                    "key":28,
                    "doc_count":1
                },
                {
                    "key":32,
                    "doc_count":1
                }
            ]
        }
    }
}

但是获取的结果怎么转成Javabean：
创建实体类：

@Data
public class Account {
    private int accountNumber;
    private int balance;
    private String firstname;
    private String lastname;
    private int age;
    private String gender;
    private String address;
    private String employer;
    private String email;
    private String city;
    private String state;
}

在方法后面加上：

		// 获取java bean
		SearchHits hits = response.getHits();
		SearchHit[] hits1 = hits.getHits();
		for (SearchHit hit : hits1) {
			hit.getId();
			hit.getIndex();
			String sourceAsString = hit.getSourceAsString();
			Account account = JSON.parseObject(sourceAsString, Account.class);
			System.out.println(account);
		}

执行结果：

Account(accountNumber=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account(accountNumber=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account(accountNumber=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account(accountNumber=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)

ProjectNo

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch倒排索引、kibana、ES检索常用命令、ES进阶检索、ES聚合、SpringBoot整合Elasticsearch

什么是Elasticsearch？Elasticsearch是基于Elastic堆栈核心的分布式搜索和分析引擎。Logstash和Beats有助于收集、聚合和丰富数据，并将其存储在Elasticsearch中。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。无论对有结构化或非结构化文本、数字数据或地理空间数据，Elasticsearch都可以以支持快速搜索，以方式高效地存储和索引这些数据。可以简单的数据检索和聚合信息来分析数据的趋势和模式。随着数据量和查询量的增长，Elasticsea
复制链接

扫一扫