Elasticsearch倒排索引、kibana、ES检索常用命令、ES进阶检索、ES聚合、SpringBoot整合Elasticsearch

本文以Elasticsearch 7.5 官方文档为准,其他版本请查看官方文档
https://www.elastic.co/guide/index.html

在这里插入图片描述

1、什么是Elasticsearch?

Elasticsearch是基于Elastic堆栈核心的分布式搜索和分析引擎。Logstash和Beats有助于收集、聚合和丰富数据,并将其存储在Elasticsearch中。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。无论对有结构化或非结构化文本、数字数据或地理空间数据,Elasticsearch都可以以支持快速搜索,以方式高效地存储和索引这些数据。可以简单的数据检索和聚合信息来分析数据的趋势和模式。随着数据量和查询量的增长,Elasticsearch的分布式特性使您的部署能够无缝地进行。

Elasticsearch可以在各种用例或场景中快速且灵活的处理数据:

  • 在应用程序或者网站中用于搜索
  • 存储和分析日志、指标和安全事件数据
  • 使用机器学习自动实时建模数据
  • 作为存储引擎自动化业务工作流
  • 作为地理信息系统(GIS)管理、集成和分析空间信息
  • 作为生物信息学研究工具存储和处理遗传数据

Elasticsearch是一个分布式的文档存储系统,不是将信息存储为列数据行,而是存储为序列化的JSON格式的文档,当集群中有多个Elasticsearch节点时,存储的文档分布在整个集群中,可以从任一节点快速访问。

当一个文件被存储时,它会被编入索引,并在秒级别以内进行完全所搜。Elasticsearch使用反向索引的数据结构,支持非常快的全文搜索,用倒排索引列出所有文档中出现的每个唯一单词,并标识每个单词出现在哪些文档,这个索引可以看做是优化文档的集合,每个文档都是字段的集合,每个字段包含数据的键值对。

Elasticsearch索引每个字段中的所有数据,不同的索引字段有不同的数据存储结构,例如文本存储在反向索引中,数字和地理字段存储在BKD树中。Elasticsearch 之所以查找速度快是和这些数据结构有很大的关系。

Elasticsearch 底层都是依赖于Apache Lucene,Elasticsearch 封装了Lucene提供了简单的REST API支持结构化查询、全文查询和结合这两者的复杂查询。 Lucene 能实现全文搜索主要是因为它实现了倒排索引的查询结构。

(1)倒排索引

倒排索引也是索引,既然是索引那就是快速检索所需数据,倒排索引的原理是通过分词器把文档分割成单独的不重复的词,将这些词排序成列表,标记每个词出现在哪些文档。例如有下面几个文档,内容如下:

  1. 士兵突击
  2. 士兵突击特别篇
  3. 士兵侦察
  4. 士兵突击特别篇报道
记录
士兵1,2,3,4
突击1,2,4
特别篇2,4
侦察3
报道4

这时我们查找“士兵突击”的时候就会先把查找内容分词,然后查找对应的词,再去找到对应的记录;如果查找“士兵特别篇”,查找不到完全对应的内容,就会根据查找相关性得分从高到低返回匹配的的词的记录。

这种结构由文档中所有不重复词的列表构成,对于其中每个词都有一个文档列表与之关联。这种由属性值来确定记录的位置的结构就是倒排索引。带有倒排索引的文件我们称为倒排文件。

2、基本概念

  • 索引(名词)

类似于传统关系型数据库中的一个数据库,是存储文档的地方。

  • 索引(动词)

索引一个文档就是存储一个文档到索引(名词)中。

  • 文档

Elasticsearch中的主要实体数据叫文档。

3、安装Elasticsearch和kibana

https://www.elastic.co/start
在这里插入图片描述
下载完成在bin路径下启动kibana(http://127.0.0.1:5601)和Elasticsearch(http://127.0.0.1:9200)
在这里插入图片描述

kibana是一个Elasticsearch的可视化界面,我们打开控制台来发送一些请求,如下步骤kibana => Management => Dev tools => Console:
在这里插入图片描述
在这里插入图片描述

4、检索常用命令

(1)查看_cat

  • GET /_cat/nodes 查看所有节点

  • GET /_cat/health 查看健康状态

  • GET /_cat/master 查看主节点

  • GET /_cat/indices 查看所有节点信息

例如:

GET _cat/nodes

输出:

127.0.0.1 19 29 2    cdfhilmrstw * P3951098A244

(2)索引(保存)文档

PUT方式索引和修改:

给索引为test_info添加,类别user,id为1的文档:

PUT /test_info/user/1
{
  "name": "wang"
}

输出:

{
    "_index": "test_info",
    "_type": "user",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

result为created,此时为索引文档,如果相同的请求再发送一次输出:

{
    "_index": "test_info",
    "_type": "user",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1
}

result编程和updated,且版本(_version)和序列号(_seq_no)都变成了2,此时是更新文档,如果当前id文档不存在就是索引文档,反之就是更新。

POST方式索引和修改:

POST /test_info/user
{
  "name": "zhang"
}

POST修改可以不传id,PUT方式必须传id,POST不传id会自动生成一个id然后索引文档,如果传id处理方式和PUT一致。

(3)查询

查询文档:获取索引为test_info,类型为user,id为1的文档

GET /test_info/user/1

结果如下

{
  "_index" : "test_info", //文档所在索引
  "_type" : "user",  //文档类型
  "_id" : "1", //id
  "_version" : 2, //版本号
  "_seq_no" : 2, //并发版本控制,每次更新会+1,用来做乐观锁
  "_primary_term" : 1, //同上,主分片重新分配,重启就会变化
  "found" : true,
  "_source" : { //文档内容
    "name" : "wang"
  }
}

(4)修改

之前保存的时候用put和post可以修改,除了这两种方式还可以这样更新文档:
更新test_info索引下,类型为user,id为1的文档:

POST test_info/user/1/_update
{
  "doc": {
    "name": "li"
  }
}

结果如下:

{
  "_index" : "test_info",
  "_type" : "user",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

_update会对比文档,如果修改的文档和已索引文档一致,就不会更新,而put和post不带_update带id的更新方式不会对比文档直接更新文档:

{
  "_index" : "test_info",
  "_type" : "user",
  "_id" : "1",
  "_version" : 3,
  "result" : "noop", //无操作
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

(5)删除

删除文档:

DELETE test_info/user/1

删除索引:

DELETE test_info

(6)批量导入

语法格式(两行为一个整体):

{action:{metadata}}
{request body  }

{action:{metadata}}
{request body  }

例如:给索引为test_info,类型为user索引两个文档:

POST /test_info/user/_bulk
{"index":{"_id":1}}
{"name":"Join"}
{"index":{"_id":2}}
{"name":"Doe"}

结果如下:

{
  "took" : 929, //花费时间,毫秒
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test_info",
        "_type" : "user",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test_info",
        "_type" : "user",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

bulk复杂操作:

POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

运行结果:

{
  "took" : 1425,
  "errors" : false,
  "items" : [
    {
      "delete" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",	//删除的文档未找到
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created", //索引成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "JqzkC3wBHvnj4b2Hv2wn",
        "_version" : 1,
        "result" : "created", //索引成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated", //修改成功
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

5、进阶检索

测试数据:https://blog.csdn.net/projectNo/article/details/120414848
复制过来批量插入:
在这里插入图片描述

查看索引:

GET _cat/indices
yellow open bank                            T0LimmutSouXMeoqOSrQ1g 1 1 1000     0 372.6kb 372.6kb66.7kb

(1)检索文档

官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html

(2)search

search检索文档的两种方式:

1、 通过REST request uri 发送搜索参数 (uri +检索参数);

GET bank/_search?q=*&sort=account_number:asc
  • q=*:查询所有
  • sort:排序字段
  • asc:升序

2、通过REST request body发送参数(uri+请求体);

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

查询结果如下:
在这里插入图片描述

参数作用
took花费时间:单位毫秒
timed_out是否超时
_shards多少分片被搜索了,以及多少成功/失败的搜索分片
hits.max_score获取文档相关性最高得分
hits.total.value多少匹配文档被找到
hits.sort结果的排序key(列),没有的话按照score排序
hits._score相关得分

(3)from和size

默认情况下只返回前10个文档,如果要分页可以在请求中指定from和size参数:

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

from和size类似于MySQL中limit的用法。

(4)source

返回部分字段:

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["balance","firstname"]  
}

结果:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "firstname" : "Amber",
          "balance" : 39225
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "firstname" : "Hattie",
          "balance" : 5686
        }
      },
      省略......
    ]
  }
}

(5)query

1)match
上面用"query": { "match_all": {} },可以查询到所有文档,如果更复杂的匹配可以用match,如果是非字符串,会进行精确匹配。如果是字符串,会进行全文检索:
例如查询address中包含mill lane的文档:

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

全文检索最终会按照评分进行排序,会对检索条件进行分词匹配:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      省略......
    ]
  }
}

2)match_phrase
要匹配整个短语,不进行分词,可以使用match_phrase:

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

结果只匹配到1个文档,相关得分最大的是9.507477,它的address的值是198 Mill Lane,不会匹配到address包含mill或者address包含Lane的文档:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

3)match+keyword

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill" 
    }
  }
}

结果一条文档也没有检索到,文本字段的匹配如果使用keyword,匹配的条件就是要显示字段的全部值,要进行精确匹配的。
4)bool
要构造更复杂的查询,可以使用bool查询来组合多个查询条件。可以根据必须(must)匹配、应该(should)匹配或必须不(must_not)匹配指定标准。
查询年龄必须为40且state不为ID的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 43,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "474",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 474,
          "balance" : 35896,
          "firstname" : "Obrien",
          "lastname" : "Walton",
          "age" : 40,
          "gender" : "F",
          "address" : "192 Ide Court",
          "employer" : "Suremax",
          "email" : "obrienwalton@suremax.com",
          "city" : "Crucible",
          "state" : "UT"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "479",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 479,
          "balance" : 31865,
          "firstname" : "Cameron",
          "lastname" : "Ross",
          "age" : 40,
          "gender" : "M",
          "address" : "904 Bouck Court",
          "employer" : "Telpod",
          "email" : "cameronross@telpod.com",
          "city" : "Nord",
          "state" : "MO"
        }
      },
      省略......
    ]
  }
}

mustshould会影响相关性得分,分数越高,文档越符合搜索条件,默认Elasticsearch会根据得分由高到低返回文档;must_not子句中的条件被视为筛选器,它会影响文档是否包含在结果中,但不会影响文档的评分。
5)bool/filter
返回balance值在1000和2000之间的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 1000,
            "lte": 2000
          }
        }
      }
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "87",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 87,
          "balance" : 1133,
          "firstname" : "Hewitt",
          "lastname" : "Kidd",
          "age" : 22,
          "gender" : "M",
          "address" : "446 Halleck Street",
          "employer" : "Isologics",
          "email" : "hewittkidd@isologics.com",
          "city" : "Coalmont",
          "state" : "ME"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "417",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 417,
          "balance" : 1788,
          "firstname" : "Wheeler",
          "lastname" : "Ayers",
          "age" : 35,
          "gender" : "F",
          "address" : "677 Hope Street",
          "employer" : "Fortean",
          "email" : "wheelerayers@fortean.com",
          "city" : "Ironton",
          "state" : "PA"
        }
      },
      省略......
    ]
  }
}

filter不会影响相关性得分,但是会过滤结果。
6)term
和match类似,term也可以用属性检索,但是全文检索建议用match,而一些精确的字段,比如年龄、工资或者日期,非text字段使用term

GET bank/_search
{
  "query": {
    "term": {
      "account_number": 970
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

6、聚合 Aggregations

聚合框架是基于搜索查询,提供了从数据中分组和提取数据的能力,以构建复杂的数据摘要,类似于SQL Group by和SQL聚合函数。
在elasticsearch中,执行搜索返回hits(命中结果),并且同时返回聚合结果,把以响应中的所有hits(命中结果)分隔开的能力。这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和有效的API来避免网络往返。
聚合语法:

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的类型 -->
            <aggregation_body> <!--聚合体:对哪些字段进行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!---->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

聚合分为四种类型:

  • 指标聚合 Metrics Aggregations
  • 桶聚合 Bucket Aggregations
  • 矩阵聚合 Matrix Aggregations
  • 管道集合 Pipeline Aggregations

(1)指标聚合

对一个数据集求最大、最小、和或平均值等指标的聚合,Elasticsearch 7.5指标聚合类型如下:
在这里插入图片描述
1)max、min、sum和avg
最大值聚合:

GET bank/_search?size=0
{
  "aggs": { 
    "ageAgg": {  
      "max": {
        "field": "balance"
      }
    }
  }
}

size=0 不返回hits,结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "value" : 49989.0
    }
  }
}

2)有效文档计数 count

GET bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "value" : 1000
    }
  }
}

3)cardinality 值去重计数

GET bank/_search?size=0
{
  "aggs": {
    "age_cardinality": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_cardinality" : {
      "value" : 21
    }
  }
}

4)stats 统计 count、max、min、avg和sum 5个值

GET bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "stats": {
        "field": "age"
      }
    }
  }
}

结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "count" : 1000,
      "min" : 20.0,
      "max" : 40.0,
      "avg" : 30.171,
      "sum" : 30171.0
    }
  }
}

5)Percentiles 占比百分位对应的值统计

GET bank/_search?size=0
{
  "aggs": {
    "age_percentiles": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比,默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值,例如"50.0" : 31.0 age小于31的占比为50%,或者50%的age小于31。
结果:

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_percentiles" : {
      "values" : {
        "1.0" : 20.0,
        "5.0" : 21.0,
        "25.0" : 25.0,
        "50.0" : 31.0,
        "75.0" : 35.0,
        "95.0" : 39.0,
        "99.0" : 40.0
      }
    }
  }
}

6)Percentiles rank 统计值小于等于指定值的文档占比
例如:统计年龄小于30和35的文档的占比

GET bank/_search?size=0
{
  "aggs": {
    "aggs_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [30, 35]
      }
    }
  }
}

结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "aggs_perc_rank" : {
      "values" : {
        "30.0" : 49.0,
        "35.0" : 75.8
      }
    }
  }
}

其他指标聚合请参照官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-aggregations-metrics.html

(2)桶聚合

桶聚合不像指标聚合那样计算字段上的值,而是创建文档的Bucket,每个Bucket都与一个标准(取决于聚合类型)相关联,该标准确定当前上下文中的文档是否“落入”其中,换句话说,桶聚合有效地定义了文档集。
与指标聚合相反,桶聚合可以保存子聚合。这些子聚合将针对其“父”桶聚合创建的Bucket进行聚合。

(3)子聚合(基于聚合的结果集合)

例如不但统计年龄分布,还要统计年龄分布的平均工资:

GET bank/_search?size=0
{
  "query": {
    "match": {
      "state": "AK"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 5
      },
      "aggs": {
        "balanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

结果如下,每个区间都会统计平均工资:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 22,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 12,
      "buckets" : [
        {
          "key" : 20,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 41416.0
          }
        },
        {
          "key" : 26,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 14901.5
          }
        },
        {
          "key" : 33,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 32760.5
          }
        },
        {
          "key" : 36,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 14936.0
          }
        },
        {
          "key" : 37,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 16099.5
          }
        }
      ]
    }
  }
}

(4)复杂子聚合(各种套娃)

统计所有年龄分布,并且这些年龄段中gender为M的平均薪资和gender为F的平均薪资,以及这个年龄段的总体平均薪资:

GET bank/_search?size=0
{
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 5
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

注意文本字段应该用.keyword进行精确匹配,否则会报错,结果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 716,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "F",
                "doc_count" : 38,
                "balanceAvg" : {
                  "value" : 26348.684210526317
                }
              },
              {
                "key" : "M",
                "doc_count" : 22,
                "balanceAvg" : {
                  "value" : 23405.68181818182
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 25269.583333333332
          }
        },
        省略......
      ]
    }
  }
}

7、Mapping字段映射

官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping.html

(1)映射是定义文档及其包含的字段如何存储和索引的过程。例如,使用映射来定义:

  • 哪些字符串字段应视为全文字段。
  • 哪些字段包含数字、日期或地理位置。
  • 日期值的格式。

(2)字段数据类型

每个字段都有一个数据类型,例如:

简单的类型:text、keyword、date、long、double、boolean或者ip

一种支持JSON层次结构的类型,如object或nested。

或者是一种特殊类型,如geo_point、geo_shape或completion。

为不同的目的以不同的方式索引同一字段通常很有用。例如,字符串字段可以作为全文搜索的文本字段索引,也可以作为排序或聚合的关键字字段索引。可以使用标准分析器、英语分析器和法语分析器为字符串字段编制索引,也可以使用插件分词器,例如中文我们一般用IK分词器。
(3)查看映射

GET bank/_mapping

结果:

{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

mappings下面的properties就是我们的映射,例如account_number字段映射类型为long,address类型为text,text类型可以做全文检索,fields中keyword也可以用精确匹配。
(4)创建显示映射

注意:Elasticsearch 7.0之后移除了type,也就是说,索引下面直接保存文档,类型被废弃掉了。

PUT /my-index
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

执行,结果如下:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my-index"
}

再来查看这个索引的映射:

GET my-index/_mapping

结果:

{
  "my-index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}

(5)新增索引映射
在之前已有的索引上再加一个字段的索引:

PUT /my-index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false
    }
  }
}
{
  "acknowledged" : true
}

(6)修改索引映射
除了支持的映射参数外,不能更改现有字段的映射或字段类型,更改现有字段可能会使已编制索引的数据无效。
如果真的需要更改字段的映射,使用正确的映射创建新索引,并将数据重新索引到该索引中。
例如我们要修改my-index的email为text类型,可以这样操作:
1)先创建新索引映射:

PUT /my-index-new
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

2)数据迁移,source为之前索引,dest为新索引:

POST _reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-index-new"
  }
}

如果之前索引下面还有类型可以这样操作:

POST reindex
{
  "source":{
      "index":"bank",
      "type":"account"
   },
  "dest":{
      "index":"new-bank"
   }
}

3)删除老索引

DELETE my-index

8、文本分词

(1)分词器

文本分词是将非结构化文本(如正文或产品描述)转换为针对搜索进行优化的结构化格式的过程。何时配置文本分词:Elasticsearch在索引或搜索文本字段时执行文本分词。如果索引不包含文本字段,则无需进一步设置;但是,如果使用文本字段或文本搜索未按预期返回结果,则配置文本分词通常会有所帮助。
分词也是我们倒排索引的一个处理方式,例如whitespace tokenizer分词器,遇到空白字符时分割文本。它会将文本"Just do it."分割为[Just ,do ,it],

POST _analyze
{
  "analyzer": "standard",
  "text": "Just do it."
}

结果:

{
  "tokens" : [
    {
      "token" : "just",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "do",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "it",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

tokenizer(分词器)还负责记录各个terms(词条)的顺序或position位置(用于phrase短语和word proximity词近邻查询),以及term(词条)所代表的原始word(单词)的start(起始)和end(结束)的character offsets(字符串偏移量),用于高亮显示搜索的内容。

(2)中文词器插件

Elasticsearch中有很多分词器,但是我们中文分词一般都不适用,例如:

POST _analyze
{
  "analyzer": "standard",
  "text": "士兵突击特别篇报道"
}

结果:

{
  "tokens" : [
    {
      "token" : "士",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "兵",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "突",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "击",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "特",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "别",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "篇",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "报",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "道",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    }
  ]
}

按照每一个字分词这样很不友好,检索效率也不高,索引我们引入一个分词插件,IK分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases,下载对应版本并解压到Elasticsearch目录下的plugins下,重启Elasticsearch即可,再来尝试也是ik_smart分词中文:

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "士兵突击特别篇报道"
}

结果:

{
  "tokens" : [
    {
      "token" : "士兵",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "突击",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "特别篇",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "报道",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

还有一种最大化分词器ik_max_word:

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "士兵突击特别篇报道"
}

结果:

{
  "tokens" : [
    {
      "token" : "士兵",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "突击",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "特别篇",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "特别",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "篇",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "报道",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

(3)自定义分词器

如果以上还不能满足需求,那么IK分词插件还可以自定义词库,通过配置文件去扩展词汇或者访问其他服务器资源词库:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry> 
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后,需要重启elasticsearch容器,否则修改不生效。

9、SpringBoot整合Elasticsearch

说了这么终于上正菜
(1)新建maven工程,点击下一步
在这里插入图片描述
(2)工程起名,点击完成
在这里插入图片描述
还是来参考官方文档:https://www.elastic.co/guide/index.html
点击Elasticsearch Clients
在这里插入图片描述
点击Java REST Client
在这里插入图片描述
官方依赖:
在这里插入图片描述
(3)pom完整依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.2.7.RELEASE</version>
	</parent>
	<groupId>com.example</groupId>
	<artifactId>elasticsearch-demo</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>demo</name>
	<properties>
		<java.version>1.8</java.version>
		<elasticsearch.version>7.14.2</elasticsearch.version>
	</properties>
	<dependencies>
		<!-- web-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<!-- test-->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<!-- elasticsearch-->
		<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>elasticsearch-rest-high-level-client</artifactId>
			<version>7.14.2</version>
		</dependency>
		<!-- lombok-->
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
		</dependency>
		<!-- fastjson-->
		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>fastjson</artifactId>
			<version>1.2.60</version>
		</dependency>
	</dependencies>
</project>

(4)完善工程
创建启动类Application.java

@SpringBootApplication
public class Application {
	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}
}

新建测试类DemoApplicationTests.java

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {

}

(5)创建Elasticsearch配置类ESConfig.java
在这里插入图片描述

@Configuration
public class ESConfig {
    @Bean
    public RestHighLevelClient esRestClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

(6)工程结构
在这里插入图片描述

(1)Java Map保存一个文档

在这里插入图片描述
执行(同步)示例:
在这里插入图片描述

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {
	@Autowired
	ESConfig esConfig;
	@Test
	void indexDoc01() throws IOException {
		//实例一个Map
		Map<String, Object> jsonMap = new HashMap<>();
		//存放数据
		jsonMap.put("user", "kimchy");
		jsonMap.put("postDate", new Date());
		jsonMap.put("message", "trying out Elasticsearch");
		IndexRequest indexRequest = new IndexRequest("posts") //传入索引
				.id("1").source(jsonMap); //传入id和数据
		System.out.println(indexRequest.toString()); //输出索引请求
		IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT); //保存并获取结果
		System.out.println(index.toString()); //输出索引结果
	}
}

执行结果:

index {[posts][_doc][1], source[{"postDate":"2021-09-24T06:57:46.664Z","message":"trying out Elasticsearch","user":"kimchy"}]}
IndexResponse[index=posts,type=_doc,id=1,version=4,result=updated,seqNo=3,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

用Kibana查看保存的数据:

GET posts/_search

结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "posts",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "postDate" : "2021-09-24T09:52:40.490Z",
          "message" : "trying out Elasticsearch",
          "user" : "kimchy"
        }
      }
    ]
  }
}

(2)Java bean保存一个文档

如何把Java bean存进Elasticsearch中,例如有个员工实体,员工实体中有部门实体的信息:
新增两个实体类:
DepartmentDao.java

@Data
public class DepartmentDao {
    private Long id;
    private String departName;
}
UserDao.java
@Data
public class UserDao {
    private Long id;
    private String name;
    private int age;
    private String email;
    private DepartmentDao dept;
}

在这里插入图片描述
编写测试方法:

	@Test
	void indexDoc02() throws IOException {
		DepartmentDao department = new DepartmentDao();
		department.setDepartName("人事部");
		department.setId(1L);
		UserDao user = new UserDao();
		user.setDept(department);
		user.setAge(18);
		user.setEmail("12345678@qq.com");
		user.setName("张三");
		user.setId(1L);
		String jsonString = JSON.toJSONString(user);
		IndexRequest indexRequest = new IndexRequest("users")
				.id("1").source(jsonString, XContentType.JSON);
		System.out.println(indexRequest.toString());
		IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT);
		System.out.println(index.toString());
	}

运行输出:

index {[users][_doc][1], source[{"age":18,"dept":{"departName":"人事部","id":1},"email":"12345678@qq.com","id":1,"name":"张三"}]}
IndexResponse[index=users,type=_doc,id=1,version=1,result=updated,seqNo=1,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

用Kibana查看保存的数据:

GET users/_search

结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "users",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "age" : 18,
          "dept" : {
            "departName" : "人事部",
            "id" : 1
          },
          "email" : "12345678@qq.com",
          "id" : 1,
          "name" : "张三"
        }
      }
    ]
  }
}

(3)Java 检索索引文档

	@Test
	void find01() throws IOException {
		// 创建检索请求
		SearchRequest searchRequest = new SearchRequest();
		searchRequest.indices("bank");
		SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
		// 构造检索条件
//        sourceBuilder.query(); //检索
//        sourceBuilder.from(); //起始位置
//        sourceBuilder.size(); //获取数量
//        sourceBuilder.aggregation(); //聚合
		//构建terms聚合
		TermsAggregationBuilder agg1 = AggregationBuilders.terms("ageAgg").field("age").size(10);// 聚合名称和聚合文档数量
		// 参数为AggregationBuilder
		sourceBuilder.aggregation(agg1);
		sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
		System.out.println(sourceBuilder.toString());
		searchRequest.source(sourceBuilder);
		// 执行检索
		SearchResponse response = esConfig.esRestClient().search(searchRequest, RequestOptions.DEFAULT);
		// 分析响应结果
		System.out.println(response.toString());
	}

运行输出:

{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"ageAgg":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"lterms#ageAgg":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]}}}

我们格式化一下返回数据:

{
    "took":2,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":4,
            "relation":"eq"
        },
        "max_score":5.4032025,
        "hits":[
            {
                "_index":"bank",
                "_type":"account",
                "_id":"970",
                "_score":5.4032025,
                "_source":{
                    "account_number":970,
                    "balance":19648,
                    "firstname":"Forbes",
                    "lastname":"Wallace",
                    "age":28,
                    "gender":"M",
                    "address":"990 Mill Road",
                    "employer":"Pheast",
                    "email":"forbeswallace@pheast.com",
                    "city":"Lopezo",
                    "state":"AK"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"136",
                "_score":5.4032025,
                "_source":{
                    "account_number":136,
                    "balance":45801,
                    "firstname":"Winnie",
                    "lastname":"Holland",
                    "age":38,
                    "gender":"M",
                    "address":"198 Mill Lane",
                    "employer":"Neteria",
                    "email":"winnieholland@neteria.com",
                    "city":"Urie",
                    "state":"IL"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"345",
                "_score":5.4032025,
                "_source":{
                    "account_number":345,
                    "balance":9812,
                    "firstname":"Parker",
                    "lastname":"Hines",
                    "age":38,
                    "gender":"M",
                    "address":"715 Mill Avenue",
                    "employer":"Baluba",
                    "email":"parkerhines@baluba.com",
                    "city":"Blackgum",
                    "state":"KY"
                }
            },
            {
                "_index":"bank",
                "_type":"account",
                "_id":"472",
                "_score":5.4032025,
                "_source":{
                    "account_number":472,
                    "balance":25571,
                    "firstname":"Lee",
                    "lastname":"Long",
                    "age":32,
                    "gender":"F",
                    "address":"288 Mill Street",
                    "employer":"Comverges",
                    "email":"leelong@comverges.com",
                    "city":"Movico",
                    "state":"MT"
                }
            }
        ]
    },
    "aggregations":{
        "lterms#ageAgg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":38,
                    "doc_count":2
                },
                {
                    "key":28,
                    "doc_count":1
                },
                {
                    "key":32,
                    "doc_count":1
                }
            ]
        }
    }
}

但是获取的结果怎么转成Javabean:
创建实体类:

@Data
public class Account {
    private int accountNumber;
    private int balance;
    private String firstname;
    private String lastname;
    private int age;
    private String gender;
    private String address;
    private String employer;
    private String email;
    private String city;
    private String state;
}

在方法后面加上:

		// 获取java bean
		SearchHits hits = response.getHits();
		SearchHit[] hits1 = hits.getHits();
		for (SearchHit hit : hits1) {
			hit.getId();
			hit.getIndex();
			String sourceAsString = hit.getSourceAsString();
			Account account = JSON.parseObject(sourceAsString, Account.class);
			System.out.println(account);
		}

执行结果:

Account(accountNumber=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account(accountNumber=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account(accountNumber=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account(accountNumber=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)
  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值