本文以Elasticsearch 7.5 官方文档为准,其他版本请查看官方文档
https://www.elastic.co/guide/index.html
文章目录
1、什么是Elasticsearch?
Elasticsearch是基于Elastic堆栈核心的分布式搜索和分析引擎。Logstash和Beats有助于收集、聚合和丰富数据,并将其存储在Elasticsearch中。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。无论对有结构化或非结构化文本、数字数据或地理空间数据,Elasticsearch都可以以支持快速搜索,以方式高效地存储和索引这些数据。可以简单的数据检索和聚合信息来分析数据的趋势和模式。随着数据量和查询量的增长,Elasticsearch的分布式特性使您的部署能够无缝地进行。
Elasticsearch可以在各种用例或场景中快速且灵活的处理数据:
- 在应用程序或者网站中用于搜索
- 存储和分析日志、指标和安全事件数据
- 使用机器学习自动实时建模数据
- 作为存储引擎自动化业务工作流
- 作为地理信息系统(GIS)管理、集成和分析空间信息
- 作为生物信息学研究工具存储和处理遗传数据
Elasticsearch是一个分布式的文档存储系统,不是将信息存储为列数据行,而是存储为序列化的JSON格式的文档,当集群中有多个Elasticsearch节点时,存储的文档分布在整个集群中,可以从任一节点快速访问。
当一个文件被存储时,它会被编入索引,并在秒级别以内进行完全所搜。Elasticsearch使用反向索引的数据结构,支持非常快的全文搜索,用倒排索引列出所有文档中出现的每个唯一单词,并标识每个单词出现在哪些文档,这个索引可以看做是优化文档的集合,每个文档都是字段的集合,每个字段包含数据的键值对。
Elasticsearch索引每个字段中的所有数据,不同的索引字段有不同的数据存储结构,例如文本存储在反向索引中,数字和地理字段存储在BKD树中。Elasticsearch 之所以查找速度快是和这些数据结构有很大的关系。
Elasticsearch 底层都是依赖于Apache Lucene,Elasticsearch 封装了Lucene提供了简单的REST API支持结构化查询、全文查询和结合这两者的复杂查询。 Lucene 能实现全文搜索主要是因为它实现了倒排索引的查询结构。
(1)倒排索引
倒排索引也是索引,既然是索引那就是快速检索所需数据,倒排索引的原理是通过分词器把文档分割成单独的不重复的词,将这些词排序成列表,标记每个词出现在哪些文档。例如有下面几个文档,内容如下:
- 士兵突击
- 士兵突击特别篇
- 士兵侦察
- 士兵突击特别篇报道
词 | 记录 |
---|---|
士兵 | 1,2,3,4 |
突击 | 1,2,4 |
特别篇 | 2,4 |
侦察 | 3 |
报道 | 4 |
这时我们查找“士兵突击”的时候就会先把查找内容分词,然后查找对应的词,再去找到对应的记录;如果查找“士兵特别篇”,查找不到完全对应的内容,就会根据查找相关性得分从高到低返回匹配的的词的记录。
这种结构由文档中所有不重复词的列表构成,对于其中每个词都有一个文档列表与之关联。这种由属性值来确定记录的位置的结构就是倒排索引。带有倒排索引的文件我们称为倒排文件。
2、基本概念
- 索引(名词)
类似于传统关系型数据库中的一个数据库,是存储文档的地方。
- 索引(动词)
索引一个文档就是存储一个文档到索引(名词)中。
- 文档
Elasticsearch中的主要实体数据叫文档。
3、安装Elasticsearch和kibana
https://www.elastic.co/start
下载完成在bin路径下启动kibana(http://127.0.0.1:5601)和Elasticsearch(http://127.0.0.1:9200)
kibana是一个Elasticsearch的可视化界面,我们打开控制台来发送一些请求,如下步骤kibana => Management => Dev tools => Console:
4、检索常用命令
(1)查看_cat
-
GET /_cat/nodes 查看所有节点
-
GET /_cat/health 查看健康状态
-
GET /_cat/master 查看主节点
-
GET /_cat/indices 查看所有节点信息
例如:
GET _cat/nodes
输出:
127.0.0.1 19 29 2 cdfhilmrstw * P3951098A244
(2)索引(保存)文档
PUT方式索引和修改:
给索引为test_info添加,类别user,id为1的文档:
PUT /test_info/user/1
{
"name": "wang"
}
输出:
{
"_index": "test_info",
"_type": "user",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
result为created,此时为索引文档,如果相同的请求再发送一次输出:
{
"_index": "test_info",
"_type": "user",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
result编程和updated,且版本(_version)和序列号(_seq_no)都变成了2,此时是更新文档,如果当前id文档不存在就是索引文档,反之就是更新。
POST方式索引和修改:
POST /test_info/user
{
"name": "zhang"
}
POST修改可以不传id,PUT方式必须传id,POST不传id会自动生成一个id然后索引文档,如果传id处理方式和PUT一致。
(3)查询
查询文档:获取索引为test_info,类型为user,id为1的文档
GET /test_info/user/1
结果如下
{
"_index" : "test_info", //文档所在索引
"_type" : "user", //文档类型
"_id" : "1", //id
"_version" : 2, //版本号
"_seq_no" : 2, //并发版本控制,每次更新会+1,用来做乐观锁
"_primary_term" : 1, //同上,主分片重新分配,重启就会变化
"found" : true,
"_source" : { //文档内容
"name" : "wang"
}
}
(4)修改
之前保存的时候用put和post可以修改,除了这两种方式还可以这样更新文档:
更新test_info索引下,类型为user,id为1的文档:
POST test_info/user/1/_update
{
"doc": {
"name": "li"
}
}
结果如下:
{
"_index" : "test_info",
"_type" : "user",
"_id" : "1",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
_update会对比文档,如果修改的文档和已索引文档一致,就不会更新,而put和post不带_update带id的更新方式不会对比文档直接更新文档:
{
"_index" : "test_info",
"_type" : "user",
"_id" : "1",
"_version" : 3,
"result" : "noop", //无操作
"_shards" : {
"total" : 0,
"successful" : 0,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
(5)删除
删除文档:
DELETE test_info/user/1
删除索引:
DELETE test_info
(6)批量导入
语法格式(两行为一个整体):
{action:{metadata}}
{request body }
{action:{metadata}}
{request body }
例如:给索引为test_info,类型为user索引两个文档:
POST /test_info/user/_bulk
{"index":{"_id":1}}
{"name":"Join"}
{"index":{"_id":2}}
{"name":"Doe"}
结果如下:
{
"took" : 929, //花费时间,毫秒
"errors" : false,
"items" : [
{
"index" : {
"_index" : "test_info",
"_type" : "user",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "test_info",
"_type" : "user",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
}
]
}
bulk复杂操作:
POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
运行结果:
{
"took" : 1425,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 1,
"result" : "not_found", //删除的文档未找到
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 404
}
},
{
"create" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"result" : "created", //索引成功
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "website",
"_type" : "blog",
"_id" : "JqzkC3wBHvnj4b2Hv2wn",
"_version" : 1,
"result" : "created", //索引成功
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 201
}
},
{
"update" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 3,
"result" : "updated", //修改成功
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1,
"status" : 200
}
}
]
}
5、进阶检索
测试数据:https://blog.csdn.net/projectNo/article/details/120414848
复制过来批量插入:
查看索引:
GET _cat/indices
yellow open bank T0LimmutSouXMeoqOSrQ1g 1 1 1000 0 372.6kb 372.6kb66.7kb
(1)检索文档
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html
(2)search
search检索文档的两种方式:
1、 通过REST request uri 发送搜索参数 (uri +检索参数);
GET bank/_search?q=*&sort=account_number:asc
- q=*:查询所有
- sort:排序字段
- asc:升序
2、通过REST request body发送参数(uri+请求体);
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
查询结果如下:
参数 | 作用 |
---|---|
took | 花费时间:单位毫秒 |
timed_out | 是否超时 |
_shards | 多少分片被搜索了,以及多少成功/失败的搜索分片 |
hits.max_score | 获取文档相关性最高得分 |
hits.total.value | 多少匹配文档被找到 |
hits.sort | 结果的排序key(列),没有的话按照score排序 |
hits._score | 相关得分 |
(3)from和size
默认情况下只返回前10个文档,如果要分页可以在请求中指定from和size参数:
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
from和size类似于MySQL中limit的用法。
(4)source
返回部分字段:
GET bank/_search
{
"query": {
"match_all": {}
},
"_source": ["balance","firstname"]
}
结果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"firstname" : "Amber",
"balance" : 39225
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"firstname" : "Hattie",
"balance" : 5686
}
},
省略......
]
}
}
(5)query
1)match
上面用"query": { "match_all": {} },
可以查询到所有文档,如果更复杂的匹配可以用match,如果是非字符串,会进行精确匹配。如果是字符串,会进行全文检索:
例如查询address中包含mill lane
的文档:
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
全文检索最终会按照评分进行排序,会对检索条件进行分词匹配:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 9.507477,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 9.507477,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
省略......
]
}
}
2)match_phrase
要匹配整个短语,不进行分词,可以使用match_phrase:
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
结果只匹配到1个文档,相关得分最大的是9.507477,它的address的值是198 Mill Lane
,不会匹配到address包含mill
或者address包含Lane
的文档:
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 9.507477,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 9.507477,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
}
]
}
}
3)match+keyword
GET bank/_search
{
"query": {
"match": {
"address.keyword": "990 Mill"
}
}
}
结果一条文档也没有检索到,文本字段的匹配如果使用keyword,匹配的条件就是要显示字段的全部值,要进行精确匹配的。
4)bool
要构造更复杂的查询,可以使用bool查询来组合多个查询条件。可以根据必须(must)匹配、应该(should)匹配或必须不(must_not)匹配指定标准。
查询年龄必须为40且state
不为ID
的文档
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 43,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "474",
"_score" : 1.0,
"_source" : {
"account_number" : 474,
"balance" : 35896,
"firstname" : "Obrien",
"lastname" : "Walton",
"age" : 40,
"gender" : "F",
"address" : "192 Ide Court",
"employer" : "Suremax",
"email" : "obrienwalton@suremax.com",
"city" : "Crucible",
"state" : "UT"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "479",
"_score" : 1.0,
"_source" : {
"account_number" : 479,
"balance" : 31865,
"firstname" : "Cameron",
"lastname" : "Ross",
"age" : 40,
"gender" : "M",
"address" : "904 Bouck Court",
"employer" : "Telpod",
"email" : "cameronross@telpod.com",
"city" : "Nord",
"state" : "MO"
}
},
省略......
]
}
}
must
和should
会影响相关性得分,分数越高,文档越符合搜索条件,默认Elasticsearch会根据得分由高到低返回文档;must_not
子句中的条件被视为筛选器,它会影响文档是否包含在结果中,但不会影响文档的评分。
5)bool/filter
返回balance
值在1000和2000之间的文档
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 1000,
"lte": 2000
}
}
}
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "87",
"_score" : 1.0,
"_source" : {
"account_number" : 87,
"balance" : 1133,
"firstname" : "Hewitt",
"lastname" : "Kidd",
"age" : 22,
"gender" : "M",
"address" : "446 Halleck Street",
"employer" : "Isologics",
"email" : "hewittkidd@isologics.com",
"city" : "Coalmont",
"state" : "ME"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "417",
"_score" : 1.0,
"_source" : {
"account_number" : 417,
"balance" : 1788,
"firstname" : "Wheeler",
"lastname" : "Ayers",
"age" : 35,
"gender" : "F",
"address" : "677 Hope Street",
"employer" : "Fortean",
"email" : "wheelerayers@fortean.com",
"city" : "Ironton",
"state" : "PA"
}
},
省略......
]
}
}
filter不会影响相关性得分,但是会过滤结果。
6)term
和match类似,term也可以用属性检索,但是全文检索建议用match,而一些精确的字段,比如年龄、工资或者日期,非text字段使用term
GET bank/_search
{
"query": {
"term": {
"account_number": 970
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 1.0,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
}
]
}
}
6、聚合 Aggregations
聚合框架是基于搜索查询,提供了从数据中分组和提取数据的能力,以构建复杂的数据摘要,类似于SQL Group by和SQL聚合函数。
在elasticsearch中,执行搜索返回hits(命中结果),并且同时返回聚合结果,把以响应中的所有hits(命中结果)分隔开的能力。这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和有效的API来避免网络往返。
聚合语法:
"aggregations" : {
"<aggregation_name>" : { <!--聚合的名字 -->
"<aggregation_type>" : { <!--聚合的类型 -->
<aggregation_body> <!--聚合体:对哪些字段进行聚合 -->
}
[,"meta" : { [<meta_data_body>] } ]? <!--元 -->
[,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
}
[,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}
聚合分为四种类型:
- 指标聚合 Metrics Aggregations
- 桶聚合 Bucket Aggregations
- 矩阵聚合 Matrix Aggregations
- 管道集合 Pipeline Aggregations
(1)指标聚合
对一个数据集求最大、最小、和或平均值等指标的聚合,Elasticsearch 7.5指标聚合类型如下:
1)max、min、sum和avg
最大值聚合:
GET bank/_search?size=0
{
"aggs": {
"ageAgg": {
"max": {
"field": "balance"
}
}
}
}
size=0 不返回hits,结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"value" : 49989.0
}
}
}
2)有效文档计数 count
GET bank/_search?size=0
{
"aggs": {
"age_count": {
"value_count": {
"field": "age"
}
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"age_count" : {
"value" : 1000
}
}
}
3)cardinality 值去重计数
GET bank/_search?size=0
{
"aggs": {
"age_cardinality": {
"cardinality": {
"field": "age"
}
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"age_cardinality" : {
"value" : 21
}
}
}
4)stats 统计 count、max、min、avg和sum 5个值
GET bank/_search?size=0
{
"aggs": {
"age_count": {
"stats": {
"field": "age"
}
}
}
}
结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"age_count" : {
"count" : 1000,
"min" : 20.0,
"max" : 40.0,
"avg" : 30.171,
"sum" : 30171.0
}
}
}
5)Percentiles 占比百分位对应的值统计
GET bank/_search?size=0
{
"aggs": {
"age_percentiles": {
"percentiles": {
"field": "age"
}
}
}
}
对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比,默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值,例如"50.0" : 31.0
age小于31的占比为50%,或者50%的age小于31。
结果:
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"age_percentiles" : {
"values" : {
"1.0" : 20.0,
"5.0" : 21.0,
"25.0" : 25.0,
"50.0" : 31.0,
"75.0" : 35.0,
"95.0" : 39.0,
"99.0" : 40.0
}
}
}
}
6)Percentiles rank 统计值小于等于指定值的文档占比
例如:统计年龄小于30和35的文档的占比
GET bank/_search?size=0
{
"aggs": {
"aggs_perc_rank": {
"percentile_ranks": {
"field": "age",
"values": [30, 35]
}
}
}
}
结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"aggs_perc_rank" : {
"values" : {
"30.0" : 49.0,
"35.0" : 75.8
}
}
}
}
其他指标聚合请参照官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-aggregations-metrics.html
(2)桶聚合
桶聚合不像指标聚合那样计算字段上的值,而是创建文档的Bucket,每个Bucket都与一个标准(取决于聚合类型)相关联,该标准确定当前上下文中的文档是否“落入”其中,换句话说,桶聚合有效地定义了文档集。
与指标聚合相反,桶聚合可以保存子聚合。这些子聚合将针对其“父”桶聚合创建的Bucket进行聚合。
(3)子聚合(基于聚合的结果集合)
例如不但统计年龄分布,还要统计年龄分布的平均工资:
GET bank/_search?size=0
{
"query": {
"match": {
"state": "AK"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 5
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
}
}
结果如下,每个区间都会统计平均工资:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 22,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 12,
"buckets" : [
{
"key" : 20,
"doc_count" : 2,
"balanceAvg" : {
"value" : 41416.0
}
},
{
"key" : 26,
"doc_count" : 2,
"balanceAvg" : {
"value" : 14901.5
}
},
{
"key" : 33,
"doc_count" : 2,
"balanceAvg" : {
"value" : 32760.5
}
},
{
"key" : 36,
"doc_count" : 2,
"balanceAvg" : {
"value" : 14936.0
}
},
{
"key" : 37,
"doc_count" : 2,
"balanceAvg" : {
"value" : 16099.5
}
}
]
}
}
}
(4)复杂子聚合(各种套娃)
统计所有年龄分布,并且这些年龄段中gender为M的平均薪资和gender为F的平均薪资,以及这个年龄段的总体平均薪资:
GET bank/_search?size=0
{
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 5
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBalanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
}
}
注意文本字段应该用.keyword进行精确匹配,否则会报错,结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 716,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
},
"ageBalanceAvg" : {
"value" : 28312.918032786885
}
},
{
"key" : 39,
"doc_count" : 60,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "F",
"doc_count" : 38,
"balanceAvg" : {
"value" : 26348.684210526317
}
},
{
"key" : "M",
"doc_count" : 22,
"balanceAvg" : {
"value" : 23405.68181818182
}
}
]
},
"ageBalanceAvg" : {
"value" : 25269.583333333332
}
},
省略......
]
}
}
}
7、Mapping字段映射
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping.html
(1)映射是定义文档及其包含的字段如何存储和索引的过程。例如,使用映射来定义:
- 哪些字符串字段应视为全文字段。
- 哪些字段包含数字、日期或地理位置。
- 日期值的格式。
(2)字段数据类型
每个字段都有一个数据类型,例如:
简单的类型:text、keyword、date、long、double、boolean或者ip
一种支持JSON层次结构的类型,如object或nested。
或者是一种特殊类型,如geo_point、geo_shape或completion。
为不同的目的以不同的方式索引同一字段通常很有用。例如,字符串字段可以作为全文搜索的文本字段索引,也可以作为排序或聚合的关键字字段索引。可以使用标准分析器、英语分析器和法语分析器为字符串字段编制索引,也可以使用插件分词器,例如中文我们一般用IK分词器。
(3)查看映射
GET bank/_mapping
结果:
{
"bank" : {
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"employer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
mappings下面的properties就是我们的映射,例如account_number字段映射类型为long,address类型为text,text类型可以做全文检索,fields中keyword也可以用精确匹配。
(4)创建显示映射
注意:Elasticsearch 7.0之后移除了type,也就是说,索引下面直接保存文档,类型被废弃掉了。
PUT /my-index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
执行,结果如下:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my-index"
}
再来查看这个索引的映射:
GET my-index/_mapping
结果:
{
"my-index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"name" : {
"type" : "text"
}
}
}
}
}
(5)新增索引映射
在之前已有的索引上再加一个字段的索引:
PUT /my-index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": false
}
}
}
{
"acknowledged" : true
}
(6)修改索引映射
除了支持的映射参数外,不能更改现有字段的映射或字段类型,更改现有字段可能会使已编制索引的数据无效。
如果真的需要更改字段的映射,使用正确的映射创建新索引,并将数据重新索引到该索引中。
例如我们要修改my-index的email为text类型,可以这样操作:
1)先创建新索引映射:
PUT /my-index-new
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
2)数据迁移,source为之前索引,dest为新索引:
POST _reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-new"
}
}
如果之前索引下面还有类型可以这样操作:
POST reindex
{
"source":{
"index":"bank",
"type":"account"
},
"dest":{
"index":"new-bank"
}
}
3)删除老索引
DELETE my-index
8、文本分词
(1)分词器
文本分词是将非结构化文本(如正文或产品描述)转换为针对搜索进行优化的结构化格式的过程。何时配置文本分词:Elasticsearch在索引或搜索文本字段时执行文本分词。如果索引不包含文本字段,则无需进一步设置;但是,如果使用文本字段或文本搜索未按预期返回结果,则配置文本分词通常会有所帮助。
分词也是我们倒排索引的一个处理方式,例如whitespace tokenizer分词器,遇到空白字符时分割文本。它会将文本"Just do it."分割为[Just ,do ,it],
POST _analyze
{
"analyzer": "standard",
"text": "Just do it."
}
结果:
{
"tokens" : [
{
"token" : "just",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "do",
"start_offset" : 5,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "it",
"start_offset" : 8,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
tokenizer(分词器)还负责记录各个terms(词条)的顺序或position位置(用于phrase短语和word proximity词近邻查询),以及term(词条)所代表的原始word(单词)的start(起始)和end(结束)的character offsets(字符串偏移量),用于高亮显示搜索的内容。
(2)中文词器插件
Elasticsearch中有很多分词器,但是我们中文分词一般都不适用,例如:
POST _analyze
{
"analyzer": "standard",
"text": "士兵突击特别篇报道"
}
结果:
{
"tokens" : [
{
"token" : "士",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "兵",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "突",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "击",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "特",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "别",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "篇",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 6
},
{
"token" : "报",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<IDEOGRAPHIC>",
"position" : 7
},
{
"token" : "道",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 8
}
]
}
按照每一个字分词这样很不友好,检索效率也不高,索引我们引入一个分词插件,IK分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases,下载对应版本并解压到Elasticsearch目录下的plugins下,重启Elasticsearch即可,再来尝试也是ik_smart分词中文:
POST _analyze
{
"analyzer": "ik_smart",
"text": "士兵突击特别篇报道"
}
结果:
{
"tokens" : [
{
"token" : "士兵",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "突击",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "特别篇",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "报道",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
}
]
}
还有一种最大化分词器ik_max_word:
POST _analyze
{
"analyzer": "ik_max_word",
"text": "士兵突击特别篇报道"
}
结果:
{
"tokens" : [
{
"token" : "士兵",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "突击",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "特别篇",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "特别",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "篇",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "报道",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 5
}
]
}
(3)自定义分词器
如果以上还不能满足需求,那么IK分词插件还可以自定义词库,通过配置文件去扩展词汇或者访问其他服务器资源词库:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
修改完成后,需要重启elasticsearch容器,否则修改不生效。
9、SpringBoot整合Elasticsearch
说了这么终于上正菜
(1)新建maven工程,点击下一步
(2)工程起名,点击完成
还是来参考官方文档:https://www.elastic.co/guide/index.html
点击Elasticsearch Clients
点击Java REST Client
官方依赖:
(3)pom完整依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.7.RELEASE</version>
</parent>
<groupId>com.example</groupId>
<artifactId>elasticsearch-demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>demo</name>
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.14.2</elasticsearch.version>
</properties>
<dependencies>
<!-- web-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- test-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- elasticsearch-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.14.2</version>
</dependency>
<!-- lombok-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<!-- fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.60</version>
</dependency>
</dependencies>
</project>
(4)完善工程
创建启动类Application.java
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
新建测试类DemoApplicationTests.java
@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {
}
(5)创建Elasticsearch配置类ESConfig.java
@Configuration
public class ESConfig {
@Bean
public RestHighLevelClient esRestClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http")));
return client;
}
}
(6)工程结构
(1)Java Map保存一个文档
执行(同步)示例:
@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
class DemoApplicationTests {
@Autowired
ESConfig esConfig;
@Test
void indexDoc01() throws IOException {
//实例一个Map
Map<String, Object> jsonMap = new HashMap<>();
//存放数据
jsonMap.put("user", "kimchy");
jsonMap.put("postDate", new Date());
jsonMap.put("message", "trying out Elasticsearch");
IndexRequest indexRequest = new IndexRequest("posts") //传入索引
.id("1").source(jsonMap); //传入id和数据
System.out.println(indexRequest.toString()); //输出索引请求
IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT); //保存并获取结果
System.out.println(index.toString()); //输出索引结果
}
}
执行结果:
index {[posts][_doc][1], source[{"postDate":"2021-09-24T06:57:46.664Z","message":"trying out Elasticsearch","user":"kimchy"}]}
IndexResponse[index=posts,type=_doc,id=1,version=4,result=updated,seqNo=3,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
用Kibana查看保存的数据:
GET posts/_search
结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "posts",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"postDate" : "2021-09-24T09:52:40.490Z",
"message" : "trying out Elasticsearch",
"user" : "kimchy"
}
}
]
}
}
(2)Java bean保存一个文档
如何把Java bean存进Elasticsearch中,例如有个员工实体,员工实体中有部门实体的信息:
新增两个实体类:
DepartmentDao.java
@Data
public class DepartmentDao {
private Long id;
private String departName;
}
UserDao.java
@Data
public class UserDao {
private Long id;
private String name;
private int age;
private String email;
private DepartmentDao dept;
}
编写测试方法:
@Test
void indexDoc02() throws IOException {
DepartmentDao department = new DepartmentDao();
department.setDepartName("人事部");
department.setId(1L);
UserDao user = new UserDao();
user.setDept(department);
user.setAge(18);
user.setEmail("12345678@qq.com");
user.setName("张三");
user.setId(1L);
String jsonString = JSON.toJSONString(user);
IndexRequest indexRequest = new IndexRequest("users")
.id("1").source(jsonString, XContentType.JSON);
System.out.println(indexRequest.toString());
IndexResponse index = esConfig.esRestClient().index(indexRequest, RequestOptions.DEFAULT);
System.out.println(index.toString());
}
运行输出:
index {[users][_doc][1], source[{"age":18,"dept":{"departName":"人事部","id":1},"email":"12345678@qq.com","id":1,"name":"张三"}]}
IndexResponse[index=users,type=_doc,id=1,version=1,result=updated,seqNo=1,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
用Kibana查看保存的数据:
GET users/_search
结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"age" : 18,
"dept" : {
"departName" : "人事部",
"id" : 1
},
"email" : "12345678@qq.com",
"id" : 1,
"name" : "张三"
}
}
]
}
}
(3)Java 检索索引文档
@Test
void find01() throws IOException {
// 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
// sourceBuilder.query(); //检索
// sourceBuilder.from(); //起始位置
// sourceBuilder.size(); //获取数量
// sourceBuilder.aggregation(); //聚合
//构建terms聚合
TermsAggregationBuilder agg1 = AggregationBuilders.terms("ageAgg").field("age").size(10);// 聚合名称和聚合文档数量
// 参数为AggregationBuilder
sourceBuilder.aggregation(agg1);
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
System.out.println(sourceBuilder.toString());
searchRequest.source(sourceBuilder);
// 执行检索
SearchResponse response = esConfig.esRestClient().search(searchRequest, RequestOptions.DEFAULT);
// 分析响应结果
System.out.println(response.toString());
}
运行输出:
{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"ageAgg":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"lterms#ageAgg":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]}}}
我们格式化一下返回数据:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":4,
"relation":"eq"
},
"max_score":5.4032025,
"hits":[
{
"_index":"bank",
"_type":"account",
"_id":"970",
"_score":5.4032025,
"_source":{
"account_number":970,
"balance":19648,
"firstname":"Forbes",
"lastname":"Wallace",
"age":28,
"gender":"M",
"address":"990 Mill Road",
"employer":"Pheast",
"email":"forbeswallace@pheast.com",
"city":"Lopezo",
"state":"AK"
}
},
{
"_index":"bank",
"_type":"account",
"_id":"136",
"_score":5.4032025,
"_source":{
"account_number":136,
"balance":45801,
"firstname":"Winnie",
"lastname":"Holland",
"age":38,
"gender":"M",
"address":"198 Mill Lane",
"employer":"Neteria",
"email":"winnieholland@neteria.com",
"city":"Urie",
"state":"IL"
}
},
{
"_index":"bank",
"_type":"account",
"_id":"345",
"_score":5.4032025,
"_source":{
"account_number":345,
"balance":9812,
"firstname":"Parker",
"lastname":"Hines",
"age":38,
"gender":"M",
"address":"715 Mill Avenue",
"employer":"Baluba",
"email":"parkerhines@baluba.com",
"city":"Blackgum",
"state":"KY"
}
},
{
"_index":"bank",
"_type":"account",
"_id":"472",
"_score":5.4032025,
"_source":{
"account_number":472,
"balance":25571,
"firstname":"Lee",
"lastname":"Long",
"age":32,
"gender":"F",
"address":"288 Mill Street",
"employer":"Comverges",
"email":"leelong@comverges.com",
"city":"Movico",
"state":"MT"
}
}
]
},
"aggregations":{
"lterms#ageAgg":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":38,
"doc_count":2
},
{
"key":28,
"doc_count":1
},
{
"key":32,
"doc_count":1
}
]
}
}
}
但是获取的结果怎么转成Javabean:
创建实体类:
@Data
public class Account {
private int accountNumber;
private int balance;
private String firstname;
private String lastname;
private int age;
private String gender;
private String address;
private String employer;
private String email;
private String city;
private String state;
}
在方法后面加上:
// 获取java bean
SearchHits hits = response.getHits();
SearchHit[] hits1 = hits.getHits();
for (SearchHit hit : hits1) {
hit.getId();
hit.getIndex();
String sourceAsString = hit.getSourceAsString();
Account account = JSON.parseObject(sourceAsString, Account.class);
System.out.println(account);
}
执行结果:
Account(accountNumber=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account(accountNumber=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account(accountNumber=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account(accountNumber=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)