elaticsearch
1. 安装elasticsearch
-
elasticsearch6.8.7:
es6.8.7下载地址 -
安装中文分词插件,在elasticsearch-6.8.7\bin目录下执行以下命令;
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6..8.7/elasticsearch-analysis-ik-6.8.7.zip
- start es
./bin/elasticsearch
2. 安装es 可视化界面Kibana
-
下载Kibana6.8.7
kibana下载地址 -
config kibana
//编辑kibana的配置文件
vim config/kibana.yml
elasticsearch.hosts=["http://localhost:9200"]
- start kibana
./bin/kibana
访问:http://localhost:5601即可
3. es常用命令
es : index -> type -> document -> field
3.1 集群状态查看
可以使用curl命令:
curl -XGET http://localhost:9200/_cat/health?v
//查看集群状态信息
http://localhost:9200/_cluster/stats?pretty
- 查看集群健康状态
GET /_cat/health?v
- 查看节点状态
GET /_cat/nodes?v
- 查看所有索引信息
GET /_cat/indices?v
3.2 索引操作
- 创建索引并查看
PUT /customer
GET /_cat/indices?v
- 删除索引并查看
DELETE /customer
GET /_cat/indices?v
3.3 类型操作
- 查看文档的类型
//格式: /索引名/类型/_mapping
GET /bank/account/_mapping
- 查看索引是否可被修改
GET 索引/_settings
"blocks": {
"read_only_allow_delete": "true"
},
- 修改索引配置
PUT 索引/_settings
{
"index.blocks.read_only_allow_delete": null
}
- 添加新字段
PUT /索引/_mapping
{
"properties": {
"hight": {
"type": "integer"
}
}
}
方式二
POST /索引/_update_by_query
{
"script": {
"source": "def a=ctx._source['ip'].lastIndexOf('.');def sec=ctx._source['ip'].substring(0,a);ctx._source['ipSection']=sec+'.0'"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "ip"
}
}
]
}
}
}
3.4 文档操作
- 在索引中添加文档
//格式:/index/文档名/文档编号
PUT /customer/doc/1
{
"name": "John Doe"
}
- 查看索引中的文档
GET /customer/doc/1
- 修改索引中的文档
POST /customer/doc/1/_update
{
"doc": { "name": "Jane Doe" }
}
- 删除索引中的文档
DELETE /customer/doc/1
- 对索引中的文档执行批量操作
POST /customer/doc/_bulk
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
- 定期删除索引数据
curl -XPOST -uxxx:xxx 'http://xxxx:9200/${索引名}/_delete_by_query' -H 'Content-Type:application/json' -d '{"query": {"bool": {"must": [{"range": {"@timestamp": {"lte": "now","format": "epoch_millis"}}}],"must_not": []}}}'
只会标记删除状态,等待es定期清理
3.5 数据搜索
数据准备:数据地址
POST /bank/account/_bulk
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}
{"index":{"_id":"18"}}
{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}
{"index":{"_id":"20"}}
{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}
{"index":{"_id":"25"}}
简单搜索
格式: /文档名/_search
- 最简单的搜索,使用match_all来表示:搜索全部
GET /bank/_search
{
"query": { "match_all": {} }
}
分页搜索
from表示偏移量,从0开始,size表示每页显示的数量
GET /bank/_search
{
"query": { "match_all": {} },
"from": 0,
"size": 10
}
排序搜索,使用sort表示
GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}
_source:搜索并返回指定字段内容
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
条件搜索
- 条件搜索,使用match表示匹配条件
GET /bank/_search
{
"query": {
"match": {
"account_number": 20
}
}
}
- 文本类型字段的条件搜索, 使用的是模糊匹配
GET /bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"_source": [
"address",
"account_number"
]
}
- 短语匹配搜索,使用match_phrase
GET /bank/_search
{
"query": {
"match_phrase": {
"address": "mill lane"
}
}
}
组合搜索
在一些非实时的分页查询,导出的场景,我们经常使用bool查询组合各种查询条件。
Bool查询包括四种子句,
must
filter
should
must_not
- 组合搜索,使用bool来进行组合,must表示同时满足
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 组合搜索,should表示满足其中任意一个
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 组合搜索,must_not表示同时不满足,例如搜索address字段中不包含mill且不包含lane的文档
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 组合搜索,组合must和must_not,例如搜索age字段等于40且state字段不包含ID的文档
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
过滤搜索
- 搜索过滤,使用filter来表示
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
//参考而已
GET /index/type/_search?pretty
{
"query": {
"bool":{
"filter":[
{
"bool":{
{
"must":[{
"match_phrase":{
"sessionId":{
"query":"2123",
"slop":0,
"zero_terms_query":"NONE",
"boost":1
}
}
}],
"adjust_pure_negative":true,
"boost":1
}
}
}
]
}
}
}
- Filters Aggregation 多个过滤组聚合计算
GET logs/_search
{
"size": 0,
"aggs": {
"messages": {
"filters": {
"filters": {
"errors": {
"match": {
"body": "error"
}
},
"warnings": {
"match": {
"body": "warning"
}
}
}
}
}
}
}
- 其他值组指定key
GET logs/_search
{
"size": 0,
"aggs": {
"messages": {
"filters": {
"other_bucket_key": "other_messages",
"filters": {
"errors": {
"match": {
"body": "error"
}
},
"warnings": {
"match": {
"body": "warning"
}
}
}
}
}
}
}
- term过滤
{
“query”:{
“term”:{
“hostname”:“activity.report”
}
}
}
//多条件过滤
“query”:{
“terms”:{
“status”:[
303,
304
]
}
}
- range过滤
lt:小于
lte:小于等于
gt:大于
gte:大于等于
查询年龄在10-20岁之间:
{
“query”:{
“range”:{
“age”:{
“gte”:20,
“lte”:30
}
}
}
}
- exists和missing过滤
exists:查询文档中是否包含某个字段。
missing:查询文档中没有某个字段。
{
“exits:”{
“field”:“title”
}
}
3.6 聚合
f.term搜索聚合
- 对搜索结果进行聚合,使用term来表示,类似于MySql中的group by
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
- 指定返回20个分组
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"size": 20
}
}
}
}
聚合缺失值处理
GET /_search
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"missing": "N/A"
}
}
}
}
聚合排序order
- order 指定分组的排序
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order" : { "_count" : "asc" }
}
}
}
}
- 根据分组值排序
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order" : { "_key" : "asc" }
}
}
}
}
- 分组指标值排序
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order": {
"max_balance": "asc"
}
},
"aggs": {
"max_balance": {
"max": {
"field": "balance"
}
},
"min_balance": {
"min": {
"field": "balance"
}
}
}
}
}
}
- 筛选分组-正则表达式匹配值
GET /_search
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"include" : ".*sport.*",
"exclude" : "water_.*"
}
}
}
}
- 筛选分组-指定值列表
GET /_search
{
"aggs" : {
"JapaneseCars" : {
"terms" : {
"field" : "make",
"include" : ["mazda", "honda"]
}
},
"ActiveCarManufacturers" : {
"terms" : {
"field" : "make",
"exclude" : ["rover", "jensen"]
}
}
}
}
- 根据脚本计算值分组
GET /_search
{
"aggs" : {
"genres" : {
"terms" : {
"script" : {
"source": "doc['genre'].value",
"lang": "painless"
}
}
}
}
}
嵌套聚合
例如对state字段进行聚合,统计出相同state的文档数量,再统计出balance的平均值
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
分段聚合range
- 例如分段范围为age字段的[20,30] [30,40] [40,50],之后按gender统计文档个数和balance的平均值
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
max min sum avg
POST /bank/_search?
{
"size": 0,
"aggs": {
"masssbalance": {
"max": {
"field": "balance"
}
}
}
}
- 查询年龄为24岁的客户中的余额最大值
POST /bank/_search?
{
"size": 2,
"query": {
"match": {
"age": 24
}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
],
"aggs": {
"max_balance": {
"max": {
"field": "balance"
}
}
}
}
- 值来源于脚本,查询所有客户的平均年龄是多少,并对平均年龄加
POST /bank/_search?size=0
{
"aggs": {
"avg_age": {
"avg": {
"script": {
"source": "doc.age.value"
}
}
},
"avg_age10": {
"avg": {
"script": {
"source": "doc.age.value + 10"
}
}
}
}
}
- 指定field,在脚本中用_value 取字段的值
POST /bank/_search?size=0
{
"aggs": {
"sum_balance": {
"sum": {
"field": "balance",
"script": {
"source": "_value * 1.03"
}
}
}
}
}
- 为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略。
POST /bank/_search?size=0
{
"aggs": {
"avg_age": {
"avg": {
"field": "age",
"missing": 18
}
}
}
}
Value count 统计某字段有值的文档数
POST /bank/_search?size=0
{
"aggs": {
"age_count": {
"value_count": {
"field": "age"
}
}
}
}
- cardinality 值去重计数
POST /bank/_search?size=0
{
"aggs": {
"age_count": {
"cardinality": {
"field": "age"
}
},
"state_count": {
"cardinality": {
"field": "state.keyword"
}
}
}
}
- stats 统计 count max min avg sum 5个值
POST /bank/_search?size=0
{
"aggs": {
"age_stats": {
"stats": {
"field": "age"
}
}
}
}
- Extended stats: 比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
POST /bank/_search?size=0
{
"aggs": {
"age_stats": {
"extended_stats": {
"field": "age"
}
}
}
}
- Percentiles 占比百分位对应的值统计
POST /bank/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "age"
}
}
}
}
//指定分值
POST /bank/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "age",
"percents" : [95, 99, 99.9]
}
}
}
}
- Percentiles rank 统计值小于等于指定值的文档占比:值不固定待验证
POST /bank/_search?size=0
{
"aggs": {
"gge_perc_rank": {
"percentile_ranks": {
"field": "age",
"values": [
25,
30
]
}
}
}
}
Range Aggregation 范围分组聚合
- 范围分组
POST /bank/_search?size=0
{
"aggs": {
"age_range": {
"range": {
"field": "age",
"ranges": [
{
"to": 25
},
{
"from": 25,
"to": 35
},
{
"from": 35
}
]
},
"aggs": {
"bmax": {
"max": {
"field": "balance"
}
}
}
}
}
}
Date Range Aggregation 时间范围分组聚合
POST /bank/_search?size=0
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{
"to": "now-10M/M"
},
{
"from": "now-10M/M"
}
]
}
}
}
}
Date Histogram Aggregation 时间直方图(柱状)聚合
可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。
POST /bank/_search?size=0
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"interval": "month"
}
}
}
}
Missing Aggregation 缺失值的桶聚合
POST /bank/_search?size=0
{
"aggs" : {
"account_without_a_age" : {
"missing" : { "field" : "age" }
}
}
}
spring boot es注解
- es 注解
@Document 作用在类,标记实体类为文档对象
包含属性
indexName:对应索引库名称
type:对应在索引库中的类型
shards:分片数量,默认5
replicas:副本数量,默认1
@Id 作用在成员变量,标记一个字段作为id主键
@Field 作用在成员变量,标记为文档的字段,并指定字段映射属性
包含属性
type:字段类型,是枚举:FieldType,可以是text、long、short、date、integer、object等
type属性名称 含义
text 存储数据时候,会自动分词,并生成索引
keyword 存储数据时候,不会分词建立索引
Numerical 数值类型,一类为基本数据类型:long、interger、short、byte、double、float、half_float 。一类为浮点数的高精度类型:scaled_float 需要指定一个精度因子,比如10或50,elasticsearch会把真实值乘以这个因子后存储,取出时再还原
Date日期类型 elasticsearch可以对日期格式化为字符串存储,但是建议我们存储为毫秒值,存储为long,节省空间
index:是否索引,布尔类型,默认是true
store:是否存储,布尔类型,默认是false
analyzer:分词器名称,这里的ik_max_word即使用ik分词器
注意事项
- 对于一个索引,除非重建索引否则不能调整主分片的数目 (number_of_shards),但可以随时调整 replica 的数目 (number_of_replicas)。