es参考手册

1. 安装elasticsearch

  • elasticsearch6.8.7:
    es6.8.7下载地址

  • 安装中文分词插件,在elasticsearch-6.8.7\bin目录下执行以下命令;

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6..8.7/elasticsearch-analysis-ik-6.8.7.zip
  • start es
./bin/elasticsearch

2. 安装es 可视化界面Kibana

//编辑kibana的配置文件
vim config/kibana.yml
elasticsearch.hosts=["http://localhost:9200"]
  • start kibana
./bin/kibana

访问:http://localhost:5601即可

3. es常用命令

es : index -> type -> document -> field

3.1 集群状态查看

可以使用curl命令:

curl -XGET http://localhost:9200/_cat/health?v
//查看集群状态信息
http://localhost:9200/_cluster/stats?pretty
  • 查看集群健康状态
GET /_cat/health?v
  • 查看节点状态
GET /_cat/nodes?v
  • 查看所有索引信息
GET /_cat/indices?v

3.2 索引操作

  • 创建索引并查看
PUT /customer

GET /_cat/indices?v
  • 删除索引并查看
DELETE /customer
GET /_cat/indices?v

3.3 类型操作

  • 查看文档的类型
//格式: /索引名/类型/_mapping
GET /bank/account/_mapping
  • 查看索引是否可被修改
GET 索引/_settings

"blocks": {
          "read_only_allow_delete": "true"
        },
  • 修改索引配置
PUT 索引/_settings
{
"index.blocks.read_only_allow_delete": null
}
  • 添加新字段
PUT /索引/_mapping
{
    "properties": {
        "hight": {
            "type": "integer"
        }
    }
}

方式二
POST /索引/_update_by_query
{
  "script": {
    "source": "def a=ctx._source['ip'].lastIndexOf('.');def sec=ctx._source['ip'].substring(0,a);ctx._source['ipSection']=sec+'.0'"
  },
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "ip"
          }
        }
      ]
    }
  }
}

3.4 文档操作

  • 在索引中添加文档
//格式:/index/文档名/文档编号
PUT /customer/doc/1
{
  "name": "John Doe"
}
  • 查看索引中的文档
GET /customer/doc/1
  • 修改索引中的文档
POST /customer/doc/1/_update
{
  "doc": { "name": "Jane Doe" }
}
  • 删除索引中的文档
DELETE /customer/doc/1
  • 对索引中的文档执行批量操作
POST /customer/doc/_bulk
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
  • 定期删除索引数据
curl -XPOST -uxxx:xxx 'http://xxxx:9200/${索引名}/_delete_by_query' -H 'Content-Type:application/json' -d '{"query": {"bool": {"must": [{"range": {"@timestamp": {"lte": "now","format": "epoch_millis"}}}],"must_not": []}}}'

只会标记删除状态,等待es定期清理

3.5 数据搜索

数据准备:数据地址
POST /bank/account/_bulk
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}
{"index":{"_id":"18"}}
{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}
{"index":{"_id":"20"}}
{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}
{"index":{"_id":"25"}}
简单搜索

格式: /文档名/_search

  • 最简单的搜索,使用match_all来表示:搜索全部
GET /bank/_search
{
  "query": { "match_all": {} }
}
分页搜索

from表示偏移量,从0开始,size表示每页显示的数量

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 10
}
排序搜索,使用sort表示
GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}
_source:搜索并返回指定字段内容
GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}
条件搜索
  • 条件搜索,使用match表示匹配条件
GET /bank/_search
{
  "query": {
    "match": {
      "account_number": 20
    }
  }
}
  • 文本类型字段的条件搜索, 使用的是模糊匹配
GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "_source": [
    "address",
    "account_number"
  ]
}
  • 短语匹配搜索,使用match_phrase
GET /bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}
组合搜索

在一些非实时的分页查询,导出的场景,我们经常使用bool查询组合各种查询条件。

Bool查询包括四种子句,

must
filter
should
must_not
  • 组合搜索,使用bool来进行组合,must表示同时满足
GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 组合搜索,should表示满足其中任意一个
GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 组合搜索,must_not表示同时不满足,例如搜索address字段中不包含mill且不包含lane的文档
GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 组合搜索,组合must和must_not,例如搜索age字段等于40且state字段不包含ID的文档
GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
过滤搜索
  • 搜索过滤,使用filter来表示
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

//参考而已
GET /index/type/_search?pretty
{
	"query": {
		"bool":{
			"filter":[
				{
					"bool":{
						{
							"must":[{
								"match_phrase":{
									"sessionId":{
											"query":"2123",
											"slop":0,
											"zero_terms_query":"NONE",
											"boost":1
									}
								}
							}],
							"adjust_pure_negative":true,
							"boost":1
						}
					}
				}
			]
		}
	}
}
  • Filters Aggregation 多个过滤组聚合计算
GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "errors": {
            "match": {
              "body": "error"
            }
          },
          "warnings": {
            "match": {
              "body": "warning"
            }
          }
        }
      }
    }
  }
}
  • 其他值组指定key
GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "other_bucket_key": "other_messages",
        "filters": {
          "errors": {
            "match": {
              "body": "error"
            }
          },
          "warnings": {
            "match": {
              "body": "warning"
            }
          }
        }
      }
    }
  }
}
  • term过滤
   {
       “query”:{
         “term”:{
           “hostname”:“activity.report”
           }
         }
    }

//多条件过滤
 “query”:{
    “terms”:{
      “status”:[
        303,
        304
      ]
    }
  }
  • range过滤
    lt:小于
    lte:小于等于
    gt:大于
    gte:大于等于

查询年龄在10-20岁之间:
 {
   “query”:{
     “range”:{
       “age”:{
         “gte”:20,
         “lte”:30
       }
     }
   }
 }
  • exists和missing过滤
    exists:查询文档中是否包含某个字段。
    missing:查询文档中没有某个字段。
{
  “exits:”{
    “field”:“title”
  }
}

3.6 聚合

f.term搜索聚合
  • 对搜索结果进行聚合,使用term来表示,类似于MySql中的group by
GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
  • 指定返回20个分组
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }
  }
}
聚合缺失值处理
GET /_search
{
    "aggs" : {
        "tags" : {
             "terms" : {
                 "field" : "tags",
                 "missing": "N/A" 
             }
         }
    }
}
聚合排序order
  • order 指定分组的排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_count" : "asc" }
      }
    }
  }
}
  • 根据分组值排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_key" : "asc" }
      }
    }
  }
}
  • 分组指标值排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "max_balance": "asc"
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        },
        "min_balance": {
          "min": {
            "field": "balance"
          }
        }
      }
    }
  }
}
  • 筛选分组-正则表达式匹配值
GET /_search
{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tags",
                "include" : ".*sport.*",
                "exclude" : "water_.*"
            }
        }
    }
}
  • 筛选分组-指定值列表
GET /_search
{
    "aggs" : {
        "JapaneseCars" : {
             "terms" : {
                 "field" : "make",
                 "include" : ["mazda", "honda"]
             }
         },
        "ActiveCarManufacturers" : {
             "terms" : {
                 "field" : "make",
                 "exclude" : ["rover", "jensen"]
             }
         }
    }
}
  • 根据脚本计算值分组
GET /_search
{
    "aggs" : {
        "genres" : {
            "terms" : {
                "script" : {
                    "source": "doc['genre'].value",
                    "lang": "painless"
                }
            }
        }
    }
}
嵌套聚合

例如对state字段进行聚合,统计出相同state的文档数量,再统计出balance的平均值

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
分段聚合range
  • 例如分段范围为age字段的[20,30] [30,40] [40,50],之后按gender统计文档个数和balance的平均值
GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}
max min sum avg
POST /bank/_search?
{
  "size": 0, 
  "aggs": {
    "masssbalance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
  • 查询年龄为24岁的客户中的余额最大值
POST /bank/_search?
{
  "size": 2, 
  "query": {
    "match": {
      "age": 24
    }
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
  • 值来源于脚本,查询所有客户的平均年龄是多少,并对平均年龄加
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "script": {
          "source": "doc.age.value"
        }
      }
    },
    "avg_age10": {
      "avg": {
        "script": {
          "source": "doc.age.value + 10"
        }
      }
    }
  }
}
  • 指定field,在脚本中用_value 取字段的值
POST /bank/_search?size=0
{
  "aggs": {
    "sum_balance": {
      "sum": {
        "field": "balance",
        "script": {
            "source": "_value * 1.03"
        }
      }
    }
  }
}
  • 为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略。
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }
  }
}
Value count 统计某字段有值的文档数
POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}
  • cardinality 值去重计数
POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    },
    "state_count": {
      "cardinality": {
        "field": "state.keyword"
      }
    }
  }
}
  • stats 统计 count max min avg sum 5个值
POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  }
}
  • Extended stats: 比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "extended_stats": {
        "field": "age"
      }
    }
  }
}
  • Percentiles 占比百分位对应的值统计
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}

//指定分值
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age",
        "percents" : [95, 99, 99.9] 
      }
    }
  }
}
  • Percentiles rank 统计值小于等于指定值的文档占比:值不固定待验证
POST /bank/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          25,
          30
        ]
      }
    }
  }
}
Range Aggregation 范围分组聚合
  • 范围分组
POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "to": 25
          },
          {
            "from": 25,
            "to": 35
          },
          {
            "from": 35
          }
        ]
      },
      "aggs": {
        "bmax": {
          "max": {
            "field": "balance"
          }
        }
      }
    }
  }
}
Date Range Aggregation 时间范围分组聚合
POST /bank/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "format": "MM-yyy",
        "ranges": [
          {
            "to": "now-10M/M"
          },
          {
            "from": "now-10M/M"
          }
        ]
      }
    }
  }
}
Date Histogram Aggregation 时间直方图(柱状)聚合

可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

POST /bank/_search?size=0
{
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "date",
        "interval": "month"
      }
    }
  }
}
Missing Aggregation 缺失值的桶聚合
POST /bank/_search?size=0
{
    "aggs" : {
        "account_without_a_age" : {
            "missing" : { "field" : "age" }
        }
    }
}
spring boot es注解
  • es 注解
@Document 作用在类,标记实体类为文档对象
包含属性
	indexName:对应索引库名称
	type:对应在索引库中的类型
	shards:分片数量,默认5
	replicas:副本数量,默认1

@Id 作用在成员变量,标记一个字段作为id主键

@Field 作用在成员变量,标记为文档的字段,并指定字段映射属性
包含属性
type:字段类型,是枚举:FieldType,可以是text、long、short、date、integer、object等

type属性名称	含义
	text	存储数据时候,会自动分词,并生成索引
	keyword	存储数据时候,不会分词建立索引
	Numerical	数值类型,一类为基本数据类型:long、interger、short、byte、double、float、half_float 。一类为浮点数的高精度类型:scaled_float 需要指定一个精度因子,比如10或50,elasticsearch会把真实值乘以这个因子后存储,取出时再还原
	Date日期类型	elasticsearch可以对日期格式化为字符串存储,但是建议我们存储为毫秒值,存储为long,节省空间


index:是否索引,布尔类型,默认是true
store:是否存储,布尔类型,默认是false
analyzer:分词器名称,这里的ik_max_word即使用ik分词器

注意事项

  1. 对于一个索引,除非重建索引否则不能调整主分片的数目 (number_of_shards),但可以随时调整 replica 的数目 (number_of_replicas)。

参考文献

其他博客

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值