1 安装
1.1 确认Java版本
- 最新版ES 6.0.1至少需要Java 8
- 手册上推荐Oracle JDK version 1.8.0_131
java -version
echo $JAVA_HOME
1.2 Linux下安装
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.0.1.tar.gz
tar -xvf elasticsearch-6.0.1.tar.gz
cd elasticsearch-6.0.1/bin
./elasticsearch
1.3 Windows下安装
1.4 启动集群
# linux下
cd %PROGRAMFILES%\Elastic\Elasticsearch\bin
# powershell下
cd $env:PROGRAMFILES\Elastic\Elasticsearch\bin
.\elasticsearch.exe
# 启动时指定集群名字和节点名字
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name
2 探索集群
2.1 REST API
使用REST API可以干的事:
- cluster,node,index健康、状态、策略查询
- cluster,node,index数据和元数据管理
- CRUD、indexes查询操作
- 执行预操作,例如paging,sorting,filtering,scripting,aggregations等
2.2 安装kibana
- 下载和解压kibana
- 配置config/kibana.yml文件,设置elasticsearch.url为es实例
- 运行kibana
# linux
bin/kibana
# windows
bin\kibana.bat
- 使用浏览器登录 http://localhost:5601
- kibana用户指南
2.3 集群健康状态
2.3.1 查询集群健康状态
GET /_cat/health?v
使用postman执行查询
返回json结果
GET 127.0.0.1:9200/_cat/health?format=json&pretty
# response
[
{
"epoch": "1541249930",
"timestamp": "20:58:50",
"cluster": "elasticsearch",
"status": "green",
"node.total": "1",
"node.data": "1",
"shards": "0",
"pri": "0",
"relo": "0",
"init": "0",
"unassign": "0",
"pending_tasks": "0",
"max_task_wait_time": "-",
"active_shards_percent": "100.0%"
}
]
三种健康状态:
- green:所有服务运行正常,集群全部功能都可用
- yellow:所有数据都可用,但部分replica失效,集群全部功能都可用
- red:部分数据不可用,集群部分功能可用
2.3.2获取集群节点列表
GET /_cat/nodes?v
GET 127.0.0.1:9200/_cat/nodes?format=json&pretty
# response
[
{
"ip": "127.0.0.1",
"heap.percent": "11",
"ram.percent": "41",
"cpu": "8",
"load_1m": null,
"load_5m": null,
"load_15m": null,
"node.role": "mdi",
"master": "*",
"name": "my_first_node"
}
]
2.4 列出所有indices
GET /_cat/indices?v
2.5 创建index
创建一个名为customer的index,并列出所有分片
PUT /customer?pretty
GET /_cat/indices?v
PUT 127.0.0.1:9200/customer?pretty
# response
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "customer"
}
GET 127.0.0.1:9200/indices?format=json&pretty
# response
[
{
"health": "yellow", # 目前只有一个节点,无法分派1个replica,固为yellow状态
"status": "open",
"index": "customer",
"uuid": "BpesQm0kRhWBauTfht4UZg",
"pri": "5", # 5个primary shards
"rep": "1", # 1个replica
"docs.count": "0", # 0个document
"docs.deleted": "0",
"store.size": "1.1kb",
"pri.store.size": "1.1kb"
}
]
2.5 index和query document
2.5.1 index
index一个ID为1的customer document 到customer index
PUT /customer/_doc/1?pretty
{
"name": "John Doe"
}
# response
{
"_index": "customer",
"_type": "doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
2.5.2 query
GET /customer/_doc/1?pretty
# response
{
"_index": "customer",
"_type": "doc",
"_id": "1",
"_version": 1,
"found": true,
"_source": { # 返回全量JSON document
"name": "Xiaotong Who"
}
}
2.6 删除 index
DELETE /customer?pretty
# response
{
"acknowledged": true
}
GET /_cat/indices?v
# response
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
2.7 访问ES数据的模式
<HTTP Verb> /<Index>/<Type>/<ID>
3 修改数据
3.1 修改数据:替换 document
对某一document执行PUT操作,ES会自动替换document中的值
# 创建index和document
PUT /customer
PUT /customer/_doc/1?pretty
{
"name": "John Doe"
}
GET /customer/_doc/1?pretty
# 替换之前的document
PUT /customer/_doc/1?pretty
{
"name":"Tom Hu"
}
GET /customer/_doc/1?pretty
# 创建一个新的document
PUT /customer/_doc/2?pretty
{
"name":"Yaping Leaf"
}
创建document的时候,ID是可选的,如果没指定,ES会自动生成一个随机ID
3.2 修改数据:更新数据
- 除了插入和替换数据,还可以更新数据
- 更新数据不是真的跟新,而是把旧的删除,然后创建个新的document
# 修改名字,并增加年龄
POST /customer/_doc/1/_update?pretty
{
"doc":{"name":"Xiaotong Who","age":20}
}
- update支持使用简单的脚本
ctx._source表示当前document的引用
# 给年龄增加5
POST /customer/_doc/1_update?pretty
{
"script":"ctx._source.age += 5"
}
3.3 删除documents
DELETE /customer/_doc/2?pretty
3.4 批量处理
# 执行两条index document
POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "Tom Hu" }
{"index":{"_id":"2"}}
{"name": "Yaping Leaf" }
# 执行跟新和删除
POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "Tom Hu becomes Xiaotong Who" } }
{"delete":{"_id":"2"}}
4 探索数据
4.1 search API
4.1.1 发送请求
通过URL发送请求
GET /bank/_search?q=*&sort=account_number:asc&pretty
- 使用_search节点
- q=*参数可以匹配index中的所有document
- sort=account_number:asc参数使返回值以account_number字段按升序排序
- pretty参数使ES将返回值以JSON的形式返回,便于阅读
# response(部分)
{
"took" : 63, # 查询花费了63毫秒
"timed_out" : false, # 查询没有超时
"_shards" : { # 总共5个shards被查询,成功5个,跳过0个,失败0个
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : { # 查询结果
"total" : 1000, # 满足查询标准的document总量
"max_score" : null,
"hits" : [ { # 实际查到的document列表,默认前10条数据
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0], # 排序key
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
通过method body发送请求
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
4.2 query语言
例1 :查询
GET /bank/_search
{
"query": { "match_all": {} },
"from": 10,
"size": 10
}
- query:定义查询
- match_all:查询全部文档
- from:从第10条开始
- size:返回数量
例2:按balance字段降序排序
GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}
4.3 执行search
4.3.1 查询document中的部分字段
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"] # 只返回account_number和balance字段
}
4.3.2 条件查询
返回account_number为20的document
GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
返回在address字段中含有“mill“或者”lane”
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
返回在address字段中含有“mill lane”
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
4.3.3 Bool查询
4.3.3.1 must子句:所有match查询都必须为都真才会匹配成功
查询address字段同时包含mill和lane的document
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
4.3.3.2 shoud子句:只要有一个match查询为真就会匹配成功
查询address字段包含“mill”或“lane”的document
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
4.3.3.3 must_not子句:所有match查询都为假,才会匹配成功
查询address字段既不包含“mill”也不包含“lane”的document
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
4.3.3.4 组合bool查询
查询age为40,state不为ID的账户:
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
4.4 执行filter
4.4.1 document score (_score字段)
- 用来表示document与我们指定的查询的相关程度,数值越大,相关度越高,数值越小,相关度越低。
- 查询不总是会有score来衡量相关性,一般在执行filter的时候才会涉及
3.bool查询也支持filter子句,在bool查询中写filter子句,可以让我们在不用计算score增减的情况下,使用别的子句来条件查询document
4.4.2 range查询
可以通过限定一个范围值来过滤文档,通常用在数值或者日期的过滤。
举例:返回余额在20000到30000的账户
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} }, # 匹配所有document
"filter": { # filter子句
"range": { # range子句
"balance": { # 余额
"gte": 20000, # 大于等于20000
"lte": 30000 # 小于等于30000
}
}
}
}
}
}
4.5 执行aggregation
- 提供类似SQL GROUP BY语句以及SQL Aggregation功能
- 能同时返回search结果集以及aggregation结果
4.5.1 Group By
将账户按state排序,然后按count降序排序,返回Top10的state
如果写SQL
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
GET /bank/_search
{
"size": 0, # 在response中不返回查询到的document,只需要数量
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
# response
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits" : {
"total" : 1000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound": 20,
"sum_other_doc_count": 770,
"buckets" : [ {
"key" : "ID",
"doc_count" : 27
}, {
"key" : "TX",
"doc_count" : 27
}, {
"key" : "AL",
"doc_count" : 25
}, {
"key" : "MD",
"doc_count" : 25
}, {
"key" : "TN",
"doc_count" : 23
}, {
"key" : "MA",
"doc_count" : 21
}, {
"key" : "NC",
"doc_count" : 21
}, {
"key" : "ND",
"doc_count" : 21
}, {
"key" : "ME",
"doc_count" : 20
}, {
"key" : "MO",
"doc_count" : 20
} ]
}
}
}
4.5.2 aggregation语句中嵌套aggregation
通常用于对聚合得到的数据做另外的总结操作
对账户数在前十的州求余额平均值
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
按账户数前十的州余额平均值降序排序
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
按年龄段分别聚合,然后在各年龄段内按性别聚合,然后获取到各年龄段中各性格的平均账户余额
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}