ELK技术栈系列-ElasticSearch(三) 常用 ElasticSearch 管理操作

最新推荐文章于 2023-02-10 19:25:11 发布

plenilune-望月

最新推荐文章于 2023-02-10 19:25:11 发布

阅读量164

点赞数

分类专栏：全文检索服务（Solr、ELK）

本文链接：https://blog.csdn.net/donglinjob/article/details/109078528

版权

全文检索服务（Solr、ELK）专栏收录该内容

14 篇文章 3 订阅

订阅专栏

常用 ElasticSearch 管理操作

1 查看健康状态

GET _cat/health?v

epoch timestamp cluster status node.total node.data shards
1531290005 14:20:05 elasticsearch green 1 1 2
pri relo init unassign pending_tasks
2 0 0 0 0
max_task_wait_time active_shards_percent
- 100.0%

status：green、yellow、red

green：每个索引的 primary shard 和 replica shard 都是 active 的

yellow：每个索引的primary shard都是active的，但部分的replica shard不是active的

red：不是所有的索引的 primary shard 都是 active 状态的。

2 创建索引

命令语法：PUT 索引名{索引配置参数}

index 名称必须是小写的，且不能以下划线'_'，'-'，'+'开头。

在 ElasticSearch 中，默认的创建索引的时候，会分配 5 个 primary shard，并为每个primary shard 分配一个 replica shard（在 ES7 版本后，默认创建 1 个 primary shard）。在 ElasticSearch 中，默认的限制是：如果磁盘空间不足 15%的时候，不分配 replica shard。如果磁盘空间不足 5%的时候，不再分配任何的 primary shard。ElasticSearch 中对 shard的分布是有要求的。ElasticSearch 尽可能保证 primary shard 平均分布在多个节点上。Replica shard 会保证不和他备份的那个 primary shard 分配在同一个节点上。

创建默认索引

PUT test_index1

创建索引时指定分片。

PUT test_index2
{
	"settings":{
		"number_of_shards" : 2,
		"number_of_replicas" : 1
	}
}

3 修改索引

命令语法：PUT 索引名/_settings{索引配置参数}

注意：索引一旦创建，primary shard 数量不可变化，可以改变 replica shard 数量。

PUT test_index2/_settings
{
   "number_of_replicas" : 2
}

4 删除索引

命令语法：DELETE 索引名 1[, 索引名 2 ...]

DELETE test_index1

5 查看索引信息

GET _cat/indices?v

health status index uuid pri rep docs.count
yellow open test_index 2PJFQBtzTwOUhcy-QjfYmQ 5 1 0
docs.deleted store.size pri.store.size
0 460b 460b

6 检查分片信息

查看索引的 shard 信息。

GET _cat/shards?v

index shard prirep state docs store ip node
test_index2 1 p STARTED 0 261b 192.168.89.142 mN_pylT
test_index2 1 r UNASSIGNED
test_index2 1 r UNASSIGNED
test_index2 0 p STARTED 0 261b 192.168.89.142 mN_pylT
test_index2 0 r UNASSIGNED
test_index2 0 r UNASSIGNED

7 新增 Document

在索引中增加文档。在 index 中增加 document。

ElasticSearch 有自动识别机制。如果增加的 document 对应的 index 不存在，自动创建 index；如果 index 存在，type 不存在，则自动创建 type。如果 index 和 type 都存在，则使用现有的 index 和 type。

7.1 PUT 语法

此操作为手工指定 id 的 Document 新增方式。

语法：PUT 索引名/类型名/唯一 ID{字段名:字段值}

如：

PUT test_index/my_type/1
{
	"name":"test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}
PUT test_index/my_type/2
{
	"name":"test_doc_02",
	"remark":"second test elastic search",
	"order_no":2
}
PUT test_index/my_type/3
{
	"name":"test_doc_03",
	"remark":"third test elastic search",
	"order_no":3
}

结果：

{
	"_index": "test_index", 新增的 document 在什么 index 中，
	"_type": "my_type", 新增的 document 在 index 中的哪一个 type 中。
	"_id": "1", 指定的 id 是多少
	"_version": 1, document 的版本是多少，版本从 1 开始递增，每次写操作都会+1
	"result": "created", 本次操作的结果，created 创建，updated 修改，deleted 删除
	"_shards": { 分片信息
		"total": 2, 分片数量只提示 primary shard
		"successful": 1, 数据 document 一定只存放在 index 中的某一个 primary shard 中
		"failed": 0
	},
	"_seq_no": 0, 执行的序列号
	"_primary_term": 1 词条比对。
}

如果使用 PUT 语法对同 id 的 Document 执行多次操作。是一种覆盖操作。如果需要ElasticSearch 辅助检查 PUT 的 Document 是否已存在，可以使用强制新增语法。使用强制新增语法时，如果 Document 的 id 在 ElasticSearch 中已存在，则会报错。（version conflict, document already exists）

语法：

PUT 索引名/类型名/唯一 ID/_create{字段名:字段值}

或

PUT 索引名/类型名/唯一 ID?op_type=create{字段名:字段值}。

如：

PUT test_index/my_type/1/_create
{
	"name":"new_test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}

7.2 POST 语法

此操作为 ElasticSearch 自动生成 id 的新增 Document 方式。此语法格式和 PUT 请求的数据新增，只有唯一的区别，就是可以自动生成主键 id，其他的和 PUT 请求新增数据完全一致。

语法：POST 索引名/类型名{字段名:字段值}

如：

POST test_index/my_type
{
	"name":"test_doc_04",
	"remark":"forth test elastic search",
	"order_no":4
}

8 查询 Document

8.1 GET ID 单数据查询

语法：GET 索引名/类型名/唯一 ID

如：

GET test_index/my_type/1

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 1,
	"found": true,
	"_source": { 找到的 document 数据内容。
		"name": "test_doc_01",
		"remark": "first test elastic search",
		"order_no":1
	}
}

8.2 GET _mget 批量查询

批量查询可以提高查询效率。推荐使用（相对于单数据查询来说）。

语法如下：

GET _mget
{
   "docs" : [
       {
           "_index" : "索引名",
           "_type" : "类型名",
           "_id" : "唯一 ID 值"
       }, {}, {}
   ]
}

GET 索引名/_mget
{
   "docs" : [
       {
           "_type" : "类型名",
           "_id" : "唯一 ID 值"
       }, {}, {}
   ]
}

GET 索引名/类型名/_mget
{
   "docs" : [
       {
           "_id" : "唯一 ID 值"
       },
       {
           "_id" : "唯一 ID 值"
       }
   ]
}

9 修改 Document

9.1 替换 Document（全量替换）

和新增的 PUT|POST 语法是一致。

PUT|POST 索引名/类型名/唯一 ID{字段名:字段值}

本操作相当于覆盖操作。全量替换的过程中，ElasticSearch 不会真的修改 Document中的数据，而是标记 ElasticSearch 中原有的 Document 为 deleted 状态，再创建一个新的 Document 来存储数据，当 ElasticSearch 中的数据量过大时，ElasticSearch 后台回收 deleted 状态的 Document。

如：

PUT test_index/my_type/1
{
	"name":"new_test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 2,
	"result": "updated",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 1,
	"_primary_term": 1
}

9.2 更新 Document（partial update）

语法：POST 索引名/类型名/唯一 ID/_update{doc:{字段名:字段值}}

只更新某 Document 中的部分字段。这种更新方式也是标记原有数据为 deleted 状态，创建一个新的Document数据，将新的字段和未更新的原有字段组成这个新的Document，并创建。对比全量替换而言，只是操作上的方便，在底层执行上几乎没有区别。

如：

POST test_index/my_type/1/_update
{
	"doc":{
		"name":" test_doc_01_for_update"
	}
}

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 5,
	"result": "updated",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 2,
	"_primary_term": 1
}

10 删除 Document

ElasticSearch 中执行删除操作时，ElasticSearch 先标记 Document 为 deleted 状态，而不是直接物理删除。当 ElasticSearch 存储空间不足或工作空闲时，才会执行物理删除操作。标记为 deleted 状态的数据不会被查询搜索到。

语法：DELETE 索引名/类型名/唯一 ID

如：

DELETE test_index/my_type/1

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 6,
	"result": "deleted",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 5,
	"_primary_term": 1
}

11 bulk 批量增删改

使用 bulk 语法执行批量增删改。语法格式如下：

POST _bulk

{ "action_type" : { "metadata_name" : "metadata_value" } }

{ document datas | action datas }

语法中的 action_type 可选值为：

create : 强制创建，相当于 PUT 索引名/类型名/唯一 ID/_create

index: 普通的 PUT 操作，相当于创建 Document 或全量替换

update: 更新操作（partial update）,相当于 POST 索引名/类型名/唯一 ID/_update

delete: 删除操作

案例如下：

新增数据：
POST _bulk
{ "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "1" } }
{ "field_name" : "field value" }

PUT 操作新增或全量替换
POST _bulk
{ "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "2" } }
{ "field_name" : "field value 2" }

POST 更新数据
POST _bulk
{ "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 2, "_retry_on_conflict" : 3 } }
{ "doc" : { "field_name" : "partial update field value" } }

DELETE 删除数据
POST _bulk
{ "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }

批量写操作
POST _bulk
{ "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "10" } }
{ "field_name" : "field value" }
{ "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "20" } }
{ "field_name" : "field value 2" }
{ "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 20, "_retry_on_conflict" : 3 } }
{ "doc" : { "field_name" : "partial update field value" } }
{ "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }

注意：bulk 语法中要求一个完整的 json 串不能有换行。不同的 json 串必须使用换行分隔。多个操作中，如果有错误情况，不会影响到其他的操作，只会在批量操作返回结果中标记失败。bulk 语法批量操作时，bulk request 会一次性加载到内存中，如果请求数据量太大，性能反而下降（内存压力过高），需要反复尝试一个最佳的 bulk request size。一般从 1000~5000 条数据开始尝试，逐渐增加。如果查看 bulk request size 的话，一般是 5~15MB 之间为好。

bulk 语法要求 json 格式是为了对内存的方便管理，和尽可能降低内存的压力。如果json 格式没有特殊的限制，ElasticSearch 在解释 bulk 请求时，需要对任意格式的 json进行解释处理，需要对 bulk 请求数据做 json 对象会 json array 对象的转化，那么内存的占用量至少翻倍，当请求量过大的时候，对内存的压力会直线上升，且需要 jvm gc 进程对垃圾数据做频繁回收，影响 ElasticSearch 效率。

生成环境中，bulk api 常用。都是使用 java 代码实现循环操作。一般一次 bulk 请求，执行一种操作。如：批量新增 10000 条数据等。

plenilune-望月

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
ELK技术栈系列-ElasticSearch(三) 常用 ElasticSearch 管理操作

常用 ElasticSearch 管理操作1 查看健康状态GET _cat/health?vepoch timestamp cluster status node.total node.data shards 1531290005 14:20:05 elasticsearch green 1 1 2pri r...
复制链接

扫一扫