bulk request会加载到内存中,如果太大的话性能也会下降。一般可以尝试逐渐增加,大小在5至15MB之间寻找平衡点
-
语法:除delete操作外,每一个操作需要两个json串,语法如下:
{"action":{"metadata"}}
{"data"}
其中action有以下选项:
- delete:删除一个文档,只要1个json串
- create: PUT /index/type/id/_create,强制创建,如果指定的id已存在,者报错
- index:普通的put操作,可以是创建文档,也可以是全量替换文档
- update:执行partial update操作
metadata有选项有:{"_index":"xxx","_type":"xxx","_id":"x","retry_on_conflict":5}
-
示例:
GET _bulk
{"delete":{"_index":"accounts","_type":"person","_id":"1"}}
{"create":{"_index":"accounts","_type":"person","_id":"1"}}
{"user":"zeng","age":10,"salary":20000,"title":"工程师","desc":"数据库管理","tag":"girl","tags":"xyz"}
{"create":{"_index":"accounts","_type":"person","_id":"1"}}
{"user":"zeng","age":10,"salary":20000,"title":"工程师","desc":"数据库管理","tag":"girl","tags":"xyz"}
{"index":{"_index":"accounts","_type":"person","_id":"1"}}
{"user":"zeng","age":10,"salary":20000,"title":"工程师","desc":"数据库管理","tag":"boy"}
{"update":{"_index":"accounts","_type":"person","_id":"1","_retry_on_conflict":5}}
{"doc":{"tag":"girl"}}
返回:
{
"took": 23,
"errors": true,
"items": [
{
"delete": {
"found": true,
"_index": "accounts",
"_type": "person",
"_id": "1",
"_version": 11,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
},
{
"create": {
"_index": "accounts",
"_type": "person",
"_id": "1",
"_version": 12,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"create": {
"_index": "accounts",
"_type": "person",
"_id": "1",
"status": 409,//强制创建,如果指定的id已存在,者报错
"error": {
"type": "version_conflict_engine_exception",
"reason": "[person][1]: version conflict, document already exists (current version [12])",
"index_uuid": "qgPjJUv6Sm-VBesoOMlNPA",
"shard": "3",
"index": "accounts"
}
}
},
{
"index": {
"_index": "accounts",
"_type": "person",
"_id": "1",
"_version": 13,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false,
"status": 200
}
},
{
"update": {
"_index": "accounts",
"_type": "person",
"_id": "1",
"_version": 14,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
}
]
}
-
异常1:
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
},
"status": 400
}一盘查看update操作的josn字符串是否是如下格式
{"doc":{"data"}}
-
异常2:
{
"error": {
"root_cause": [
{
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@694f1218; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@694f1218; line: 1, column: 3]"
}
],
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@694f1218; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@694f1218; line: 1, column: 3]"
},
"status": 500
}请查看请求json格式,json串需要去格式化,只能在一行中。
-
为何不采用更合理,更易读的json串?
主要原因是,如果那样的话,解析json串会形成请求数据的拷贝(如JsonArray对象),占用内存增加,使es的JVM进行频繁的gc操作,非常影响性能。
采用目前的格式,只需要将json串进行分割操作,然后转发给相应的节点进行处理就可以了,不会浪费内存,性能会大大增加。