Bulk API-批量执行

最新推荐文章于 2023-12-14 09:00:00 发布

SUN123565

最新推荐文章于 2023-12-14 09:00:00 发布

阅读量636

点赞数

分类专栏： elaasticsearch 文章标签： bulk

elaasticsearch 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

来源：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

简介

Bulk API用于执行批量创建索引和删除数据操作。

工具

一些工具可以帮助使用者执行bulk请求：
Perl：
源网址404
Python：
http://elasticsearch-py.readthedocs.io/en/master/helpers.html

使用

（1）api格式：
/_bulk
（2）数据格式：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

注意：数据必须以/n结束

（3）可以使用的操作
index：需要在下一行中输入数据源
create：需要在下一行中输入数据源
delete：不需要数据源
update：需要部分doc，upsert 和script 操作在下一行中出现。

如果给出的数据是文本，那么必须使用–data-binary 而不能使用简写的-d。-d不保留新行。例如：

$ cat requests
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}

正因为_bulk是以/n作为分隔符，所以需要确保json动作和json数据不是格式化的（格式化打印的会带着很多/n）。正确的_bulk命令的使用方式：

POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

/_bulk命令可以这样使用：
- /_bulk：需要指定_index,_type
- /{index}/_bulk：需要指定_type
- /{index}/{type}/_bulk

shell脚本批量导入

_bulk接口虽然可以将数据批量导入，但是需要的格式比较复杂，在没有工具的情况下如果将大量数据建立索引还是很麻烦的。这时候，我们可以使用shell脚本直接执行数据导入：

(1) 准备好一个json文件命名为：myfile.json
(2)执行脚本：
while read line
do
curl -XPOST ‘http://localhost:9200/< indexname>/< typeofdoc>/’ -d “$line”
done < myfile.json