重建索引不会复制源索引的设置,应该在执行
_reindex
之前,指定目标索引的设置,包括mappings、分片数、副本数等。
第一个示例
POST _reindex
{
"source": {
"index": "test"
},
"dest": {
"index": "test-copy"
}
}
_reindex是获取了一个快照来进行索引重建的。 处理版本冲突,可以在目标索引中指定version_type属性,包括"inernal"和"external"两个选项。(==这两个选项的作用,我没看懂==)
在目标索引的参数中加入op_type
属性,并将些属性设置为"create",_reindex将只创建那些在目标索引中不存在的文档。所有已存在的文档将导致一个版本冲突,但不影响_reindex的执行。可以设置conflicts
为"proceed",只统计版本冲突的文档数量,两者的区别如下: 请求参数
POST _reindex
{
"source": {
"index": "test"
},
"dest": {
"index": "test-copy",
"op_type": "create"
}
}
响应结果如下
{
"took": 2,
"timed_out": false,
"total": 2,
"updated": 0,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 2,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": [
{
"index": "test-copy",
"type": "doc",
"id": "2",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[doc][2]: version conflict, document already exists (current version [1])",
"index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
"shard": "2",
"index": "test-copy"
},
"status": 409
},
{
"index": "test-copy",
"type": "doc",
"id": "1",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[doc][1]: version conflict, document already exists (current version [1])",
"index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
"shard": "3",
"index": "test-copy"
},
"status": 409
}
]
}
请求参数
POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "test"
},
"dest": {
"index": "test-copy",
"op_type": "create"
}
}
响应结果
{
"took": 5,
"timed_out": false,
"total": 3,
"updated": 0,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 3,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
可以指定多个源索引,如"index": ["source_index_1", "source_index_2"]。 可以限制从目标索引复制文档的数量,在源索引中可以使作query和sort,并且可以指定_source字段
POST _reindex
{
"size":1,
"source":{
"index": "test",
"sort": {
"date": "desc"
},
"query": {
"match": {
"test": "data"
}
},
"_source": ["field1", "field2"]
},
"dest":{...}
}
_reindex支持script来修改文档。
假如源文档中有一个名为"flag"字段,你想在目标文档中改为"tag",可以执行以下语句
POST _reindex
{
"source": {
"index": "test"
},
"dest": {
"index": "test2"
},
"script": {
"source": "ctx._source.tag = ctx._source.remove(\"flag\")"
}
}
从远程elasticsearch重建索引
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"username": "user",
"password": "pass"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
可以在elasticsearch.yml中配置允许的远程服务器白名单: reindex.remote.whitelist: ["first-host:9200", "second-host:9200"]
远程重建会使用一个最大为100Mb的堆缓冲区,如果源索引中的文档尺寸很大,要合理的指定每个批次的数量,即前面提到的size属性。
可以指定socket_timeout
和connect_timeout
,如果不指定,这两个参数的默认值为30秒。
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"socket_timeout": "1m",
"connect_timeout": "10s"
},
"index": "source"
},
"dest": {
"index": "dest"
}
}
更多功能查看官方文档: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/docs-reindex.html