Update API

最新推荐文章于 2024-04-14 16:22:26 发布

weixin_30482383

最新推荐文章于 2024-04-14 16:22:26 发布

阅读量278

点赞数 1

文章标签：大数据

原文链接：http://www.cnblogs.com/ginb/p/9413382.html

版权

Update API可以根据提供的脚本更新文档。该操作从索引获取文档，运行脚本（脚本语言和参数是可选的），并返回操作的结果（也允许删除或忽略该操作）。使用版本控制来确保在“get”(查询文档)和“reindex”(重新索引文档)期间没有发生更新。

值得注意的是，该操作会重新索引文档（也就是说更新操作会先查文档，对文档合并，删除之前的文档，重新添加合并的文档。），它只是减少了网络往返以及减少了get（获取文档）和index（索引文档）之间版本冲突的可能性。需要启用_source字段才能使此特性生效。

比如，索引一个简单的文档：

PUT test/_doc/1
{
    "counter" : 1,
    "tags" : ["red"]
}

Scripted updates

以下示例演示了如何执行一个增加counter的脚本：

POST test/_doc/1/_update
{
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}

现在我们就可以往tags列表里添加一个tag（注意，如果tag存在，仍会添加，因为它是一个list）

POST test/_doc/1/_update
{
    "script" : {
        "source": "ctx._source.tags.add(params.tag)",
        "lang": "painless",
        "params" : {
            "tag" : "blue"
        }
    }
}

不止_source，以下变量也可以通过ctx来取得： _index, _type, _id, _version, _routing and _now(当前的时间戳)

以下示例演示了如何获取_id，比如：

POST test/_doc/1/_update
{
    "script" : "ctx._source.tags.add(ctx._id)"
}

也可以向文档添加新字段：

POST test/_doc/1/_update
{
    "script" : "ctx._source.new_field = 'value_of_new_field'"
}

从文档移除某个字段：

POST test/_doc/1/_update
{
    "script" : "ctx._source.remove('new_field')"
}

甚至可以改变已执行的操作。以下示例：如果标签字段包含green，将删除doc，否则它不执行任何操作（即该操作会被忽略，返回noop）：

POST test/_doc/1/_update
{
    "script" : {
        "source": "if (ctx._source.tags.contains(params.tag)) { ctx.op = 'delete' } else { ctx.op = 'none' }",
        "lang": "painless",
        "params" : {
            "tag" : "green"
        }
    }
}

更新部分文档

update API还支持传递部分文档，该部分文档将合并到现有文档中（简单的递归合并，对象的内部合并，替换核心"keys/values"和数组）。要完全替换现有文档，应使用index API。以下示例演示了如何使用部分更新向现有文档添加新字段：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    }
}

如果同时指定了doc和script，会报错。最好是将部分文档的字段对放在脚本本身中（目前我还不知道该怎么操作）。

POST test/_doc/1/_update
{
  "doc" : {
        "age" : "18"
    },
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}

返回结果如下：

{
  "error": {
    "root_cause": [
      {
        "type": "action_request_validation_exception",
        "reason": "Validation Failed: 1: can't provide both script and doc;"
      }
    ],
    "type": "action_request_validation_exception",
    "reason": "Validation Failed: 1: can't provide both script and doc;"
  },
  "status": 400
}

检测noop更新
如果指定了doc，则其值将与现有_source合并。默认情况下，不更改任何内容的更新，会检测到并会返回“result”：“noop”，如下所示：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    }
}

如果在发送请求之前name是new_name，则忽略整个更新请求。如果请求被忽略，响应中的result元素将返回noop。

{
  "_index": "test",
  "_type": "_doc",
  "_id": "1",
  "_version": 2,
  "result": "noop",
  "_shards": {
    "total": 0,
    "successful": 0,
    "failed": 0
  }
}

设置"detect_noop": false可以禁用这种默认行为：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    },
    "detect_noop": false
}

Upserts

如果文档尚不存在，则upsert元素的内容将作为新文档插入。如果文档确实存在，则执行脚本：

POST test/_doc/1/_update
{
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    },
    "upsert" : {
        "counter" : 1
    }
}

当然，不一定非得脚本，下面这样也是可以的，文档不存在的时候执行upsert内容，文档存在的时候执行doc的内容：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    },
    "upsert" : {
        "counter" : 10
    }
}

scripted_upsert
如果希望无论文档是否存在，都运行脚本（即使用脚本处理初始化文档而不是upsert元素）可以将scripted_upsert设置为true：

POST sessions/session/dh3sgudg8gsrgl/_update
{
    "scripted_upsert":true,
    "script" : {
        "id": "my_web_session_summariser",
        "params" : {
            "pageViewEvent" : {
                "url":"foo.com/bar",
                "response":404,
                "time":"2014-01-01 12:32"
            }
        }
    },
    "upsert" : {}
}

下面来看看和直接写脚本不用upsert的区别，当文档不存在时，直接下面这样写会报错。

POST test/_doc/1/_update
{
    "scripted_upsert":true,
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}

返回错误消息如下：

{
  "error": {
    "root_cause": [
      {
        "type": "document_missing_exception",
        "reason": "[_doc][1]: document missing",
        "index_uuid": "YgmlkeEERGm20yUBDJHKtQ",
        "shard": "3",
        "index": "test"
      }
    ],
    "type": "document_missing_exception",
    "reason": "[_doc][1]: document missing",
    "index_uuid": "YgmlkeEERGm20yUBDJHKtQ",
    "shard": "3",
    "index": "test"
  },
  "status": 404
}

设置scripted_upsert：true，当文档不存在时，执行下面的代码：

POST test/_doc/1/_update
{
    "scripted_upsert":true,
    "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    },
    "upsert" : {
        "counter" : 10
    }
}

返回的结果如下：

{
  "_index": "test",
  "_type": "_doc",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 6,
  "_primary_term": 1
}

可见，执行成功了，下面来看看文档：

{
  "_index": "test",
  "_type": "_doc",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "counter": 14
  }
}

counter的值为14，可见是先执行了upsert的内容，然后执行了脚本。

doc_as_upsert
将doc_as_upsert设置为true将使用doc的内容作为upsert值，而不是发送部分doc加上upsert文档：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    },
    "doc_as_upsert" : true
}

下面来看看和直接写doc的区别：

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    }
}

当文档不存在时，设置doc_as_upsert为true，可以成功执行。而上面这种情况会报错，提示文档不存在。如果向下面这样写会出现什么情况呢？

POST test/_doc/1/_update
{
    "doc" : {
        "name" : "new_name"
    },
    "upsert" : {
        "counter" : 10
    },
    "doc_as_upsert" : true
}

结果是upsert永远不会被执行，不管文档存在不存在，始终执行的是doc的内容。