ES 学习之路-document

1.每个index只能有一个type

从elasticsearch6.0开始已经移除多type,也就是说每个index只有一个type,这个与关系型数据库中的databases有多个table不同,其实在官方有解释为什么移除,之前将elasticsearch与关系型数据进行类比就是一个错误的做法,因为在elasticsearch的同一个index,如果有多个type,而且这多个type有同样的field,由于lucene的原因,这些field都必须要有同样的类型,详细描述请看方法文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html,而且在elasticsearch后续版本中会完全将type这个概念移除。

2.index API 可以用json的方式添加或更新一个index中文档,并且是这个文档可以被检索到
PUT test/log/2
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
//添加一个文档到test的log type中,并且指定这个文档的id为2
//返回结果:
{
  "_index": "test",
  "_type": "log",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 3
}
//total 表示由多少主分片和副本执行了索引操作
//successful 表示有多少分片副本执行成功
//failed 表示执行失败的数量
//如果要一次操作成功, 那么successful至少是1
2.自动索引创建

默认情况下,elasticsearch会自动判断index是否存在当前文档,如果没有就添加该文档到库中,而且会自动匹配Mapping,没有的Mapping也会自动添加的Mappings中,当然可以使用index.mapper.dynamic=false来禁止自动索引创建,如果没有指定id,elasticsearch也会自动生成一个唯一id

3.version

每个文档都会有自己的一个version,不论是增删改,这个version都会发生变化,在添加的时候我们可以指定文档version,更新和查询的时候也可以指定verison

如果在参数中不提供version,elasticsearch则不会检验version

默认情况下version从1开始,如果没有其他干预,每次更新操作+1

test/log/2?version=2
{
  "message": "test for verison"
}
//更新id为2的文档的message
//返回结果提示版本号不一致,不允许操作
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[log][2]: version conflict, current version [1] is different than the one provided [2]",
        "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
        "shard": "2",
        "index": "test"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[log][2]: version conflict, current version [1] is different than the one provided [2]",
    "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
    "shard": "2",
    "index": "test"
  },
  "status": 409
}
4.Operation Type

在API中可以使用op_type参数进行一些特殊的操作,如默认情况下是如果文档不存在就创建,如果存在这是更新,但是下边这个实例:

PUT test/log/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
//这个样例就是强制创建一个文档
//不同的写法
PUT twitter/_doc/1/_create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
5.Routing

默认情况下文档放置的位置是用过文档id的hash值控制,为了显示的控制,可以使用路由参数在每次操作的基础上直接指定输入到路由器使用的散列函数中的值,如:

POST test/log?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
6.Timeout

可以在api中设置超时时间

PUT test/log/1?timeout=5m
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
7.GET Index API

GET Index API 可以通过id获取json格式的文档内容,如下:

GET /test/log/1
返回结果:
{
  "_index": "test",
  "_type": "log",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "arr": [
      {
        "name": "1"
      },
      {
        "name": "2"
      }
    ]
  }
}

在默认情况下,这个接口是实时性的,如果文档调用了update接口,但是没有reflush,这个时候对GET 接口没有任何影响,它会自动刷新文档内容,如果想要禁止获取实时文档数据,可以在参数中提供realtime为false

像上边的调用那样,接口默认是会返回_source结果的,如果不想返回该结果,可以提供参数 _source=false禁止

GET /test/log/1?_source=false

如果想要获取其中的一个或多个指定的field数据可以像下边这样:

GET test/log/1?_source_include=*.id&_source_exclude=entities
//_source_exclude结果中排出项
//_source_include结果中包含项

可以在参数中指定version获取特定版本的文档信息

GET /test/log/1?version=1
8.DELETE INDEX API

elasticsearch提供API删除指定id对应的文档

DELETE /test/log/1
{
  "_index": "test",
  "_type": "log",
  "_id": "1",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 4
}
//当删除是, 返回结果中有_version,使用它可以代表着文档已经被更改,在elasticsearch中,文档添加,删除,更新操作都会有version发生变化

当使用控制路由的能力进行索引时,为了删除文档,还应该提供路由值

DELETE /test/log/1?routing=kimchy

当执行删除API后,需要执行refresh才能在索引生效

删除API可以带有超时机制

DELETE /test/log/1?timeout=5m
9.DELETE INDEX BY QUERY API

elasticsearch提供通过查询的方式删除,也就是将查询结果对应的文档删除

POST /test/_delete_by_query
{
  "query": {
    "match": {
      "name": "mjlf"
    }
  }
}
//返回结果
{
  "took": 104,
  "timed_out": false,
  "total": 1,
  "deleted": 1,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

_DELETE_BY_Query API 在索引启动时获取该索引的快照,并使用内部版本控制删除它发现的内容。这意味着,如果文档在快照拍摄时间和删除请求处理时间之间发生更改,则会出现版本冲突。当版本匹配时,文档将被删除。

由于内部版本控制不支持0作为有效的版本号,因此不能使用_DELETE_BY_Query删除版本等于零的文档,并将请求失败。

10.Index Document Update
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.name += params.name",
    "params": {
      "name": "love f"
    }
  }
}

//向数组中添加新的内容
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.tags.add(params.tag)",
    "lang": "painless",
    "params": {
      "tag": "name"
    }
  }
}

//添加新字段
POST /test/log/2/_update
{
    "script" : "ctx._source.new_field = 'value_of_new_field'"
}

//移除字段
POST /test/log/2/_update
{
  "script": {
    "source": "ctx._source.remove('new_field')"
  }
}

//if判断,如果tags中包含name, 执行删除, 否则什么都不干
POST /test/log/2/_update
{
  "script": {
    "source": "if(ctx._source.tags.contains(params.name)){ ctx.op = 'delete' } else { ctx.op = 'none' }",
    "params": {
      "name": "name"
    }
  }
}

//update API 还支持一下方式更新文档
POST /test/log/2/_update
{
  "doc": {
    "name":"mjlf",//更新现有
    "age": 12,//新增field
    "tags": [//新增array
      "tag1"
    ]
  }
}

//
POST /test/log/2/_update
{
  "doc": {
    "age": 13
  }
}
//返回结果, 如果更新中没有发生任何操作,如上, 原本age就是13,这时不需要更新,所有返回结果中result是noop,表示没有执行任何操作,可以通过"detect_noop":"false"忽略这样的检查
{
  "_index": "test",
  "_type": "log",
  "_id": "2",
  "_version": 4,
  "result": "noop",
  "_shards": {
    "total": 0,
    "successful": 0,
    "failed": 0
  }
}

//如果文档不存在, 执行更新会报错,但是如果加上upsert参数, 这个可以在没有文档的时候添加新的问题, 文档存在的时候执行更新操作
{
  "error": {
    "root_cause": [
      {
        "type": "document_missing_exception",
        "reason": "[log][4]: document missing",
        "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
        "shard": "2",
        "index": "test"
      }
    ],
    "type": "document_missing_exception",
    "reason": "[log][4]: document missing",
    "index_uuid": "hwkWXs3KTBWJHad-AbuynQ",
    "shard": "2",
    "index": "test"
  },
  "status": 404
}

POST /test/log/3/_update
{
  "doc": {
    "name": "mjlf"
  },
  "upsert": {
    "counter": 1
  }
}

//同时可以使用如下方式
POST /test/log/4/_update
{
  "doc": {
    "name": "new_name"
  },
  "doc_as_upsert": true
}

The update operation supports the following query-string parameters:

retry_on_conflictIn between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. The retry_on_conflict parameter controls how many times to retry the update before finally throwing an exception.
routingRouting is used to route the update request to the right shard and sets the routing for the upsert request if the document being updated doesn’t exist. Can’t be used to update the routing of an existing document.
timeoutTimeout waiting for a shard to become available.
wait_for_active_shardsThe number of shard copies required to be active before proceeding with the update operation. See here for details.
refreshControl when the changes made by this request are visible to search. See ?refresh.
_sourceAllows to control if and how the updated source should be returned in the response. By default the updated source is not returned. See source filtering for details.
versionThe update API uses the Elasticsearch’s versioning support internally to make sure the document doesn’t change during the update. You can use the versionparameter to specify that the document should only be updated if its version matches the one specified.
11.Update_By_Query

elasticsearch支持使用查询匹配的方式进行文档更新,如下

POST test/log/_update_by_query
{
  "script": {
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}

更多详细信息查看官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

12.Multi Get API

MultiGET API允许基于索引、类型(可选)和id(可能还包括路由)获取多个文档。响应包括一个docs数组,其中包含与原始多个GET请求对应的所有获取文档(如果某个GET失败,则在响应中包含一个包含此错误的对象)。成功GET的结构在结构上类似于GET API提供的文档。

//所有index中查找
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}

//指定index中查找
GET /test/_mget
{
    "docs" : [
        {
            "_type" : "_doc",
            "_id" : "1"
        },
        {
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}

//指定index 和 type中查找
GET /test/type/_mget
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}

//换一种方式
GET /test/type/_mget
{
    "ids" : ["1", "2"]
}

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "_source" : false//不显示_source信息
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2",
            "_source" : ["field3", "field4"]//过滤只显示指定的field
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "3",
            "_source" : {
                "include": ["user"],//指定包含field
                "exclude": ["user.location"]//指定_source中不包含的field
            }
        }
    ]
}

可以指定每个要获取的文档检索特定的存储字段,类似于getAPI的存储_field参数。例如:
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "stored_fields" : ["field1", "field2"]
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2",
            "stored_fields" : ["field3", "field4"]
        }
    ]
}

//Routing
GET /_mget?routing=key1
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "routing" : "key2"
        },
        {
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "2"
        }
    ]
}
13.bulk API

bulk像是一个批量操作接口,可以同时顺序执行多个操作,详细描述查看官方文档:

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

14.Reindex API

这个API大多时候用来复制一个index中的闻到到一个新的index中,像下边这样:

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}
//返回结果
{
  "took" : 147,
  "timed_out": false,
  "created": 120,
  "updated": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "total": 120,
  "failures" : [ ]
}

通过上边的实例可以复制一个index的文档到另一个index中,但是这样新index中每个文档的index默认是从1开始, 并非和原来文档一样,我们可以通过在dest中使用version_type来控制文档的version, 当使用才是为”version_type”:”internal”,使用的是新index内容的version,当参数为”version_type”:”external”时,使用的是原index的version,如:

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "internal"
  }
}//使用新index中的version

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "external"
  }
}//使用原来文档

使用参数op_type:create时,只会在新index中创建没有的文档,如果在源index和新index中同时存在相同的文档会引起冲突,如:

POST /_reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test1",
    "op_type": "create"
  }
}

{
  "took": 8,
  "timed_out": false,
  "total": 3,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 3,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "test1",
      "type": "log",
      "id": "2",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][2]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "2",
        "index": "test1"
      },
      "status": 409
    },
    {
      "index": "test1",
      "type": "log",
      "id": "3",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][3]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "4",
        "index": "test1"
      },
      "status": 409
    },
    {
      "index": "test1",
      "type": "log",
      "id": "4",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[log][4]: version conflict, document already exists (current version [8])",
        "index_uuid": "u_eShpBpREiFCytW6hL7jA",
        "shard": "2",
        "index": "test1"
      },
      "status": 409
    }
  ]
}

版本冲突会是reindex API结束运行, 但是可以通过conflicts:proceed使其继续

POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "op_type": "create"
  }
}

也可以通过查询的方式限制需要复制的源index文档

POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "_doc",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

//多源操作
POST _reindex
{
  "source": {
    "index": ["twitter", "blog"],
    "type": ["_doc", "post"]
  },
  "dest": {
    "index": "all_together"
  }
}

//限制复制到新index中文档的数量
POST _reindex
{
  "size": 1,
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

//限制复制数量,使用date降序优先复制
POST _reindex
{
  "size": 10000,
  "source": {
    "index": "twitter",
    "sort": { "date": "desc" }
  },
  "dest": {
    "index": "new_twitter"
  }
}

//限制源index运行被复制的字段
POST _reindex
{
  "source": {
    "index": "twitter",
    "_source": ["user", "_doc"]
  },
  "dest": {
    "index": "new_twitter"
  }
}

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "external"
  },
  "script": {
    "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
    "lang": "painless"
  }
}
//执行复制后可以对源数据进行操作

//远程复制
POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

更多详细:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值