elasticsearch 数据 id 是唯一的吗?
es 的数据 id,在同一个分片中,数据 id 是唯一,在不同的分片中,数据的 id 是可以出现重复的。当添加数据的时候同一个数据 id ,在不同的分片中存放,是可以的。这种情况,如果查询的时候不指定routing ,就会查询所有的分片,把数据查询出来,就会出来重复数据 。
创建索引 | 添加数据 |
PUT worker-001 { "settings": { "number_of_replicas": 2, "number_of_shards": 3 }, "mappings": { "properties": { "id": { "type": "keyword" }, "name": { "type": "text", "fields": { "key":{ "type":"keyword" } } }, "age":{ "type": "integer" }, "workAge":{ "type": "short" }, "companyId":{ "type":"keyword" }, "companyName":{ "type": "text", "fields": { "key": { "type": "keyword" } } } } } } | PUT worker-001/_doc/8?routing=a PUT worker-001/_doc/8?routing=b PUT worker-001/_doc/8?routing=2 |
在添加数据的时候,指定了 routing,数据都是可以添加成功的。
查询的时候不指定routing
GET worker-001/_search | { "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "worker-001", "_type" : "_doc", "_id" : "8", "_score" : 1.0, "_routing" : "a", "_source" : { "id" : 8, "name" : "worker-8", "companyId" : 1, "companyName" : "baidu" } }, { "_index" : "worker-001", "_type" : "_doc", "_id" : "8", "_score" : 1.0, "_routing" : "2", "_source" : { "id" : 8, "name" : "worker-8", "companyId" : 1, "companyName" : "baidu" } }, { "_index" : "worker-001", "_type" : "_doc", "_id" : "8", "_score" : 1.0, "_routing" : "b", "_source" : { "id" : 8, "name" : "worker-8", "companyId" : 1, "companyName" : "baidu" } } ] } } |
此时,结果集中是出现三条重复的数据。
如果我们查询的时候指定 routing ,则不会出现重复的数据。
GET worker-001/_search?routing=a | { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "worker-001", "_type" : "_doc", "_id" : "8", "_score" : 1.0, "_routing" : "a", "_source" : { "id" : 8, "name" : "worker-8", "companyId" : 1, "companyName" : "baidu" } } ] } } |