问题
根据ID获取不到文档, 报错
GET index/_doc/2_900002151162=I1B8PIUB1-66M04493WI71zDLXZliUSgmR9S9eVMLh2/NK3FcIhRi4yf8VU=
注: id是字符串经过处理而来
提示信息
{
"error": "no handler found for uri [/card_record/_doc/2_900002151162=I1B8PIUB1-66M04493WI71zDLXZliUSgmR9S9eVMLh2/NK3FcIhRi4yf8VU=?pretty] and method [GET]"
}
问题复现
使用index接口索引文档
IndexRequest request = new IndexRequest();
request.index("index-test2");
request.id("1+1");
Map<String, Object> doc = new HashMap<>();
doc.put("doc", 1);
doc.put("type","index");
request.source(doc);
try {
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println("response = " + JSONObject.toJSONString(response));
} catch (IOException e) {
e.printStackTrace();
}
使用bulk索引文档
IndexRequest request = new IndexRequest();
request.index("index-test2");
request.id("1+1");
Map<String, Object> doc = new HashMap<>();
doc.put("doc", 2);
doc.put("type","bulk");
request.source(doc);
BulkRequest bulkRequest=new BulkRequest();
bulkRequest.add(request);
try {
BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println("response = " + JSONObject.toJSONString(response));
} catch (IOException e) {
e.printStackTrace();
}
kibana上查看
POST index-test2/_search
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index-test2",
"_type" : "_doc",
"_id" : "1 1",
"_score" : 1.0,
"_source" : {
"doc" : 1,
"type" : "index"
}
},
{
"_index" : "index-test2",
"_type" : "_doc",
"_id" : "1+1",
"_score" : 1.0,
"_source" : {
"doc" : 2,
"type" : "bulk"
}
}
]
}
}
原因分析
- 类似于浏览器请求, ES服务端接收请求时, 会对浏览器参数做decode操作, 这个是与ES无关;
- 可以想到解决办法是, 调用index, get, exists等客户端方法时, 先对 _id 调用URLEncode方法;
- 查看客户端源码, 发现上述方案行不通, 因为ES内部会调用endpoint方法; 且, 这个方法与URLEncode是不同的;
static String endpoint(String index, String type, String id, String endpoint) {
return new EndpointBuilder().addPathPart(index, type, id).addPathPartAsIs(endpoint).build();
}
- 实际试验,
- 做URLEncode之后, 可能含有%, endpoint方法会对%作转义, 也就是说, "1+1"会被转成: "1%252B1"
- 服务端接收后, 会去处理 _id 为 "1%2B1"的文档;
{
"_index" : "index-test2",
"_type" : "_doc",
"_id" : "1%2B1",
"_score" : 1.0,
"_source" : {
"doc" : 1,
"type" : "index"
}
}
- 通过bulk接口索引, 不会出现浏览器转义问题;
总结,
- 含有特殊字符的 _id, index, bulk索引后, _id不一致;
- 含有特殊字符的 _id, bulk索引后, 使用get等方法可能查询不到;
- 同理, 若是按id更新, 则可能会更新失败; 使用index作更新时, 可能会插入一条新数据;
解决方案
- 手动生成ID时, 避免使用特殊字符;
- 从入口处, 统一生成 _id;
- 从ES内部查询出来后, 不需要再做特殊处理;