背景:现有一个es查询,不得已情况下在es中做了父子关联,但es是一个搜索框架,不适合逻辑运算,所以父子查询以及nested嵌套查询都特别慢。在实际场景下,有一个3亿数据量的es索引里面还包含了一部分子查询,每次调用子查询的时候都会超时,所以将这块从父子关系改为动态标模式的查询关联,来提高查询速度。
比如我现在的关系是这样的:
门店商品和上架渠道是1对多的关系。在一个门店商品索引里商品信息存在门店索引维度,渠道信息存在渠道维度(如:当前渠道上架时间)
如上述情况采用动态mapping的结构来实现这样的关系:
1.建立一个store_es的索引,然后静态mapping中使用一个storeSkuStatus的对象(本质是一个list),storeSkuStatus对象中包含了一个个渠道对象platform(这个platform可以用业务实际枚举name来代替)
Put /store_test_es
{
"mappings": {
"dynamic": false,
"properties": {
"id": {
"type": "long"
},
"storeId": {
"type": "long"
},
"storeSkuStatus": {
"dynamic": "true",
"properties": {
"platform": {
"dynamic": "true",
"properties": {
"start": {
"type": "integer"
}
}
}
}
}
}
}
}
然后通过这样的dsl向里面塞入这样的测试数据:
PUT /store_test_es/_doc/5
{
"id":911,
"storeId":888,
"storeSkuStatus":{
"baidu":{
"start":300
},
"alibaba":{
"start":300
},
"tx":{
"start":100
}
}
}
塞入几条数据后,查询
GET /store_es/_doc/_search
结果为:
{
"took" : 898,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 111,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 3,
"end" : 5
}
}
}
},
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 1121,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 30
},
"tx" : {
"start" : 30
}
}
}
},
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"id" : 911,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 100
},
"tx" : {
"start" : 100
},
"alibaba" : {
"start" : 100
}
}
}
},
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"id" : 112199,
"storeId" : 222
}
},
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"id" : 9199891,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 50
},
"tx" : {
"start" : 50
},
"alibaba" : {
"start" : 100
}
}
}
}
]
}
}
一共五条数据,都是边界值的情况
然后测试一下这样的查询语句
storeId=222 and start >=80 and start <=100 and platform = baidu
转成dsl语言的话是这样的:
POST /store_test_es/_doc/_search
{
"query":{
"bool":{
"must":[{
"term":{
"storeId":222
}},{
"range":{
"storeSkuStatus.baidu.start":{
"gte":80,
"lte":120
}
}
}]
}
}
}
查询结果为:只有id是3的满足这个条件
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"id" : 911,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 100
},
"tx" : {
"start" : 100
},
"alibaba" : {
"start" : 100
}
}
}
}
]
}
}
再测试一种场景:
storeId=222 and start >=80 and start<=120
dsl语句如下:
POST /store_test_es/_doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"storeId": 222
}
},
{
"bool": {
"should": [
{
"range": {
"storeSkuStatus.baidu.start": {
"gte": 80,
"lte": 120
}
}
},
{
"range": {
"storeSkuStatus.tx.start": {
"gte": 80,
"lte": 120
}
}
},
{
"range": {
"storeSkuStatus.alibaba.start": {
"gte": 80,
"lte": 120
}
}
}
]
}
}
]
}
}
}
结果为:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 4.0,
"hits" : [
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "3",
"_score" : 4.0,
"_source" : {
"id" : 911,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 100
},
"tx" : {
"start" : 100
},
"alibaba" : {
"start" : 100
}
}
}
},
{
"_index" : "store_test_es",
"_type" : "_doc",
"_id" : "5",
"_score" : 2.0,
"_source" : {
"id" : 9199891,
"storeId" : 222,
"storeSkuStatus" : {
"baidu" : {
"start" : 50
},
"tx" : {
"start" : 50
},
"alibaba" : {
"start" : 100
}
}
}
}
]
}
}
最后来一个不需要一对多关系的查询
POST /store_test_es/_doc/_search
{
"query":{
"bool":{
"must":[{
"term":{
"storeId":222
}
}]
}
}
}
结果会将所有结果查询出来