es terms精准匹配，匹配不到数据的解决方案

最新推荐文章于 2024-06-26 22:15:00 发布

功夫猫熊yeah

最新推荐文章于 2024-06-26 22:15:00 发布

阅读量7.7k

点赞数

本文链接：https://blog.csdn.net/weixin_39031707/article/details/90757547

版权

在创建了es索引

PUT /t_dmp_idfa_bundle_country_array_tbl
{
  "settings":{
    "number_of_shards":8,
    "number_of_replicas":0
  },
  "mappings":{
      "properties":{
        "appuid":{"type":"text"},
      "bundles":{"type":"text"},
      "countrys":{"type":"text"}
      }
    
  }
}

并录入数据以后：
利用spark录入3Billion的数据，大概花费了7h时间，可以用waterdrop 来录入，也可以用spark的saveRdd来录入，后面会更新如何用saveRdd来录入hive数据到es中，
录入完成后，想实现如下sql的功能：

select  appuid from t_dmp_idfa_bundle_country_array_tbl where bundles in(xxx,xxx) and country in(xxx,xxx)//
//或者
select appuid from  t_dmp_idfa_bundle_country_array_tbl where bundles in(xxx,xxx)

使用DSL的terms语句来进行精准匹配来做

GET /t_dmp_idfa_bundle_country_array_tbl_1/_search
{
  "query": {
    "constant_score" : {
      "filter": {
        "terms": {
          "bundles": [
          "1241932094", 
          "1019161597",
          "1315940085", 
          "1191247231", 
          "1112615959", 
          "1280272085"]
        }
      }
    }
  }
}

可以实现

select appuid from  t_dmp_idfa_bundle_country_array_tbl where bundles in(xxx,xxx)

按照道理说如果要实现

select  appuid from t_dmp_idfa_bundle_country_array_tbl where bundles in(xxx,xxx) and country in(xxx,xxx)//

DSL应该如下所示：

GET /t_dmp_idfa_bundle_country_array_tbl_1/_search
{
  "query": {
    "bool": {
      "must": [
          {"terms": {
        "bundles": [
        "1241932094", 
        "1019161597",
        "1131137981",
        "866437441", 
        "900353132", 
        "1366215798", 
        "1339238576", 
        "533079551",  
        "1280272085"]
      }
        },
        
        {"terms": {
        "countrys": [
        "CHN", 
        "USA"]
      }
        }
    ]
    }
    
  }
}

但是无论如何查询都不行，查询不到数据：原因如下：
在这里插入图片描述

terms只适合not anynized的数据， text数据的countrys可能被分词了，所以查不到，

所以需要修改索引，可以reindex（由于reindex的时间太长，加上我还在测试阶段，所以直接将mapping删除了，重新入）
新的mapping为：

PUT /t_dmp_idfa_bundle_country_array_tbl_1
{
  "settings":{
    "number_of_shards":8,
    "number_of_replicas":0
  },
  "mappings":{
      "properties":{
        "appuid":{"type":"keyword"},
      "bundles":{"type":"keyword"},
      "countrys":{"type":"keyword"}
      }
    
  }
}

将字段都设计成keyword，就不会对字段进行分词了，就可以利用terms进行精准匹配了

如果是6.x一下版本可以如下操作，修改index（未尝试，可以自行试错，如果有错，可以留言）

PUT /t_dmp_idfa_bundle_country_array_tbl_1
{
  "settings":{
    "number_of_shards":8,
    "number_of_replicas":0
  },
  "mappings":{
      "properties":{
        "appuid":{"type":"string","index": "not_analyzed"},
      "bundles":{"type":"string","index": "not_analyzed"},
      "countrys":{"type":"string","index": "not_analyzed"}
      }
    
  }
}

将字段都设计成keyword，elasticsearch就不设置分词了
然后再录入数据，就可以利用前面的DSL，完成相应需求了