ElasticSearch——setting部分不存储某个字段

最新推荐文章于 2024-08-22 22:48:15 发布

温水的小青蛙2023

最新推荐文章于 2024-08-22 22:48:15 发布

阅读量8.1k

点赞数 2

分类专栏： Elasticsearch 搜索引擎 ElasticSearch从0到1

本文链接：https://blog.csdn.net/shiyaru1314/article/details/50803685

版权

Elasticsearch 同时被 3 个专栏收录

53 篇文章 1 订阅

订阅专栏

ElasticSearch从0到1

52 篇文章 25 订阅

订阅专栏

搜索引擎

18 篇文章 0 订阅

订阅专栏

方案1

setting中的通用属性Store

该属性的取值可以为yes和no，用于指定字段的原始属性是否存入索引。默认值是no.意味着不能在结果中返回字段的原始值（即使没有存储原始值，也可以使用Soure字段返回原始值）。如果已经建立索引可以搜索该字段的内容。

ElasticSearch默认是存储整个文件的，如果要改变这种情形的话可以设置：“_source”:{"enable":"false"}

PUT ibase/_mapping/ptcontext
{
  "_source": {
    "enabled": false
  },
  "_all": {
    "enabled": true,
    "store": false,
    "analyzer": "biGrams"
  },
  "properties": {
    "Id": {
      "type": "string",
      "include_in_all": false,
      "index": "not_analyzed"
    },
    "PT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BP": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BF": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BOT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UBP": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UPT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    }
  }
}

搜索返回字段的话，就不能返回，因为每个字段默认是不存储的。

使用_Source来限定返回的字段，更是会报错，因为整个Source是不存储的

关闭这个开关之后就可以自定义那个字段需要存储那个不需要存储

如下：

PUT ibase/_mapping/ptcontext
{
  "_source": {
    "enabled": false
  },
  "_all": {
    "enabled": true,
    "store": false,
    "analyzer": "biGrams"
  },
  "properties": {
    "Id": {
      "type": "string",
      "include_in_all": false,
      "index": "not_analyzed"
    },
    "PT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BP": {
      "type": "string",
      "index": "not_analyzed",
      "store": true,              //设置为存储
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BF": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BOT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UBP": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UPT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    }
  }
}

搜索结果条件，返回特定字段必须用 fields

GET jurassic_ibase/ptcontext/_search
{
  "fields": [ "PT", "BP" ] ,
  "size": 2
}

返回结果中只返回BP而没有返回PT，因为PT字段虽然索引，但是整个字段的值是没有存储的

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4278,
      "max_score": 1,
      "hits": [
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNA6S2uH6PqkrPoQOu_",
            "_score": 1,
            "fields": {
               "BP": [
                  "科研/目标评价/单井评价/单井基础地质条件/井区区域沉积背景分析"
               ]
            }
         },
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNA6S2zH6PqkrPoQOvA",
            "_score": 1,
            "fields": {
               "BP": [
                  "科研/方案设计/完井方案设计/固井评价级评估/施工作业设计"
               ]
            }
         }
      ]
   }
}

于是最终结论一下：

默认整个json文档是存储的
默认每个字段是不存储的

so，如果需要定义某些字段存储，某些不存储，需要关闭开关1，针对不同字段设置开关2

方案 2

但是经过总结使用上面的方式太过繁琐，每次返回的字段都需要使用 fields 罗列所有的返回字段，而且还没有 exclude 或者是 include 的方法

而且需要存储的，还需要一个字段一个字段的设置存储为true，然而我的需求中只有一个字段很大不需要存储

寻寻觅觅，这里有另一种方法，还是_source好用！

使用如下，关键前三行

PUT ibase/_mapping/ptcontext
{
   "_source": {
    "excludes": [ "PP"]
  },
  "_all": {
    "enabled": true,
    "store": false,
    "analyzer": "biGrams"
  },
  "properties": {
    "Id": {
      "type": "string",
      "include_in_all": false,
      "index": "not_analyzed"
    },
    "PT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "PP": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BF": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "BOT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UBP": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    },
    "UPT": {
      "type": "string",
      "index": "not_analyzed",
      "fields": {
        "f_path": {
          "type": "string",
          "analyzer": "path_divide"
        }
      }
    }
  }
}

然后查询：使用fields限定和使用source限定返回的字段

GET jurassic_ibase/ptcontext/_search
{
  "fields": ["PP","PC"],
   "size":2
}

GET jurassic_ibase/ptcontext/_search
{
  "_source": ["BP","PT"],
   "size":2
}

返回结果中不包含PP：

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4278,
      "max_score": 1,
      "hits": [
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNBPbz8H6PqkrPoQSzY",
            "_score": 1,
            "fields": {
               "PT": [
                  "控制储量面积分布图"
               ]
            }
         },
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNBPb0CH6PqkrPoQSzZ",
            "_score": 1,
            "fields": {
               "PT": [
                  "/反演剖面图1"
               ]
            }
         }
      ]
   }
}

3.返回全部字段，看是否包含不存储字段

GET jurassic_ibase/ptcontext/_search
{
 "query": {
    "match_all": {}
  },
   "size":2
}

返回结果

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4278,
      "max_score": 1,
      "hits": [
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNBPbz8H6PqkrPoQSzY",
            "_score": 1,
            "_source": {
               "UBP": "",
               "PT": "控制储量面积分布图",
               "BF": "",
               "BOT": "",
               "UPT": "",
               "Id": "7b5a401b-2fe9-41c7-ba24-a4e646e86be7"
            }
         },
         {
            "_index": "ibase",
            "_type": "ptcontext",
            "_id": "AVNBPb0CH6PqkrPoQSzZ",
            "_score": 1,
            "_source": {
               "UBP": "",
               "PT": "/反演剖面图1",
               "BF": "",
               "BOT": "油气藏",
               "UPT": "",
               "Id": "94303507-d67c-43a2-8d9c-a4eb9118dd90"
            }
         }
      ]
   }
}

成功去除不存储字段，同时也做到不存储该字段，这个方法所是最好的解决方案了！