Elasticsearch：inverted index，doc_values 及 source

最新推荐文章于 2023-10-09 16:09:43 发布

留不住斜阳

最新推荐文章于 2023-10-09 16:09:43 发布

阅读量304

点赞数

分类专栏： Elasticsearch

本文链接：https://blog.csdn.net/lubin2016/article/details/118722032

版权

Elasticsearch 专栏收录该内容

11 篇文章 1 订阅

订阅专栏

inverted index

如果不想为字段建立inverted index，可以通过mapping对user进行如下设置

"user": {
  "type": "object",
  "enabled": false
}

这个字段将不被建立索引，同时也不会建立 doc values。这个字段将不能被用于搜索和做聚合。如果使用这个字段进行搜索的话，不会产生任何的结果。

如果对这个文档进行查询，会查询到信息

GET twitter/_doc/1

显然 user 的信息是存放于 source 里的，只是它不被我们所搜索而已。

如果不想整个文档被搜索，可以直接采用如下的方法

DELETE twitter
 
PUT twitter 
{
  "mappings": {
    "enabled": false 
  }
}

那么整个 twitter 索引将不建立任何的 inverted index

如果禁止对一个字段进行查询，可以用如下方式

{
  "mappings": {
    "properties": {
      "http_version": {
        "type": "keyword",
        "index": false
      }
     ...
    }
  }
}

上面的设置使得 http_version 不被索引。我们不能对 http_version 字段进行搜索，从而节省磁盘空间，但是它并不妨碍我们对该字段进行 aggregation 及对 source 的访问。

source

_source 里可以看到 Elasticsearch 存储了所有的字段。如果我们不想存储任何的字段，那么我们可以做如下的设置：

DELETE twitter
 
PUT twitter
{
  "mappings": {
    "_source": {
      "enabled": false
    }
  }
}

那么同样地，查询一下这个文档

GET twitter/_doc/1

文档可以查找到，但是没有返回source，文档也可以被搜索，它有完好的 inverted index 供我们查询，虽然它没有source。

如何有选择地进行存储想要的字段呢？这种情况适用于节省存储空间，只存储需要的字段到source里去。可以做如下的设置：

DELETE twitter
 
PUT twitter
{
  "mappings": {
    "_source": {
      "includes": [
        "*.lat",
        "address",
        "name.*"
      ],
      "excludes": [
        "name.surname"
      ]
    }    
  }
}

使用 include 来包含我们想要的字段，同时我们通过 exclude 来去除那些不需要的字段

在实际的使用中，查询文档时，也可以有选择地进行显示我们想要的字段，尽管有很多的字段被存于source中：

GET twitter/_doc/1?_source=name,location

doc_values

doc_values简介

绝大多数的field在默认情况下是indexed，因此字段数据是可被搜索的。倒排索引中按照一定顺序存放着terms（terms中包含很多term）供搜索，当命中搜索时，返回包含term的document。

倒排索引
在这里插入图片描述

doc_values 本质上是一个序列化的列式存储，这个结构非常适用于聚合（aggregations）、排序（Sorting）、脚本（scripts access to field）等操作。而且，这种存储方式也非常便于压缩，特别是数字类型。这样可以减少磁盘空间并且提高访问速度。

doc values 为on-disk 数据结构，在document索引时被创建。doc values 存放的values和 _source这个meta-Fields是一致的。支持除了analyzed string 以外的所有类型。

doc_values特性

doc_value 默认情况下是enable的。

disable doc_value设置如下

PUT /lib3
{
    "settings": {
        "number_of_shards": 3,
        "number_of_replicas": 0
    },
    "mappings": {
        "user": {
            "properties": {
                "name": {
                    "type": "text"
                },
                "address": {
                    "type": "text"
                },
                "age": {
                    "type": "integer",
                    "doc_value": false
                },
                "interests": {
                    "type": "text"
                },
                "birthday": {
                    "type": "date"
                }
            }
        }
    }
}

DocValue其实是Lucene在构建倒排索引时，会额外建立一个有序的正排索引（基于document => field value的映射列表）。

{“birthday”：“1985-11-11”,age:23}
{“birthday”：“1989-11-11”,age:29}

在这里插入图片描述

注意： 默认对不分词的字段是开启的，对分词字段无效（需要把fielddata设置为true）

disable doc_value影响

消极影响： sort、aggregate、access the field from script将会无法使用

积极影响： 节省磁盘空间

留不住斜阳

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录