Elastic：Data tiers 介绍及索引生命周期管理 - 7.10 之后版本

Elastic 中国社区官方博客

于 2020-12-14 19:03:45 发布

阅读量2.9k

点赞数 2

分类专栏： Elastic 文章标签： elasticsearch 大数据

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/UbuntuTouch/article/details/111150474

版权

Elastic 专栏收录该内容

1496 篇文章 903 订阅

订阅专栏

Data tier 也就是数据层。是一个在 7.10 版本的一个新概念。数据层是具有相同数据角色的节点的集合，这些节点通常共享相同的硬件配置文件：

Content tier （内容层）节点处理诸如产品目录之类的内容的索引和查询负载。
Hot tier （热层）节点处理诸如日志或指标之类的时间序列数据的索引负载，并保存你最近，最常访问的数据。
Warm tier （温层）节点保存的时间序列数据访问频率较低，并且很少需要更新。
Cold tier （冷层）节点保存时间序列数据，这些数据偶尔会被访问，并且通常不会更新。

当你将文档直接索引到特定索引时，它们会无限期地保留在 content tier 节点上。

当你将文档索引到数据流（data stream）时，它们最初位于 hot tier 节点上。你可以配置索引生命周期管理（ILM）策略，以根据性能，韧性和数据保留要求自动通过热，温和冷层转换时间序列数据。

节点的数据角色是在 elasticsearch.yml 中配置的。例如，可以将群集中性能最高的节点分配给热层和内容层：

node.roles: ["data_hot", "data_content"]

关于 node.roles 的介绍请参阅我的另外一篇文章 “Elasticsearch：Node 介绍 - 7.9 之后版本”。

Cnotent tier

存储在 content tier 中的数据通常是一些项的集合，例如产品目录或商品档案。与时间序列数据不同，内容的价值在一段时间内保持相对恒定，因此，随着时间的流逝，将其转移到具有不同性能特征的层中是没有意义的。内容数据通常具有很长的数据保留要求，并且您你望能够快速检索项目，无论它们有多旧。

Content tier 节点通常针对查询性能进行了优化-它们将处理能力置于 IO 吞吐量之上，因此它们可以处理复杂的搜索和聚合并快速返回结果。尽管它们还负责索引编制，但通常不会以与时间序列数据（例如日志和指标）一样高的速率摄取内容数据。从韧性角度来看，该层中的索引应配置为使用一个或多个副本。

除非新索引是数据流的一部分，否则它们会自动分配给 content tier。

Hot tier

Hot tier，也即热层是时间序列数据的 Elasticsearch 入口点，并保存你最近，最频繁搜索的时间序列数据。热层中的节点在读取和写入时都需要快速，这需要更多的硬件资源和更快的存储（SSD）。为了实现韧性，应将热层中的索引配置为使用一个或多个副本。

属于数据流（data stream）的新索引会自动分配给热层。关于数据流，你也可以参阅我之前的文章 “Data stream 在索引生命周期管理中的应用”。

Warm tier

一旦查询时间序列数据的频率低于 hot tier 中最近索引的数据，便可以将其移至 warm tier。 warm tier 通常保存最近几周的数据。仍然允许进行更新，但可能很少。通常，warm tier 中的节点不需要像 hot tier 中的节点一样快。为了实现韧性，应将热层中的索引配置为使用一个或多个副本。

Cold tier

一旦数据不再被更新，它就可以从 warm tier 移到 cold tier，并在余下的时间内保留下来。 cold tier 仍然是响应查询层，但是 cold tier 中的数据通常不会更新。随着数据过渡到cold tier，可以对其进行压缩和。为了具有韧性， cold tier 中的索引可以依赖可搜索的快照，从而无需副本。

Data tier 索引分配

创建索引时，默认情况下，Elasticsearch 将 index.routing.allocation.include._tier_preference 设置为 data_content，以将索引分片自动分配给内容层。

当 Elasticsearch 创建索引作为数据流 (data stream) 的一部分时，默认情况下，Elasticsearch 将 index.routing.allocation.include._tier_preference 设置为 data_hot，以自动将索引分片分配给热层。

你可以通过在创建索引请求或与新索引匹配的索引模板中指定分片分配过滤设置，来覆盖基于层的自动分配。

你还可以显式设置 index.routing.allocation.include._tier_preference 以选择退出默认的基于层的分配。如果将层首选项设置为 null，则 Elasticsearch 在分配期间将忽略数据层角色。

数据层在索引生命周期管理中的应用

在今天的练习中，我们将参照之前的文章 “Data stream 在索引生命周期管理中的应用” 来重新实现索引生命周期管理。

安装 Elastic Stack

我们可以参考文章 “Elasticsearch：运用shard filtering来控制索引分配给哪个节点” 运行起来两个 node 的 cluster。其实非常简单，当我们安装好 Elasticsearch 后，打开一个 terminal，并在 Elasticsearch 的安装根目录下运行如下的命令：

./bin/elasticsearch -E node.name=node1 -E node.roles=data_hot,data_content,master,ingest -Enode.max_local_storage_nodes=2

在上面，我们运行了一个节点。关于 node.roles，请参阅我之前的文章 “Elasticsearch：Node 介绍 - 7.9 之后版本”。这个节点的角色是一个 data_hot，master 以及 ingest。

我们启动另外一个 terminal，并在 Elasticsearch 的安装根目录下运行如下的命令：

./bin/elasticsearch -E node.name=node2 -E node.roles=data_warm,data_content,master,ingest -Enode.max_local_storage_nodes=2

这样我们就运行起来两个节点的 Elasticsearch 集群。我们可以在 Kibana 中进行查看：

GET _cat/nodes?v

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           17         100   5    2.16                  him       *      node1
127.0.0.1           28         100   6    2.16                  imw       -      node1

上面显示有两个节点。它们的 node.role 分别是 him，意思是 data_hot, ingest 及 master，而另外一个节点是 imw，意思是 ingest，master 以及 data_warm。

这样，我们就创建好了我们的集群：

创建一个 data stream

在接下来的动手实践中，我们的操作非常简单。为了利用 ILM 来自动化 rollover 以及管理时序索引，我们作如下新的步骤：

创建一个 Lifecycle Policy
创建一个运用于一个 data stream 的 index template
创建一个 data stream
发送数据到索引，并验证索引经历 Lifecycle 的阶段

如上所示，一个典型的 ILM 通常有4个阶段：Hot, Warm, Cold, 以及 Delete。针对自己的业务需求，你可以分别启动相应的阶段。针对如下的练习，我们将省略掉 Cold 阶段。

创建 Index Lifecycle Policy

在上面，我把 Policy 的名字取做 demo。同时我也对 Hot phase 的 rollover 的条件进行了定义。当它满足如下的任何一个条件：

索引的大小大于 1G
文档的数量大于 5
索引的时间跨度超过30天

那么索引将自动进行 rollover 到另外一个索引。

我们接着定义 Warm phase:

在上面，我们启动 Warm phase。在这个 phase 里，数据将保存于 node.roles 为 data_warm 的节点上。由于我们只有一个 warm 节点，在本练习中，我将 number of replica 设置为 0。在实际的使用中，有更多的 replica 代表有更多的 read 能力。这个可以根据自己的业务需求和配置进行设置。我也同时启动了 Shrink index，也就是说它将在 warm phase 里把所有的 primary shard 压缩到一个。通常 primary shard 代表导入数据的能力。在 warm phase 中，我们通常不需要导入数据，我们只在 hot 节点中导入数据。

在上面，我们可以注意到的一点是和之前在 “Data stream 在索引生命周期管理中的应用” 不一样的地方。请对比之前我们使用 attribute 来定义 hot 及 warm 节点时的配置：

当我们使用 attribute 来定义 hot 及 warm 时，我们需要做如上的选择。在 7.10 之后的版本，我们建议使用 node.role 来定义 hot 及 warm 架构，尽管之前的 attribute 还可以工作。

我们接下来定义 Delete phase：

在上面，我们启动了 Delete phase。上面显示当我们的文档进入 Warm 阶段后，再过3分钟，这个索引将会被自动删除。点击上面的 Save as new policy 按钮。我们可以通过如下的 API 来获得被定义的 Policy:

GET _ilm/policy/demo

{
  "demo" : {
    "version" : 1,
    "modified_date" : "2020-12-14T08:51:34.347Z",
    "policy" : {
      "phases" : {
        "warm" : {
          "min_age" : "0ms",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 0,
              "include" : { },
              "exclude" : { },
              "require" : { }
            },
            "shrink" : {
              "number_of_shards" : 1
            },
            "set_priority" : {
              "priority" : 50
            }
          }
        },
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "1gb",
              "max_age" : "30d",
              "max_docs" : 5
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : {
          "min_age" : "3m",
          "actions" : { }
        }
      }
    }
  }
}

上面的显示是针对 7.9.1 版本的结果。针对 7.9.3 及 7.10 （目前最新的版本），有一些原因，为了使得 delete 能正常工作，我们需要使用如下的 API 来对之进行设置：

PUT _ilm/policy/demo
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "30d",
            "max_size": "1gb",
            "max_docs": 5
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "actions": {
          "allocate": {
            "number_of_replicas": 0,
            "include": {},
            "exclude": {},
            "require": {
              "data": "warm"
            }
          },
          "shrink": {
            "number_of_shards": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "delete": {
        "min_age": "3m",
        "actions": {
          "delete" : {}
        }
      }
    }
  }
}

请注意上面的 delete 部分。在 actions 里面含有 "delete": {}。

定义 index template

我们在 Kibana 的 console 中输入如下的命令：

# Create tje template to apply the policy to every new backing index of the data stream
PUT _index_template/template_demo
{
  "index_patterns": ["demo-*"],
  "data_stream": {},
  "priority": 200,
  "template": {
    "settings": {
      "number_of_shards": 2,
      "index.lifecycle.name": "demo",
      "index.routing.allocation.include._tier_preference": "data_hot"
    }
  }
}

请注意上面的 "index.routing.allocation.include._tier_preference": "data_hot"。在默认的情况下，它会自动地分配索引到 data_hot 里，所以我们可以省去上面的这个配置。在上面，我们创建了一个叫做 template_demo 的 index template。请注意，我们在上面定义 data_stream 为一个空的 object。我们定义了两个 primary shards。

创建一个 data stream

创建一个 data stream 是非常简单的：

# Create a data stream
PUT _data_stream/demo-ds

运行上面的命令。由于我们在上面已经创建了以 demo-* 为 index_pattern 的 index template，所以上面的创建是成功的。否则如果我们用如下的命令：

PUT _data_stream/demo

它将会是失败的。错误代码告诉你没有相对应的 index template。

# Check the shards allocation
GET _cat/shards/demo-ds?v

上面的命显示：

index              shard prirep state      docs store ip        node
.ds-demo-ds-000001 1     p      STARTED       0  208b 127.0.0.1 node1
.ds-demo-ds-000001 1     r      UNASSIGNED                      
.ds-demo-ds-000001 0     p      STARTED       0  208b 127.0.0.1 node1
.ds-demo-ds-000001 0     r      UNASSIGNED

由于我们分配了两个 primary shards，但是我们只有一个 hot 节点，所以在上面我们看到的是有两个没有被分分配的 replica shards。

我们可以通过如下的命令来检查 data stream 的索引：

# Verify data stream indexes
GET _data_stream/demo-ds

上面的命令显示：

{
  "data_streams" : [
    {
      "name" : "demo-ds",
      "timestamp_field" : {
        "name" : "@timestamp"
      },
      "indices" : [
        {
          "index_name" : ".ds-demo-ds-000001",
          "index_uuid" : "sPN5JEW8SVuTFNh4UqK9zw"
        }
      ],
      "generation" : 1,
      "status" : "YELLOW",
      "template" : "template_demo",
      "ilm_policy" : "demo"
    }
  ]
}

上面显示了一个叫做 .ds-demo-ds-000001 的索引已经被创建了。我们不需要做任何的事情。

我们也可以通过如下的命令来进行检查这个被创建的索引的设置：

GET .ds-demo-ds-000001/_settings

上面的命令显示：

{
  ".ds-demo-ds-000001" : {
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "demo"
        },
        "routing" : {
          "allocation" : {
            "require" : {
              "data" : "hot"
            }
          }
        },
        "hidden" : "true",
        "number_of_shards" : "2",
        "provided_name" : ".ds-demo-ds-000001",
        "creation_date" : "1606989315969",
        "priority" : "100",
        "number_of_replicas" : "1",
        "uuid" : "sPN5JEW8SVuTFNh4UqK9zw",
        "version" : {
          "created" : "7100099"
        }
      }
    }
  }

从上面，我们可以看出来，它位于 hot 节点上，同时 hidden 显示为 true，也就意味着它将被不能使用通配符表达式来进行返回。

发送数据到 data stream

我们接下来执行如下的命令：

PUT _ingest/pipeline/add-timestamp
{
  "processors": [
    {
      "set": {
        "field": "@timestamp",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

在上面的 add-timestamp pipeline 是为了加入一个被导入时的 timestamp。我们先执行这个 pipeline，然后执行下面的命令：

POST demo-ds/_doc?pipeline=add-timestamp
{
  "user": {
    "id": "liuxg"
  },
  "message": "This is so cool!"
}

我们可以看到如下的输出：

{
  "_index" : ".ds-demo-ds-000001",
  "_type" : "_doc",
  "_id" : "3JsVKHYBtwZVzHJGZXRc",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

它表明我们的数据是发送到 .ds-demo-ds-000001 这个索引中的。我们可以使用如下的命令来进行搜索：

# Search the data stream
GET demo-ds/_search

上面的命令显示的结果是：

{
  "took" : 351,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3JsVKHYBtwZVzHJGZXRc",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T10:10:59.541518Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      }
    ]
  }
}

上面显示有一个文档已经被搜索到了，并且它的索引的名称为 .ds-demo-ds-000001。

我们接着再执行如下的命令4次：

POST demo-ds/_doc?pipeline=add-timestamp
{
  "user": {
    "id": "liuxg"
  },
  "message": "This is so cool!"
}

我们将会看到和下面类似的输出：

{
  "_index" : ".ds-demo-ds-000001",
  "_type" : "_doc",
  "_id" : "4JueKHYBtwZVzHJGfnRv",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

到目前为止，我们已经生成了 5 个文档。它们都位于 .ds-demo-ds-000001 索引里。还记得我们之前在 ILM 中定义的 Policy 吗？当文档的个数超过5个的时候，它会自动 rollover, 并且会把之前的文档移到 warm 的节点上。我们接着执行如下的命令：

POST demo-ds/_doc?pipeline=add-timestamp
{
  "user": {
    "id": "liuxg"
  },
  "message": "This is so cool!"
}

这样总的文档个数变为6个了。我们可以使用如下的命来检查 ILM 的状态：

# Check ILM status per demo-ds data stream
GET demo-ds/_ilm/explain

上面的命令显示：

{
  "indices" : {
    ".ds-demo-ds-000001" : {
      "index" : ".ds-demo-ds-000001",
      "managed" : true,
      "policy" : "demo",
      "lifecycle_date_millis" : 1606989315969,
      "age" : "2.86h",
      "phase" : "hot",
      "phase_time_millis" : 1606989317057,
      "action" : "rollover",
      "action_time_millis" : 1606989760360,
      "step" : "check-rollover-ready",
      "step_time_millis" : 1606989760360,
      "phase_execution" : {
        "policy" : "demo",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "1gb",
              "max_age" : "30d",
              "max_docs" : 5
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1606988416663
      }
    }
  }
}

上面显示 action 为 rollover。我们稍等一段时间让 rollover 发生。这个时间是由如下的参数来决定的：

indices.lifecycle.poll_interval

我们可以在地址找到这个参数的设置。在默认的情况下，这个参数是10分钟的时间。我们需要等一段时间。我们也可以通过如下的命令来进行修改这个时间：

PUT _cluster/settings
{
    "transient": {
      "indices.lifecycle.poll_interval": "10s"
    }
}

上面表明 Elasticsearch 每隔10秒钟进行查询，并执行 ILM policy。我们可以再次执行如下的命令：

GET demo-ds/_ilm/explain

上面命令的输出显示：

{
  "indices" : {
    ".ds-demo-ds-000002" : {
      "index" : ".ds-demo-ds-000002",
      "managed" : true,
      "policy" : "demo",
      "lifecycle_date_millis" : 1606999959439,
      "age" : "5.22m",
      "phase" : "hot",
      "phase_time_millis" : 1606999961410,
      "action" : "rollover",
      "action_time_millis" : 1607000204426,
      "step" : "check-rollover-ready",
      "step_time_millis" : 1607000204426,
      "phase_execution" : {
        "policy" : "demo",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "1gb",
              "max_age" : "30d",
              "max_docs" : 5
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1606988416663
      }
    },
    ".ds-demo-ds-000001" : {
      "index" : ".ds-demo-ds-000001",
      "managed" : true,
      "policy" : "demo",
      "lifecycle_date_millis" : 1606999959458,
      "age" : "5.22m",
      "phase" : "warm",
      "phase_time_millis" : 1606999962564,
      "action" : "shrink",
      "action_time_millis" : 1607000207450,
      "step" : "shrink",
      "step_time_millis" : 1607000271877,
      "phase_execution" : {
        "policy" : "demo",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 0,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "warm"
              }
            },
            "shrink" : {
              "number_of_shards" : 1
            },
            "set_priority" : {
              "priority" : 50
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1606988416663
      }
    }
  }
}

有一个新增加的 Warm phase, 并伴有一个 shrink 的 action。

我们再次执行如下的指令：

GET demo-ds/_search

它显示：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3ZueKHYBtwZVzHJGdHRU",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:41.812609Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3pueKHYBtwZVzHJGd3Sd",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:42.653342Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4JueKHYBtwZVzHJGfnRv",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:44.399382Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4puiKHYBtwZVzHJGeHT3",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:45:05.143514Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "45unKHYBtwZVzHJGK3RB",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:50:12.929209Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3JsVKHYBtwZVzHJGZXRc",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T10:10:59.541518Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "35ueKHYBtwZVzHJGe3S4",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:43.704291Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4ZuhKHYBtwZVzHJGb3Si",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:43:57.217932Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      }
    ]
  }
}

从上面显示我们可以看出来，所有的8个文档都已经放置于 shrink-.ds-demo-ds-000001 这个索引中了。这个就是我们在 Warm phase 中所定义的那样，shrink 到一个 primary 索引中了。

我们再次执行如下的命令：

POST demo-ds/_doc?pipeline=add-timestamp
{
  "user": {
    "id": "liuxg"
  },
  "message": "This is so cool!"
}

上面的命令输出显示：

{
  "_index" : ".ds-demo-ds-000002",
  "_type" : "_doc",
  "_id" : "C5uzKHYBtwZVzHJGDHW6",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

从上面我们可以看出来：新的文档保存于一个叫做 .ds-demo-ds-000002 的索引当中。这是一个新的索引，和之前的 .ds-demo-ds-000001 是不一样的。

当我们执行：

# Search the data stream
GET demo-ds/_search

它将显示所有的9个文档：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 9,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3ZueKHYBtwZVzHJGdHRU",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:41.812609Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3pueKHYBtwZVzHJGd3Sd",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:42.653342Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4JueKHYBtwZVzHJGfnRv",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:44.399382Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4puiKHYBtwZVzHJGeHT3",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:45:05.143514Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "45unKHYBtwZVzHJGK3RB",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:50:12.929209Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "3JsVKHYBtwZVzHJGZXRc",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T10:10:59.541518Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "35ueKHYBtwZVzHJGe3S4",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:40:43.704291Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000001",
        "_type" : "_doc",
        "_id" : "4ZuhKHYBtwZVzHJGb3Si",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T12:43:57.217932Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : ".ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "C5uzKHYBtwZVzHJGDHW6",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T13:03:11.546240Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      }
    ]
  }
}

它们分别位于 shrink-.ds-demo-ds-000001 及 .ds-demo-ds-000002 索引中。

我们也可以通过如下的命令来查看 data stream 的当前索引信息：

GET _data_stream/demo-ds

{
  "data_streams" : [
    {
      "name" : "demo-ds",
      "timestamp_field" : {
        "name" : "@timestamp"
      },
      "indices" : [
        {
          "index_name" : "shrink-.ds-demo-ds-000001",
          "index_uuid" : "SMlpBdzdTSala5hMt0XmpQ"
        },
        {
          "index_name" : ".ds-demo-ds-000002",
          "index_uuid" : "1uZk3ug0SfmD-1UUgaeqDw"
        }
      ],
      "generation" : 2,
      "status" : "YELLOW",
      "template" : "template_demo",
      "ilm_policy" : "demo"
    }
  ]
}

上面显示有两个索引：shrink-.ds-demo-ds-000001 及 .ds-demo-ds-000002。

我们再等一会执行如下的命令：

GET demo-ds/_ilm/explain

上面的命令显示：

{
  "indices" : {
    "shrink-.ds-demo-ds-000001" : {
      "index" : "shrink-.ds-demo-ds-000001",
      "managed" : true,
      "policy" : "demo",
      "lifecycle_date_millis" : 1606999959458,
      "age" : "13.83m",
      "phase" : "delete",
      "phase_time_millis" : 1607000275318,
      "action" : "complete",
      "action_time_millis" : 1607000275157,
      "step" : "complete",
      "step_time_millis" : 1607000275318,
      "phase_execution" : {
        "policy" : "demo",
        "phase_definition" : {
          "min_age" : "3m",
          "actions" : { }
        },
        "version" : 1,
        "modified_date_in_millis" : 1606988416663
      }
    },
    ".ds-demo-ds-000002" : {
      "index" : ".ds-demo-ds-000002",
      "managed" : true,
      "policy" : "demo",
      "lifecycle_date_millis" : 1606999959439,
      "age" : "13.83m",
      "phase" : "hot",
      "phase_time_millis" : 1606999961410,
      "action" : "rollover",
      "action_time_millis" : 1607000204426,
      "step" : "check-rollover-ready",
      "step_time_millis" : 1607000204426,
      "phase_execution" : {
        "policy" : "demo",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "1gb",
              "max_age" : "30d",
              "max_docs" : 5
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1606988416663
      }
    }
  }
}

我们可以看到 shrink-.ds-demo-ds-000001 处于 delete 阶段。

这也是在我们之前的 ILM policy 中所定义的那样。当时间超过 3 分钟后会启动删除的动作。我们再接着执行如下的命令直到有11个以上的文档：

POST demo-ds/_doc?pipeline=add-timestamp
{
  "user": {
    "id": "liuxg"
  },
  "message": "This is so cool!"
}

我们再次使用如下的命令：

# Search the data stream
GET demo-ds/_search

我们可以看到如下的结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "PO34KHYB_Lwe3F0spf66",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:12.698241Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "Pe34KHYB_Lwe3F0sp_7i",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:13.249898Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "Ou34KHYB_Lwe3F0soP6E",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:11.364617Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "O-34KHYB_Lwe3F0so_54",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:12.120209Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "Pu34KHYB_Lwe3F0sqv4S",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:13.809864Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      },
      {
        "_index" : "shrink-.ds-demo-ds-000002",
        "_type" : "_doc",
        "_id" : "P-34KHYB_Lwe3F0srP4N",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-12-03T14:19:14.317471Z",
          "message" : "This is so cool!",
          "user" : {
            "id" : "liuxg"
          }
        }
      }
    ]
  }
}

我们可以看到之前的 shrink-.ds-demo-ds-000001 索引不见了。它在3分钟过后被删除了。我们可以通过如下的命令获得 data stream 的信息：

# Get data stream information
GET _data_stream/demo-ds/_stats

{
  "_shards" : {
    "total" : 5,
    "successful" : 3,
    "failed" : 0
  },
  "data_stream_count" : 1,
  "backing_indices" : 2,
  "total_store_size_bytes" : 29508,
  "data_streams" : [
    {
      "data_stream" : "demo-ds",
      "backing_indices" : 2,
      "store_size_bytes" : 29508,
      "maximum_timestamp" : 1607000591546
    }
  ]
}