ES5.6.4源码解析----分片在磁盘间的分配策略

最新推荐文章于 2024-05-13 05:47:34 发布

道友，且慢

最新推荐文章于 2024-05-13 05:47:34 发布

阅读量1k

点赞数

分类专栏： elasticsearch Elasticsearch源码分析

本文链接：https://blog.csdn.net/qqqq0199181/article/details/82853871

版权

elasticsearch 同时被 2 个专栏收录

33 篇文章 2 订阅

订阅专栏

Elasticsearch源码分析

11 篇文章 6 订阅

订阅专栏

引文

ES的索引是由若干个分片组成，在索引创建的时候需要指定分片个数、副本个数。如果没有指定，分片个数默认为5个，副本个数默认1个。一个索引的各个分片会根据路由算法均匀的分布于各个节点中。本文提出的问题是，如果一个分片指定分片到某个节点，而该节点的数据路径分布于多个磁盘上，即elasticsearch.yml中的配置如下：

path.data:/disk1/data/elasticsearch,/disk2/data/elasticsearch,/disk3/data/elasticsearch

在ES为分片选择路径的时候，是如何选择将分片落地在哪个路径的呢？

何时分配分片

ES分片会在两种情况下去分配分片：

创建索引
分片被重新指派

无论是什么引起分片的分配，都需要调用如下的分片路径选择方法：

public static ShardPath selectNewPathForShard(NodeEnvironment env, ShardId shardId, IndexSettings indexSettings,
                                                  long avgShardSizeInBytes, Map<Path,Integer> dataPathToShardCount) throws IOException {

        final Path dataPath;
        final Path statePath;

        if (indexSettings.hasCustomDataPath()) {
            dataPath = env.resolveCustomLocation(indexSettings, shardId);
            statePath = env.nodePaths()[0].resolve(shardId);
        } else {
            BigInteger totFreeSpace = BigInteger.ZERO;
            for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {
                totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));
            }

            // TODO: this is a hack!!  We should instead keep track of incoming (relocated) shards since we know
            // how large they will be once they're done copying, instead of a silly guess for such cases:

            // Very rough heuristic of how much disk space we expect the shard will use over its lifetime, the max of current average
            // shard size across the cluster and 5% of the total available free space on this node:
            BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));

            // TODO - do we need something more extensible? Yet, this does the job for now...
            final NodeEnvironment.NodePath[] paths = env.nodePaths();
            NodeEnvironment.NodePath bestPath = null;
            BigInteger maxUsableBytes = BigInteger.valueOf(Long.MIN_VALUE);
            for (NodeEnvironment.NodePath nodePath : paths) {
                FileStore fileStore = nodePath.fileStore;

                BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());
                assert usableBytes.compareTo(BigInteger.ZERO) >= 0;

                // Deduct estimated reserved bytes from usable space:
                Integer count = dataPathToShardCount.get(nodePath.path);
                if (count != null) {
                    usableBytes = usableBytes.subtract(estShardSizeInBytes.multiply(BigInteger.valueOf(count)));
                }
                if (bestPath == null || usableBytes.compareTo(maxUsableBytes) > 0) {
                    maxUsableBytes = usableBytes;
                    bestPath = nodePath;
                }
            }

            statePath = bestPath.resolve(shardId);
            dataPath = statePath;
        }
        return new ShardPath(indexSettings.hasCustomDataPath(), dataPath, statePath, shardId);
}

分配策略分析

下面根据上一小节的代码进行分析

1、预估分片的大小

1、获取该索引下已有分片的平均大小
2、计算path.data指定的数据路径的所有可用的空间的小的5%

取1,2中较大的值作为预估分片大小estShardSizeInBytes

2、计算各个路径该索引的分片数

3、用公式计算各个路径的剩余可用空间大小

usableBytes = usableBytes-路径下该索引的分片数*estShardSizeInBytes

4、选取最大

比较各个路径的usableBytes 值，最大的路径将拥有该分片。

总结

由于ES预估分片大小的算法并不准确，因此ES的分片分配策略并无法保证多个磁盘间的数据均衡分布。

举个例子：

假设数据路径,已经他们的剩余空间，总空间大小如下
/disk1/data/elasticsearch 10G 20G
/disk2/data/elasticsearch 9.5G 20G
/disk3/data/elasticsearch 9.5G 20G

先后创建2个索引people1,people2。他们的分片数都是1。
首先创建people1，根据上述算法，其分片的预估值为（10+9.5+9.5）*5% = 1.45G

由于该索引还没有分片，因此各个路径计算所得的剩余可用空间如下：
/disk1/data/elasticsearch 10G
/disk2/data/elasticsearch 9.5G
/disk3/data/elasticsearch 9.5G

/disk1/data/elasticsearch 剩余空间最多，people1唯一的分片分配给/disk1/data/elasticsearch。由于people1索引的数据为空，不影响/disk1/data/elasticsearch的剩余空间。因此people1创建后的剩余空间如下

/disk1/data/elasticsearch 10G 20G
/disk2/data/elasticsearch 9.5G 20G
/disk3/data/elasticsearch 9.5G 20G

按照people1的流程，可知people2的分片也是分片给/disk1/data/elasticsearch。因此两个索引的数据都将存放于该路径下。这样的结果就是导致两个索引的数据导入之后，造成磁盘间的数据倾斜问题。

道友，且慢

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ES5.6.4源码解析----分片在磁盘间的分配策略

引文ES的索引是由若干个分片组成，在索引创建的时候需要指定分片个数、副本个数。如果没有指定，分片个数默认为5个，副本个数默认1个。一个索引的各个分片会根据路由算法均匀的分布于各个节点中。本文提出的问题是，如果一个分片指定分片到某个节点，而该节点的数据路径分布于多个磁盘上，即elasticsearch.yml中的配置如下：path.data:/disk1/data/elasticsearch,/...
复制链接

扫一扫