Shards and replicas in Elasticsearch

When you download elasticsearch and start it up you create an elasticsearch node which tries to join an existing cluster if available or creates a new one. Let's say you created your own new cluster with a single node, the one that you just started up. We have no data, therefore we need to create an index.

When you create an index (an index is automatically created when you index the first document as well) you can define how many shards it will be composed of. If you don't specify a number it will have the default number of shards: 5 primaries. What does it mean?

It means that elasticsearch will create 5 primary shards that will contain your data:

 ____    ____    ____    ____    ____
|1||2||3||4||5||____||____||____||____||____|

Every time you index a document elasticsearch will decide which primary shard is supposed to hold that document and will index it there. Primary shards are not copy of the data, they are the data! With a single node of course multiple shards don't make much sense, but if we start another elasticsearch instance on the same cluster, the shards will be distributed in an even way over the cluster.

Node 1 will then hold for example only three shards:

 ____    ____    ____ 
|1||2||3||____||____||____|

Since the remaining two shards have been moved to the newly started node:

 ____    ____
|4||5||____||____|

Why does this happen? Because elasticsearch is a distributed search engine and this way you can make use of multiple nodes/machines to manage big amounts of data.

Every elasticsearch index is composed of at least one primary shard, since that's where the data is stored. Every shard comes at a cost though, therefore if you have a single node and no foreseeable growth, just stick with a single primary shard.

Another type of shard is replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data. Replicas are used to increase search performance and for fail-over. A replica shard is never going to be allocated on the same node where the related primary is (it would pretty much be like putting a backup on the same disk as the original data).

Back to our example, with 1 replica we'll have the whole index on each node, since 3 replica shards will be allocated on the first node and they will contain exactly the same data as the primaries on the second node:

 ____    ____    ____    ____    ____
|1||2||3||4R||5R||____||____||____||____||____|

Same for the second node, which will contain a copy of the primary shards on the first node:

 ____    ____    ____    ____    ____
|1R||2R||3R||4||5||____||____||____||____||____|

With a setup like this, if a node goes down you still have the whole index. The replica shards will automatically become primaries and the cluster will work properly despite the node failure.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值