How to choose a shard key for MongoDB

https://bugsnag.com/blog/mongo-shard-key

TL/DR. Shard using an index on {_id: 'hashed'} or {projectId: 1, _id: 1}.

A few months ago, we sharded our MongoDB cluster to give us two replica sets.Last week we just added another new shard. Adding the first shard took a littlework, but we did it without downtime. Adding a new shard is now trivial.

How does sharding work?

Sharding lets you split your MongoDB database accross multiple servers or, inproduction, multiple replica sets. It's important to do this early becauseMongoDB is harder to shard once you have >512Gb ofdata. Vertical scaling can only get you so far.

To do this you tell MongoDB to use one of your indexes as a shard key. It thendivides your documents into chunks with similar shard keys. These chunks arethen spread out to your replica sets, in approximate shard key order.

As you can see, everything depends on choosing the right shard key.

What makes a good shard key?

MongoDB automatically ensures that each replica set contains the same number ofchunks, 2 in the image above, 6300 or so in the Bugsnag cluster. But that ispretty much the only guarantee.

The choice of shard key determines three important things:

1. The distribution of reads and writes

The most important of these is distribution of reads and writes. If you'realways writing to one machine, then that machine will have a high write-lock-%,and so writes to your cluster will be slow. It doesn't matter how many machinesyou have in total, as all the writes will go to the same place. This is why youshould never use the monotonically increasing _id or a timestamp as the shardkey, you'll always be adding things to the last replica set.

Similarly, if all of your reads are going to the same replica set, then you'dbetter hope that your working set fits into RAM on one machine. By splittingreads evenly across all replica sets, you can scale your working set sizelinearly with number of shards. You will be utilising RAM and disks equallyacross all machines.

2. The size of your chunks

Secondarily important is the chunk size. MongoDB will split large chunks intosmaller ones if, and only if, the shard keys are different. If you have toomany documents with the same shard key you end up with jumbo chunks. Jumbochunks are bad not only because they cause the data to be unevenly distributedbut also because once they grow too large you cannot move them between shardsat all.

3. The number of shards each query hits

Finally it's nice to ensure that most queries hit as few shards as possible.The latency of a query is directly dependant on the latency of the slowestserver it hits; so the fewer you hit, the faster queries are likely to run.This isn't a hard requirement, but it's nice to strive for. Because thedistribution of chunks onto shards is only approximately in order it can neverbe enforced strictly.

Good shard key schemes

With all of that knowledge, what makes a good shard key?

Hashed id

As a first approximation you can use a hash of the _id of your documents.

db.events.createIndex({_id: 'hashed'})

This will distribute reads and writes evenly, and it will ensure that eachdocument has a different shard key so chunks can be fine-grained and small.

It's not perfect, because queries for multiple documents will have to hit allshards, but it might be good enough.

Multi-tenant compound index

If you want to beat the hashed _id scheme, you need to come up with way ofgrouping related documents close together in the index. At Bugsnag we group thedocuments by project, because of the way our app works most queries are run inthe scope of a project. You will have to figure out a grouping that works foryour app.

We can't just use projectId as a shard key because that leads to jumbochunks, so we also include the _id to break large projects into multiplechunks. These chunks are still adjacent in the index, and so still most likelyto end up on the same shard.

db.events.createIndex({projectId: 1, _id: 1})

This works particularly well for us because the number of reads and writes fora project is mostly independent of the age of that project, and old projectsusually get deleted. If that wasn't the case we might see a slight imbalancetowards higher load on more modern projects.

To avoid this problem in the future, we will likely migrate to an index on{projectId: 'hashed', _id: 1} as soon as MongoDB supports compound indexeswith hashed values.(SERVER-10220).

In summary

Choosing a shard key is hard, but there are really only two options. If youcan't find a good grouping key for your application, hash the _id. If youcan, then go with that grouping key and add the _id to avoid jumbo chunks.Remeber that whichever grouping key you use, it needs to also distribute readsand writes evenly to get the most out of each node in your cluster.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!
"failed to load then jni shard library jvm.dll" 这个错误通常在尝试运行Java应用程序时出现。它表示无法加载jvm.dll这个JNI共享库文件。以下是一些可能的解决方案: 首先,确保您正在使用的是正确版本的Java运行时环境(JRE)或Java开发工具包(JDK)。如果您安装了多个Java版本,请检查您的系统环境变量中的JAVA_HOME设置,并确保它指向正确的Java安装目录。 其次,检查您的系统是否正确设置了JAVA_HOME环境变量。如果没有,请手动设置它,指向正确的Java安装目录。 如果您尝试在64位系统上运行32位的Java程序,或者反之,可能会导致这个错误。请确保您正在使用与您的操作系统架构(32位或64位)相匹配的Java版本。 还有可能是由于损坏的Java安装文件导致的问题。在这种情况下,您可以尝试重新安装Java以修复这个错误。 另外,如果您正在使用Eclipse或其他IDE,请确保您的项目配置正确,包括指定了正确的JRE或JDK路径。 最后,如果您的计算机上没有安装Java,您可能需要下载并安装一个适当的Java版本。在安装过程中,请确保您选择正确的安装选项,以便安装所需的JNI库文件。 总之,"failed to load then jni shard library jvm.dll" 错误通常是由于Java环境配置问题引起的。通过检查您的Java安装、环境变量、系统架构以及IDE配置,您应该能够解决这个问题。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值