mongodb更新某个字段_MongoDB 哈希分片

本文详细介绍了MongoDB的哈希分片,包括其原理、片键选择、哈希分片与范围分片的对比,以及如何对已填充和空集合进行分片操作。强调了哈希分片在数据分布上的优势,但要注意浮点数可能产生的哈希冲突问题。
摘要由CSDN通过智能技术生成

Hashed Sharding 哈希分片

Hashed sharding uses a hashed index to partition data across your shared cluster. Hashed indexes compute the hash value of a single field as the index value; this value is used as your shard key. [1]
哈希分片使用哈希索引来在分片集群中对数据进行划分。哈希索引计算某一个字段的哈希值作为索引值,这个值被用作片键。

68d47e8a-1636-eb11-8da9-e4434bdf6706.svg

Hashed sharding provides more even data distribution across the sharded cluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Post-hash, documents with “close” shard key values are unlikely to be on the same chunk or shard - the mongos is more likely to perform Broadcast Operations to fulfill a given ranged query. mongos can target queries with equality matches to a single shard.
哈希分片以减少定向操作和增加广播操作为代价,分片集群内的数据分布更加均衡。在哈希之后,拥有比较“接近”的片键的文档将不太可能会分布在相同的数据库或者分片上。mongos更有可能执行广播操作来完成一个给定的范围查询。相对的,mongos可以将等值匹配的查询直接定位到单个分片上。

TIP 注意:

MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.
当使用哈希索引来解析查询时,MongoDB会自动计算哈希值。应用程序不需要计算哈希。

WARNING 警告

MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.32.2, and 2.9. To prevent collisions, do not use a hashed index for floating point numbers that cannot be reliably converted to 64-bit integers (and then back to floating point). MongoDB hashed indexes do not support floating point values larger than 2^53.
MongoDB哈希索引在哈希计算之前会将浮点数截断为64位整数。例如,哈希索引会将为具有2.32.22.9的值的字段存储为相同的值。为了避免冲突,请勿对不能可靠地转换为64位整数(然后再返回到浮点)的浮点数使用哈希索引。MongoDB哈希索引不支持大于2^53的浮点值。

To see what the hashed value would be for a key, see convertShardKeyToHashed().

如果想查看一个键的哈希值是什么,请参考convertShardKeyToHashed()

[1]Starting in version 4.0, the mongo shell provides the method convertShardKeyToHashed(). This method uses the same hashing function as the hashed index and can be used to see what the hashed value would be for a key.
从4.0版开始,mongo shell提供了convertShardKeyToHashed()方法。此方法使用与哈希索引相同的哈希函数,可用于查看键的哈希值。

Hashed Sharding Shard Key 哈希分片的片键

The field you choose as your hashed shard key should have a good cardinality, or large number of different values. Hashed keys are ideal for shard keys with fields that change monotonically like ObjectId values or timestamps. A good example of this is the default _id field, assuming it only contains ObjectID values.
您选择作为哈希片键的字段应具有良好的【基数】或者该字段包含大量不同的值。哈希分片非常适合选取具有像ObjectId值或时间戳那样单调更改的字段作为片键。一个很好的例子是默认的_id字段,假设它仅包含ObjectID值(而非用户自定义的_id)。

To shard a collection using a hashed shard key, see Shard a Collection.
要使用哈希片键对集合进行分片,请参阅【对集合进行分片】。

Hashed vs Ranged Sharding 哈希分片 VS 范围分片

Given a collection using a monotonically increasing value X as the shard key, using ranged sharding results in a distribution of incoming inserts similar to the following:
给定一个使用单调递增的值X作为片键的集合,使用范围分片会导致插入数据的分布类似于下面这样:

6ad47e8a-1636-eb11-8da9-e4434bdf6706.svg

Since the value of X is always increasing, the chunk with an upper bound of maxKey receives the majority incoming writes. This restricts insert operations to the single shard containing this chunk, which reduces or removes the advantage of distributed writes in a sharded cluster.
由于X的值始终在增加,因此具有maxKey(上限)的数据块将接收大多数传入的写操作。这将插入操作限制在只能定向到包含此块的单个分片,从而减少或消除了分片集群中分布式写入的优势。

By using a hashed index on X, the distribution of inserts is similar to the following:
通过在X上使用哈希索引,插入的分布将类似于下面这样:

6bd47e8a-1636-eb11-8da9-e4434bdf6706.svg

Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.
由于现在数据分布更加均匀,因此可以在整个集群中更高效地分布式插入数据。

Shard the Collection 对一个集合进行分片

Use the sh.shardCollection() method, specifying the full namespace of the collection and the target hashed index to use as the shard key.
使用sh.shardCollection()方法,指定集合的完整命名空间以及作为片键的目标哈希索引。

sh.shardCollection( "database.collection", {  : "hashed" } )

IMPORTANT 重要

  • Once you shard a collection, the selection of the shard key is immutable; i.e. you cannot select a different shard key for that collection.
    一旦对某个集合进行分片后,片键的选择是不可变的。也就是说,您不能再为该集合选择其他的片键。

  • Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable _id field. For details on updating the shard key, see Change a Document’s Shard Key Value.Before MongoDB 4.2, a document’s shard key field value is immutable.
    从MongoDB 4.2开始,除非片键字段是不可变的_id字段,否则您可以更新文档的片键值。有关更新片键的详细信息,请参阅【更改文档的片键值】。在MongoDB 4.2以前的版本,片键是不可变的。

Shard a Populated Collection 对一个已有数据的集合进行分片

If you shard a populated collection using a hashed shard key:
如果您使用哈希片键对一个已经包含数据的集合进行分片操作:

  • The sharding operation creates the initial chunk(s) to cover the entire range of the shard key values. The number of chunks created depends on the configured chunk size.
    分片操作将创建初始数据块,以覆盖片键值的整个范围。创建的数据块数取决于【配置的数据块大小】。

  • After the initial chunk creation, the balancer migrates these initial chunks across the shards as appropriate as well as manages the chunk distribution going forward.
    在初始数据块创建之后,均衡器会在分片上适当地迁移这些初始数据块,并管理后续的数据块分配。

Shard an Empty Collection 对一个空集合进行分片

If you shard an empty collection using a hashed shard key:
如果您使用哈希片键对一个空集合进行分片操作:

  • With no zones and zone ranges specified for the empty or non-existing collection:
    如果没有为空集合或不存在的集合指定区域和区域范围:

    • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. By default, the operation creates 2 chunks per shard and migrates across the cluster. You can use numInitialChunks option to specify a different number of initial chunks. This initial creation and distribution of chunks allows for faster setup of sharding.
      分片操作将创建空数据块,以覆盖片键值的整个范围,并执行初始数据块分配。默认情况下,该操作为每个分片创建2个数据块,并在整个集群中迁移。您可以使用numInitialChunks选项指定不同数量的初始块。数据块的这种初始创建和分配可以使分片设置更加快速。

  • After the initial distribution, the balancer manages the chunk distribution going forward.
    初始分配之后,均衡器将管理后续的数据块分配。

  • With zones and zone ranges specified for the empty or a non-existing collection (Available starting in MongoDB 4.0.3),
    如果已经为空集合或不存在的集合指定区域和区域范围(从MongoDB4.0.3版本起可用):

    • The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.
      分片操作会为定义的区域范围以及所有其他分片创建空数据块,以覆盖片键值的整个范围,并根据区域范围执行初始数据块分配。数据块的这种初始创建和分配可以使分片设置更加快速。

  • After the initial distribution, the balancer manages the chunk distribution going forward.
    初始分配之后,均衡器将管理后续的数据块分配。

SEE ALSO 另请参考:

To learn how to deploy a sharded cluster and implement hashed sharding, see Deploy a Sharded Cluster.
要了解如何部署分片集群和实现哈希分片,请参阅【部署分片集群】。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值