MongoDB move chunk and balancing

最新推荐文章于 2023-06-03 17:06:58 发布

macyang

最新推荐文章于 2023-06-03 17:06:58 发布

阅读量1.7k

点赞数

分类专栏： database/nosql 文章标签： mongodb migration query parameters application database

本文链接：https://blog.csdn.net/macyang/article/details/6289349

版权

database/nosql 专栏收录该内容

102 篇文章 0 订阅

订阅专栏

1) Move Chunk:

db.runCommand( { moveChunk : "test.blog.posts" ,
                 find : { author : "eliot" } ,
                 to : "shard1" } )

Parameters:

    * moveChunk: a full collection namespace, including the database name
    * find: a query expression that falls within the chunk to be moved; the command will find the FROM (donor) shard automatically
    * to: shard id where the chunk will be moved

Moving a chunk is a complex, but under the covers operation. It involves two interconnected protocols. One, to clone the data of the actual chunk, including any changes made during the cloning process itself. The second protocol is a commmit protocol that makes sure that all the migration participants – the TO-shard, the FROM-shard, and the config servers – agreed that the migration has completed .

2) Balancing

The balancer is a background task that tries to keep the number of chunks even across all servers of the cluster. The activity of balancing is transparent to querying. In other words, your application doesn't need to know or care that there is any data moving activity ongoing.

To make that so, the balancer is careful about when and how much data it would transfer . Let's look at how much to tranfer first. The unit of transfer is a chunk. On the steady state, the size of chunks should be in the range of 100-200MBs of data. This range has shown to be the sweet spot of how much data to move at once. More than that, and the migration would take longer and the queries might perceive that in a wider difference in response times. Less than that, and the overhead of moving wouldn't pay off as highly.

Regarding when to transfer load, the balancer waits for a threshold of uneven chunk counts to occur before acting. In the field, having a difference of 8 chunks between the least and most loaded shards showed to be a good heuristic. (This is an arbitrary number, granted.) The concern here is not to incur overhead if -- exagerating to make a point -- there is a difference of one doc between shard A and shard B. It's just inneficient to monitor load differences at that fine of a grain.

Now, once the balancer "kicked in," it will redistribute chunks, one at a time -- in what we call rounds -- until that difference in chunks beween any two shards is down to 2 chunks.

A common source of questions is why a given collection is not being balanced. By far, the most probable cause is: it doesn't need to. If the chunk difference is small enough, redistributing chunks won't matter enough. The implicit assumption here is that you actually have a large enough collection and the overhead of the balancing machinery is little compared to the amount of data your app is handling. If you do the math, you'll find that you might not hit "balancing threshold" if you're doing an experiment on your laptop.

Another possibility is that the balancer is not making progress. The balancing task happens at an arbitrary mongos (query router) in your cluster. Since there can be several query routers, there is a mechanism they all use to decide which mongos will take the responsibility. The mongos acting as balancer takes a "lock" by inserting a document into the 'locks' collection of the config database. When a mongos is running the balancer the 'state' of that lock is 1 (taken).

http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing

macyang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MongoDB move chunk and balancing

 1) Move Chunk: db.runCommand( { moveChunk : "test.blog.posts" , find : { author : "eliot" } , to : "shard1" } ) Parameters: * moveChunk: a full collection name
复制链接

扫一扫

专栏目录