MongoDB paddingFactor implications

关于Mongodb Padding Factor其实是一个优化点,在update操作很多的时候,如果能够提前预填充即将添加的字段就可以避免后面因为频繁更新document而导致的数据迁移问题。本文原文地址:http://tebros.com/2010/11/mongodb-paddingfactor-implications/

MongoDB has a feature that allows you to do an atomic upsert on a document.  An upsert can do any number of things from incrementing a counter, to adding new document fields, removing fields, etc.  These operations are useful to many applications but can come with performance and storage size side effects that should be taken into consideration when designing an application.

If you modify a document in such a way that it grows the document size, there is a chance that MongoDB will have to move the document to some other place in the file.  There is overhead involved in the move and Mongo may decide to grow the size of your data files to accommodate the move.  To account for this, Mongo maintains a heuristic on how often documents in each collection grow and then pads some extra space around the document in the data file so that the document can grow a bit and not have to be moved.  This heuristic is called the padding factor (or paddingFactor).  It is seen by viewing the stats on a collection.

> db.no_padding.stats()
{
        ...
        "paddingFactor": 1.4099999999940787,
        ...
}

A paddingFactor of 1 means that mongod has not had to move any of your documents around in the data files.

Here’s a simple little test to see how much overhead can be involved with moves when I add new fields to a document.  I create two collections: one where the upsert adds new fields to a document and the other creates the documents with all the fields but initializes the counter fields to 0 so the upsert operation will modify the document in place.

var d = db.getSisterDB("padding_test");
d.no_padding.drop();
d.padding.drop();
 
var no_padding_f = function(count) {
    var start = Date.now();
    for(var i=0; i < count; i++) {
        // Document created with only the _id field
        d.no_padding.insert({_id:i});
        d.no_padding.update({_id:i}, {$inc : {"counter1": 1}}, true);
        d.no_padding.update({_id:i}, {$inc : {"counter2": 1}}, true);
        d.no_padding.update({_id:i}, {$inc : {"counter3": 1}}, true);
    }
    t = (Date.now() - start)/1000;
    print("no_padding_f runtime: " + t);
    returnt;
}
 
var padding_f = function(count) {
    var start = Date.now();
    for(var i=0; i < count; i++) {
        // Document created with all the counter fields I
        // expect to use, each initialized to 0.
        d.padding.insert({_id:i, counter1: 0, counter2: 0, counter3: 0});
        d.padding.update({_id:i}, {$inc : {"counter1": 1}}, true);
        d.padding.update({_id:i}, {$inc : {"counter2": 1}}, true);
        d.padding.update({_id:i}, {$inc : {"counter3": 1}}, true);
    }
    t = (Date.now() - start)/1000;
    print("padding_f runtime: " + t);
    returnt;
}
 
var t1 = no_padding_f(200000);
var t2 = padding_f(200000);
var faster = (1-(t2/t1))*100;
print("Padded is " + faster + "% faster\n");
 
print("storageSize with no padding  : " + d.no_padding.stats().storageSize);
print("paddingFactor with no padding: " + d.no_padding.stats().paddingFactor);
print("storageSize with padding     : " + d.padding.stats().storageSize);
print("paddingFactor with padding   : " + d.padding.stats().paddingFactor);

Here are the results when I run this on my local machine:

{nehresma@frodo:/tmp/mongodb-linux-x86_64-1.7.3/bin}$ ./mongo --quiet /tmp/script.js
no_padding_f runtime: 56.031
padding_f runtime: 42.165
Padded is 24.747015045242815% faster
 
storageSize with no padding  : 27136256
paddingFactor with no padding: 1.4099999999940787
storageSize with padding     : 17614336
paddingFactor with padding   : 1

Things to note about this simple demonstration:

  1. The moves can amount to a fair bit of additional storage space.  In this example it was an additional 35%.  Each situation varies and this number bounces around.  This extra storage can be reclaimed if you run a –repair on the database since a repair will compact the collection.
  2. The data set was sufficiently small and was kept in RAM by mongod.  This means that the timings of the two runs did not account for the additional overhead that may be needed for paging in the data files (a.k.a. disk IO). When disk IO is taken into account, moves become even more expensive.
  3. Be aware that $inc can change the BSON data type of a field from a 32bit to 64bit integer if you increment past 2^31 — see http://jira.mongodb.org/browse/SERVER-2005.

Planning how your application will do upserts and then adding space to your documents accordingly when doing the initial insert can give a nice performance boost.  If your application is upsert heavy, you should consider this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值