Optimizing Map/Reduce with MongoDB

Optimizing Map/Reduce with MongoDB

I’ve come across several users who experience poor performance when using Map/Reduce with MongoDB version 1.8 and older, and it turns out that in many cases it is easily fixable. Today I will focus on the “sort” parameter of the MapReduce command, which is often overlooked but critical.

Here is how the M/R works in the general case, assuming there is no query filter:

  • mongod does full table scan in natural order, going through all documents of collection
  • for each document, map() is called, which emits a document like {_id: key, value: val} which gets stored in an in memory map (tree).
  • mongod checks every 100 records that the size of the map is not over 50KB, if so it runs reduce on ALL current keys. If size of map is still over 100KB, it dumps all current documents to disk in an “incremental” collection.
  • when all mapping is done, it reads back from the inc collection sorted by _id, and does the final reduce.

Now if you have many documents, and the key distribution is fairly random, it can result in following: all docs get inserted to map but it is not useful for reduction, and most documents will end up in the “inc” collection on disk that needs to be read back in order. The particular issue to understand is that since mongod has no idea what key you will use to emit, it cannot presort the data to make it efficient.

To fix this issue:

  • add an input sort key for the M/R job that is the same as the emit key.
  • make sure that key is indexed and works well with your query filter. You should run a find() with same query and sort with explain(), and make sure it uses an index.

This can result in 100x performance in some cases. Note that in mongo 1.9 and above, some works has been done to improve performance:

  • threshold to run reduces or dump to disk have been increased.
  • there is a new “pure JS” mode that can be very fast for light jobs.
  • optimized the js engine interface

But in any case mongod is still not aware of your emit key, so use sort!

cheers

AG

    • 0
      点赞
    • 0
      收藏
      觉得还不错? 一键收藏
    • 0
      评论
    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值