MongoDB Performance for more data than memory

文章来源:http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/

 

目前有几点需要和作者确认:测试环境是否千兆直连,用的是磁盘还是RAID?

 

(测试环境是千兆直连,用的是磁盘 )

 

另外可以参考基于RAID 10的MongoDB性能测试http://www.cnblogs.com/lovecindywang/archive/2011/03/02/1969324.html

 

I've recently been having a play around with MongoDB and it's really cool. One of the common messages I see all over is that you should only use it if your dataset fits into memory. I've not yet seen any benchmarks on what happens when it doesn't though. So, here is a benchmark on how MongoDB performs when the data is bigger than the amount of memory available.

Setup

Mongo server: An EC2 large instance (64 bit) running a Ubuntu 10.10 image from Alestic. Has 7.5gb of memory. Data folder was on the instance and not EBS.

Mongo client: An EC2 small instance.

Test

The test will involve inserting X documents to MongoDB with the following structure:



There will be an index on key to prevent full scans of the data.

After the insert there will be 30,000 gets with random keys.

The expectation is that when the data set gets too large to fit in memory the random gets will become very slow. This will be due to MongoDB's memory mapped files no longer fitting in memory and needing to be read from disk.

When this thrashing of the disk starts happening it will be interesting to see what happens when a subset of the dataset is read from. To investigate this a further test will be run that:

  • 99% of the time - reads from a random key chosen from only Y% of the keys
  • 1% of the time 0 reads from any key chosen from the entire dataset

The expectation here is that for small Y the performance will be similar to when the entire dataset is in memory - as the pages that contain the subset of data will be in memory already and not need to read from disk.

Results

Basic results

A result spreadsheet is available here (Google Doc).

Up to 3 million documents the reads were consistent around 17s for 30,000 reads:

      Keys  Average time (s)  Memory usage (mb)


Once the dataset got larger than the amount of memory available the read time got slow. It wasn't as slow as it could be in extreme cases as roughly half of the dataset would still have been in memory.

It's worth noting that at this point inserts started getting slow: 178s for 3 million documents vs 1,102s for 10 million documents (~17k inserts/sec vs ~9k inserts/sec).

What about when reading a subset more often?


Focus in the above results refers to the %age of the dataset that was chosen for 99% of reads. In this case it was the first Y% of rows to be inserted - meaning that the pages were likely now out of memory by the time we wanted to read them.

The results show that MongoDB will perform just as fast on a dataset that is too large for memory if a small subset of the data is read from more frequently than the rest.

It was interesting to see the 10% figure drop over time. I suspect that this figure will get closer to 18s as the number of reads increases - more and more of the pages will be cached by the operating system and not need to be read from disk.

Conclusions

From doing this it can be seen that the performance of MongoDB can drop by an order of magnitude when the dataset gets too big for memory. However, if the reads are clustered in a subset of the dataset then a large amount of the data will be able to be kept in cache and reads kept quick.

It's definitely worth noting that it's normal for the performance to drop by an order of magnitude when the database has to start hitting disk. The point of this experiment was to make sure that it was only one order of magnitude and that if reads were focussed the performance would stay high.

Code

The code for the benchmark (for improvements and your own testing) is in github: http://github.com/colinhowe/mongo-benchmarks/blob/master/bench.py

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值