MongoDB Performance for more data than memory

最新推荐文章于 2024-07-23 15:06:20 发布

macyang

最新推荐文章于 2024-07-23 15:06:20 发布

阅读量1.1k

点赞数

分类专栏： Test 文章标签： performance mongodb dataset spreadsheet random structure

本文链接：https://blog.csdn.net/macyang/article/details/6233650

版权

Test 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

文章来源：http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/

目前有几点需要和作者确认：测试环境是否千兆直连，用的是磁盘还是RAID?

(测试环境是千兆直连，用的是磁盘 )

另外可以参考基于RAID 10的MongoDB性能测试http://www.cnblogs.com/lovecindywang/archive/2011/03/02/1969324.html

I've recently been having a play around with MongoDB and it's really cool. One of the common messages I see all over is that you should only use it if your dataset fits into memory. I've not yet seen any benchmarks on what happens when it doesn't though. So, here is a benchmark on how MongoDB performs when the data is bigger than the amount of memory available.

Setup

Mongo server: An EC2 large instance (64 bit) running a Ubuntu 10.10 image from Alestic. Has 7.5gb of memory. Data folder was on the instance and not EBS.

Mongo client: An EC2 small instance.

Test

The test will involve inserting X documents to MongoDB with the following structure:

key: n (where n is 0, 1, ... X - 1)
text: 'Mary had a little lamb. ' x 10



There will be an index on key to prevent full scans of the data.

After the insert there will be 30,000 gets with random keys.

The expectation is that when the data set gets too large to fit in memory the random gets will become very slow. This will be due to MongoDB's memory mapped files no longer fitting in memory and needing to be read from disk.

When this thrashing of the disk starts happening it will be interesting to see what happens when a subset of the dataset is read from. To investigate this a further test will be run that:

99% of the time - reads from a random key chosen from only Y% of the keys
1% of the time 0 reads from any key chosen from the entire dataset

The expectation here is that for small Y the performance will be similar to when the entire dataset is in memory - as the pages that contain the subset of data will be in memory already and not need to read from disk.

Results

Basic results

A result spreadsheet is available here (Google Doc).

Up to 3 million documents the reads were consistent around 17s for 30,000 reads:

      Keys  Average time (s)  Memory usage (mb)



    10,000              16.8   forgot to check
   100,000              16.9               547
 1,000,000              18.0              1672
 3,000,000              17.2              4158
10,000,000              74.1              7469 (16.1gb inc. virtual)




Once the dataset got larger than the amount of memory available the read
 time got slow. It wasn't as slow as it could be in extreme cases as 
roughly half of the dataset would still have been in memory.

It's worth noting that at this point inserts started getting slow: 178s for 3 million documents vs 1,102s for 10 million documents (~17k inserts/sec vs ~9k inserts/sec).

What about when reading a subset more often?

Focus (%)  Read 1 (s)  Read 2 (s)  Read 3 (s)
      100        73.1        75.3        73.9
       10        54.3        37.0        29.5
        1        21.1        18.8        18.2


Focus in the above results refers to the %age of the dataset that was
 chosen for 99% of reads. In this case it was the first Y% of rows to be
 inserted - meaning that the pages were likely now out of memory by the 
time we wanted to read them.

The results show that MongoDB will perform just as fast on a dataset that is too large for memory if a small subset of the data is read from more frequently than the rest.

It was interesting to see the 10% figure drop over time. I suspect that this figure will get closer to 18s as the number of reads increases - more and more of the pages will be cached by the operating system and not need to be read from disk.

Conclusions

From doing this it can be seen that the performance of MongoDB can drop by an order of magnitude when the dataset gets too big for memory. However, if the reads are clustered in a subset of the dataset then a large amount of the data will be able to be kept in cache and reads kept quick.

It's definitely worth noting that it's normal for the performance to drop by an order of magnitude when the database has to start hitting disk. The point of this experiment was to make sure that it was only one order of magnitude and that if reads were focussed the performance would stay high.

Code

The code for the benchmark (for improvements and your own testing) is in github: http://github.com/colinhowe/mongo-benchmarks/blob/master/bench.py

macyang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MongoDB Performance for more data than memory

文章来源：http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/目前有几点需要和作者确认：测试环境是否千兆直连，用的是磁盘还是RAID?另外可以参考基于RAID 10的MongoDB性能测试http://www.cnblogs.com/lovecindywang/archive/2011/03/02/1969324.htmlI've recently been having a play ar
复制链接

扫一扫

专栏目录