python读mongodb很慢_MongoDB + Python-非常慢的简单查询

I have an open source energy monitor (http://openenergymonitor.org) which logs the power usage of my house every five seconds, so I thought this would be a perfect application to play with MongoDB. I have a Flask Python application running in Apache using MongoEngine to interface with MongoDB.

Now I am running all of this on a RaspberryPi, so I'm not expecting incredible performance, but a simple query is taking around 20 seconds, which seems slow even for this limited hardware.

I have the following model:

class Reading(db.Document):

created_at = db.DateTimeField(default=datetime.datetime.now, required=True)

created_at_year = db.IntField(default=datetime.datetime.now().year, required=True)

created_at_month = db.IntField(default=datetime.datetime.now().month, required=True)

created_at_day = db.IntField(default=datetime.datetime.now().day, required=True)

created_at_hour = db.IntField(default=datetime.datetime.now().hour, required=True)

battery = db.IntField()

power = db.IntField()

meta = {

'indexes': ['created_at_year', 'created_at_month', 'created_at_day', 'created_at_hour']

}

I currently have around 36,000 readings stored from the last couple of days. The following code runs super quick:

def get_readings_count():

count = '

Count: %d

' % Reading.objects.count()

return count

def get_last_24_readings_as_json():

readings = Reading.objects.order_by('-id')[:24]

result = "["

for reading in reversed(readings):

result += str(reading.power) + ","

result = result[:-1]

result += "]"

return result

But doing a simple filter:

def get_today_readings_count():

todaycount = '

Today: %d

' % Reading.objects(created_at_year=2014, created_at_month=1, created_at_day=28).count()

return todaycount

Takes around 20 seconds - there are around 11,000 readings for today.

Shall I give up expecting anything more of my Pi, or is there some tuning I can do to get more performance from MongoDB?

Mongo 2.1.1 on Debian Wheezy

Update 29/1/2014:

In response to an answer below, here are the results of getIndexes() and explain():

> db.reading.getIndexes()

[

{

"v" : 1,

"key" : {

"_id" : 1

},

"ns" : "sensor_network.reading",

"name" : "_id_"

},

{

"v" : 1,

"key" : {

"created_at_year" : 1

},

"ns" : "sensor_network.reading",

"name" : "created_at_year_1",

"background" : false,

"dropDups" : false

},

{

"v" : 1,

"key" : {

"created_at_month" : 1

},

"ns" : "sensor_network.reading",

"name" : "created_at_month_1",

"background" : false,

"dropDups" : false

},

{

"v" : 1,

"key" : {

"created_at_day" : 1

},

"ns" : "sensor_network.reading",

"name" : "created_at_day_1",

"background" : false,

"dropDups" : false

},

{

"v" : 1,

"key" : {

"created_at_hour" : 1

},

"ns" : "sensor_network.reading",

"name" : "created_at_hour_1",

"background" : false,

"dropDups" : false

}

]

> db.reading.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()

{

"cursor" : "BtreeCursor created_at_day_1",

"isMultiKey" : false,

"n" : 15689,

"nscannedObjects" : 15994,

"nscanned" : 15994,

"scanAndOrder" : false,

"indexOnly" : false,

"nYields" : 5,

"nChunkSkips" : 0,

"millis" : 25511,

"indexBounds" : {

"created_at_day" : [

[

28,

28

]

]

},

"server" : "raspberrypi:27017"

}

Update 4 Feb

Okay, so I deleted the indexes, set a new one on created_at, deleted all the records and left it a day to collect new data. I've just run a query for today's data and it took longer (48 seconds):

> db.reading.find({'created_at': {'$gte':ISODate("2014-02-04")}}).explain()

{

"cursor" : "BtreeCursor created_at_1",

"isMultiKey" : false,

"n" : 14189,

"nscannedObjects" : 14189,

"nscanned" : 14189,

"scanAndOrder" : false,

"indexOnly" : false,

"nYields" : 9,

"nChunkSkips" : 0,

"millis" : 48653,

"indexBounds" : {

"created_at" : [

[

ISODate("2014-02-04T00:00:00Z"),

ISODate("292278995-12-2147483314T07:12:56.808Z")

]

]

},

"server" : "raspberrypi:27017"

}

That's with only 16,177 records in the database and only one index. There's around 111MB of free memory, so there shouldn't be an issue with the index fitting in memory. I guess I'm going to have to write this off as the Pi not being powerful enough for this job.

解决方案

Are you sure that your index is created? could you provide the output of getIndexes() of your collection

eg: db.my_collection.getIndexes()

and the explanation of your query

db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()

PS: of course I must agree with @Aesthete about the fact that you store much more than you need to...

29/1/2014 update

Perfect! As you see you have four different indexes when you can create ONE compound index which will include all of them.

defining

db.my_collection.ensureIndex({created_at_year: 1, created_at_month: 1, created_at_day: 1, created_at_hour: 1 })

will provide you a more precise index that will enable you to query for:

year

year and month

year and month and day

year and month and day and hour

This will make your queries (with the four keys) much faster, because all your criteria will be met in the index data!

please note that that the order of keys in ensureIndex() is crucial, that order actually defines the above mentioned list of queries!

Also note that if all you need is these 4 fields, than if you specify a correct projection

eg:

db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28}, { created_at_year: 1, created_at_month: 1, created_at_day: 1 })

then only the index will be used, which is the maximum performance!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值