python解读json超内存_Python json内存膨胀

import json

import time

from itertools import count

def keygen(size):

for i in count(1):

s = str(i)

yield '0' * (size - len(s)) + str(s)

def jsontest(num):

keys = keygen(20)

kvjson = json.dumps(dict((keys.next(), '0' * 200) for i in range(num)))

kvpairs = json.loads(kvjson)

del kvpairs # Not required. Just to check if it makes any difference

print 'load completed'

jsontest(500000)

while 1:

time.sleep(1)

Linux top indicates that the python process holds ~450Mb of RAM after completion of 'jsontest' function. If the call to 'json.loads' is omitted then this issue is not observed. A gc.collect after this function execution does releases the memory.

Looks like the memory is not held in any caches or python's internal memory allocator as explicit call to gc.collect is releasing memory.

Is this happening because the threshold for garbage collection (700, 10, 10) was never reached ?

I did put some code after jsontest to simulate threshold. But it didn't help.

解决方案

Put this at the top of your program

import gc

gc.set_debug(gc.DEBUG_STATS)

and you'll get printed output whenever there's a collection. You'll see that in your example code there is no collection after jsontest completes, until the program exits.

You can put

print gc.get_count()

to see the current counts. The first number is the excess of allocations over deallocations since the last collection of generation 0; the second (resp. third) is the number of times generation 0 (resp. 1) has been collected since the last collection of generation 1 (resp. 2). If you print these immediately after jsontest completes you'll see that the counts are (548, 6, 0) or something similar (no doubt this varies according to Python version). So the threshold was not reached and no collection took place.

This is typical behaviour for threshold-based garbage collection scheduling. If you need free memory to be returned to the operating system in a timely manner, then you need to combine threshold-based scheduling with time-based scheduling (that is, request another collection after a certain amount of time has passed since the last collection, even if the threshold has not been reached).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值