python list 内存溢出_在循环中使用numpy加载时发生内存溢出

Looping over npz files load causes memory overflow (depending on the file

list length).

None of the following seems to help

Deleting the variable which stores the data in the file.

Using mmap.

calling gc.collect() (garbage collection).

The following code should reproduce the phenomenon:

import numpy as np

# generate a file for the demo

X = np.random.randn(1000,1000)

np.savez('tmp.npz',X=X)

# here come the overflow:

for i in xrange(1000000):

data = np.load('tmp.npz')

data.close() # avoid the "too many files are open" error

in my real application the loop is over a list of files and the overflow exceeds 24GB of RAM!

please note that this was tried on ubuntu 11.10, and for both numpy v

1.5.1 as well as 1.6.0

I have filed a report in numpy ticket 2048 but this may be of a wider interest and so I am posting it here as well (moreover, I am not sure that this is a bug but may result of my bad programming).

SOLUTION (by HYRY):

the command

del data.f

should precede the command

data.close()

for more information and a method to find the solution, please read HYRY's kind answer below

解决方案

I think this is a bug, and maybe I found the solution: call "del data.f".

for i in xrange(10000000):

data = np.load('tmp.npz')

del data.f

data.close() # avoid the "too many files are open" error

to found this kind of memory leak. you can use the following code:

import numpy as np

import gc

# here come the overflow:

for i in xrange(10000):

data = np.load('tmp.npz')

data.close() # avoid the "too many files are open" error

d = dict()

for o in gc.get_objects():

name = type(o).__name__

if name not in d:

d[name] = 1

else:

d[name] += 1

items = d.items()

items.sort(key=lambda x:x[1])

for key, value in items:

print key, value

After the test program, I created a dict and count objects in gc.get_objects(). Here is the output:

...

wrapper_descriptor 1382

function 2330

tuple 9117

BagObj 10000

NpzFile 10000

list 20288

dict 21001

From the result we know that there are something wrong with BagObj and NpzFile. Find the code:

class NpzFile(object):

def __init__(self, fid, own_fid=False):

...

self.zip = _zip

self.f = BagObj(self)

if own_fid:

self.fid = fid

else:

self.fid = None

def close(self):

"""

Close the file.

"""

if self.zip is not None:

self.zip.close()

self.zip = None

if self.fid is not None:

self.fid.close()

self.fid = None

def __del__(self):

self.close()

class BagObj(object):

def __init__(self, obj):

self._obj = obj

def __getattribute__(self, key):

try:

return object.__getattribute__(self, '_obj')[key]

except KeyError:

raise AttributeError, key

NpzFile has del(), NpzFile.f is a BagObj, and BagObj._obj is NpzFile, this is a reference cycle and will cause both NpzFile and BagObj uncollectable. Here is some explanation in Python document: http://docs.python.org/library/gc.html#gc.garbage

So, to break the reference cycle, will need to call "del data.f"

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值