I've been working with XML resources, and it seems that Python is issuing a weird behavior. I've tested both lxml library and xml.etree.ElementTree, both holding memory after it should be collected by gc. I typed gc.collect() as a test, but nothing else happen: memory still being held by process.
Imports:
import time
from lxml import etree
import gc
This is the code:
def process_alternative():
"""
This alternative process will use lxml
"""
filename = u"/tmp/randomness.xml"
fd = open(filename, 'r')
tree = etree.parse(fd)
root = tree.getroot()
accum = {}
for _item in root.iter("*"):
for _field in _item.iter("*"):
if _field.tag in accum.keys():
accum[_field.tag] += 1
else:
accum[_field.tag] = 1
for key in accum.keys():
print "%s -> %i" % (key, accum[key])
fd.close()
gc.collect()
And this is my main
if __name__ == "__main__":
while True:
print "Wake up!"
process_alternative()
print "Sleeping..."
time.sleep(30)
As you see, this main calls "process_alternative", and then sleep. XML file provided loads memory with nearly 800MB; so, before time.sleep, memory should be freed by process, returning to basic VM memory needed (around 32MB?). Instead, process continue holding around 800MB.
Any tip about why memory has not been freed after every iteration?
Using ubuntu 13.04, Python 2.7.4
This function deallocates memory in every iteration
def check_memory():
ac1 = [a1**5 for a1 in xrange(10000000)]
time.sleep(5)
ac2 = [a1**5 for a1 in xrange(10000000)]
time.sleep(5)
ac3 = [a1**5 for a1 in xrange(10000000)]
解决方案
I do not know why, but process still hold memory, even when I set an explicit call to gc.collect().
After some playing, and thanks to Martijn Pieters, a solution appeared. Calling
len(gc.get_objects())
frees all accessed memory, and keeps process on right resources when it's not busy. Strange, but true.
这篇博客探讨了在使用Python的lxml和xml.etree.ElementTree库处理XML资源时遇到的内存泄漏问题。即使调用了垃圾回收器gc.collect(),内存仍然被占用。作者发现,通过调用len(gc.get_objects())可以释放内存,保持进程资源的正常状态。这个问题在Ubuntu 13.04和Python 2.7.4环境下出现。
2265

被折叠的 条评论
为什么被折叠?



