一、python有自动垃圾回收机制(当对象的引用计数为零时解释器会自动释放内存),出现内存泄露的场景一般是扩展库内存泄露或者循环引用(还有一种是全局容器里的对象没有删除)
前者无需讨论,后者举例如下(Obj('B')和Obj('C')的内存没有回收)(貌似循环引用的内存,Python解释器也会自己回收(标记-清除垃圾收集机制),只是时间早晚的问题,也就是说我们在编码中不需要耗费精力去刻意避免循环引用,具体的内容这两天再细看一下(http://stackoverflow.com/questions/4484167/details-how-python-garbage-collection-works 源码剖析的垃圾收集那一章还没看完真是心病啊)---2013.10.20)
[dongsong@localhost python_study]$ cat leak_test2.py
#encoding=utf-8
class Obj:
def __init__(self,name='A'):
self.name = name
print '%s inited' % self.name
def __del__(self):
print '%s deleted' % self.name
if __name__ == '__main__':
a = Obj('A')
b = Obj('B')
c = Obj('c')
c.attrObj = b
b.attrObj = c
[dongsong@localhost python_study]$ vpython leak_test2.py
A inited
B inited
c inited
A deleted
该模块可以找到增长最快的对象、实际最多的对象,可以画出某对象里面所有元素的引用关系图、某对象背后的所有引用关系图;可以根据地址获取对象
但是用它来找内存泄露还是有点大海捞针的感觉:需要自己更具增长最快、实际最多对象的日志来确定可疑对象(一般是list/dict/tuple等common对象,这个很难排查;如果最多最快的是自定义的非常规对象则比较好确定原因)
1.show_refs() show_backrefs() show_most_common_types() show_growth()
[dongsong@localhost python_study]$ !cat
cat objgraph1.py
#encoding=utf-8
import objgraph
if __name__ == '__main__':
x = []
y = [x, [x], dict(x=x)]
objgraph.show_refs([y], filename='/tmp/sample-graph.png') #把[y]里面所有对象的引用画出来
objgraph.show_backrefs([x], filename='/tmp/sample-backref-graph.png') #把对x对象的引用全部画出来
#objgraph.show_most_common_types() #所有常用类型对象的统计,数据量太大,意义不大
objgraph.show_growth(limit=4) #打印从程序开始或者上次show_growth到现在增加的对象(按照增加量的大小排序)
[dongsong@localhost python_study]$ !vpython
vpython objgraph1.py
Graph written to /tmp/tmpuSFr9A.dot (5 nodes)
Image generated as /tmp/sample-graph.png
Graph written to /tmp/tmpAn6niV.dot (7 nodes)
Image generated as /tmp/sample-backref-graph.png
tuple 3393 +3393
wrapper_descriptor 945 +945
function 830 +830
builtin_function_or_method 622 +622
sample-graph.png
sample-backref-graph.png
2.show_chain()
[dongsong@localhost python_study]$ cat objgraph2.py
#encoding=utf-8
import objgraph, inspect, random
class MyBigFatObject(object):
pass
def computate_something(_cache = {}):
_cache[42] = dict(foo=MyBigFatObject(),bar=MyBigFatObject())
x = MyBigFatObject()
if __name__ == '__main__':
objgraph.show_growth(limit=3)
computate_something()
objgraph.show_growth(limit=3)
objgraph.show_chain(
objgraph.find_backref_chain(random.choice(objgraph.by_type('MyBigFatObject')),
inspect.ismodule),
filename = '/tmp/chain.png')
#roots = objgraph.get_leaking_objects()
#print 'len(roots)=%d' % len(roots)
#objgraph.show_most_common_types(objects = roots)
#objgraph.show_refs(roots[:3], refcounts=True, filename='/tmp/roots.png')
[dongsong@localhost python_study]$ !vpython
vpython objgraph2.py
tuple 3400 +3400
wrapper_descriptor 945 +945
function 831 +831
wrapper_descriptor 956 +11
tuple 3406 +6
member_descriptor 165 +4
Graph written to /tmp/tmpklkHqC.dot (7 nodes)
Image generated as /tmp/chain.png
chain.png
三、gc模块
该模块可以确定垃圾回收期无法引用到(unreachable)和无法释放(uncollectable)的对象,跟objgraph相比有其独到之处
gc.collect()强制回收垃圾,返回unreachable object的数量
gc.garbage返回unreachable object中uncollectable object的列表(都是些有__del__()析构函数并且身陷引用循环的对象)IfDEBUG_SAVEALL is set, then all unreachable objects will be added to this list rather than freed.
warning:如果用gc.disable()把自动垃圾回收关掉了,然后又不主动gc.collect(),你会看到内存刷刷的被消耗....
[dongsong@bogon python_study]$ cat gc_test.py
#encoding=utf-8
import gc
class MyObj:
def __init__(self, name):
self.name = name
print "%s inited" % self.name
def __del__(self):
print "%s deleted" % self.name
if __name__ == '__main__':
gc.disable()
gc.set_debug(gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS | gc.DEBUG_SAVEALL)
a = MyObj('a')
b = MyObj('b')
c = MyObj('c')
a.attr = b
b.attr = a
a = None
b = None
c = None
if gc.isenabled():
print 'automatic collection is enabled'
else:
print 'automatic collection is disabled'
rt = gc.collect()
print "%d unreachable" % rt
garbages = gc.garbage
print "\n%d garbages:" % len(garbages)
for garbage in garbages:
if isinstance(garbage, MyObj):
print "obj-->%s name-->%s attrrMyObj-->%s" % (garbage, garbage.name, garbage.attr)
else:
print str(garbage)
[dongsong@bogon python_study]$ vpython gc_test.py
a inited
b inited
c inited
c deleted
automatic collection is disabled
gc: uncollectable <MyObj instance at 0x7f3ebd455b48>
gc: uncollectable <MyObj instance at 0x7f3ebd455b90>
gc: uncollectable <dict 0x261c4b0>
gc: uncollectable <dict 0x261bdf0>
4 unreachable
4 garbages:
obj--><__main__.MyObj instance at 0x7f3ebd455b48> name-->a attrrMyObj--><__main__.MyObj instance at 0x7f3ebd455b90>
obj--><__main__.MyObj instance at 0x7f3ebd455b90> name-->b attrrMyObj--><__main__.MyObj instance at 0x7f3ebd455b48>
{'name': 'a', 'attr': <__main__.MyObj instance at 0x7f3ebd455b90>}
{'name': 'b', 'attr': <__main__.MyObj instance at 0x7f3ebd455b48>}
四、pdb模块
详细手册:http://www.ibm.com/developerworks/cn/linux/l-cn-pythondebugger/
命令和gdb差不错(只是打印数据的时候不是必须加个p,而且调试界面和操作类似python交互模式)
h(elp) 帮助
c(ontinue) 继续
n(ext) 下一个语句
s(tep) 下一步(跟进函数内部)
b(reak) 设置断点
l(ist) 显示代码
bt 调用栈
回车 重复上一个命令
....
鸟人喜欢在需要调试的地方加入pdb.set_trace()然后进入状态....(其他还有好多方式备选)
五、django内存泄露
Why is Django leaking memory?
Django isn't known to leak memory. If you find your Django processes areallocating more and more memory, with no sign of releasing it, check to makesure yourDEBUG setting is set toFalse. IfDEBUGisTrue, then Django saves a copy of every SQL statement it has executed.
(The queries are saved in django.db.connection.queries. SeeHow can I see the raw SQL queries Django is running?.)
To fix the problem, set DEBUG toFalse.
If you need to clear the query list manually at any point in your functions,just callreset_queries(), like this:
from django import db
db.reset_queries()