I have a large dictionary whose structure looks like:
dcPaths = {'id_jola_001': CPath instance}
where CPath is a self-defined class:
class CPath(object):
def __init__(self):
# some attributes
self.m_dAvgSpeed = 0.0
...
# a list of CNode instance
self.m_lsNodes = []
where m_lsNodes is a list of CNode:
class CNode(object):
def __init__(self):
# some attributes
self.m_nLoc = 0
# a list of Apps
self.m_lsApps = []
Here, m_lsApps is a list of CApp, which is another self-defined class:
class CApp(object):
def __init__(self):
# some attributes
self.m_nCount= 0
self.m_nUpPackets = 0
I serialize this dictionary by using cPickle:
def serialize2File(strFileName, strOutDir, obj):
if len(obj) != 0:
strOutFilePath = "%s%s" % (strOutDir, strFileName)
with open(strOutFilePath, 'w') as hOutFile:
cPickle.dump(obj, hOutFile, protocol=0)
return strOutFilePath
else:
print("Nothing to serialize!")
It works fine and the size of serialized file is about 6.8GB. However, when I try to deserialize this object:
def deserializeFromFile(strFilePath):
obj = 0
with open(strFilePath) as hFile:
obj = cPickle.load(hFile)
return obj
I find it consumes more than 90GB memory and takes a long time.
why would this happen?
Is there any way I could optimize this?
BTW, I'm using python 2.7.6
解决方案
You can try specifying the pickle protocol; fastest is -1 (meaning: latest
protocol, no problem if you are pickling and unpickling with the same Python version).
cPickle.dump(obj, file, protocol = -1)
EDIT:
As said in the comments: load detects the protocol itself.
cPickle.load(obj, file)