python换行符输出多个数据_一次读取多个Python腌制数据，缓冲和换行符？-CSDN博客

to give you context:

I have a large file f, several Gigs in size. It contains consecutive pickles of different object that were generated by running

for obj in objs: cPickle.dump(obj, f)

I want to take advantage of buffering when reading this file. What I want, is to read several picked objects into a buffer at a time. What is the best way of doing this? I want an analogue of readlines(buffsize) for pickled data. In fact if the picked data is indeed newline delimited one could use readlines, but I am not sure if that is true.

Another option that I have in mind is to dumps() the pickled object to a string first and then to write the strings to a file, each separated by a newline. To read the file back I can use readlines() and loads(). But I fear that a pickled object may have the "\n" character and it will throw off this file reading scheme. Is my fear unfounded ?

One option is to pickle it out as a huge list of objects, but that will require more memory than I can afford. The setup can be sped up by multi-threading but I do not want to go there before I get the buffering working properly. Whats the "best practice" for situations like this.

EDIT:

I can also read in raw bytes into a buffer and invoke loads on that, but I need to know how many bytes of that buffer was consumed by loads so that I can throw the head away.

解决方案

file.readlines() returns a list of the entire contents of the file. You'll want to read a few lines at a time. I think this naive code should unpickle your data:

import pickle

infile = open('/tmp/pickle', 'rb')

buf = []

while True:

line = infile.readline()

if not line:

break

buf.append(line)

if line.endswith('.\n'):

print 'Decoding', buf

print pickle.loads(''.join(buf))

buf = []