python 对象序列化_11.1. pickle —— Python 对象序列化 — Python 2.7.18 文档

最新推荐文章于 2023-11-30 15:10:34 发布

weixin_39525812

最新推荐文章于 2023-11-30 15:10:34 发布

阅读量294

点赞数

文章标签： python 对象序列化

11.1.pickle —— Python 对象序列化¶

The pickle module implements a fundamental, but powerful algorithm for

serializing and de-serializing a Python object structure. “Pickling” is the

process whereby a Python object hierarchy is converted into a byte stream, and

“unpickling” is the inverse operation, whereby a byte stream is converted back

into an object hierarchy. Pickling (and unpickling) is alternatively known as

“serialization”, “marshalling,” 1 or “flattening”, however, to avoid

confusion, the terms used here are “pickling” and “unpickling”.

This documentation describes both the pickle module and the

cPickle module.

警告

pickle 模块在接受被错误地构造或者被恶意地构造的数据时不安全。永远不要 unpickle 来自于不受信任的或者未经验证的来源的数据。

11.1.1.与其他 Python 模块间的关系¶

The pickle module has an optimized cousin called the cPickle

module. As its name implies, cPickle is written in C, so it can be up to

1000 times faster than pickle. However it does not support subclassing

of the Pickler() and Unpickler() classes, because in cPickle

these are functions, not classes. Most applications have no need for this

functionality, and can benefit from the improved performance of cPickle.

Other than that, the interfaces of the two modules are nearly identical; the

common interface is described in this manual and differences are pointed out

where necessary. In the following discussions, we use the term “pickle” to

collectively describe the pickle and cPickle modules.

The data streams the two modules produce are guaranteed to be interchangeable.

Python 有一个更原始的序列化模块称为 marshal，但一般地 pickle 应该是序列化 Python 对象时的首选。marshal 存在主要是为了支持 Python 的 .pyc 文件.

pickle 模块与 marshal 在如下几方面显著地不同：

pickle 模块会跟踪已被序列化的对象，所以该对象之后再次被引用时不会再次被序列化。marshal 不会这么做。

这隐含了递归对象和共享对象。递归对象指包含对自己的引用的对象。这种对象并不会被 marshal 接受，并且实际上尝试 marshal 递归对象会让你的 Python 解释器崩溃。对象共享发生在对象层级中存在多处引用同一对象时。pickle 只会存储这些对象一次，并确保其他的引用指向同一个主副本。共享对象将保持共享，这可能对可变对象非常重要。

marshal 不能被用于序列化用户定义类及其实例。pickle 能够透明地存储并保存类实例，然而此时类定义必须能够从与被存储时相同的模块被引入。

The marshal serialization format is not guaranteed to be portable

across Python versions. Because its primary job in life is to support

.pyc files, the Python implementers reserve the right to change the

serialization format in non-backwards compatible ways should the need arise.

The pickle serialization format is guaranteed to be backwards compatible

across Python releases.

Note that serialization is a more primitive notion than persistence; although

pickle reads and writes file objects, it does not handle the issue of

naming persistent objects, nor the (even more complicated) issue of concurrent

access to persistent objects. The pickle module can transform a complex

object into a byte stream and it can transform the byte stream into an object

with the same internal structure. Perhaps the most obvious thing to do with

these byte streams is to write them onto a file, but it is also conceivable to

send them across a network or store them in a database. The module

shelve provides a simple interface to pickle and unpickle objects on

DBM-style database files.

11.1.2.数据流格式¶

The data format used by pickle is Python-specific. This has the

advantage that there are no restrictions imposed by external standards such as

XDR (which can’t represent pointer sharing); however it means that non-Python

programs may not be able to reconstruct pickled Python objects.

By default, the pickle data format uses a printable ASCII representation.

This is slightly more voluminous than a binary representation. The big

advantage of using printable ASCII (and of some other characteristics of

pickle’s representation) is that for debugging or recovery purposes it is

possible for a human to read the pickled file with a standard text editor.

There are currently 3 different protocols which can be used for pickling.

Protocol version 0 is the original ASCII protocol and is backwards compatible

with earlier versions of Python.

Protocol version 1 is the old binary format which is also compatible with

earlier versions of Python.

Protocol version 2 was introduced in Python 2.3. It provides much more

efficient pickling of new-style classes.

Refer to PEP 307 for more information.

If a protocol is not specified, protocol 0 is used. If protocol is specified

as a negative value or HIGHEST_PROTOCOL, the highest protocol version

available will be used.

在 2.3 版更改:Introduced the protocol parameter.

A binary format, which is slightly more efficient, can be chosen by specifying a

protocol version >= 1.

11.1.3.Usage¶

To serialize an object hierarchy, you first create a pickler, then you call the

pickler’s dump() method. To de-serialize a data stream, you first create

an unpickler, then you call the unpickler’s load() method. The

pickle module provides the following constant:

pickle.HIGHEST_PROTOCOL¶

The highest protocol version available. This value can be passed as a

protocol value.

2.3 新版功能.

注解

Be sure to always open pickle files created with protocols >= 1 in binary mode.

For the old ASCII-based pickle protocol 0 you can use either text mode or binary

mode as long as you stay consistent.

A pickle file written with protocol 0 in binary mode will contain lone linefeeds

as line terminators and therefore will look “funny” when viewed in Notepad or

other editors which do not support this format.

pickle 模块提供了以下方法，让打包过程更加方便。

pickle.dump(obj, file[, protocol])¶

Write a pickled representation of obj to the open file object file. This is

equivalent to Pickler(file, protocol).dump(obj).

If the protocol parameter is omitted, protocol 0 is used. If protocol is

specified as a negative value or HIGHEST_PROTOCOL, the highest protocol

version will be used.

在 2.3 版更改:Introduced the protocol parameter.

file must have a write() method that accepts a single string argument.

It can thus be a file object opened for writing, a StringIO object, or

any other custom object that meets this interface.

pickle.load(file)¶

Read a string from the open file object file and interpret it as a pickle data

stream, reconstructing and returning the original object hierarchy. This is

equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer

argument, and a readline() method that requires no arguments. Both

methods should return a string. Thus file can be a file object opened for

reading, a StringIO object, or any other custom object that meets this

interface.

This function automatically determines whether the data stream was written in

binary mode or not.

pickle.dumps(obj[, protocol])¶

Return the pickled representation of the object as a string, instead of writing

it to a file.

If the protocol parameter is omitted, protocol 0 is used. If protocol is

specified as a negative value or HIGHEST_PROTOCOL, the highest protocol

version will be used.

在 2.3 版更改:The protocol parameter was added.

pickle.loads(string)¶

Read a pickled object hierarchy from a string. Characters in the string past

the pickled object’s representation are ignored.

The pickle module also defines three exceptions:

exceptionpickle.PickleError¶

A common base class for the other exceptions defined below. This inherits from

Exception.

exceptionpickle.PicklingError¶

This exception is raised when an unpicklable object is passed to the

dump() method.

exceptionpickle.UnpicklingError¶

This exception is raised when there is a problem unpickling an object. Note that

other exceptions may also be raised during unpickling, including (but not

necessarily limited to) AttributeError, EOFError,

ImportError, and IndexError.

The pickle module also exports two callables 2, Pickler and

Unpickler:

classpickle.Pickler(file[, protocol])¶

This takes a file-like object to which it will write a pickle data stream.

If the protocol parameter is omitted, protocol 0 is used. If protocol is

specified as a negative value or HIGHEST_PROTOCOL, the highest

protocol version will be used.

在 2.3 版更改:Introduced the protocol parameter.

file must have a write() method that accepts a single string argument.

It can thus be an open file object, a StringIO object, or any other

custom object that meets this interface.

Pickler objects define one (or two) public methods:

dump(obj)¶

Write a pickled representation of obj to the open file object given in the

constructor. Either the binary or ASCII format will be used, depending on the

value of the protocol argument passed to the constructor.

clear_memo()¶

Clears the pickler’s “memo”. The memo is the data structure that remembers

which objects the pickler has already seen, so that shared or recursive objects

pickled by reference and not by value. This method is useful when re-using

picklers.

注解

Prior to Python 2.3, clear_memo() was only available on the picklers

created by cPickle. In the pickle module, picklers have an

instance variable called memo which is a Python dictionary. So to clear

the memo for a pickle module pickler, you could do the following:

mypickler.memo.clear()

Code that does not need to support older versions of Python should simply use

clear_memo().

It is possible to make multiple calls to the dump() method of the same

Pickler instance. These must then be matched to the same number of

calls to the load() method of the corresponding Unpickler

instance. If the same object is pickled by multiple dump() calls, the

load() will all yield references to the same object. 3

Unpickler objects are defined as:

classpickle.Unpickler(file)¶

This takes a file-like object from which it will read a pickle data stream.

This class automatically determines whether the data stream was written in

binary mode or not, so it does not need a flag as in the Pickler

factory.

file must have two methods, a read() method that takes an integer

argument, and a readline() method that requires no arguments. Both

methods should return a string. Thus file can be a file object opened for

reading, a StringIO object, or any other custom object that meets this

interface.

Unpickler objects have one (or two) public methods:

load()¶

Read a pickled object representation from the open file object given in

the constructor, and return the reconstituted object hierarchy specified

therein.

This method automatically determines whether the data stream was written

in binary mode or not.

noload()¶

This is just like load() except that it doesn’t actually create any

objects. This is useful primarily for finding what’s called “persistent

ids” that may be referenced in a pickle data stream. See section

The pickle protocol below for more details.

Note: the noload() method is currently only available on

Unpickler objects created with the cPickle module.

pickle module Unpicklers do not have the noload()

method.

11.1.4.可以被打包/解包的对象¶

下列类型可以被打包：

None、True 和 False

integers, long integers, floating point numbers, complex numbers

normal and Unicode strings

只包含可打包对象的集合，包括 tuple、list、set 和 dict

functions defined at the top level of a module

定义在模块顶层的内置函数

定义在模块顶层的类

instances of such classes whose __dict__ or the result of

calling __getstate__() is picklable (see section The pickle protocol

for details).

Attempts to pickle unpicklable objects will raise the PicklingError

exception; when this happens, an unspecified number of bytes may have already

been written to the underlying file. Trying to pickle a highly recursive data

structure may exceed the maximum recursion depth, a RuntimeError will be

raised in this case. You can carefully raise this limit with

sys.setrecursionlimit().

Note that functions (built-in and user-defined) are pickled by “fully qualified”

name reference, not by value. This means that only the function name is

pickled, along with the name of the module the function is defined in. Neither

the function’s code, nor any of its function attributes are pickled. Thus the

defining module must be importable in the unpickling environment, and the module

must contain the named object, otherwise an exception will be raised. 4

同样的，类也只打包名称，所以在解包环境中也有和函数相同的限制。注意，类体及其数据不会被打包，所以在下面的例子中类属性 attr 不会存在于解包后的环境中：

class Foo:

attr = 'a class attr'

picklestring = pickle.dumps(Foo)

这些限制决定了为什么必须在一个模块的顶层定义可打包的函数和类。

类似的，在打包类的实例时，其类体和类数据不会跟着实例一起被打包，只有实例数据会被打包。这样设计是有目的的，在将来修复类中的错误、给类增加方法之后，仍然可以载入原来版本类实例的打包数据来还原该实例。如果你准备长期使用一个对象，可能会同时存在较多版本的类体，可以为对象添加版本号，这样就可以通过类的 __setstate__() 方法将老版本转换成新版本。

11.1.5.The pickle protocol¶

This section describes the “pickling protocol” that defines the interface

between the pickler/unpickler and the objects that are being serialized. This

protocol provides a standard way for you to define, customize, and control how

your objects are serialized and de-serialized. The description in this section

doesn’t cover specific customizations that you can employ to make the unpickling

environment slightly safer from untrusted pickle data streams; see section

Subclassing Unpicklers for more details.

11.1.5.1.Pickling and unpickling normal class instances¶

object.__getinitargs__()¶

When a pickled class instance is unpickled, its __init__() method is

normally not invoked. If it is desirable that the __init__() method

be called on unpickling, an old-style class can define a method

__getinitargs__(), which should return a tuple of positional

arguments to be passed to the class constructor (__init__() for

example). Keyword arguments are not supported. The __getinitargs__()

method is called at pickle time; the tuple it returns is incorporated in the

pickle for the instance.

object.__getnewargs__()¶

New-style types can provide a __getnewargs__() method that is used for

protocol 2. Implementing this method is needed if the type establishes some

internal invariants when the instance is created, or if the memory allocation

is affected by the values passed to the __new__() method for the type

(as it is for tuples and strings). Instances of a new-style class

C are created using

obj = C.__new__(C, *args)

where args is the result of calling __getnewargs__() on the original

object; if there is no __getnewargs__(), an empty tuple is assumed.

object.__getstate__()¶

Classes can further influence how their instances are pickled; if the class

defines the method __getstate__(), it is called and the return state is

pickled as the contents for the instance, instead of the contents of the

instance’s dictionary. If there is no __getstate__() method, the

instance’s __dict__ is pickled.

object.__setstate__(state)¶

Upon unpickling, if the class also defines the method __setstate__(),

it is called with the unpickled state. 5 If there is no

__setstate__() method, the pickled state must be a dictionary and its

items are assigned to the new instance’s dictionary. If a class defines both

__getstate__() and __setstate__(), the state object needn’t be a

dictionary and these methods can do what they want. 6

注解

For new-style classes, if __getstate__() returns a false

value, the __setstate__() method will not be called.

注解

At unpickling time, some methods like __getattr__(),

__getattribute__(), or __setattr__() may be called upon the

instance. In case those methods rely on some internal invariant being

true, the type should implement either __getinitargs__() or

__getnewargs__() to establish such an invariant; otherwise, neither

__new__() nor __init__() will be called.

11.1.5.2.Pickling and unpickling extension types¶

object.__reduce__()¶

When the Pickler encounters an object of a type it knows nothing

about — such as an extension type — it looks in two places for a hint of

how to pickle it. One alternative is for the object to implement a

__reduce__() method. If provided, at pickling time __reduce__()

will be called with no arguments, and it must return either a string or a

tuple.

If a string is returned, it names a global variable whose contents are

pickled as normal. The string returned by __reduce__() should be the

object’s local name relative to its module; the pickle module searches the

module namespace to determine the object’s module.

When a tuple is returned, it must be between two and five elements long.

Optional elements can either be omitted, or None can be provided as their

value. The contents of this tuple are pickled as normal and used to

reconstruct the object at unpickling time. The semantics of each element

are:

A callable object that will be called to create the initial version of the

object. The next element of the tuple will provide arguments for this

callable, and later elements provide additional state information that will

subsequently be used to fully reconstruct the pickled data.

In the unpickling environment this object must be either a class, a

callable registered as a “safe constructor” (see below), or it must have an

attribute __safe_for_unpickling__ with a true value. Otherwise, an

UnpicklingError will be raised in the unpickling environment. Note

that as usual, the callable itself is pickled by name.

A tuple of arguments for the callable object.

在 2.5 版更改:Formerly, this argument could also be None.

Optionally, the object’s state, which will be passed to the object’s

__setstate__() method as described in section Pickling and unpickling normal class instances. If

the object has no __setstate__() method, then, as above, the value

must be a dictionary and it will be added to the object’s

__dict__.

Optionally, an iterator (and not a sequence) yielding successive list

items. These list items will be pickled, and appended to the object using

either obj.append(item) or obj.extend(list_of_items). This is

primarily used for list subclasses, but may be used by other classes as

long as they have append() and extend() methods with the

appropriate signature. (Whether append() or extend() is used

depends on which pickle protocol version is used as well as the number of

items to append, so both must be supported.)

Optionally, an iterator (not a sequence) yielding successive dictionary

items, which should be tuples of the form (key, value). These items

will be pickled and stored to the object using obj[key] = value. This

is primarily used for dictionary subclasses, but may be used by other

classes as long as they implement __setitem__().

object.__reduce_ex__(protocol)¶

It is sometimes useful to know the protocol version when implementing

__reduce__(). This can be done by implementing a method named

__reduce_ex__() instead of __reduce__(). __reduce_ex__(),

when it exists, is called in preference over __reduce__() (you may

still provide __reduce__() for backwards compatibility). The

__reduce_ex__() method will be called with a single integer argument,

the protocol version.

The object class implements both __reduce__() and

__reduce_ex__(); however, if a subclass overrides __reduce__()

but not __reduce_ex__(), the __reduce_ex__() implementation

detects this and calls __reduce__().

An alternative to implementing a __reduce__() method on the object to be

pickled, is to register the callable with the copy_reg module. This

module provides a way for programs to register “reduction functions” and

constructors for user-defined types. Reduction functions have the same

semantics and interface as the __reduce__() method described above, except

that they are called with a single argument, the object to be pickled.

The registered constructor is deemed a “safe constructor” for purposes of

unpickling as described above.

11.1.5.3.Pickling and unpickling external objects¶

For the benefit of object persistence, the pickle module supports the

notion of a reference to an object outside the pickled data stream. Such

objects are referenced by a “persistent id”, which is just an arbitrary string

of printable ASCII characters. The resolution of such names is not defined by

the pickle module; it will delegate this resolution to user defined

functions on the pickler and unpickler. 7

To define external persistent id resolution, you need to set the

persistent_id attribute of the pickler object and the

persistent_load attribute of the unpickler object.

To pickle objects that have an external persistent id, the pickler must have a

custom persistent_id() method that takes an object as an

argument and returns either None or the persistent id for that object.

When None is returned, the pickler simply pickles the object as normal.

When a persistent id string is returned, the pickler will pickle that string,

along with a marker so that the unpickler will recognize the string as a

persistent id.

To unpickle external objects, the unpickler must have a custom

persistent_load() function that takes a persistent id string

and returns the referenced object.

Here’s a silly example that might shed more light:

import pickle

from cStringIO import StringIO

src = StringIO()

p = pickle.Pickler(src)

def persistent_id(obj):

if hasattr(obj, 'x'):

return 'the value%d' % obj.x

else:

return None

p.persistent_id = persistent_id

class Integer:

def __init__(self, x):

self.x = x

def __str__(self):

return 'My name is integer%d' % self.x

i = Integer(7)

print i

p.dump(i)

datastream = src.getvalue()

print repr(datastream)

dst = StringIO(datastream)

up = pickle.Unpickler(dst)

class FancyInteger(Integer):

def __str__(self):

return 'I am the integer%d' % self.x

def persistent_load(persid):

if persid.startswith('the value '):

value = int(persid.split()[2])

return FancyInteger(value)

else:

raise pickle.UnpicklingError, 'Invalid persistent id'

up.persistent_load = persistent_load

j = up.load()

print j

In the cPickle module, the unpickler’s persistent_load

attribute can also be set to a Python list, in which case, when the unpickler

reaches a persistent id, the persistent id string will simply be appended to

this list. This functionality exists so that a pickle data stream can be

“sniffed” for object references without actually instantiating all the objects

in a pickle.

8 Setting persistent_load to a list is usually used in

conjunction with the noload() method on the Unpickler.

11.1.6.Subclassing Unpicklers¶

By default, unpickling will import any class that it finds in the pickle data.

You can control exactly what gets unpickled and what gets called by customizing

your unpickler. Unfortunately, exactly how you do this is different depending

on whether you’re using pickle or cPickle. 9

In the pickle module, you need to derive a subclass from

Unpickler, overriding the load_global() method.

load_global() should read two lines from the pickle data stream where the

first line will the name of the module containing the class and the second line

will be the name of the instance’s class. It then looks up the class, possibly

importing the module and digging out the attribute, then it appends what it

finds to the unpickler’s stack. Later on, this class will be assigned to the

__class__ attribute of an empty class, as a way of magically creating an

instance without calling its class’s __init__(). Your job (should you

choose to accept it), would be to have load_global() push onto the

unpickler’s stack, a known safe version of any class you deem safe to unpickle.

It is up to you to produce such a class. Or you could raise an error if you

want to disallow all unpickling of instances. If this sounds like a hack,

you’re right. Refer to the source code to make this work.

Things are a little cleaner with cPickle, but not by much. To control

what gets unpickled, you can set the unpickler’s find_global

attribute to a function or None. If it is None then any attempts to

unpickle instances will raise an UnpicklingError. If it is a function,

then it should accept a module name and a class name, and return the

corresponding class object. It is responsible for looking up the class and

performing any necessary imports, and it may raise an error to prevent

instances of the class from being unpickled.

The moral of the story is that you should be really careful about the source of

the strings your application unpickles.

11.1.7.Example¶

For the simplest code, use the dump() and load() functions. Note

that a self-referencing list is pickled and restored correctly.

import pickle

data1 = {'a': [1, 2.0, 3, 4+6j],

'b': ('string', u'Unicode string'),

'c': None}

selfref_list = [1, 2, 3]

selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.

pickle.dump(data1, output)

# Pickle the list using the highest protocol available.

pickle.dump(selfref_list, output, -1)

output.close()

The following example reads the resulting pickled data. When reading a

pickle-containing file, you should open the file in binary mode because you

can’t be sure if the ASCII or binary format was used.

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)

pprint.pprint(data1)

data2 = pickle.load(pkl_file)

pprint.pprint(data2)

pkl_file.close()

Here’s a larger example that shows how to modify pickling behavior for a class.

The TextReader class opens a text file, and returns the line number and

line contents each time its readline() method is called. If a

TextReader instance is pickled, all attributes except the file object

member are saved. When the instance is unpickled, the file is reopened, and

reading resumes from the last location. The __setstate__() and

__getstate__() methods are used to implement this behavior.

#!/usr/local/bin/python

class TextReader:

"""Print and number lines in a text file."""

def __init__(self, file):

self.file = file

self.fh = open(file)

self.lineno = 0

def readline(self):

self.lineno = self.lineno + 1

line = self.fh.readline()

if not line:

return None

if line.endswith("\n"):

line = line[:-1]

return "%d:%s" % (self.lineno, line)

def __getstate__(self):

odict = self.__dict__.copy() # copy the dict since we change it

del odict['fh'] # remove filehandle entry

return odict

def __setstate__(self, dict):

fh = open(dict['file']) # reopen file

count = dict['lineno'] # read from file...

while count: # until line count is restored

fh.readline()

count = count - 1

self.__dict__.update(dict) # update attributes

self.fh = fh # save the file object

使用方法如下所示：

>>>import TextReader

>>>obj = TextReader.TextReader("TextReader.py")

>>>obj.readline()

'1: #!/usr/local/bin/python'

>>>obj.readline()

'2: '

>>>obj.readline()

'3: class TextReader:'

>>>import pickle

>>>pickle.dump(obj, open('save.p', 'wb'))

If you want to see that pickle works across Python processes, start

another Python session, before continuing. What follows can happen from either

the same process or a new process.

>>>import pickle

>>>reader = pickle.load(open('save.p', 'rb'))

>>>reader.readline()

'4: """Print and number lines in a text file."""'

参见

为扩展类型提供 pickle 接口所需的构造函数。

带索引的数据库，用于存放对象，使用了 pickle 模块。

浅层 (shallow) 和深层 (deep) 复制对象操作

高效地序列化内置类型的数据。

weixin_39525812

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫