# MessagePack for Python
[![Build Status](https://travis-ci.org/msgpack/msgpack-python.svg?branch=master)](https://travis-ci.org/msgpack/msgpack-python)
[![Documentation Status](https://readthedocs.org/projects/msgpack-python/badge/?version=latest)](https://msgpack-python.readthedocs.io/en/latest/?badge=latest)
## What's this
`MessagePack `_ is an efficient binary serialization format.
It lets you exchange data among multiple languages like JSON.
But it's faster and smaller.
This package provides CPython bindings for reading and writing MessagePack data.
## Very important notes for existing users
### PyPI package name
TL;DR: When upgrading from msgpack-0.4 or earlier, don't do `pip install -U msgpack-python`.
Do `pip uninstall msgpack-python; pip install msgpack` instead.
Package name on PyPI was changed to msgpack from 0.5.
I upload transitional package (msgpack-python 0.5 which depending on msgpack)
for smooth transition from msgpack-python to msgpack.
Sadly, this doesn't work for upgrade install. After `pip install -U msgpack-python`,
msgpack is removed, and `import msgpack` fail.
### Compatibility with the old format
You can use ``use_bin_type=False`` option to pack ``bytes``
object into raw type in the old msgpack spec, instead of bin type in new msgpack spec.
You can unpack old msgpack format using ``raw=True`` option.
It unpacks str (raw) type in msgpack into Python bytes.
See note below for detail.
### Major breaking changes in msgpack 1.0
* Python 2
* The extension module does not support Python 2 anymore.
The pure Python implementation (``msgpack.fallback``) is used for Python 2.
* Packer
* ``use_bin_type=True`` by default. bytes are encoded in bin type in msgpack.
**If you are still sing Python 2, you must use unicode for all string types.**
You can use ``use_bin_type=False`` to encode into old msgpack format.
* ``encoding`` option is removed. UTF-8 is used always.
* Unpacker
* ``raw=False`` by default. It assumes str types are valid UTF-8 string
and decode them to Python str (unicode) object.
* ``encdoding`` option is rmeoved. You can use ``raw=True`` to support old format.
* Default value of ``max_buffer_size`` is changed from 0 to 100 MiB.
* Default value of ``strict_map_key`` is changed to True to avoid hashdos.
You need to pass ``strict_map_key=False`` if you have data which contain map keys
which type is not bytes or str.
## Install
$ pip install msgpack
### Pure Python implementation
The extension module in msgpack (``msgpack._cmsgpack``) does not support
Python 2 and PyPy.
But msgpack provides a pure Python implementation (``msgpack.fallback``)
for PyPy and Python 2.
Since the [pip](https://pip.pypa.io/) uses the pure Python implementation,
Python 2 support will not be dropped in the foreseeable future.
### Windows
When you can't use a binary distribution, you need to install Visual Studio
or Windows SDK on Windows.
Without extension, using pure Python implementation on CPython runs slowly.
## How to use
NOTE: In examples below, I use ``raw=False`` and ``use_bin_type=True`` for users
using msgpack < 1.0. These options are default from msgpack 1.0 so you can omit them.
### One-shot pack & unpack
Use ``packb`` for packing and ``unpackb`` for unpacking.
msgpack provides ``dumps`` and ``loads`` as an alias for compatibility with
``json`` and ``pickle``.
``pack`` and ``dump`` packs to a file-like object.
``unpack`` and ``load`` unpacks from a file-like object.
```pycon
>>> import msgpack
>>> msgpack.packb([1, 2, 3], use_bin_type=True)
'\x93\x01\x02\x03'
>>> msgpack.unpackb(_, raw=False)
[1, 2, 3]
```
``unpack`` unpacks msgpack's array to Python's list, but can also unpack to tuple:
```pycon
>>> msgpack.unpackb(b'\x93\x01\x02\x03', use_list=False, raw=False)
(1, 2, 3)
```
You should always specify the ``use_list`` keyword argument for backward compatibility.
See performance issues relating to `use_list option`_ below.
Read the docstring for other options.
### Streaming unpacking
``Unpacker`` is a "streaming unpacker". It unpacks multiple objects from one
stream (or from bytes provided through its ``feed`` method).
```py
import msgpack
from io import BytesIO
buf = BytesIO()
for i in range(100):
buf.write(msgpack.packb(i, use_bin_type=True))
buf.seek(0)
unpacker = msgpack.Unpacker(buf, raw=False)
for unpacked in unpacker:
print(unpacked)
```
### Packing/unpacking of custom data type
It is also possible to pack/unpack custom data types. Here is an example for
``datetime.datetime``.
```py
import datetime
import msgpack
useful_dict = {
"id": 1,
"created": datetime.datetime.now(),
}
def decode_datetime(obj):
if b'__datetime__' in obj:
obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
return obj
def encode_datetime(obj):
if isinstance(obj, datetime.datetime):
return {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f")}
return obj
packed_dict = msgpack.packb(useful_dict, default=encode_datetime, use_bin_type=True)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime, raw=False)
```
``Unpacker``'s ``object_hook`` callback receives a dict; the
``object_pairs_hook`` callback may instead be used to receive a list of
key-value pairs.
### Extended types
It is also possible to pack/unpack custom data types using the **ext** type.
```pycon
>>> import msgpack
>>> import array
>>> def default(obj):
... if isinstance(obj, array.array) and obj.typecode == 'd':
... return msgpack.ExtType(42, obj.tostring())
... raise TypeError("Unknown type: %r" % (obj,))
...
>>> def ext_hook(code, data):
... if code == 42:
... a = array.array('d')
... a.fromstring(data)
... return a
... return ExtType(code, data)
...
>>> data = array.array('d', [1.2, 3.4])
>>> packed = msgpack.packb(data, default=default, use_bin_type=True)
>>> unpacked = msgpack.unpackb(packed, ext_hook=ext_hook, raw=False)
>>> data == unpacked
True
```
### Advanced unpacking control
As an alternative to iteration, ``Unpacker`` objects provide ``unpack``,
``skip``, ``read_array_header`` and ``read_map_header`` methods. The former two
read an entire message from the stream, respectively de-serialising and returning
the result, or ignoring it. The latter two methods return the number of elements
in the upcoming container, so that each element in an array, or key-value pair
in a map, can be unpacked or skipped individually.
Each of these methods may optionally write the packed data it reads to a
callback function:
```py
from io import BytesIO
def distribute(unpacker, get_worker):
nelems = unpacker.read_map_header()
for i in range(nelems):
# Select a worker for the given key
key = unpacker.unpack()
worker = get_worker(key)
# Send the value as a packed message to worker
bytestream = BytesIO()
unpacker.skip(bytestream.write)
worker.send(bytestream.getvalue())
```
## Notes
### string and binary type
Early versions of msgpack didn't distinguish string and binary types.
The type for representing both string and binary types was named **raw**.
You can pack into and unpack from this old spec using ``use_bin_type=False``
and ``raw=True`` options.
```pycon
>>> import msgpack
>>> msgpack.unpackb(msgpack.packb([b'spam', u'eggs'], use_bin_type=False), raw=True)
[b'spam', b'eggs']
>>> msgpack.unpackb(msgpack.packb([b'spam', u'eggs'], use_bin_type=True), raw=False)
[b'spam', 'eggs']
```
### ext type
To use the **ext** type, pass ``msgpack.ExtType`` object to packer.
```pycon
>>> import msgpack
>>> packed = msgpack.packb(msgpack.ExtType(42, b'xyzzy'))
>>> msgpack.unpackb(packed)
ExtType(code=42, data='xyzzy')
```
You can use it with ``default`` and ``ext_hook``. See below.
### Security
To unpacking data received from unreliable source, msgpack provides
two security options.
``max_buffer_size`` (default: 100*1024*1024) limits the internal buffer size.
It is used to limit the preallocated list size too.
``strict_map_key`` (default: ``True``) limits the type of map keys to bytes and str.
While msgpack spec doesn't limit the types of the map keys,
there is a risk of the hashdos.
If you need to support other types for map keys, use ``strict_map_key=False``.
### Performance tips
CPython's GC starts when growing allocated object.
This means unpacking may cause useless GC.
You can use ``gc.disable()`` when unpacking large message.
List is the default sequence type of Python.
But tuple is lighter than list.
You can use ``use_list=False`` while unpacking when performance is important.
## Development
### Test
MessagePack uses `pytest` for testing.
Run test with following command:
```
$ make test
```
一键复制
编辑
Web IDE
原始数据
按行查看
历史