以下的演示均在python 3环境下进行。
我们以一个简单的文件打开和文件读为例说明,pickle读文件时可能存在的编码问题:
import gzip
import pickle
# 使用with结构避免手动的文件关闭操作
with gzip.open('./mnist.pkl.gz', 'rb') as f:
training_data, validation_data, test_data = pickle.load(f)
如果沿用python 2.x的做法,如上所示,而不做任何编码格式上的设置,直接运行,编译器会提示如下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
0. 一种 import 的兼容性解决思路
try:
import cPickle as pickle
except ImportError:
import pickle
1. 解决方案
一种可行的解决方案是:
with gzip.open('./mnist.pkl.gz', 'rb') as f:
u = pickle._Unpickler(f)
u.encoding = 'latin1'
training_data, validation_data, test_data = u.load()
或者更为精炼地:
with gzip.open('./mnist.pkl.gz', 'rb') as f:
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
references
[1] <Pickle incompatability of numpy arrays between Python 2 and 3>