问题:UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
、UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
预期:可正常编码encode、解码decode
原因:python会自动解码,若未指明解码方式,就会用sys.setdefaultencoding指明的方式解码,sys.setdefaultencoding默认的解码方式是ASCII
解决:
- 方法一:将默认编码ASCII改为UTF-8
>>> import sys
>>> reload(sys) # python2.5初始化后删除了sys.setdefaultencoding方法,因此需要重新载入
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
示例如下
>>> "小明".encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
>>> u'\u5c0f\u660e'.decode('unicode_escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> "小明".encode('utf-8')
'\xe5\xb0\x8f\xe6\x98\x8e'
>>> u'\u5c0f\u660e'.decode('unicode_escape')
u'\xe5\xb0\x8f\xe6\x98\x8e'
- 方法二:指出字符串的编码方式
示例如下
>>> "小明".encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
>>> "小明".decode('utf-8').encode('utf-8')
'\xe5\xb0\x8f\xe6\x98\x8e'
>>> u'\u5c0f\u660e'.decode('unicode_escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
>>> u'\xe5\xb0\x8f\xe6\x98\x8e'.encode('utf-8').decode('unicode_escape')
u'\xc3\xa5\xc2\xb0\xc2\x8f\xc3\xa6\xc2\x98\xc2\x8e'