python unicode error utf8_python json unicode utf-8处理总结

最新推荐文章于 2024-05-24 16:48:06 发布

weixin_39916511

最新推荐文章于 2024-05-24 16:48:06 发布

阅读量265

点赞数

文章标签： python unicode error utf8

本文链接：https://blog.csdn.net/weixin_39916511/article/details/111420281

版权

一、直接输出字典中文

在python中经常遇见直接print dict(字典)，或者dict转json，但是没有给特定的参数，然后打印json字符串，输出的中文就成了unicode码的情况，如下：

>>> import json

>>> d = {'name': '张三', 'age': '1'}

>>> print d

{'age': '1', 'name': '\xe5\xbc\xa0\xe4\xb8\x89'}

>>> jd = json.dumps(d)

>>> print jd

{"age": "1", "name": "\u5f20\u4e09"}

这种情况怎么办呢？

要将字典中的中文正确的输出，可以将d转换成json字符串，转换时指定ensure_ascii=false，如下

>>> d = {'name': '张三', 'age': '1'}

>>> print d

{'age': '1', 'name': '\xe5\xbc\xa0\xe4\xb8\x89'}

>>> jd = json.dumps(d, ensure_ascii=False, encoding='utf-8')

>>> print jd

{"age": "1", "name": "张三"}

参数ensure_ascii=False不能少, encoding可以省略，因为默认就是encoding='utf-8'

1、关于参数ensure_ascii的解释：

If ``ensure_ascii`` is true (the default), all non-ASCII characters in the

output are escaped with ``\uXXXX`` sequences, and the result is a ``str``

instance consisting of ASCII characters only. If ``ensure_ascii`` is

``False``, some chunks written to ``fp`` may be ``unicode`` instances.

This usually happens because the input contains unicode strings or the

``encoding`` parameter is used. Unless ``fp.write()`` explicitly

understands ``unicode`` (as in ``codecs.getwriter``) this is likely to

cause an error.

2、关于参数encoding的解释：

``encoding`` is the character encoding for str instances, default is UTF-8.

二、用python自带的json库将json转换成字典输出，输出是unicode码

在用json.loads(json_str)将json_str字符串转换成字典时，字典中的内容是unicode码，具体如下：

>>> ud = json.loads(jd, encoding='utf-8')

>>> print ud

{u'age': u'1', u'name': u'\u5f20\u4e09'}

字典中的字符串都带的u，要想去掉u，有两种办法

1、使用yaml库的yaml.safe_load(jd)(python2.7.10 yaml5.1.2的safe_load方法没有encoding参数)

>>> import yaml

>>> d = {'name': '张三', 'age': '1'}

>>> print d

{'age': '1', 'name': '\xe5\xbc\xa0\xe4\xb8\x89'}

>>> jd = json.dumps(d, ensure_ascii=False, encoding='utf-8')

>>> ud = json.loads(jd, encoding='utf-8')

>>> print ud

{u'age': u'1', u'name': u'\u5f20\u4e09'}

# python2.7.10 yaml5.1.2的safe_load方法没有encoding参数

>>> ud = yaml.safe_load(jd, encoding='utf-8')

Traceback (most recent call last):

File "", line 1, in

TypeError: safe_load() got an unexpected keyword argument 'encoding'

>>> ud = yaml.safe_load(jd)

>>> print ud

{'age': '1', 'name': u'\u5f20\u4e09'}

2、递归实现转码函数自己去将json.loads()返回的字典从unicode码转成自己想要的码，实现如下：

>>> def byteify(input, encoding='utf-8'):

... if isinstance(input, dict):

... return {byteify(key): byteify(value) for key, value in input.iteritems()}

... elif isinstance(input, list):

... return [byteify(element) for element in input]

... elif isinstance(input, unicode):

... return input.encode(encoding)

... else:

... return input

...

>>> d = {'name': '张三', 'age': '1'}

>>> print d

{'age': '1', 'name': '\xe5\xbc\xa0\xe4\xb8\x89'}

>>> jd = json.dumps(d, ensure_ascii=False, encoding='utf-8')

>>> ud = json.loads(jd, encoding='utf-8')

>>> print ud

{u'age': u'1', u'name': u'\u5f20\u4e09'}

>>> ud = byteify(ud)

>>> print ud

{'age': '1', 'name': '\xe5\xbc\xa0\xe4\xb8\x89'}

>>> print ud['name']

张三

>>>

这次是彻底的将json.loads()返回的字典转换码成了utf-8，至于输出为什么是乱码？别忘了，开头第一点说的，直接print字典，中文是会乱码的，但是print ud['name'] 就会正常显示中文'张三'。

三、参考文章

1、https://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-from-json

2、https://www.jianshu.com/p/90ecc5987a18

注意：本文归作者所有，未经作者允许，不得转载

weixin_39916511

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫