python json dumps utf8,将json.dumps中的utf-8文本保存为UTF8,而不是\u转义序列

sample code:

>>> import json

>>> json_string = json.dumps("ברי צקלה")

>>> print json_string

"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps. (and i'd rather not use XML)

Is there a way to serialize objects into utf-8 json string (instead of \uXXXX ) ?

this doesn't help:

>>> output = json_string.decode('string-escape')

"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

this works, but if any sub-objects is a python-unicode and not utf-8, it'll dump garbage:

>>> #### ok:

>>> s= json.dumps( "ברי צקלה", ensure_ascii=False)

>>> print json.loads(s)

ברי צקלה

>>> #### NOT ok:

>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }

>>> print d

{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94',

2: u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'}

>>> s = json.dumps( d, ensure_ascii=False, encoding='utf8')

>>> print json.loads(s)['1']

ברי צקלה

>>> print json.loads(s)['2']

××¨× ×¦×§××

i searched the json.dumps documentation but couldn't find something useful.

Edit - Solution(?):

i'll try to sum up the comments and answers by Martijn Pieters:

(edit: 2nd thought after @Sebastian's comment and about a year later)

there might be no is a built-in solution in json.dumps.

i'll have to convert all strings to UTF8 Unicode the object before it's being JSON-ed.

i'll use Mark's function that converts strings recuresively in a nested object

the example I gave depends too much on my computer & IDE environment, and doesn't run the same on all computers.

Thank you everybody :)

解决方案

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

>>> json_string = json.dumps(u"ברי צקלה", ensure_ascii=False).encode('utf8')

>>> json_string

'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'

>>> print json_string

"ברי צקלה"

If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

with io.open('filename', 'w', encoding='utf8') as json_file:

json.dump(u"ברי צקלה", json_file, ensure_ascii=False)

In Python 3, the built-in open() is an alias for io.open(). Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

with io.open('filename', 'w', encoding='utf8') as json_file:

data = json.dumps(u"ברי צקלה", ensure_ascii=False)

# unicode(data) auto-decodes data to unicode if str

json_file.write(unicode(data))

If you are passing in byte strings (type str in Python 2, bytes in Python 3) encoded to UTF-8, make sure to also set the encoding keyword:

>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }

>>> d

{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}

>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')

>>> s

u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'

>>> json.loads(s)['1']

u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'

>>> json.loads(s)['2']

u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'

>>> print json.loads(s)['1']

ברי צקלה

>>> print json.loads(s)['2']

ברי צקלה

Note that your second sample is not valid Unicode; you gave it UTF-8 bytes as a unicode literal, that would never work:

>>> s = u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'

>>> print s

××¨× ×¦×§××

>>> print s.encode('latin1').decode('utf8')

ברי צקלה

Only when I encoded that string to Latin 1 (whose unicode codepoints map one-to-one to bytes) then decode as UTF-8 do you see the expected output. That has nothing to do with JSON and everything to do with that you use the wrong input. The result is called a Mojibake.

If you got that Unicode value from a string literal, it was decoded using the wrong codec. It could be your terminal is mis-configured, or that your text editor saved your source code using a different codec than what you told Python to read the file with. Or you sourced it from a library that applied the wrong codec. This all has nothing to do with the JSON library.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值