python构建字典查大写英文字母ascii编码_Python的:将复杂的字符串的字典由统一code为ASCII...

I have a lot of input as multi-level dictionaries parsed from JSON API calls. The strings are all in unicode which means there is a lot of u'stuff like this'. I am using jq to play around with the results and need to convert these results to ASCII.

I know I can write a function to just convert it like that:

def convert(input):

if isinstance(input, dict):

ret = {}

for stuff in input:

ret = convert(stuff)

elif isinstance(input, list):

ret = []

for i in range(len(input))

ret = convert(input[i])

elif isinstance(input, str):

ret = input.encode('ascii')

elif :

ret = input

return ret

Is this even correct? Not sure. That's not what I want to ask you though.

What I'm asking is, this is a typical brute-force solution to the problem. There must be a better way. A more pythonic way. I'm no expert on algorithms, but this one doesn't look particularly fast either.

So is there a better way? Or if not, can this function be improved...?

Post-answer edit

Mark Amery's answer is correct but I would like to post a modified version of it. His function works on Python 2.7+ and I'm on 2.6 so had to convert it:

def convert(input):

if isinstance(input, dict):

return dict((convert(key), convert(value)) for key, value in input.iteritems())

elif isinstance(input, list):

return [convert(element) for element in input]

elif isinstance(input, unicode):

return input.encode('utf-8')

else:

return input

解决方案

Recursion seems like the way to go here, but if you're on python 2.xx you want to be checking for unicode, not str (the str type represents a string of bytes, and the unicode type a string of unicode characters; neither inherits from the other and it is unicode-type strings that are displayed in the interpreter with a u in front of them).

There's also a little syntax error in your posted code (the trailing elif: should be an else), and you're not returning the same structure in the case where input is either a dictionary or a list. (In the case of a dictionary, you're returning the converted version of the final key; in the case of a list, you're returning the converted version of the final element. Neither is right!)

You can also make your code pretty and Pythonic by using comprehensions.

Here, then, is what I'd recommend:

def convert(input):

if isinstance(input, dict):

return {convert(key): convert(value) for key, value in input.iteritems()}

elif isinstance(input, list):

return [convert(element) for element in input]

elif isinstance(input, unicode):

return input.encode('utf-8')

else:

return input

One final thing. I changed encode('ascii') to encode('utf-8'). My reasoning is as follows: any unicode string that contains only characters in the ASCII character set will be represented by the same byte string when encoded in ASCII as when encoded in utf-8, so using utf-8 instead of ASCII cannot break anything and the change will be invisible as long as the unicode strings you're dealing with use only ASCII characters. However, this change extends the scope of the function to be able to handle strings of characters from the entire unicode character set, rather than just ASCII ones, should such a thing ever be necessary.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值