python里的非_在Python中可靠的处理非ASCII字符的方法？

最新推荐文章于 2024-03-15 09:45:43 发布

weixin_39980711

最新推荐文章于 2024-03-15 09:45:43 发布

阅读量115

点赞数

文章标签： python里的非

I have a column a spreadsheet whose header contains non-ASCII characters thus:

'ï»¿Campaign'

If I pop this string into the interpreter, I get:

'\xc3\xaf\xc2\xbb\xc2\xbfCampaign'

The string is one the keys in the rows of a csv.DictReader()

When I try to populate a new dict with with the value of this key:

spends['ï»¿Campaign'] = 2

I get:

Key Error: '\xc3\xaf\xc2\xbb\xc2\xbfCampaign'

If I print the value of the keys of row, I can see that it is '\xef\xbb\xbfCampaign'

Obviously then I can just update my program to access this key thus:

spends['\xef\xbb\xbfCampaign']

But is there a "better" way of doing this, in Python? Indeed, if the value of this key every changes to contain other non-ASCII characters, what is an all-encompassing way of handling any all non-ASCII characters that may arise?

解决方案

In general, you should decode a bytestring into Unicode text using the corresponding character encoding as soon as possible on input. And, in reverse, encode Unicode text into a bytestring as late as possible on output. Some APIs such as io.open() can do it implicitly so that your code sees only Unicode.

Unfortunately, csv module does not support Unicode directly on Python 2. See UnicodeReader, UnicodeWriter in the doc examples. You could create their analog for csv.DictReader or as an alternative just pass utf-8 encoded bytestrings to csv module.

weixin_39980711

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python里的非_在Python中可靠的处理非ASCII字符的方法？

I have a column a spreadsheet whose header contains non-ASCII characters thus:'ï»¿Campaign'If I pop this string into the interpreter, I get:'\xc3\xaf\xc2\xbb\xc2\xbfCampaign'The string is one the keys...
复制链接

扫一扫