python转ascii_Python - Unicode到ASCII转换

最新推荐文章于 2025-03-12 13:57:30 发布

weixin_39606177

最新推荐文章于 2025-03-12 13:57:30 发布

阅读量266

点赞数

文章标签： python转ascii

I am unable to convert the following Unicode to ASCII without losing data:

u'ABRA\xc3O JOS\xc9'

I tried encode and decode and they won’t do it.

Does anyone have a suggestion?

解决方案

The Unicode characters u'\xce0' and u'\xc9' do not have any corresponding ASCII values. So, if you don't want to lose data, you have to encode that data in some way that's valid as ASCII. Options include:

>>> print s.encode('ascii', errors='backslashreplace')

ABRA\xc3O JOS\xc9

>>> print s.encode('ascii', errors='xmlcharrefreplace')

ABRAÃO JOSÉ

>>> print s.encode('unicode-escape')

ABRA\xc3O JOS\xc9

>>> print s.encode('punycode')

ABRAO JOS-jta5e

All of these are ASCII strings, and contain all of the information from your original Unicode string (so they can all be reversed without loss of data), but none of them are all that pretty for an end-user (and none of them can be reversed just by decode('ascii')).

As a side note, when some people say "ASCII", they really don't mean "ASCII" but rather "any 8-bit character set that's a superset of ASCII" or "some particular 8-bit character set that I have in mind". If that's what you meant, the solution is to encode to the right 8-bit character set:

>>> s.encode('utf-8')

'ABRA\xc3\x83O JOS\xc3\x89'

>>> s.encode('cp1252')

'ABRA\xc3O JOS\xc9'

>>> s.encode('iso-8859-15')

'ABRA\xc3O JOS\xc9'

The hard part is knowing which character set you meant. If you're writing both the code that produces the 8-bit strings and the code that consumes it, and you don't know any better, you meant UTF-8. If the code that consumes the 8-bit strings is, say, the open function or a web browser that you're serving a page to or something else, things are more complicated, and there's no easy answer without a lot more information.