python转ascii_Python - Unicode到ASCII转换

I am unable to convert the following Unicode to ASCII without losing data:

u'ABRA\xc3O JOS\xc9'

I tried encode and decode and they won’t do it.

Does anyone have a suggestion?

解决方案

The Unicode characters u'\xce0' and u'\xc9' do not have any corresponding ASCII values. So, if you don't want to lose data, you have to encode that data in some way that's valid as ASCII. Options include:

>>> print s.encode('ascii', errors='backslashreplace')

ABRA\xc3O JOS\xc9

>>> print s.encode('ascii', errors='xmlcharrefreplace')

ABRAÃO JOSÉ

>>> print s.encode('unicode-escape')

ABRA\xc3O JOS\xc9

>>> print s.encode('punycode')

ABRAO JOS-jta5e

All of these are ASCII strings, and contain all of the information from your original Unicode string (so they can all be reversed without loss of data), but none of them are all that pretty for an end-user (and none of them can be reversed just by decode('ascii')).

As a side note, when some people say "ASCII", they really don't mean "ASCII" but rather "any 8-bit character set that's a superset of ASCII" or "some particular 8-bit character set that I have in mind". If that's what you meant, the solution is to encode to the right 8-bit character set:

>>> s.encode('utf-8')

'ABRA\xc3\x83O JOS\xc3\x89'

>>> s.encode('cp1252')

'ABRA\xc3O JOS\xc9'

>>> s.encode('iso-8859-15')

'ABRA\xc3O JOS\xc9'

The hard part is knowing which character set you meant. If you're writing both the code that produces the 8-bit strings and the code that consumes it, and you don't know any better, you meant UTF-8. If the code that consumes the 8-bit strings is, say, the open function or a web browser that you're serving a page to or something else, things are more complicated, and there's no easy answer without a lot more information.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值