python保留字符串之外的,128范围之外的Python字符串

Hi,

Could anyone explain me how the python string "é" is mapped to

the binary code "\xe9" in my python interpreter ?

"é" is not present in the 7-bit ASCII table that is the default

encoding, right ? So is the mapping "é" -"\xe9" portable ?

(site-)configuration dependent ? Can anyone have something

different of "é" when ''print "\xe9"'' is executed ? If the process

is config-dependent, what kind of config info is used ?

Regards,

SB

解决方案

Sébastien Boisgérault schrieb:

Hi,

Could anyone explain me how the python string "é" is mapped to

the binary code "\xe9" in my python interpreter ?

"é" is not present in the 7-bit ASCII table that is the default

encoding, right ? So is the mapping "é" -"\xe9" portable ?

(site-)configuration dependent ? Can anyone have something

different of "é" when ''print "\xe9"'' is executed ? If the process

is config-dependent, what kind of config info is used ?

The default encoding has nothing to do with this. "\xe9" is just a byte.

You can write it into a file (which the terminal is basically), and no

default encoding whatsoever in the mix.

The default-encoding comes into play when you write unicode(!) strings

to a file. Then the unicode string is converted to a byte string using

the default-eocoding. Which will fail miserably if the default encoding

is ascii (as it is supposed to be) and your unicode string contains any

"funny" characters.

But even if you encode the unicode string explicitely with an encoding

like latin1 or utf-8, the resulting byte strings will just be written to

the file. And it is a totally different question (and actually not

controllable by you/python) if the terminal will interpret the bytes

correct or not.

Diez

Sébastien Boisgérault wrote:

Could anyone explain me how the python string "é" is mapped to

the binary code "\xe9" in my python interpreter ?

in the iso-8859-1 character set, the character é is represented by the code

0xE9 (233 in decimal). there''s no mapping going on here; there''s only one

character in the string. how it appears on your screen depends on how you

print it, and what encoding your terminal is using.

>>s = "é"

len(s)

1

>>ord(s)

233

>>hex(ord(s))

''0xe9''

>>s

''\xe9''

>>print repr(s)

''\xe9''

>>print s

é

>>print chr(233)

é

Fredrik Lundh wrote:

in the iso-8859-1 character set, the character é is represented by the code

0xE9 (233 in decimal). there''s no mapping going on here; there''s only one

character in the string. how it appears on your screen depends on how you

print it, and what encoding your terminal is using.

Crystal clear. Thanks !

SB

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值