Hi,
Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?
"é" is not present in the 7-bit ASCII table that is the default
encoding, right ? So is the mapping "é" -"\xe9" portable ?
(site-)configuration dependent ? Can anyone have something
different of "é" when ''print "\xe9"'' is executed ? If the process
is config-dependent, what kind of config info is used ?
Regards,
SB
解决方案
Sébastien Boisgérault schrieb:
Hi,
Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?
"é" is not present in the 7-bit ASCII table that is the default
encoding, right ? So is the mapping "é" -"\xe9" portable ?
(site-)configuration dependent ? Can anyone have something
different of "é" when ''print "\xe9"'' is executed ? If the process
is config-dependent, what kind of config info is used ?
The default encoding has nothing to do with this. "\xe9" is just a byte.
You can write it into a file (which the terminal is basically), and no
default encoding whatsoever in the mix.
The default-encoding comes into play when you write unicode(!) strings
to a file. Then the unicode string is converted to a byte string using
the default-eocoding. Which will fail miserably if the default encoding
is ascii (as it is supposed to be) and your unicode string contains any
"funny" characters.
But even if you encode the unicode string explicitely with an encoding
like latin1 or utf-8, the resulting byte strings will just be written to
the file. And it is a totally different question (and actually not
controllable by you/python) if the terminal will interpret the bytes
correct or not.
Diez
Sébastien Boisgérault wrote:
Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?
in the iso-8859-1 character set, the character é is represented by the code
0xE9 (233 in decimal). there''s no mapping going on here; there''s only one
character in the string. how it appears on your screen depends on how you
print it, and what encoding your terminal is using.
>>s = "é"
len(s)
1
>>ord(s)
233
>>hex(ord(s))
''0xe9''
>>s
''\xe9''
>>print repr(s)
''\xe9''
>>print s
é
>>print chr(233)
é
Fredrik Lundh wrote:
in the iso-8859-1 character set, the character é is represented by the code
0xE9 (233 in decimal). there''s no mapping going on here; there''s only one
character in the string. how it appears on your screen depends on how you
print it, and what encoding your terminal is using.
Crystal clear. Thanks !
SB