python 输出ascii,如何在Windows的Python中打印ASCII和其他可用的特殊字符

I would like to print an ê in Python for windows. When I am at the DOS prompt I can type alt+136 to get an ê, however when I try to do this in python for DOS (code page cp437 or after chcp 1252 to cp1252) I can't type alt+136 to get the ê character. Why is this?

print(chr(136)) correctly prints ê under code page cp437, but how can I open a unicode file with these characters:

Sokal’, L’vivs’ka Oblastâ€

BucureÅŸti, Romania

ง'⌣'

and get it to print those characters instead of the below gobbledygook:

>>> import codecs

>>> f = codecs.open("unicode.txt", "r", "utf-8")

>>> f.read()

u"Sokal\xe2\u20ac\u2122, L\xe2\u20ac\u2122vivs\xe2\u20ac\u2122ka Oblast\xe2\u20ac\nBucure\xc5\u0178ti, Romania\n\xe0\xb8\u2021'\

xe2\u0152\xa3'\nThis text should be in \xe2\u20ac\u0153quotes\xe2\u20ac\\x9d.\nBroken text… it’s ?ubberi?c!"

or even worse:

>>> f = codecs.open("unicode.txt", "r", "utf-8")

>>> print(f.read())

Traceback (most recent call last):

File "", line 1, in

File "C:\Python27\lib\encodings\cp437.py", line 12, in encode

return codecs.charmap_encode(input,errors,encoding_map)

UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to

The following

import codecs

f = codecs.open("unicode.txt", "r", "utf-8")

s = f.read()

print(s.encode('utf8'))

prints

Sokal’, L’vivs’ka Oblastâ€

BucureÅŸti, Romania

ง'⌣'

This text should be in “quotesâ€\x9d.

Broken text… it’s ?ubberi?c!

instead of

Sokal’, L’vivs’ka Oblastâ€

BucureÅŸti, Romania

ง'⌣'

I'm using:

Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32

Is there some way of replacing the ê, etc. in the unicode string to rather be the printable ascii version of ê aka chr(136)?

Note that my question relates to how I can create a new non-Unicode extended ascii string based on the original UTF-8 unicode that will change the non-printable characters to characters in the ascii code page if there are equivalent characters available, or to replace the character with a ? or something similar if an equivalent is available.

解决方案

I see multiple questions, you've stumbled upon several common Unicode issues:

>>> u'ê'.encode('cp437')

b'\x88'

>>> int('88', 16)

136

>>> u'ê'.encode('cp1252')

b'\xea'

>>> int('ea', 16)

234

why do you get mojibake if you read text from a file? -- don't print bytes from the file, convert to Unicode first: io.open('unicode.txt', encoding=encoding).read()

why does Python console display u'\u20ac' instead of €? And in reverse, how to display ê Unicode character using only ascii printable characters e.g., u'\xea'? -- Python REPL uses sys.displayhook() (customizable) function to display the result of Python expression. It calls repr() e.g.:

>>> print u'ê'

ê

>>> print repr(u'ê')

u'\xea'

>>> u'ê'

u'\xea'

u'\xea' is a text representation of the corresponding Unicode string. You can use it as a Unicode string literal, to create the string in Python source code.

It might not be necessary in your case but in general to input/display arbitrary Unicode characters in Windows console, you could install win-unicode-console package.

Unrelated: print(chr(136)) is incorrect. It will produce wrong output if the environment uses an incompatible to yours character encoding e.g.:

>>> print chr(136)

Print Unicode instead:

>>> print unichr(234)

ê

The reason is that chr() returns a bytestring on Python 2. The same byte may represent different characters in different character encodings that is why you should always use Unicode if you work with text.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值