I'm trying to find a generic solution to print unicode strings from a python script.
The requirements are that it must run in both python 2.7 and 3.x, on any platform, and with any terminal settings and environment variables (e.g. LANG=C or LANG=en_US.UTF-8).
The python print function automatically tries to encode to the terminal encoding when printing, but if the terminal encoding is ascii it fails.
For example, the following works when the environment "LANG=enUS.UTF-8":
x = u'\xea'
print(x)
But it fails in python 2.7 when "LANG=C":
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128)
The following works regardless of the LANG setting, but would not properly show unicode characters if the terminal was using a different unicode encoding:
print(x.encode('utf-8'))
The desired behavior would be to always show unicode in the terminal if it is possible and show some encoding if the terminal does not support unicode. For example, the output would be UTF-8 encoded if the terminal only supported ascii. Basically, the goal is to do the same thing as the python print function when it works, but in the cases where the print function fails, use some default encoding.
解决方案
You can handle the LANG=C case by telling sys.stdout to default to UTF-8 in cases when it would otherwise default to ASCII.
import sys, codecs
if sys.stdout.encoding is None or sys.stdout.encoding == 'ANSI_X3.4-1968':
utf8_writer = codecs.getwriter('UTF-8')
if sys.version_info.major < 3:
sys.stdout = utf8_writer(sys.stdout, errors='replace')
else:
sys.stdout = utf8_writer(sys.stdout.buffer, errors='replace')
print(u'\N{snowman}')
The above snippet fulfills your requirements: it works in Python 2.7 and 3.4, and it doesn't break when LANG is in a non-UTF-8 setting such as C.
It is not a new technique, but it's surprisingly hard to find in the documentation. As presented above, it actually respects non-UTF-8 settings such as ISO 8859-*. It only defaults to UTF-8 if Python would have bogusly defaulted to ASCII, breaking the application.