python3指定编码_Python 3:如何指定stdin编码

While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:

...

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:

for line in file:

...

But how can I specify the encoding for standard input? Other SO posts have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin)

for line in input_stream:

...

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

解决方案

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io

import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值