java 将字节变成字符_将字节转换为字符串

最新推荐文章于 2024-07-15 02:39:50 发布

愙賗

最新推荐文章于 2024-07-15 02:39:50 发布

阅读量1.1k

点赞数

文章标签： java 将字节变成字符

本文链接：https://blog.csdn.net/weixin_30783611/article/details/114714695

版权

我正在使用以下代码从外部程序获取标准输出：

>>> from subprocess import *

>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communication()方法返回一个字节数组：

>>> command_stdout

b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'

但是，我想将输出作为普通的Python字符串使用。这样我就可以像这样打印它：

>>> print(command_stdout)

-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1

-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2

我以为那是binascii.b2a_qp()方法的用途，但是当我尝试它时，我又得到了相同的字节数组：

>>> binascii.b2a_qp(command_stdout)

b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'

如何将字节值转换回字符串？我的意思是，使用“电池”而不是手动进行操作。我希望它与Python 3兼容。

#1楼

我认为这种方式很简单：

bytes_data = [112, 52, 52]

"".join(map(chr, bytes_data))

>> p44

#2楼

要从标准流写入二进制数据或从标准流读取二进制数据，请使用基础二进制缓冲区。例如，要将字节写入stdout，请使用sys.stdout.buffer.write(b'abc') 。

#3楼

将Universal_newlines设置为True，即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

#4楼

如果您不知道编码，则要以Python 3和Python 2兼容的方式将二进制输入读取为字符串，请使用古老的MS-DOS CP437编码：

PY3K = sys.version_info >= (3, 0)

lines = []

for line in stream:

if not PY3K:

lines.append(line)

else:

lines.append(line.decode('cp437'))

因为编码是未知的，所以希望将非英文符号转换为cp437字符(不翻译英文字符，因为它们在大多数单字节编码和UTF-8中都匹配)。

将任意二进制输入解码为UTF-8是不安全的，因为您可能会得到以下信息：

>>> b'\x00\x01\xffsd'.decode('utf-8')

Traceback (most recent call last):

File "", line 1, in

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid

start byte

同样适用于latin-1 ，它在Python 2中很流行(默认值？)。请参见“ 代码页布局”中的遗漏点-这是Python臭名昭著的ordinal not in range 。

UPDATE 20150604 ：有传言称Python 3具有surrogateescape错误策略，可将东西编码为二进制数据而不会导致数据丢失和崩溃，但它需要转换测试[binary] -> [str] -> [binary]来验证这两种性能和可靠性。

更新20170116 ：感谢Nearoo的评论-还可以使用backslashreplace替换错误处理程序对所有未知字节进行斜杠转义。这仅适用于Python 3，因此即使采用这种解决方法，您仍然会从不同的Python版本获得不一致的输出：

PY3K = sys.version_info >= (3, 0)

lines = []

for line in stream:

if not PY3K:

lines.append(line)

else:

lines.append(line.decode('utf-8', 'backslashreplace'))

更新20170119 ：我决定实现适用于Python 2和Python 3的斜线转义解码。它应该比cp437解决方案要慢，但是在每个Python版本上它都应产生相同的结果。

# --- preparation

import codecs

def slashescape(err):

""" codecs error handler. err is UnicodeDecode instance. return

a tuple with a replacement for the unencodable part of the input

and a position where encoding should continue"""

#print err, dir(err), err.start, err.end, err.object[:err.start]

thebyte = err.object[err.start:err.end]

repl = u'\\x'+hex(ord(thebyte))[2:]

return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []

for line in stream:

lines.append(line.decode('utf-8', 'slashescape'))

#5楼

虽然@Aaron Maenpaa的答案有效，但最近有用户问：

有没有更简单的方法？ 'fhand.read()。decode(“ ASCII”)'[...]太长了！

您可以使用：

command_stdout.decode()

decode()有一个标准参数：

codecs.decode(obj, encoding='utf-8', errors='strict')

愙賗

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java 将字节变成字符_将字节转换为字符串

我正在使用以下代码从外部程序获取标准输出：>>> from subprocess import *>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]communication()方法返回一个字节数组：>>> command_stdoutb'total 0\...
复制链接

扫一扫