将字节转换为字符串

我将外部程序的标准输出捕获到一个字节对象中：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>>
>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

我想把它转换成一个普通的Python字符串，这样我就可以这样打印了：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

如何使用Python 3将字节对象转换为str？

当前回答

对于“运行shell命令并将其输出作为文本而不是字节”的特定情况，在Python 3.7上，应该使用subprocess.run并传入text=True（以及capture_output=True来捕获输出）

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

文本过去被称为universal_newlines，在Python 3.7中被更改（嗯，别名）。如果希望支持3.7之前的Python版本，请传入universal_newlines=True而不是text=True

2019-08-07 14:15:31

其他回答

在Python 3中，默认编码为“utf-8”，因此可以直接使用：

b'hello'.decode()

相当于

b'hello'.decode(encoding="utf-8")

另一方面，在Python 2中，编码默认为默认字符串编码。因此，您应该使用：

b'hello'.decode(encoding)

其中编码是所需的编码。

注意：Python 2.7中添加了对关键字参数的支持。

2016-06-29 14:21:21

尝试使用这个；此函数将忽略所有非字符集（如UTF-8）二进制文件，并返回一个干净的字符串。它针对Python 3.6及更高版本进行了测试。

def bin2str(text, encoding = 'utf-8'):
    """Converts a binary to Unicode string by removing all non Unicode char
    text: binary string to work on
    encoding: output encoding *utf-8"""

    return text.decode(encoding, 'ignore')

在这里，函数将获取二进制并对其进行解码（使用Python预定义的字符集将二进制数据转换为字符，忽略参数忽略二进制中的所有非字符集数据，并最终返回所需的字符串值）。

如果您不确定编码，请使用sys.getdefaultencoding（）获取设备的默认编码。

2021-05-18 19:07:58

字节

m=b'This is bytes'

转换为字符串

方法1

m.decode("utf-8")

m.decode()

方法2

import codecs
codecs.decode(m,encoding="utf-8")

import codecs
codecs.decode(m)

方法3

str(m,encoding="utf-8")

str(m)[2:-1]

后果

'This is bytes'

2022-06-21 13:18:28

要将字节序列解释为文本，您必须知道对应字符编码：

unicode_text = bytestring.decode(character_encoding)

例子：

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls命令可能产生无法解释为文本的输出。文件名在Unix上可以是除斜杠b'/'和零之外的任何字节序列b“\0”：

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

尝试使用utf-8编码解码此类字节汤会引发UnicodeDecodeError。

可能会更糟。解码可能会无声地失败并产生mojibake如果使用错误的不兼容编码：

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

数据已损坏，但您的程序仍然没有意识到故障已发生。

通常，要使用的字符编码不会嵌入字节序列本身。你必须在乐队外传达这些信息。某些结果比其他结果更有可能，因此存在可以猜测字符编码的chardet模块。一个Python脚本可以在不同的地方使用多个字符编码。

ls输出可以使用os.fsdecode（）转换为Python字符串即使对于不可编码的函数也会成功文件名（它使用上的sys.getfilesystemencoding（）和surrogateescape错误处理程序Unix）：

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

要获取原始字节，可以使用os.fencoder（）。

如果传递universal_newlines=True参数，则子流程使用locale.getpreferredencoding（False）以解码字节，例如，它可以是Windows上的cp1252。

为了动态解码字节流，io.TextIOWrapper（）例如。

不同的命令可能使用不同的字符编码例如dir内部命令（cmd）的输出可以使用cp437。解码其输出，可以显式传递编码（Python 3.6+）：

output = subprocess.check_output('dir', shell=True, encoding='cp437')

文件名可能不同于os.listdir（）（它使用WindowsUnicode API），例如，“\xb6”可以替换为“\x14”-Python的cp437编解码器将b'\x14'映射到控制字符U+0014，而不是U+00B6（¶）。要支持具有任意Unicode字符的文件名，请参阅将可能包含非ASCII Unicode字符的PowerShell输出解码为Python字符串

2016-11-16 09:43:26

这将字节列表合并为字符串：

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

2012-08-22 12:57:08

将字节转换为字符串

推荐文章

最新文章

标签