将字节转换为字符串

我将外部程序的标准输出捕获到一个字节对象中：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>>
>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

我想把它转换成一个普通的Python字符串，这样我就可以这样打印了：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

如何使用Python 3将字节对象转换为str？

当前回答

解码字节对象以生成字符串：

>>> b"abcde".decode("utf-8") 
'abcde'

上面的示例假设字节对象是UTF-8格式的，因为它是一种常见的编码。但是，您应该使用数据实际使用的编码！

2009-03-03 12:26:18

其他回答

解码字节对象以生成字符串：

>>> b"abcde".decode("utf-8") 
'abcde'

上面的示例假设字节对象是UTF-8格式的，因为它是一种常见的编码。但是，您应该使用数据实际使用的编码！

2009-03-03 12:26:18

由于这个问题实际上是关于子流程输出的，所以您可以使用更直接的方法。最现代的方法是使用subprocess.check_output并传递text=True（Python 3.7+），使用系统默认编码自动解码stdout：

text = subprocess.check_output(["ls", "-l"], text=True)

对于Python 3.6，Popen接受一个编码关键字：

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

如果您不处理子流程输出，标题中问题的一般答案是将字节解码为文本：

>>> b'abcde'.decode()
'abcde'

如果没有参数，将使用sys.getdefaultencoding（）。如果数据不是sys.getdefaultencoding（），则必须在decode调用中显式指定编码：

>>> b'caf\xe9'.decode('cp1250')
'café'

2018-05-31 17:52:19

尝试使用这个；此函数将忽略所有非字符集（如UTF-8）二进制文件，并返回一个干净的字符串。它针对Python 3.6及更高版本进行了测试。

def bin2str(text, encoding = 'utf-8'):
    """Converts a binary to Unicode string by removing all non Unicode char
    text: binary string to work on
    encoding: output encoding *utf-8"""

    return text.decode(encoding, 'ignore')

在这里，函数将获取二进制并对其进行解码（使用Python预定义的字符集将二进制数据转换为字符，忽略参数忽略二进制中的所有非字符集数据，并最终返回所需的字符串值）。

如果您不确定编码，请使用sys.getdefaultencoding（）获取设备的默认编码。

2021-05-18 19:07:58

使用Windows系统中的数据（以行结尾）时，我的答案是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

为什么？尝试使用多行Input.txt：

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

所有的行尾都将加倍（到\r\n），导致多余的空行。Python的文本读取函数通常规范化行结尾，以便字符串只使用\n。如果您从Windows系统接收二进制数据，Python就没有机会这样做。因此

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

将复制原始文件。

2018-03-16 13:28:25

我想你真的想要这样：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron的回答是正确的，只是你需要知道使用哪种编码。我相信Windows使用的是“Windows-1252”。只有当你的内容中有一些不寻常的（非ASCII）字符时，这才是重要的，但这会产生影响。

顺便说一句，这一点很重要，这是Python转而使用两种不同类型的二进制数据和文本数据的原因：它无法在它们之间进行神奇的转换，因为除非你告诉它，否则它不知道编码！您知道的唯一方法是阅读Windows文档（或在此处阅读）。

2011-07-18 19:51:15

将字节转换为字符串

推荐文章

最新文章

标签