字符串和字节字符串的区别是什么?

我正在使用一个返回“字节字符串”(字节)的库，我需要将其转换为字符串。

这两者之间真的有区别吗?它们是如何关联的，我该如何进行转换?

当前回答

Assuming Python 3 (in Python 2, this difference is a little less well-defined) - a string is a sequence of characters, ie unicode codepoints; these are an abstract concept, and can't be directly stored on disk. A byte string is a sequence of, unsurprisingly, bytes - things that can be stored on disk. The mapping between them is an encoding - there are quite a lot of these (and infinitely many are possible) - and you need to know which applies in the particular case in order to do the conversion, since a different encoding may map the same bytes to a different string:

>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-16')
'蓏콯캁澽苏'
>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8')
'τoρνoς'

一旦知道使用哪一个，就可以使用字节字符串的.decode()方法从它获得正确的字符串，如上所述。为了完整起见，字符串的.encode()方法采用相反的方式:

>>> 'τoρνoς'.encode('utf-8')
b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'

2011-06-03 07:49:39

其他回答

Unicode是一种公认的字符二进制表示格式和各种格式(例如，小写/大写、换行和回车)以及其他“东西”(例如，表情符号)的格式。无论是在内存中还是在文件中，计算机存储Unicode表示(一系列位)的能力并不亚于存储ASCII表示(不同的一系列位)或任何其他表示(一系列位)的能力。

为了进行沟通，沟通双方必须就使用何种代表达成一致。

Because Unicode seeks to represent all the possible characters (and other "things") used in inter-human and inter-computer communication, it requires a greater number of bits for the representation of many characters (or things) than other systems of representation that seek to represent a more limited set of characters/things. To "simplify," and perhaps to accommodate historical usage, Unicode representation is almost exclusively converted to some other system of representation (e.g., ASCII) for the purpose of storing characters in files.

并不是说Unicode不能用于在文件中存储字符或通过任何通信通道传输字符。很简单，事实并非如此。

The term "string," is not precisely defined. "String," in its common usage, refers to a set of characters/things. In a computer, those characters may be stored in any one of many different bit-by-bit representations. A "byte string" is a set of characters stored using a representation that uses eight bits (eight bits being referred to as a byte). Since, these days, computers use the Unicode system (characters represented by a variable number of bytes) to store characters in memory, and byte strings (characters represented by single bytes) to store characters to files, a conversion must be used before characters represented in memory will be moved into storage in files.

2019-10-03 19:09:48

>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-16')
'蓏콯캁澽苏'
>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8')
'τoρνoς'

一旦知道使用哪一个，就可以使用字节字符串的.decode()方法从它获得正确的字符串，如上所述。为了完整起见，字符串的.encode()方法采用相反的方式:

>>> 'τoρνoς'.encode('utf-8')
b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'

2011-06-03 07:49:39

简单地说，想想我们的自然语言，如英语、孟加拉语、汉语等。在说话时，所有这些语言都发出声音。但即使我们听到了，我们能听懂所有的吗?-

答案通常是否定的。所以，如果我说我懂英语，这意味着我知道这些声音是如何被编码成一些有意义的英语单词的，我只是用同样的方式解码这些声音来理解它们。所以，其他语言也是如此。如果你知道它，你就有了那种语言的编码器-解码器包，如果你不知道它，你就没有这个。

数字系统也是如此。就像我们自己一样，我们只能用耳朵听声音，用嘴巴发声，计算机只能存储字节和读取字节。因此，某个应用程序知道如何读取字节并解释它们(比如要考虑多少字节才能理解任何信息)，并且以相同的方式编写，以便其其他应用程序也能理解它。但是如果没有理解(编码器-解码器)，所有写入磁盘的数据都只是字节串。

2021-07-01 04:47:18

让我们有一个简单的单字符字符串'š'，并将其编码成一个字节序列:

>>> 'š'.encode('utf-8')
b'\xc5\xa1'

为了本例的目的，让我们以二进制形式显示字节序列:

>>> bin(int(b'\xc5\xa1'.hex(), 16))
'0b1100010110100001'

现在，如果不知道信息是如何编码的，通常是不可能解码回信息的。只有当你知道使用了UTF-8文本编码时，你才能按照解码UTF-8的算法获得原始字符串:

11000101 10100001
   ^^^^^   ^^^^^^
   00101   100001

您可以将二进制数字101100001显示为字符串:

>>> chr(int('101100001', 2))
'š'

2020-04-28 13:06:54

计算机唯一能存储的东西就是字节。

要在计算机中存储任何东西，首先必须对其进行编码，即将其转换为字节。例如:

如果你想要存储音乐，你必须首先使用MP3、WAV等对其进行编码。如果你想要存储一张图片，你必须先用PNG、JPEG等对它进行编码。如果想要存储文本，首先必须使用ASCII、UTF-8等对其进行编码。

MP3、WAV、PNG、JPEG、ASCII和UTF-8都是编码的例子。编码是一种以字节表示音频、图像、文本等的格式。

在Python中，一个字节字符串就是:一个字节序列。它不是人类可读的。在底层，所有内容都必须转换为字节字符串，然后才能存储在计算机中。

另一方面，字符串，通常称为“字符串”，是一个字符序列。它是人类可读的。字符串不能直接存储在计算机中，它必须先进行编码(转换为字节字符串)。有多种编码可以将字符串转换为字节字符串，例如ASCII和UTF-8。

'I am a string'.encode('ASCII')

上面的Python代码将使用编码ASCII对字符串'I am a string'进行编码。上述代码的结果将是一个字节字符串。如果你打印它，Python会将它表示为b' i am a string'。然而，请记住，字节字符串不是人类可读的，只是Python在打印它们时将它们从ASCII解码。在Python中，字节字符串由b表示，后面跟着字节字符串的ASCII表示形式。

如果您知道用于编码字节字符串的编码，则可以将字节字符串解码回字符串。

b'I am a string'.decode('ASCII')

上面的代码将返回原始字符串'I am a string'。

编码和解码是反向操作。所有内容都必须在写入磁盘之前进行编码，并且必须在人类读取之前进行解码。

2015-07-09 15:46:40

字符串和字节字符串的区别是什么?

推荐文章

最新文章

标签