字符串和字节字符串的区别是什么?

什么是Unicode?：

从根本上说，计算机只是处理数字。它们通过为每个字母和其他字符分配一个数字来存储它们。．．．．．． Unicode为每个字符提供了一个唯一的数字，无论什么平台，什么程序，什么语言。

So when a computer represents a string, it finds characters stored in the computer of the string through their unique Unicode number and these figures are stored in memory. But you can't directly write the string to disk or transmit the string on network through their unique Unicode number because these figures are just simple decimal number. You should encode the string to byte string, such as UTF-8. UTF-8 is a character encoding capable of encoding all possible characters and it stores characters as bytes (it looks like this). So the encoded string can be used everywhere because UTF-8 is nearly supported everywhere. When you open a text file encoded in UTF-8 from other systems, your computer will decode it and display characters in it through their unique Unicode number.

当浏览器从网络接收到编码为UTF-8的字符串数据时，它将把数据解码为字符串(假设浏览器采用UTF-8编码)并显示字符串。

在Python 3中，你可以将字符串和字节字符串相互转换:

>>> print('中文'.encode('utf-8'))
b'\xe4\xb8\xad\xe6\x96\x87'
>>> print(b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8'))
中文

总而言之，字符串是在计算机上显示给人类阅读的，字节字符串是用于存储到磁盘和数据传输的。

2017-04-23 12:52:07

什么是Unicode?：

从根本上说，计算机只是处理数字。它们通过为每个字母和其他字符分配一个数字来存储它们。．．．．．． Unicode为每个字符提供了一个唯一的数字，无论什么平台，什么程序，什么语言。

So when a computer represents a string, it finds characters stored in the computer of the string through their unique Unicode number and these figures are stored in memory. But you can't directly write the string to disk or transmit the string on network through their unique Unicode number because these figures are just simple decimal number. You should encode the string to byte string, such as UTF-8. UTF-8 is a character encoding capable of encoding all possible characters and it stores characters as bytes (it looks like this). So the encoded string can be used everywhere because UTF-8 is nearly supported everywhere. When you open a text file encoded in UTF-8 from other systems, your computer will decode it and display characters in it through their unique Unicode number.

当浏览器从网络接收到编码为UTF-8的字符串数据时，它将把数据解码为字符串(假设浏览器采用UTF-8编码)并显示字符串。

在Python 3中，你可以将字符串和字节字符串相互转换:

>>> print('中文'.encode('utf-8'))
b'\xe4\xb8\xad\xe6\x96\x87'
>>> print(b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8'))
中文

总而言之，字符串是在计算机上显示给人类阅读的，字节字符串是用于存储到磁盘和数据传输的。

2017-04-23 12:52:07

简单地说，想想我们的自然语言，如英语、孟加拉语、汉语等。在说话时，所有这些语言都发出声音。但即使我们听到了，我们能听懂所有的吗?-

答案通常是否定的。所以，如果我说我懂英语，这意味着我知道这些声音是如何被编码成一些有意义的英语单词的，我只是用同样的方式解码这些声音来理解它们。所以，其他语言也是如此。如果你知道它，你就有了那种语言的编码器-解码器包，如果你不知道它，你就没有这个。

数字系统也是如此。就像我们自己一样，我们只能用耳朵听声音，用嘴巴发声，计算机只能存储字节和读取字节。因此，某个应用程序知道如何读取字节并解释它们(比如要考虑多少字节才能理解任何信息)，并且以相同的方式编写，以便其其他应用程序也能理解它。但是如果没有理解(编码器-解码器)，所有写入磁盘的数据都只是字节串。

2021-07-01 04:47:18

字符串是串在一起的一堆项目。字节串是一个字节序列，比如b'\xce\xb1\xce\xac'表示“α”。字符串是一串字符，比如“α”。序列的同义词。

字节串可以直接存储在磁盘上，而字符串(字符串)不能直接存储在磁盘上。它们之间的映射是一种编码。

2021-09-29 09:29:42

注意:我将详细阐述我对Python 3的回答，因为Python 2的生命周期已经非常接近了。

Python 3

bytes由8位无符号值的序列组成，而str由表示人类语言文本字符的Unicode码位序列组成。

>>> # bytes
>>> b = b'h\x65llo'
>>> type(b)
<class 'bytes'>
>>> list(b)
[104, 101, 108, 108, 111]
>>> print(b)
b'hello'
>>>
>>> # str
>>> s = 'nai\u0308ve'
>>> type(s)
<class 'str'>
>>> list(s)
['n', 'a', 'i', '̈', 'v', 'e']
>>> print(s)
naïve

尽管bytes和str看起来工作方式相同，但它们的实例彼此不兼容，即bytes和str实例不能与>和+等操作符一起使用。此外，请记住，比较bytes和str实例是否相等，即使用==，即使它们包含完全相同的字符，也将始终计算为False。

>>> # concatenation
>>> b'hi' + b'bye' # this is possible
b'hibye'
>>> 'hi' + 'bye' # this is also possible
'hibye'
>>> b'hi' + 'bye' # this will fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes
>>> 'hi' + b'bye' # this will also fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "bytes") to str
>>>
>>> # comparison
>>> b'red' > b'blue' # this is possible
True
>>> 'red'> 'blue' # this is also possible
True
>>> b'red' > 'blue' # you can't compare bytes with str
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'bytes' and 'str'
>>> 'red' > b'blue' # you can't compare str with bytes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'str' and 'bytes'
>>> b'blue' == 'red' # equality between str and bytes always evaluates to False
False
>>> b'blue' == 'blue' # equality between str and bytes always evaluates to False
False

处理bytes和str的另一个问题是处理使用open内置函数返回的文件。一方面，如果你想从一个文件中读取或写入二进制数据，总是使用二进制模式打开文件，比如'rb'或'wb'。另一方面，如果要从文件中读取或写入Unicode数据，请注意计算机的默认编码，因此如果需要，可以传递encoding参数以避免意外。

在Python 2中

str由8位值的序列组成，而unicode由unicode字符的序列组成。需要记住的一点是，如果str仅由7位ASCI字符组成，则可以将str和unicode与操作符一起使用。

在Python 2中使用helper函数在str和unicode之间进行转换，在Python 3中使用bytes和str之间进行转换，可能会很有用。

2017-09-04 13:12:04

Unicode是一种公认的字符二进制表示格式和各种格式(例如，小写/大写、换行和回车)以及其他“东西”(例如，表情符号)的格式。无论是在内存中还是在文件中，计算机存储Unicode表示(一系列位)的能力并不亚于存储ASCII表示(不同的一系列位)或任何其他表示(一系列位)的能力。

为了进行沟通，沟通双方必须就使用何种代表达成一致。

Because Unicode seeks to represent all the possible characters (and other "things") used in inter-human and inter-computer communication, it requires a greater number of bits for the representation of many characters (or things) than other systems of representation that seek to represent a more limited set of characters/things. To "simplify," and perhaps to accommodate historical usage, Unicode representation is almost exclusively converted to some other system of representation (e.g., ASCII) for the purpose of storing characters in files.

并不是说Unicode不能用于在文件中存储字符或通过任何通信通道传输字符。很简单，事实并非如此。

The term "string," is not precisely defined. "String," in its common usage, refers to a set of characters/things. In a computer, those characters may be stored in any one of many different bit-by-bit representations. A "byte string" is a set of characters stored using a representation that uses eight bits (eight bits being referred to as a byte). Since, these days, computers use the Unicode system (characters represented by a variable number of bytes) to store characters in memory, and byte strings (characters represented by single bytes) to store characters to files, a conversion must be used before characters represented in memory will be moved into storage in files.

2019-10-03 19:09:48

字符串和字节字符串的区别是什么?

推荐文章

最新文章

标签