为什么“bytes(n)”创建长度为n的字节字符串，而不是将n转换为二进制表示?

我试图在Python 3中构建这个bytes对象:

b'3\r\n'

所以我尝试了显而易见的(对我来说)，发现了一个奇怪的行为:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

显然:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

在阅读文档时，我无法看到字节转换为什么以这种方式工作的任何指针。然而，我确实在这个Python问题中发现了一些关于向字节添加格式的令人惊讶的消息(另见Python 3字节格式化):

http://bugs.python.org/issue3982

这与bytes(int)返回零这样的奇怪情况的交互更加糟糕

and:

如果字节(int)返回该int的ASCIIfication，对我来说会更方便;但说实话，即使是一个错误也比这种行为要好。(如果我想要这种行为——我从来没有——我宁愿它是一个类方法，像“bytes.zero (n)”那样调用。)

谁能给我解释一下这种行为是怎么来的?

当前回答

>>> chr(116).encode()
b't'

2022-08-03 11:54:19

其他回答

从python 3.2开始，你可以使用to_bytes:

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

因此，x == int_from_bytes(int_to_bytes(x))。注意，上述编码仅适用于无符号(非负)整数。

对于有符号整数，比特长度的计算有点棘手:

def int_to_bytes(number: int) -> bytes:
    return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)

def int_from_bytes(binary_data: bytes) -> Optional[int]:
    return int.from_bytes(binary_data, byteorder='big', signed=True)

2015-05-21 13:28:45

我对范围为[0,255]的单个int的各种方法的性能很好奇，所以我决定做一些定时测试。

根据下面的时间，以及我从尝试许多不同的值和配置中观察到的总体趋势，struct。Pack似乎是最快的，其次是int。To_bytes、bytes和str.encode(不出意外)是最慢的。注意，结果显示了比所表示的更多的变化，并且int。To_bytes和bytes在测试过程中有时会切换速度排名，但是struct。Pack显然是最快的。

在Windows上的CPython 3.7中的结果:

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

测试模块(命名为int_to_byte.py):

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921
    
    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

2019-01-03 18:37:25

>>> chr(116).encode()
b't'

2022-08-03 11:54:19

文件说:

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

序列:

b'3\r\n'

它是字符“3”(十进制51)、字符“\r”(13)和“\n”(10)。

因此，方法会这样对待它，例如:

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

在IPython 1.1.0和Python 3.2.3上测试

2014-01-09 13:15:12

3的ASCIIfication是“\x33”而不是“\x03”!

这就是python对str(3)所做的，但对字节来说是完全错误的，因为它们应该被认为是二进制数据的数组，而不应该被滥用为字符串。

实现您想要的最简单的方法是bytes((3，))，这比bytes([3])更好，因为初始化列表的代价要高得多，所以当您可以使用元组时，永远不要使用列表。可以使用int转换更大的整数。“小”,to_bytes(3)。

初始化具有给定长度的字节是有意义的，也是最有用的，因为它们通常用于创建某种类型的缓冲区，为此需要分配一定大小的内存。我经常在初始化数组或通过写入零来扩展某个文件时使用这个方法。

2015-08-01 10:40:11

为什么“bytes(n)”创建长度为n的字节字符串，而不是将n转换为二进制表示?

推荐文章

最新文章

标签