简短的回答
您需要将一个类字节对象(bytes, bytearray等)推入base64.b64encode()方法。这里有两种方法:
>>> import base64
>>> data = base64.b64encode(b'data to be encoded')
>>> print(data)
b'ZGF0YSB0byBiZSBlbmNvZGVk'
或者用一个变量:
>>> import base64
>>> string = 'data to be encoded'
>>> data = base64.b64encode(string.encode())
>>> print(data)
b'ZGF0YSB0byBiZSBlbmNvZGVk'
Why?
In Python 3, str objects are not C-style character arrays (so they are not byte arrays), but rather, they are data structures that do not have any inherent encoding. You can encode that string (or interpret it) in a variety of ways. The most common (and default in Python 3) is utf-8, especially since it is backwards compatible with ASCII (although, as are most widely-used encodings). That is what is happening when you take a string and call the .encode() method on it: Python is interpreting the string in utf-8 (the default encoding) and providing you the array of bytes that it corresponds to.
Python中的Base-64编码
最初题目问的是Base-64编码。继续阅读有关Base-64的内容。
base64 encoding takes 6-bit binary chunks and encodes them using the characters A-Z, a-z, 0-9, '+', '/', and '=' (some encodings use different characters in place of '+' and '/'). This is a character encoding that is based off of the mathematical construct of radix-64 or base-64 number system, but they are very different. Base-64 in math is a number system like binary or decimal, and you do this change of radix on the entire number, or (if the radix you're converting from is a power of 2 less than 64) in chunks from right to left.
在base64编码中,转换是从左到右进行的;这前64个字符就是为什么它被称为base64编码。第65个'='符号用于填充,因为编码提取6位块,但它通常意味着编码的数据是8位字节,因此有时最后一个块中只有2位或4位。
例子:
>>> data = b'test'
>>> for byte in data:
... print(format(byte, '08b'), end=" ")
...
01110100 01100101 01110011 01110100
>>>
如果你将二进制数据解释为单个整数,那么你将如何将它转换为base-10和base-64(表为base-64):
base-2: 01 110100 011001 010111 001101 110100 (base-64 grouping shown)
base-10: 1952805748
base-64: B 0 Z X N 0
然而,Base64编码将重新分组此数据:
base-2: 011101 000110 010101 110011 011101 00(0000) <- pad w/zeros to make a clean 6-bit chunk
base-10: 29 6 21 51 29 0
base-64: d G V z d A
So, 'B0ZXN0' is the base-64 version of our binary, mathematically speaking. However, base64 encoding has to do the encoding in the opposite direction (so the raw data is converted to 'dGVzdA') and also has a rule to tell other applications how much space is left off at the end. This is done by padding the end with '=' symbols. So, the base64 encoding of this data is 'dGVzdA==', with two '=' symbols to signify two pairs of bits will need to be removed from the end when this data gets decoded to make it match the original data.
让我们来测试一下,看看我是否不诚实:
>>> encoded = base64.b64encode(data)
>>> print(encoded)
b'dGVzdA=='
为什么使用base64编码?
假设我要通过电子邮件给某人发送一些数据,比如这个数据:
>>> data = b'\x04\x6d\x73\x67\x08\x08\x08\x20\x20\x20'
>>> print(data.decode())
>>> print(data)
b'\x04msg\x08\x08\x08 '
>>>
我制造了两个问题:
If I tried to send that email in Unix, the email would send as soon as the \x04 character was read, because that is ASCII for END-OF-TRANSMISSION (Ctrl-D), so the remaining data would be left out of the transmission.
Also, while Python is smart enough to escape all of my evil control characters when I print the data directly, when that string is decoded as ASCII, you can see that the 'msg' is not there. That is because I used three BACKSPACE characters and three SPACE characters to erase the 'msg'. Thus, even if I didn't have the EOF character there the end user wouldn't be able to translate from the text on screen to the real, raw data.
这只是一个演示,向您展示简单地发送原始数据有多么困难。将数据编码为base64格式可以得到完全相同的数据,但格式可以确保通过电子媒体(如电子邮件)发送数据是安全的。