生成文件的MD5校验和

有没有什么简单的方法可以在Python中生成(并检查)文件列表的MD5校验和?(我有一个小程序，我正在工作，我想确认文件的校验和)。

当前回答

在Python 3.11+中，有一个新的可读且内存高效的方法:

import hashlib
with open(path, "rb") as f:
    digest = hashlib.file_digest(f, "md5")
print(digest.hexdigest())

2022-11-01 14:49:23

其他回答

将file_path更改为您的文件

import hashlib
def getMd5(file_path):
    m = hashlib.md5()
    with open(file_path,'rb') as f:
        lines = f.read()
        m.update(lines)
    md5code = m.hexdigest()
    return md5code

2021-02-19 07:26:24

你可以使用simple-file-checksum1，它只使用subprocess来调用macOS/Linux的openssl和Windows的CertUtil，并只从输出中提取摘要:

安装:

pip install simple-file-checksum

用法:

>>> from simple_file_checksum import get_checksum
>>> get_checksum("path/to/file.txt")
'9e107d9d372bb6826bd81d3542a419d6'
>>> get_checksum("path/to/file.txt", algorithm="MD5")
'9e107d9d372bb6826bd81d3542a419d6'

支持SHA1、SHA256、SHA384、SHA512四种算法。

披露:我是simple-file-checksum的作者。

2022-07-30 15:15:32

hashlib.md5(pathlib.Path('path/to/file').read_bytes()).hexdigest()

2019-04-24 13:43:14

在Python 3.11+中，有一个新的可读且内存高效的方法:

import hashlib
with open(path, "rb") as f:
    digest = hashlib.file_digest(f, "md5")
print(digest.hexdigest())

2022-11-01 14:49:23

有一种方法内存效率很低。

单文件:

import hashlib
def file_as_bytes(file):
    with file:
        return file.read()

print hashlib.md5(file_as_bytes(open(full_path, 'rb'))).hexdigest()

文件列表:

[(fname, hashlib.md5(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

但是，请记住，MD5是已知的坏的，不应该用于任何目的，因为漏洞分析可能真的很棘手，并且分析您的代码可能用于安全问题的任何可能的未来用途是不可能的。恕我直言，它应该从库中删除，这样每个使用它的人都必须更新。所以，你应该这样做:

[(fname, hashlib.sha256(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

如果你只想要128位的摘要，你可以使用.digest()[:16]。

这将给你一个元组列表，每个元组包含它的文件名和它的散列。

Again I strongly question your use of MD5. You should be at least using SHA1, and given recent flaws discovered in SHA1, probably not even that. Some people think that as long as you're not using MD5 for 'cryptographic' purposes, you're fine. But stuff has a tendency to end up being broader in scope than you initially expect, and your casual vulnerability analysis may prove completely flawed. It's best to just get in the habit of using the right algorithm out of the gate. It's just typing a different bunch of letters is all. It's not that hard.

这里有一个更复杂的方法，但内存效率高:

import hashlib

def hash_bytestr_iter(bytesiter, hasher, ashexstr=False):
    for block in bytesiter:
        hasher.update(block)
    return hasher.hexdigest() if ashexstr else hasher.digest()

def file_as_blockiter(afile, blocksize=65536):
    with afile:
        block = afile.read(blocksize)
        while len(block) > 0:
            yield block
            block = afile.read(blocksize)


[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.md5()))
    for fname in fnamelst]

并且，再一次，由于MD5是坏的，不应该再使用了:

[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256()))
    for fname in fnamelst]

同样，如果你只想要128位的摘要，你可以把[:16]放在hash_bytestr_iter(…)调用之后。

2010-08-07 19:53:25

生成文件的MD5校验和

推荐文章

最新文章

标签