在我重新发明这个特殊的轮子之前,有没有人有一个很好的用Python计算目录大小的例程?如果该例程能以Mb/Gb等格式格式化大小,那就太好了。


当前回答

问题的第二部分

def human(size):

    B = "B"
    KB = "KB" 
    MB = "MB"
    GB = "GB"
    TB = "TB"
    UNITS = [B, KB, MB, GB, TB]
    HUMANFMT = "%f %s"
    HUMANRADIX = 1024.

    for u in UNITS[:-1]:
        if size < HUMANRADIX : return HUMANFMT % (size, u)
        size /= HUMANRADIX

    return HUMANFMT % (size,  UNITS[-1])

其他回答

当计算子目录的大小时,它应该更新其父目录的文件夹大小,这将一直进行下去,直到它到达根父目录。

下面的函数计算文件夹及其所有子文件夹的大小。

import os

def folder_size(path):
    parent = {}  # path to parent path mapper
    folder_size = {}  # storing the size of directories
    folder = os.path.realpath(path)

    for root, _, filenames in os.walk(folder):
        if root == folder:
            parent[root] = -1  # the root folder will not have any parent
            folder_size[root] = 0.0  # intializing the size to 0

        elif root not in parent:
            immediate_parent_path = os.path.dirname(root)  # extract the immediate parent of the subdirectory
            parent[root] = immediate_parent_path  # store the parent of the subdirectory
            folder_size[root] = 0.0  # initialize the size to 0

        total_size = 0
        for filename in filenames:
            filepath = os.path.join(root, filename)
            total_size += os.stat(filepath).st_size  # computing the size of the files under the directory
        folder_size[root] = total_size  # store the updated size

        temp_path = root  # for subdirectories, we need to update the size of the parent till the root parent
        while parent[temp_path] != -1:
            folder_size[parent[temp_path]] += total_size
            temp_path = parent[temp_path]

    return folder_size[folder]/1000000.0

使用pathlib,我想出了这个一行程序来获取文件夹的大小:

sum(file.stat().st_size for file in Path(folder).rglob('*'))

这是我为一个漂亮的格式化输出:

from pathlib import Path


def get_folder_size(folder):
    return ByteSize(sum(file.stat().st_size for file in Path(folder).rglob('*')))


class ByteSize(int):

    _KB = 1024
    _suffixes = 'B', 'KB', 'MB', 'GB', 'PB'

    def __new__(cls, *args, **kwargs):
        return super().__new__(cls, *args, **kwargs)

    def __init__(self, *args, **kwargs):
        self.bytes = self.B = int(self)
        self.kilobytes = self.KB = self / self._KB**1
        self.megabytes = self.MB = self / self._KB**2
        self.gigabytes = self.GB = self / self._KB**3
        self.petabytes = self.PB = self / self._KB**4
        *suffixes, last = self._suffixes
        suffix = next((
            suffix
            for suffix in suffixes
            if 1 < getattr(self, suffix) < self._KB
        ), last)
        self.readable = suffix, getattr(self, suffix)

        super().__init__()

    def __str__(self):
        return self.__format__('.2f')

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, super().__repr__())

    def __format__(self, format_spec):
        suffix, val = self.readable
        return '{val:{fmt}} {suf}'.format(val=val, fmt=format_spec, suf=suffix)

    def __sub__(self, other):
        return self.__class__(super().__sub__(other))

    def __add__(self, other):
        return self.__class__(super().__add__(other))
    
    def __mul__(self, other):
        return self.__class__(super().__mul__(other))

    def __rsub__(self, other):
        return self.__class__(super().__sub__(other))

    def __radd__(self, other):
        return self.__class__(super().__add__(other))
    
    def __rmul__(self, other):
        return self.__class__(super().__rmul__(other))   

用法:

>>> size = get_folder_size("c:/users/tdavis/downloads")
>>> print(size)
5.81 GB
>>> size.GB
5.810891855508089
>>> size.gigabytes
5.810891855508089
>>> size.PB
0.005674699077644618
>>> size.MB
5950.353260040283
>>> size
ByteSize(6239397620)

我还遇到了这个问题,它有一些更紧凑、可能更高效的打印文件大小的策略。

这有点晚了,但只要安装了glob2和humanize,就行了。注意,在Python 3中,默认的iglob具有递归模式。如何修改Python 3的代码是留给读者的简单练习。

>>> import os
>>> from humanize import naturalsize
>>> from glob2 import iglob
>>> naturalsize(sum(os.path.getsize(x) for x in iglob('/var/**'))))
'546.2 MB'

问题的第二部分

def human(size):

    B = "B"
    KB = "KB" 
    MB = "MB"
    GB = "GB"
    TB = "TB"
    UNITS = [B, KB, MB, GB, TB]
    HUMANFMT = "%f %s"
    HUMANRADIX = 1024.

    for u in UNITS[:-1]:
        if size < HUMANRADIX : return HUMANFMT % (size, u)
        size /= HUMANRADIX

    return HUMANFMT % (size,  UNITS[-1])

Python 3.6+递归文件夹/文件大小使用os.scandir。和@blakev的回答一样强大,但更短,采用EAFP python风格。

import os

def size(path, *, follow_symlinks=False):
    try:
        with os.scandir(path) as it:
            return sum(size(entry, follow_symlinks=follow_symlinks) for entry in it)
    except NotADirectoryError:
        return os.stat(path, follow_symlinks=follow_symlinks).st_size