我最近对算法产生了兴趣,并开始通过编写简单的实现来探索算法,然后以各种方式优化它。

我已经熟悉了用于分析运行时的标准Python模块(对于大多数事情,我发现IPython中的timeit魔术函数就足够了),但我也对内存使用感兴趣,所以我也可以探索这些权衡(例如,缓存以前计算值的表的成本与根据需要重新计算它们的成本)。是否有一个模块,将为我分析给定函数的内存使用情况?


当前回答

披露:

仅适用于Linux 报告当前进程作为一个整体使用的内存,而不是其中的各个函数

但它的优点在于简单:

import resource
def using(point=""):
    usage=resource.getrusage(resource.RUSAGE_SELF)
    return '''%s: usertime=%s systime=%s mem=%s mb
           '''%(point,usage[0],usage[1],
                usage[2]/1024.0 )

只要插入using("Label")你想看到发生了什么。例如

print(using("before"))
wrk = ["wasting mem"] * 1000000
print(using("after"))

>>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb
>>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb

其他回答

也许它有帮助: <看到额外的>

pip install gprof2dot
sudo apt-get install graphviz

gprof2dot -f pstats profile_for_func1_001 | dot -Tpng -o profile.png

def profileit(name):
    """
    @profileit("profile_for_func1_001")
    """
    def inner(func):
        def wrapper(*args, **kwargs):
            prof = cProfile.Profile()
            retval = prof.runcall(func, *args, **kwargs)
            # Note use of name from outer scope
            prof.dump_stats(name)
            return retval
        return wrapper
    return inner

@profileit("profile_for_func1_001")
def func1(...)

Python 3.4包含了一个新模块:tracemalloc。它提供了关于哪些代码分配了最多内存的详细统计信息。下面的示例显示了分配内存的前三行。

from collections import Counter
import linecache
import os
import tracemalloc

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


tracemalloc.start()

counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
    words = list(words)
    for word in words:
        prefix = word[:3]
        counts[prefix] += 1
print('Top prefixes:', counts.most_common(3))

snapshot = tracemalloc.take_snapshot()
display_top(snapshot)

结果如下:

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: scratches/memory_test.py:37: 6527.1 KiB
    words = list(words)
#2: scratches/memory_test.py:39: 247.7 KiB
    prefix = word[:3]
#3: scratches/memory_test.py:40: 193.0 KiB
    counts[prefix] += 1
4 other: 4.3 KiB
Total allocated size: 6972.1 KiB

什么时候内存泄漏不是泄漏?

That example is great when the memory is still being held at the end of the calculation, but sometimes you have code that allocates a lot of memory and then releases it all. It's not technically a memory leak, but it's using more memory than you think it should. How can you track memory usage when it all gets released? If it's your code, you can probably add some debugging code to take snapshots while it's running. If not, you can start a background thread to monitor memory usage while the main thread runs.

下面是前面的示例,其中所有代码都被移动到count_prefixes()函数中。当该函数返回时,释放所有内存。我还添加了一些sleep()调用来模拟长时间运行的计算。

from collections import Counter
import linecache
import os
import tracemalloc
from time import sleep


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    tracemalloc.start()

    most_common = count_prefixes()
    print('Top prefixes:', most_common)

    snapshot = tracemalloc.take_snapshot()
    display_top(snapshot)


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

当我运行这个版本时,内存使用量从6MB下降到4KB,因为函数在完成时释放了所有内存。

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: collections/__init__.py:537: 0.7 KiB
    self.update(*args, **kwds)
#2: collections/__init__.py:555: 0.6 KiB
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
#3: python3.6/heapq.py:569: 0.5 KiB
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
10 other: 2.2 KiB
Total allocated size: 4.0 KiB

现在这里有一个受另一个答案启发的版本,它启动第二个线程来监视内存使用情况。

from collections import Counter
import linecache
import os
import tracemalloc
from datetime import datetime
from queue import Queue, Empty
from resource import getrusage, RUSAGE_SELF
from threading import Thread
from time import sleep

def memory_monitor(command_queue: Queue, poll_interval=1):
    tracemalloc.start()
    old_max = 0
    snapshot = None
    while True:
        try:
            command_queue.get(timeout=poll_interval)
            if snapshot is not None:
                print(datetime.now())
                display_top(snapshot)

            return
        except Empty:
            max_rss = getrusage(RUSAGE_SELF).ru_maxrss
            if max_rss > old_max:
                old_max = max_rss
                snapshot = tracemalloc.take_snapshot()
                print(datetime.now(), 'max RSS', max_rss)


def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
            sleep(0.0001)
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common


def main():
    queue = Queue()
    poll_interval = 0.1
    monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval))
    monitor_thread.start()
    try:
        most_common = count_prefixes()
        print('Top prefixes:', most_common)
    finally:
        queue.put('stop')
        monitor_thread.join()


def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/file.py" with "module/file.py"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


main()

资源模块允许您查看当前内存使用情况,并保存峰值内存使用情况下的快照。该队列让主线程告诉内存监控器线程何时打印报告并关闭。当它运行时,它会显示list()调用所使用的内存:

2018-05-29 10:34:34.441334 max RSS 10188
2018-05-29 10:34:36.475707 max RSS 23588
2018-05-29 10:34:36.616524 max RSS 38104
2018-05-29 10:34:36.772978 max RSS 45924
2018-05-29 10:34:36.929688 max RSS 46824
2018-05-29 10:34:37.087554 max RSS 46852
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
2018-05-29 10:34:56.281262
Top 3 lines
#1: scratches/scratch.py:36: 6527.0 KiB
    words = list(words)
#2: scratches/scratch.py:38: 16.4 KiB
    prefix = word[:3]
#3: scratches/scratch.py:39: 10.1 KiB
    counts[prefix] += 1
19 other: 10.8 KiB
Total allocated size: 6564.3 KiB

如果您在Linux上,您可能会发现/proc/self/statm比资源模块更有用。

一个简单的例子,使用memory_profile计算一个代码块/函数的内存使用情况,同时返回函数的结果:

import memory_profiler as mp

def fun(n):
    tmp = []
    for i in range(n):
        tmp.extend(list(range(i*i)))
    return "XXXXX"

在运行代码之前计算内存使用量,然后在代码期间计算最大使用量:

start_mem = mp.memory_usage(max_usage=True)
res = mp.memory_usage(proc=(fun, [100]), max_usage=True, retval=True) 
print('start mem', start_mem)
print('max mem', res[0][0])
print('used mem', res[0][0]-start_mem)
print('fun output', res[1])

在运行函数时计算采样点的使用情况:

res = mp.memory_usage((fun, [100]), interval=.001, retval=True)
print('min mem', min(res[0]))
print('max mem', max(res[0]))
print('used mem', max(res[0])-min(res[0]))
print('fun output', res[1])

学分:@skeept

如果你只想查看一个对象的内存使用情况,(回答其他问题)

有一个叫做Pympler的模块,它包含asizeof 模块。 使用方法如下: 从pympler进口asizeof asizeof.asizeof (my_object) 不像系统。Getsizeof,它适用于你自己创建的对象。 > > > asizeof.asizeof(元组(bcd)) 200 > > > asizeof。Asizeof ({'foo': 'bar', 'baz': 'bar'}) 400 > > > asizeof.asizeof ({}) 280 > > > asizeof.asizeof({“foo”:“酒吧”}) 360 > > > asizeof.asizeof(“foo”) 40 > > > asizeof.asizeof (Bar ()) 352 > > > asizeof.asizeof (Bar () . __dict__) 280

>>> help(asizeof.asizeof)
Help on function asizeof in module pympler.asizeof:

asizeof(*objs, **opts)
    Return the combined size in bytes of all objects passed as positional arguments.

披露:

仅适用于Linux 报告当前进程作为一个整体使用的内存,而不是其中的各个函数

但它的优点在于简单:

import resource
def using(point=""):
    usage=resource.getrusage(resource.RUSAGE_SELF)
    return '''%s: usertime=%s systime=%s mem=%s mb
           '''%(point,usage[0],usage[1],
                usage[2]/1024.0 )

只要插入using("Label")你想看到发生了什么。例如

print(using("before"))
wrk = ["wasting mem"] * 1000000
print(using("after"))

>>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb
>>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb