imap/imap_unordered和map/map_async之间有两个关键区别:
它们消耗你传递给它们的可迭代对象的方式。
他们把结果返回给你的方式。
map consumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.
Imap不会把你给它的可迭代对象转换成一个列表,也不会把它分解成块(默认情况下)。它将每次迭代可迭代的一个元素,并将它们每个发送到工作进程。这意味着将整个可迭代对象转换为列表不会占用内存,但这也意味着大型可迭代对象的性能较慢,因为缺少分块。但是,可以通过传递一个大于默认值1的chunksize参数来缓解这一问题。
The other major difference between imap/imap_unordered and map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResult is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map does (map is actually implemented internally as map_async(...).get()). There's no way to get partial results; you either have the entire result, or nothing.
Imap和imap_unordered都立即返回可迭代对象。使用imap,只要迭代对象准备好,结果就会立即从迭代对象中产生,同时仍然保留输入迭代对象的顺序。使用imap_unordered,只要结果准备好就会产生,而不管输入可迭代对象的顺序如何。假设你有这个:
import multiprocessing
import time
def func(x):
time.sleep(x)
return x + 2
if __name__ == "__main__":
p = multiprocessing.Pool()
start = time.time()
for x in p.imap(func, [1,5,3]):
print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))
这将输出:
3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
如果你使用p.imap_unordered而不是p.imap,你会看到:
3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)
如果你使用p.map或p.map_async().get(),你会看到:
3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
因此,使用imap/imap_unordered而不是map_async的主要原因是:
你的iterable足够大,以至于将它转换成一个列表会导致你耗尽/使用太多的内存。
您希望能够在完成所有结果之前开始处理结果。