在Python多处理库中,是否有支持多个参数的pool.map变体?
import multiprocessing
text = "test"
def harvester(text, case):
X = case[0]
text + str(X)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=6)
case = RAW_DATASET
pool.map(harvester(text, case), case, 1)
pool.close()
pool.join()
Python 2的更好解决方案:
from multiprocessing import Pool
def func((i, (a, b))):
print i, a, b
return a + b
pool = Pool(3)
pool.map(func, [(0,(1,2)), (1,(2,3)), (2,(3, 4))])
输出
2 3 4
1 2 3
0 1 2
out[]:
[3, 5, 7]
另一个简单的选择是将函数参数包装在元组中,然后包装应该在元组中传递的参数。在处理大量数据时,这可能并不理想。我相信它会为每个元组创建副本。
from multiprocessing import Pool
def f((a,b,c,d)):
print a,b,c,d
return a + b + c +d
if __name__ == '__main__':
p = Pool(10)
data = [(i+0,i+1,i+2,i+3) for i in xrange(10)]
print(p.map(f, data))
p.close()
p.join()
以某种随机顺序给出输出:
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
7 8 9 10
6 7 8 9
8 9 10 11
9 10 11 12
[6, 10, 14, 18, 22, 26, 30, 34, 38, 42]
对我来说,以下是一个简单明了的解决方案:
from multiprocessing.pool import ThreadPool
from functools import partial
from time import sleep
from random import randint
def dosomething(var,s):
sleep(randint(1,5))
print(var)
return var + s
array = ["a", "b", "c", "d", "e"]
with ThreadPool(processes=5) as pool:
resp_ = pool.map(partial(dosomething,s="2"), array)
print(resp_)
输出:
a
b
d
e
c
['a2', 'b2', 'c2', 'd2', 'e2']
另一种方法是将列表列表传递给单参数例程:
import os
from multiprocessing import Pool
def task(args):
print "PID =", os.getpid(), ", arg1 =", args[0], ", arg2 =", args[1]
pool = Pool()
pool.map(task, [
[1,2],
[3,4],
[5,6],
[7,8]
])
然后可以用自己喜欢的方法构造一个参数列表。