就Python中的性能而言,列表理解或map(), filter()和reduce()等函数是否比for循环更快?为什么,从技术上讲,它们以C速度运行,而for循环以python虚拟机速度运行?
假设在我正在开发的游戏中,我需要使用for循环绘制复杂而巨大的地图。这个问题肯定是相关的,例如,如果一个列表理解确实更快,为了避免延迟,它将是一个更好的选择(尽管代码的视觉复杂性)。
就Python中的性能而言,列表理解或map(), filter()和reduce()等函数是否比for循环更快?为什么,从技术上讲,它们以C速度运行,而for循环以python虚拟机速度运行?
假设在我正在开发的游戏中,我需要使用for循环绘制复杂而巨大的地图。这个问题肯定是相关的,例如,如果一个列表理解确实更快,为了避免延迟,它将是一个更好的选择(尽管代码的视觉复杂性)。
当前回答
我写了一个简单的脚本来测试速度,这是我发现的。实际上在我的例子中,for循环是最快的。这真的让我很惊讶,看看下面(正在计算平方和)。
from functools import reduce
import datetime
def time_it(func, numbers, *args):
start_t = datetime.datetime.now()
for i in range(numbers):
func(args[0])
print (datetime.datetime.now()-start_t)
def square_sum1(numbers):
return reduce(lambda sum, next: sum+next**2, numbers, 0)
def square_sum2(numbers):
a = 0
for i in numbers:
i = i**2
a += i
return a
def square_sum3(numbers):
sqrt = lambda x: x**2
return sum(map(sqrt, numbers))
def square_sum4(numbers):
return(sum([int(i)**2 for i in numbers]))
time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.390000 #List comprehension
其他回答
如果你检查python.org上的信息,你可以看到这个摘要:
Version Time (seconds)
Basic loop 3.47
Eliminate dots 2.45
Local variable & no dots 1.79
Using map function 0.54
但您确实应该详细阅读上面的文章,以了解性能差异的原因。
我还强烈建议您使用timeit来计时。最后,可能会出现这样的情况,例如,当满足条件时,您可能需要跳出for循环。这可能比调用map得到结果更快。
以下是基于经验的粗略指南和有根据的猜测。您应该对具体用例进行计时或分析以获得确切的数字,而这些数字可能偶尔与下面的数字不一致。
列表推导式通常比精确等效的for循环(实际上构建了一个列表)快一点点,很可能是因为它不需要在每次迭代时查找列表及其append方法。然而,列表推导式仍然执行字节码级循环:
>>> dis.dis(<the code object for `[x for x in range(10)]`>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
使用列表推导式代替不构建列表的循环,无意义地积累一个无意义值的列表,然后将列表丢弃,这通常较慢,因为创建和扩展列表的开销很大。列表推导并不是天生就比旧的循环快的魔法。
As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.
假设在我正在开发的游戏中,我需要使用for循环绘制复杂而巨大的地图。这个问题肯定是相关的,例如,如果一个列表理解确实更快,为了避免延迟,它将是一个更好的选择(尽管代码的视觉复杂性)。
Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.
您特别询问了map()、filter()和reduce(),但我假设您想了解一般的函数式编程。我亲自测试了计算一组点中所有点之间距离的问题,函数式编程(使用内置itertools模块中的starmap函数)比for-loops稍微慢一点(实际上需要1.25倍的时间)。下面是我使用的示例代码:
import itertools, time, math, random
class Point:
def __init__(self,x,y):
self.x, self.y = x, y
point_set = (Point(0, 0), Point(0, 1), Point(0, 2), Point(0, 3))
n_points = 100
pick_val = lambda : 10 * random.random() - 5
large_set = [Point(pick_val(), pick_val()) for _ in range(n_points)]
# the distance function
f_dist = lambda x0, x1, y0, y1: math.sqrt((x0 - x1) ** 2 + (y0 - y1) ** 2)
# go through each point, get its distance from all remaining points
f_pos = lambda p1, p2: (p1.x, p2.x, p1.y, p2.y)
extract_dists = lambda x: itertools.starmap(f_dist,
itertools.starmap(f_pos,
itertools.combinations(x, 2)))
print('Distances:', list(extract_dists(point_set)))
t0_f = time.time()
list(extract_dists(large_set))
dt_f = time.time() - t0_f
函数版本是否比过程版本更快?
def extract_dists_procedural(pts):
n_pts = len(pts)
l = []
for k_p1 in range(n_pts - 1):
for k_p2 in range(k_p1, n_pts):
l.append((pts[k_p1].x - pts[k_p2].x) ** 2 +
(pts[k_p1].y - pts[k_p2].y) ** 2)
return l
t0_p = time.time()
list(extract_dists_procedural(large_set))
# using list() on the assumption that
# it eats up as much time as in the functional version
dt_p = time.time() - t0_p
f_vs_p = dt_p / dt_f
if f_vs_p >= 1.0:
print('Time benefit of functional progamming:', f_vs_p,
'times as fast for', n_points, 'points')
else:
print('Time penalty of functional programming:', 1 / f_vs_p,
'times as slow for', n_points, 'points')
我写了一个简单的脚本来测试速度,这是我发现的。实际上在我的例子中,for循环是最快的。这真的让我很惊讶,看看下面(正在计算平方和)。
from functools import reduce
import datetime
def time_it(func, numbers, *args):
start_t = datetime.datetime.now()
for i in range(numbers):
func(args[0])
print (datetime.datetime.now()-start_t)
def square_sum1(numbers):
return reduce(lambda sum, next: sum+next**2, numbers, 0)
def square_sum2(numbers):
a = 0
for i in numbers:
i = i**2
a += i
return a
def square_sum3(numbers):
sqrt = lambda x: x**2
return sum(map(sqrt, numbers))
def square_sum4(numbers):
return(sum([int(i)**2 for i in numbers]))
time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.390000 #List comprehension
在Alphii的答案中添加一个扭曲,实际上for循环是第二好的,大约比map慢6倍
from functools import reduce
import datetime
def time_it(func, numbers, *args):
start_t = datetime.datetime.now()
for i in range(numbers):
func(args[0])
print (datetime.datetime.now()-start_t)
def square_sum1(numbers):
return reduce(lambda sum, next: sum+next**2, numbers, 0)
def square_sum2(numbers):
a = 0
for i in numbers:
a += i**2
return a
def square_sum3(numbers):
a = 0
map(lambda x: a+x**2, numbers)
return a
def square_sum4(numbers):
a = 0
return [a+i**2 for i in numbers]
time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
主要的变化是消除了缓慢的sum调用,以及在最后一种情况下可能不必要的int()。将for循环和map放在同一个术语中,实际上使它变得非常真实。记住lambda是函数概念,理论上不应该有副作用,但是,它们可能有副作用,比如添加到。 在这种情况下,使用Python 3.6.1, Ubuntu 14.04, Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
0:00:00.257703 #Reduce
0:00:00.184898 #For loop
0:00:00.031718 #Map
0:00:00.212699 #List comprehension