我想了很久了。就像题目说的,哪个更快,是实际函数还是简单地取1 / 2次幂?
更新
这不是一个过早优化的问题。这只是一个底层代码如何实际工作的问题。Python代码的工作原理是什么?
我给Guido van Rossum发了一封邮件,因为我真的很想知道这些方法的区别。
我的电子邮件:
在Python中至少有3种方法来求平方根:math。返回值,
'**'运算符和pow(x,.5)。我只是好奇它们之间的区别
每一个的实现。说到效率
是更好吗?
他的回答:
Pow和**是等价的;数学。根号方根不适用于复数,
并链接到C的sqrt()函数。至于哪一个是
快点,我不知道……
python要优化的是可读性。为此,我认为显式地使用平方根函数是最好的。话虽如此,我们还是来研究一下性能。
我为Python 3更新了Claudiu的代码,并使其不可能优化计算(未来一个优秀的Python编译器可能会做的事情):
from sys import version
from time import time
from math import sqrt, pi, e
print(version)
N = 1_000_000
def timeit1():
z = N * e
s = time()
for n in range(N):
z += (n * pi) ** .5 - z ** .5
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit2():
z = N * e
s = time()
for n in range(N):
z += sqrt(n * pi) - sqrt(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit3(arg=sqrt):
z = N * e
s = time()
for n in range(N):
z += arg(n * pi) - arg(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
timeit1()
timeit2()
timeit3()
结果不同,但一个示例输出是:
3.6.6 (default, Jul 19 2018, 14:25:17)
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166
还有一个最近的输出:
3.7.4 (default, Jul 9 2019, 16:48:28)
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
Took 0.2583 seconds to calculate 3130485.5713865166
Took 0.1612 seconds to calculate 3130485.5713865166
Took 0.1563 seconds to calculate 3130485.5713865166
你自己试试。
python要优化的是可读性。为此,我认为显式地使用平方根函数是最好的。话虽如此,我们还是来研究一下性能。
我为Python 3更新了Claudiu的代码,并使其不可能优化计算(未来一个优秀的Python编译器可能会做的事情):
from sys import version
from time import time
from math import sqrt, pi, e
print(version)
N = 1_000_000
def timeit1():
z = N * e
s = time()
for n in range(N):
z += (n * pi) ** .5 - z ** .5
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit2():
z = N * e
s = time()
for n in range(N):
z += sqrt(n * pi) - sqrt(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
def timeit3(arg=sqrt):
z = N * e
s = time()
for n in range(N):
z += arg(n * pi) - arg(z)
print (f"Took {(time() - s):.4f} seconds to calculate {z}")
timeit1()
timeit2()
timeit3()
结果不同,但一个示例输出是:
3.6.6 (default, Jul 19 2018, 14:25:17)
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166
还有一个最近的输出:
3.7.4 (default, Jul 9 2019, 16:48:28)
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
Took 0.2583 seconds to calculate 3130485.5713865166
Took 0.1612 seconds to calculate 3130485.5713865166
Took 0.1563 seconds to calculate 3130485.5713865166
你自己试试。
优化的第一条规则:不要这么做
第二条规则:先别这么做
以下是一些计时(Python 2.5.2, Windows):
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop
这个测试表明x**。5比√(x)略快。
对于Python 3.0,结果正好相反:
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop
$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop
$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop
Math.sqrt (x)总是比x**快。5在另一台机器上(Ubuntu, Python 2.6和3.1):
$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop