这是我能想到的最好的算法。
def get_primes(n):
numbers = set(range(n, 1, -1))
primes = []
while numbers:
p = numbers.pop()
primes.append(p)
numbers.difference_update(set(range(p*2, n+1, p)))
return primes
>>> timeit.Timer(stmt='get_primes.get_primes(1000000)', setup='import get_primes').timeit(1)
1.1499958793645562
还能做得更快吗?
这段代码有一个缺陷:由于numbers是一个无序集,不能保证numbers.pop()将从集合中移除最低的数字。尽管如此,它还是适用于(至少对我来说)一些输入数字:
>>> sum(get_primes(2000000))
142913828922L
#That's the correct sum of all numbers below 2 million
>>> 529 in get_primes(1000)
False
>>> 529 in get_primes(530)
True
这些都是经过编写和测试的。所以没有必要重新发明轮子。
python -m timeit -r10 -s"from sympy import sieve" "primes = list(sieve.primerange(1, 10**6))"
打破了12.2秒的记录!
10 loops, best of 10: 12.2 msec per loop
如果这还不够快,你可以试试PyPy:
pypy -m timeit -r10 -s"from sympy import sieve" "primes = list(sieve.primerange(1, 10**6))"
结果是:
10 loops, best of 10: 2.03 msec per loop
得到247张赞成票的答案列出了15.9毫秒的最佳解决方案。
比较这个! !
对于最快的代码,numpy解决方案是最好的。不过,出于纯粹的学术原因,我发布了我的纯python版本,它比上面发布的食谱版本快不到50%。由于我将整个列表放在内存中,所以需要足够的空间来容纳所有内容,但它的可伸缩性似乎相当好。
def daniel_sieve_2(maxNumber):
"""
Given a number, returns all numbers less than or equal to
that number which are prime.
"""
allNumbers = range(3, maxNumber+1, 2)
for mIndex, number in enumerate(xrange(3, maxNumber+1, 2)):
if allNumbers[mIndex] == 0:
continue
# now set all multiples to 0
for index in xrange(mIndex+number, (maxNumber-3)/2+1, number):
allNumbers[index] = 0
return [2] + filter(lambda n: n!=0, allNumbers)
结果是:
>>>mine = timeit.Timer("daniel_sieve_2(1000000)",
... "from sieves import daniel_sieve_2")
>>>prev = timeit.Timer("get_primes_erat(1000000)",
... "from sieves import get_primes_erat")
>>>print "Mine: {0:0.4f} ms".format(min(mine.repeat(3, 1))*1000)
Mine: 428.9446 ms
>>>print "Previous Best {0:0.4f} ms".format(min(prev.repeat(3, 1))*1000)
Previous Best 621.3581 ms
假设N < 9,080,191, Miller-Rabin's Primality检验的确定性实现
import sys
def miller_rabin_pass(a, n):
d = n - 1
s = 0
while d % 2 == 0:
d >>= 1
s += 1
a_to_power = pow(a, d, n)
if a_to_power == 1:
return True
for i in range(s-1):
if a_to_power == n - 1:
return True
a_to_power = (a_to_power * a_to_power) % n
return a_to_power == n - 1
def miller_rabin(n):
if n <= 2:
return n == 2
if n < 2_047:
return miller_rabin_pass(2, n)
return all(miller_rabin_pass(a, n) for a in (31, 73))
n = int(sys.argv[1])
primes = [2]
for p in range(3,n,2):
if miller_rabin(p):
primes.append(p)
print len(primes)
根据维基百科(http://en.wikipedia.org/wiki/Miller -Rabin_primality_test)上的文章,对于a = 37和73,测试N < 9,080,191足以判断N是否为合数。
我从原始米勒-拉宾测试的概率实现中改编了源代码:https://www.literateprograms.org/miller-rabin_primality_test__python_.html
在写这篇文章的时候,这是最快的工作解决方案(至少在我的机器上是这样)。它同时使用numpy和bitarray,并受到这个答案的primesfrom2to的启发。
import numpy as np
from bitarray import bitarray
def bit_primes(n):
bit_sieve = bitarray(n // 3 + (n % 6 == 2))
bit_sieve.setall(1)
bit_sieve[0] = False
for i in range(int(n ** 0.5) // 3 + 1):
if bit_sieve[i]:
k = 3 * i + 1 | 1
bit_sieve[k * k // 3::2 * k] = False
bit_sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = False
np_sieve = np.unpackbits(np.frombuffer(bit_sieve.tobytes(), dtype=np.uint8)).view(bool)
return np.concatenate(((2, 3), ((3 * np.flatnonzero(np_sieve) + 1) | 1)))
下面是与素数from2to的比较,它之前被发现是unutbu比较中最快的解:
python3 -m timeit -s "import fast_primes" "fast_primes.bit_primes(1000000)"
200 loops, best of 5: 1.19 msec per loop
python3 -m timeit -s "import fast_primes" "fast_primes.primesfrom2to(1000000)"
200 loops, best of 5: 1.23 msec per loop
对于寻找100万以下的质数,bit_primes稍微快一些。
n值越大,差异就越大。在某些情况下,bit_primes的速度是原来的两倍多:
python3 -m timeit -s "import fast_primes" "fast_primes.bit_primes(500_000_000)"
1 loop, best of 5: 540 msec per loop
python3 -m timeit -s "import fast_primes" "fast_primes.primesfrom2to(500_000_000)"
1 loop, best of 5: 1.15 sec per loop
作为参考,以下是primesfrom2to I的最小修改版本(适用于Python 3):
def primesfrom2to(n):
# https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
""" Input n>=6, Returns a array of primes, 2 <= p < n"""
sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
sieve[0] = False
for i in range(int(n ** 0.5) // 3 + 1):
if sieve[i]:
k = 3 * i + 1 | 1
sieve[((k * k) // 3)::2 * k] = False
sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = False
return np.r_[2, 3, ((3 * np.nonzero(sieve)[0] + 1) | 1)]