我把Project Euler中的第12题作为一个编程练习,并比较了我在C、Python、Erlang和Haskell中的实现(当然不是最优的)。为了获得更高的执行时间,我搜索第一个因数超过1000的三角形数,而不是原始问题中所述的500。
结果如下:
C:
lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320
real 0m11.074s
user 0m11.070s
sys 0m0.000s
Python:
lorenzo@enzo:~/erlang$ time ./euler12.py
842161320
real 1m16.632s
user 1m16.370s
sys 0m0.250s
Python与PyPy:
lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py
842161320
real 0m13.082s
user 0m13.050s
sys 0m0.020s
Erlang:
lorenzo@enzo:~/erlang$ erlc euler12.erl
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.4 (abort with ^G)
1> 842161320
real 0m48.259s
user 0m48.070s
sys 0m0.020s
Haskell:
lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx
842161320
real 2m37.326s
user 2m37.240s
sys 0m0.080s
简介:
C: 100%
Python: 692% (PyPy占118%)
Erlang: 436%(135%归功于RichardC)
Haskell: 1421%
我认为C语言有一个很大的优势,因为它使用长来进行计算,而不是像其他三种那样使用任意长度的整数。它也不需要首先加载运行时(其他的呢?)
问题1:
Erlang, Python和Haskell是否会因为使用任意长度的整数而降低速度,或者只要值小于MAXINT就不会?
问题2:
哈斯克尔为什么这么慢?是否有一个编译器标志关闭刹车或它是我的实现?(后者是很有可能的,因为Haskell对我来说是一本有七个印章的书。)
问题3:
你能否给我一些提示,如何在不改变我确定因素的方式的情况下优化这些实现?以任何方式优化:更好、更快、更“原生”的语言。
编辑:
问题4:
我的函数实现是否允许LCO(最后调用优化,也就是尾递归消除),从而避免在调用堆栈中添加不必要的帧?
虽然我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图用这四种语言实现尽可能相似的相同算法。
使用的源代码:
#include <stdio.h>
#include <math.h>
int factorCount (long n)
{
double square = sqrt (n);
int isquare = (int) square;
int count = isquare == square ? -1 : 0;
long candidate;
for (candidate = 1; candidate <= isquare; candidate ++)
if (0 == n % candidate) count += 2;
return count;
}
int main ()
{
long triangle = 1;
int index = 1;
while (factorCount (triangle) < 1001)
{
index ++;
triangle += index;
}
printf ("%ld\n", triangle);
}
#! /usr/bin/env python3.2
import math
def factorCount (n):
square = math.sqrt (n)
isquare = int (square)
count = -1 if isquare == square else 0
for candidate in range (1, isquare + 1):
if not n % candidate: count += 2
return count
triangle = 1
index = 1
while factorCount (triangle) < 1001:
index += 1
triangle += index
print (triangle)
-module (euler12).
-compile (export_all).
factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).
factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;
factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;
factorCount (Number, Sqrt, Candidate, Count) ->
case Number rem Candidate of
0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
_ -> factorCount (Number, Sqrt, Candidate + 1, Count)
end.
nextTriangle (Index, Triangle) ->
Count = factorCount (Triangle),
if
Count > 1000 -> Triangle;
true -> nextTriangle (Index + 1, Triangle + Index + 1)
end.
solve () ->
io:format ("~p~n", [nextTriangle (1, 1) ] ),
halt (0).
factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
where square = sqrt $ fromIntegral number
isquare = floor square
factorCount' number sqrt candidate count
| fromIntegral candidate > sqrt = count
| number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
| otherwise = factorCount' number sqrt (candidate + 1) count
nextTriangle index triangle
| factorCount triangle > 1000 = triangle
| otherwise = nextTriangle (index + 1) (triangle + index + 1)
main = print $ nextTriangle 1 1
问题1:Erlang、Python和Haskell是否会因为使用
任意长度的整数,只要值更小
比MAXINT ?
对于Erlang,第一个问题的答案是否定的。最后一个问题可以通过适当地使用Erlang来回答,如下所示:
http://bredsaal.dk/learning-erlang-using-projecteuler-net
由于它比您最初的C示例要快,我猜它会有很多问题,因为其他人已经详细讨论过了。
这个Erlang模块在一个便宜的上网本上执行大约5秒…它使用erlang中的网络线程模型,并演示了如何利用事件模型。它可以分布在许多节点上。而且速度很快。不是我的代码。
-module(p12dist).
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").
-compile(export_all).
server() ->
server(1).
server(Number) ->
receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},
server(Number+101);
{result,T} -> io:format("The result is: \~w.\~n", [T]);
_ -> server(Number)
end.
worker(Server_PID) ->
Server_PID ! {getwork, self()},
receive {work,Start,End} -> solve(Start,End,Server_PID)
end,
worker(Server_PID).
start() ->
Server_PID = spawn(p12dist, server, []),
spawn(p12dist, worker, [Server_PID]),
spawn(p12dist, worker, [Server_PID]),
spawn(p12dist, worker, [Server_PID]),
spawn(p12dist, worker, [Server_PID]).
solve(N,End,_) when N =:= End -> no_solution;
solve(N,End,Server_PID) ->
T=round(N*(N+1)/2),
case (divisor(T,round(math:sqrt(T))) > 500) of
true ->
Server_PID ! {result,T};
false ->
solve(N+1,End,Server_PID)
end.
divisors(N) ->
divisor(N,round(math:sqrt(N))).
divisor(_,0) -> 1;
divisor(N,I) ->
case (N rem I) =:= 0 of
true ->
2+divisor(N,I-1);
false ->
divisor(N,I-1)
end.
下面的测试发生在Intel(R) Atom(TM) CPU N270 @ 1.60GHz上
~$ time erl -noshell -s p12dist start
The result is: 76576500.
^C
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
a
real 0m5.510s
user 0m5.836s
sys 0m0.152s
只是为了好玩。下面是一个更“原生”的Haskell实现:
import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares
isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round
intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'
factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]
factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize
numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2
nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)
forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)
problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n
main = do
let (n,val) = problem12 1000
print val
使用ghc -O3,它在我的机器上持续运行0.55-0.58秒(1.73GHz Core i7)。
C版本中一个更有效的factorCount函数:
int factorCount (int n)
{
int count = 1;
int candidate,tmpCount;
while (n % 2 == 0) {
count++;
n /= 2;
}
for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
if (n % candidate == 0) {
tmpCount = 1;
do {
tmpCount++;
n /= candidate;
} while (n % candidate == 0);
count*=tmpCount;
}
if (n > 1)
count *= 2;
return count;
}
在main中使用gcc -O3 -lm将long类型更改为int类型,该程序始终在0.31-0.35秒内运行。
如果您利用第n个三角形数= n*(n+1)/2,并且n和(n+1)具有完全不同的质因数分解,则可以使两者运行得更快,因此可以将每个一半的因数数相乘,以得到整体的因数数。以下几点:
int main ()
{
int triangle = 0,count1,count2 = 1;
do {
count1 = count2;
count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
} while (count1*count2 < 1001);
printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}
将c代码的运行时间减少到0.17-0.19秒,它可以处理更大的搜索——大于10000个因数在我的机器上大约需要43秒。我给感兴趣的读者留下了类似的haskell加速。
在Python优化方面,除了使用PyPy(对代码进行零更改即可获得令人印象深刻的加速)之外,还可以使用PyPy的翻译工具链编译与rpython兼容的版本,或者使用Cython构建扩展模块,在我的测试中,这两种工具都比C版本快,而Cython模块的速度几乎是C版本的两倍。作为参考,我包括C和PyPy基准测试结果:
C(编译gcc -O3 -lm)
% time ./euler12-c
842161320
./euler12-c 11.95s
user 0.00s
system 99%
cpu 11.959 total
PyPy 1.5
% time pypy euler12.py
842161320
pypy euler12.py
16.44s user
0.01s system
99% cpu 16.449 total
RPython(使用最新的PyPy修订版,c2f583445aee)
% time ./euler12-rpython-c
842161320
./euler12-rpy-c
10.54s user 0.00s
system 99%
cpu 10.540 total
崇拜0.15
% time python euler12-cython.py
842161320
python euler12-cython.py
6.27s user 0.00s
system 99%
cpu 6.274 total
RPython版本有几个关键的变化。要转换成一个独立的程序,您需要定义目标,在本例中是主函数。它被期望接受sys。Argv作为它唯一的参数,并且需要返回一个int。你可以使用translate.py, % translate.py euler12-rpython.py来翻译它,它可以翻译成C语言并为你编译它。
# euler12-rpython.py
import math, sys
def factorCount(n):
square = math.sqrt(n)
isquare = int(square)
count = -1 if isquare == square else 0
for candidate in xrange(1, isquare + 1):
if not n % candidate: count += 2
return count
def main(argv):
triangle = 1
index = 1
while factorCount(triangle) < 1001:
index += 1
triangle += index
print triangle
return 0
if __name__ == '__main__':
main(sys.argv)
def target(*args):
return main, None
Cython版本被重写为扩展模块_euler12。我从一个普通的python文件中导入并调用它。_euler12。Pyx本质上与您的版本相同,只是有一些额外的静态类型声明。setup.py有一个正常的样板来构建扩展,使用python setup.py build_ext——inplace。
# _euler12.pyx
from libc.math cimport sqrt
cdef int factorCount(int n):
cdef int candidate, isquare, count
cdef double square
square = sqrt(n)
isquare = int(square)
count = -1 if isquare == square else 0
for candidate in range(1, isquare + 1):
if not n % candidate: count += 2
return count
cpdef main():
cdef int triangle = 1, index = 1
while factorCount(triangle) < 1001:
index += 1
triangle += index
print triangle
# euler12-cython.py
import _euler12
_euler12.main()
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("_euler12", ["_euler12.pyx"])]
setup(
name = 'Euler12-Cython',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
老实说,我对RPython或Cython都没有什么经验,对结果感到惊喜。如果您正在使用CPython,那么在Cython扩展模块中编写cpu密集型代码似乎是优化程序的一种非常简单的方法。