我把Project Euler中的第12题作为一个编程练习,并比较了我在C、Python、Erlang和Haskell中的实现(当然不是最优的)。为了获得更高的执行时间,我搜索第一个因数超过1000的三角形数,而不是原始问题中所述的500。

结果如下:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

Python与PyPy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

简介:

C: 100% Python: 692% (PyPy占118%) Erlang: 436%(135%归功于RichardC) Haskell: 1421%

我认为C语言有一个很大的优势,因为它使用长来进行计算,而不是像其他三种那样使用任意长度的整数。它也不需要首先加载运行时(其他的呢?)

问题1: Erlang, Python和Haskell是否会因为使用任意长度的整数而降低速度,或者只要值小于MAXINT就不会?

问题2: 哈斯克尔为什么这么慢?是否有一个编译器标志关闭刹车或它是我的实现?(后者是很有可能的,因为Haskell对我来说是一本有七个印章的书。)

问题3: 你能否给我一些提示,如何在不改变我确定因素的方式的情况下优化这些实现?以任何方式优化:更好、更快、更“原生”的语言。

编辑:

问题4: 我的函数实现是否允许LCO(最后调用优化,也就是尾递归消除),从而避免在调用堆栈中添加不必要的帧?

虽然我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图用这四种语言实现尽可能相似的相同算法。


使用的源代码:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

当前回答

我把“Jannich Brendle”版本改成了1000,而不是500。并列出euler12.bin, euler12.bin的结果。话务量,p12dist.erl。两个erl代码都使用'+native'进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

其他回答

我把“Jannich Brendle”版本改成了1000,而不是500。并列出euler12.bin, euler12.bin的结果。话务量,p12dist.erl。两个erl代码都使用'+native'进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$
#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate <= n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

时间。/ a.o ut

2.79s user 0.00s system 99% CPU 2.794 total

看看这个博客。在过去一年左右的时间里,他用Haskell和Python完成了一些Project Euler问题,他通常发现Haskell要快得多。我认为在这些语言之间,它更多地与你的流畅性和编码风格有关。

说到Python速度,你使用了错误的实现!尝试一下PyPy,对于这样的事情,你会发现它要快得多。

Erlang实现存在一些问题。作为下面的基准,我测量的未修改的Erlang程序的执行时间为47.6秒,而C代码的执行时间为12.7秒。

(编辑:在Erlang/OTP版本24,2021年,Erlang有一个自动JIT编译器,旧的+本机编译器选项不再支持或需要。我保留下面这段文字作为历史文件。关于export_all的注释对于jit生成良好代码的能力仍然是有效的。)

The first thing you should do if you want to run computationally intensive Erlang code is to use native code. Compiling with erlc +native euler12 got the time down to 41.3 seconds. This is however a much lower speedup (just 15%) than expected from native compilation on this kind of code, and the problem is your use of -compile(export_all). This is useful for experimentation, but the fact that all functions are potentially reachable from the outside causes the native compiler to be very conservative. (The normal BEAM emulator is not that much affected.) Replacing this declaration with -export([solve/0]). gives a much better speedup: 31.5 seconds (almost 35% from the baseline).

但是代码本身有一个问题:对于factorCount循环中的每一次迭代,都要执行以下测试:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C代码不这样做。一般来说,在相同代码的不同实现之间进行公平的比较是很棘手的,特别是如果算法是数值的,因为您需要确保它们实际上在做相同的事情。在某个实现中由于某个类型转换而产生的轻微舍入错误可能会导致它比另一个实现进行更多的迭代,即使两者最终得到相同的结果。

为了消除这个可能的错误源(并在每次迭代中摆脱额外的测试),我重写了factorCount函数,如下所示,密切模仿C代码:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

这个重写,没有export_all和本机编译,给了我以下运行时:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

这与C代码相比不算太糟:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

考虑到Erlang完全不适合编写数字代码,在这样的程序中只比C慢50%就已经很不错了。

最后,关于你的问题:

问题1:erlang、python和haskell是否会因为使用任意长度的整数而降低速度 只要值小于MAXINT,它们不就行了吗?

Yes, somewhat. In Erlang, there is no way of saying "use 32/64-bit arithmetic with wrap-around", so unless the compiler can prove some bounds on your integers (and it usually can't), it must check all computations to see if they can fit in a single tagged word or if it has to turn them into heap-allocated bignums. Even if no bignums are ever used in practice at runtime, these checks will have to be performed. On the other hand, that means you know that the algorithm will never fail because of an unexpected integer wraparound if you suddenly give it larger inputs than before.

问题4:我的函数实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,您的Erlang代码在最后调用优化方面是正确的。

看看您的Erlang实现。计时包括启动整个虚拟机、运行程序和停止虚拟机。我很确定设置和停止erlang vm需要一些时间。

If the timing was done within the erlang virtual machine itself, results would be different as in that case we would have the actual time for only the program in question. Otherwise, i believe that the total time taken by the process of starting and loading of the Erlang Vm plus that of halting it (as you put it in your program) are all included in the total time which the method you are using to time the program is outputting. Consider using the erlang timing itself which we use when we want to time our programs within the virtual machine itself timer:tc/1 or timer:tc/2 or timer:tc/3. In this way, the results from erlang will exclude the time taken to start and stop/kill/halt the virtual machine. That is my reasoning there, think about it, and then try your bench mark again.

实际上,我建议我们尝试在这些语言的运行时内为程序计时(对于具有运行时的语言),以便获得精确的值。例如,C不像Erlang、Python和Haskell那样有启动和关闭运行时系统的开销(98%确定-我可以纠正)。因此(基于这个推理)我总结说,这个基准测试对于运行在运行时系统之上的语言来说不够精确/公平。让我们用这些更改再做一次。

编辑:此外,即使所有的语言都有运行时系统,启动和停止它们的开销也会有所不同。因此,我建议我们从运行时系统内部计时(对于应用此方法的语言)。众所周知,Erlang VM在启动时有相当大的开销!