我把Project Euler中的第12题作为一个编程练习,并比较了我在C、Python、Erlang和Haskell中的实现(当然不是最优的)。为了获得更高的执行时间,我搜索第一个因数超过1000的三角形数,而不是原始问题中所述的500。

结果如下:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

Python与PyPy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

简介:

C: 100% Python: 692% (PyPy占118%) Erlang: 436%(135%归功于RichardC) Haskell: 1421%

我认为C语言有一个很大的优势,因为它使用长来进行计算,而不是像其他三种那样使用任意长度的整数。它也不需要首先加载运行时(其他的呢?)

问题1: Erlang, Python和Haskell是否会因为使用任意长度的整数而降低速度,或者只要值小于MAXINT就不会?

问题2: 哈斯克尔为什么这么慢?是否有一个编译器标志关闭刹车或它是我的实现?(后者是很有可能的,因为Haskell对我来说是一本有七个印章的书。)

问题3: 你能否给我一些提示,如何在不改变我确定因素的方式的情况下优化这些实现?以任何方式优化:更好、更快、更“原生”的语言。

编辑:

问题4: 我的函数实现是否允许LCO(最后调用优化,也就是尾递归消除),从而避免在调用堆栈中添加不必要的帧?

虽然我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图用这四种语言实现尽可能相似的相同算法。


使用的源代码:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

当前回答

通过使用Haskell包中的一些函数,可以大大加快Haskell实现的速度。 在这种情况下,我使用了质数,它只是安装了'cabal安装质数';)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

计时:

您的原始程序:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

改进的实现

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

正如你所看到的,在同一台机器上,这台机器运行38毫秒,而你的机器运行16秒:)

编译命令:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

其他回答

看看您的Erlang实现。计时包括启动整个虚拟机、运行程序和停止虚拟机。我很确定设置和停止erlang vm需要一些时间。

If the timing was done within the erlang virtual machine itself, results would be different as in that case we would have the actual time for only the program in question. Otherwise, i believe that the total time taken by the process of starting and loading of the Erlang Vm plus that of halting it (as you put it in your program) are all included in the total time which the method you are using to time the program is outputting. Consider using the erlang timing itself which we use when we want to time our programs within the virtual machine itself timer:tc/1 or timer:tc/2 or timer:tc/3. In this way, the results from erlang will exclude the time taken to start and stop/kill/halt the virtual machine. That is my reasoning there, think about it, and then try your bench mark again.

实际上,我建议我们尝试在这些语言的运行时内为程序计时(对于具有运行时的语言),以便获得精确的值。例如,C不像Erlang、Python和Haskell那样有启动和关闭运行时系统的开销(98%确定-我可以纠正)。因此(基于这个推理)我总结说,这个基准测试对于运行在运行时系统之上的语言来说不够精确/公平。让我们用这些更改再做一次。

编辑:此外,即使所有的语言都有运行时系统,启动和停止它们的开销也会有所不同。因此,我建议我们从运行时系统内部计时(对于应用此方法的语言)。众所周知,Erlang VM在启动时有相当大的开销!

在x86_64 Core2 Duo (2.5GHz)机器上使用GHC 7.0.3, gcc 4.4.6, Linux 2.6.29,对Haskell使用GHC -O2 - flvm - force-recomp编译,对C使用gcc -O3 -lm编译。

Your C routine runs in 8.4 seconds (faster than your run probably because of -O3) The Haskell solution runs in 36 seconds (due to the -O2 flag) Your factorCount' code isn't explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you). You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds. factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:

 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

没错,7.95秒。始终比C方案快半秒。没有- flvm标志,我仍然得到8.182秒,所以NCG后端在这种情况下也做得很好。

结论:Haskell非常棒。

生成的代码

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

编辑:现在我们已经探讨了这个问题,让我们来解决问题

问题1:erlang、python和haskell是否会因为使用 任意长度的整数,只要值更小 比MAXINT ?

在Haskell中,使用Integer比Int慢,但慢多少取决于执行的计算。幸运的是(对于64位机器)Int就足够了。出于可移植性的考虑,你可能应该重写我的代码,使用Int64或Word64 (C不是唯一的语言长)。

问题2:为什么haskell这么慢?有编译器标志吗 关闭刹车还是我的实现?(后者相当 就像haskell对我来说是一本有七个印章的书一样。) 问题3:你能给我一些建议吗 实现而不改变我确定因子的方式? 以任何方式优化:更好、更快、更“原生”的语言。

这就是我上面所回答的。答案是

0)通过-O2进行优化 1)尽可能使用快速(特别是不可装箱的)类型 2) rem not mod(一个经常被遗忘的优化)和 3)工人/包装器转换(可能是最常见的优化)。

问题4:我的功能实现是否允许LCO,因此 避免添加不必要的帧到调用堆栈?

是的,这不是问题所在。干得好,很高兴你考虑到这一点。

c++ 11, < 20ms for me -在这里运行它

我理解您想要一些技巧来帮助提高您的语言特定知识,但由于这里已经很好地介绍了这一点,我想我应该为那些可能看过您的问题的mathematica注释等并想知道为什么这段代码如此之慢的人添加一些上下文。

这个答案主要是为了提供上下文,希望能够帮助人们更容易地评估您的问题/其他答案中的代码。

这段代码只使用了一些(丑陋的)优化,与所使用的语言无关,基于:

每个三角数的形式都是n(n+1)/2 N和N +1是互质 除数的数量是一个乘法函数

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

我的台式机平均花费19毫秒,笔记本电脑平均花费80毫秒,这与我在这里看到的大多数其他代码相差甚远。毫无疑问,还有许多优化方法可用。

我把“Jannich Brendle”版本改成了1000,而不是500。并列出euler12.bin, euler12.bin的结果。话务量,p12dist.erl。两个erl代码都使用'+native'进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

尝试:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

我得到:

原始版本:9.1690 100% Go: 8.2520 111%

但使用:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

我得到:

原始版本:9.1690 100% Thaumkid的c版本:0.1060 8650% 首发版本:8.2520 111% 第二围棋版本:0.0230 39865%

我还尝试了Python3.6和pypy3.3-5.5-alpha:

原版本:8.629 100% Thaumkid的c版本:0.109 7916% python: 54.795 16% Pypy3.3-5.5-alpha: 13.291 65%

然后用下面的代码我得到:

原版本:8.629 100% Thaumkid的c版本:0.109 8650% Python3.6: 1.489 580% Pypy3.3-5.5-alpha: 0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)