我正在寻找最快的方法来获得π的值,作为一个个人挑战。更具体地说,我使用的方法不涉及使用#define常量M_PI,或硬编码的数字。
下面的程序测试了我所知道的各种方法。从理论上讲,内联汇编版本是最快的选择,尽管显然不能移植。我将它作为一个基准,与其他版本进行比较。在我的测试中,使用内置函数,4 * atan(1)版本在GCC 4.2上是最快的,因为它自动将atan(1)折叠成一个常量。通过指定-fno-builtin, atan2(0, -1)版本是最快的。
下面是主要的测试程序(pitimes.c):
#include <math.h>
#include <stdio.h>
#include <time.h>
#define ITERS 10000000
#define TESTWITH(x) { \
diff = 0.0; \
time1 = clock(); \
for (i = 0; i < ITERS; ++i) \
diff += (x) - M_PI; \
time2 = clock(); \
printf("%s\t=> %e, time => %f\n", #x, diff, diffclock(time2, time1)); \
}
static inline double
diffclock(clock_t time1, clock_t time0)
{
return (double) (time1 - time0) / CLOCKS_PER_SEC;
}
int
main()
{
int i;
clock_t time1, time2;
double diff;
/* Warmup. The atan2 case catches GCC's atan folding (which would
* optimise the ``4 * atan(1) - M_PI'' to a no-op), if -fno-builtin
* is not used. */
TESTWITH(4 * atan(1))
TESTWITH(4 * atan2(1, 1))
#if defined(__GNUC__) && (defined(__i386__) || defined(__amd64__))
extern double fldpi();
TESTWITH(fldpi())
#endif
/* Actual tests start here. */
TESTWITH(atan2(0, -1))
TESTWITH(acos(-1))
TESTWITH(2 * asin(1))
TESTWITH(4 * atan2(1, 1))
TESTWITH(4 * atan(1))
return 0;
}
内联汇编的东西(fldpi.c)只适用于x86和x64系统:
double
fldpi()
{
double pi;
asm("fldpi" : "=t" (pi));
return pi;
}
和一个构建脚本,构建我正在测试的所有配置(build.sh):
#!/bin/sh
gcc -O3 -Wall -c -m32 -o fldpi-32.o fldpi.c
gcc -O3 -Wall -c -m64 -o fldpi-64.o fldpi.c
gcc -O3 -Wall -ffast-math -m32 -o pitimes1-32 pitimes.c fldpi-32.o
gcc -O3 -Wall -m32 -o pitimes2-32 pitimes.c fldpi-32.o -lm
gcc -O3 -Wall -fno-builtin -m32 -o pitimes3-32 pitimes.c fldpi-32.o -lm
gcc -O3 -Wall -ffast-math -m64 -o pitimes1-64 pitimes.c fldpi-64.o -lm
gcc -O3 -Wall -m64 -o pitimes2-64 pitimes.c fldpi-64.o -lm
gcc -O3 -Wall -fno-builtin -m64 -o pitimes3-64 pitimes.c fldpi-64.o -lm
除了在各种编译器标志之间进行测试(我也比较了32位和64位,因为优化是不同的),我还尝试切换测试的顺序。但是,atan2(0, -1)版本在每次测试中仍然名列前茅。
基本上是C版本的回形针优化器的答案,并且更加简化:
#include <stdio.h>
#include <math.h>
double calc_PI(int K) {
static const int A = 545140134;
static const int B = 13591409;
static const int D = 640320;
const double ID3 = 1.0 / ((double) D * (double) D * (double) D);
double sum = 0.0;
double b = sqrt(ID3);
long long int p = 1;
long long int a = B;
sum += (double) p * (double) a * b;
for (int k = 1; k < K; ++k) {
a += A;
b *= ID3;
p *= (6 * k) * (6 * k - 1) * (6 * k - 2) * (6 * k - 3) * (6 * k - 4) * (6 * k - 5);
p /= (3 * k) * (3 * k - 1) * (3 * k - 2) * k * k * k;
p = -p;
sum += (double) p * (double) a * b;
}
return 1.0 / (12 * sum);
}
int main() {
for (int k = 1; k <= 5; ++k) {
printf("k = %i, PI = %.16f\n", k, calc_PI(k));
}
}
但为了更简化,这个算法采用Chudnovsky公式,如果你不太理解代码,我可以完全简化这个公式。
Summary: We will get a number from 1 to 5 and add it in to a function we will use to get PI. Then 3 numbers are given to you: 545140134 (A), 13591409 (B), 640320 (D). Then we will use D as a double multiplying itself 3 times into another double (ID3). We will then take the square root of ID3 into another double (b) and assign 2 numbers: 1 (p), the value of B (a). Take note that C is case-insensitive. Then a double (sum) will be created by multiplying the value's of p, a and b, all in doubles. Then a loop up until the number given for the function will start and add up A's value to a, b's value gets multiplied by ID3, p's value will be multiplied by multiple values that I hope you can understand and also gets divided by multiple values as well. The sum will add up by p, a and b once again and the loop will repeat until the value of the loop's number is greater or equal to 5. Later, the sum is multiplied by 12 and returned by the function giving us the result of PI.
好吧,这很长,但我想你会理解的……
更好的方法
要获得标准常数(如pi)或标准概念的输出,我们应该首先使用所使用语言中可用的内置方法。它将以最快和最好的方式返回一个值。我正在使用python以最快的方式运行,以获得圆周率的值。
数学库的PI变量。数学库将变量pi存储为常数。
math_pi.py
import math
print math.pi
使用linux /usr/bin/time -v python math_pi.py的time工具运行脚本
输出:
Command being timed: "python math_pi.py"
User time (seconds): 0.01
System time (seconds): 0.01
Percent of CPU this job got: 91%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
用arccos的数学方法
acos_pi.py
import math
print math.acos(-1)
使用linux /usr/bin/time -v python acos_pi.py的time工具运行脚本
输出:
Command being timed: "python acos_pi.py"
User time (seconds): 0.02
System time (seconds): 0.01
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
使用BBP公式
bbp_pi.py
from decimal import Decimal, getcontext
getcontext().prec=100
print sum(1/Decimal(16)**k *
(Decimal(4)/(8*k+1) -
Decimal(2)/(8*k+4) -
Decimal(1)/(8*k+5) -
Decimal(1)/(8*k+6)) for k in range(100))
使用linux /usr/bin/time -v python bbp_pi.py的time工具运行脚本
输出:
Command being timed: "python c.py"
User time (seconds): 0.05
System time (seconds): 0.01
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.06
因此,最好的方法是使用语言提供的内置方法,因为它们是获得输出的最快和最好的方法。在python中使用math.pi