我需要一个简单的浮点舍入函数,这样:

double round(double);

round(0.1) = 0
round(-0.1) = 0
round(-0.9) = -1

我可以在math.h中找到ceil()和floor() -但没有round()。

它是否以另一个名字出现在标准c++库中,或者它是否丢失了??


当前回答

如果你最终想要将round()函数的双输出转换为int型,那么这个问题的可接受的解决方案将如下所示:

int roundint(double r) {
  return (int)((r > 0.0) ? floor(r + 0.5) : ceil(r - 0.5));
}

当传入均匀随机值时,在我的机器上大约为8.88 ns。

据我所知,下面的功能是等效的,但在我的机器上是2.48 ns,具有显著的性能优势:

int roundint (double r) {
  int tmp = static_cast<int> (r);
  tmp += (r-tmp>=.5) - (r-tmp<=-.5);
  return tmp;
}

性能更好的原因之一是跳过了分支。

其他回答

基于Kalaxy的响应,下面是一个模板化的解决方案,它将任何浮点数舍入为基于自然舍入的最接近的整数类型。如果值超出了整数类型的范围,它还会在调试模式下抛出一个错误,从而大致作为一个可行的库函数。

    // round a floating point number to the nearest integer
    template <typename Arg>
    int Round(Arg arg)
    {
#ifndef NDEBUG
        // check that the argument can be rounded given the return type:
        if (
            (Arg)std::numeric_limits<int>::max() < arg + (Arg) 0.5) ||
            (Arg)std::numeric_limits<int>::lowest() > arg - (Arg) 0.5)
            )
        {
            throw std::overflow_error("out of bounds");
        }
#endif

        return (arg > (Arg) 0.0) ? (int)(r + (Arg) 0.5) : (int)(r - (Arg) 0.5);
    }

值得注意的是,如果想要从舍入中得到整数结果,则不需要通过上下限或上下限。也就是说,

int round_int( double r ) {
    return (r > 0.0) ? (r + 0.5) : (r - 0.5); 
}

它通常实现为下限(值+ 0.5)。

编辑:它可能不叫四舍五入,因为我知道至少有三种四舍五入算法:四舍五入到零,四舍五入到最接近的整数,以及银行家的四舍五入。你要求的是最接近的整数。

如果你最终想要将round()函数的双输出转换为int型,那么这个问题的可接受的解决方案将如下所示:

int roundint(double r) {
  return (int)((r > 0.0) ? floor(r + 0.5) : ceil(r - 0.5));
}

当传入均匀随机值时,在我的机器上大约为8.88 ns。

据我所知,下面的功能是等效的,但在我的机器上是2.48 ns,具有显著的性能优势:

int roundint (double r) {
  int tmp = static_cast<int> (r);
  tmp += (r-tmp>=.5) - (r-tmp<=-.5);
  return tmp;
}

性能更好的原因之一是跳过了分支。

现在,使用包含C99/ c++ 11数学库的c++ 11编译器应该不是问题。但接下来的问题是:选择哪个舍入函数?

C99/ c++ 11 round()通常不是你想要的舍入函数。它使用了一种时髦的舍入模式,在一半的情况下(+-xxx.5000)舍入0作为抢七。如果你确实特别想要这种舍入模式,或者你的目标是一个round()比rint()更快的c++实现,那么就使用它(或者用这个问题的其他答案之一来模仿它的行为,从表面上看,仔细地复制特定的舍入行为)。

round()的舍入不同于IEEE754默认的舍入到最接近的模式,以偶数作为抢七。最接近偶数避免了数字平均大小的统计偏差,但确实偏向偶数。

有两个数学库舍入函数使用当前默认的舍入模式:std::nearbyint()和std::rint(),它们都是在C99/ c++ 11中添加的,所以它们在std::round()存在的任何时候都可用。唯一的区别是nearbyint从不引发FE_INEXACT。

出于性能考虑,更倾向于rint(): gcc和clang都更容易内联它,但gcc从不内联nearbyint()(即使使用- fast-math)


gcc/clang用于x86-64和AArch64

我把一些测试函数放在Matt Godbolt的编译器资源管理器上,在那里你可以看到source + asm输出(用于多个编译器)。有关阅读编译器输出的更多信息,请参阅此问答和Matt的CppCon2017演讲:“我的编译器最近为我做了什么?”打开编译器的盖子”,

In FP code, it's usually a big win to inline small functions. Especially on non-Windows, where the standard calling convention has no call-preserved registers, so the compiler can't keep any FP values in XMM registers across a call. So even if you don't really know asm, you can still easily see whether it's just a tail-call to the library function or whether it inlined to one or two math instructions. Anything that inlines to one or two instructions is better than a function call (for this particular task on x86 or ARM).

在x86上,任何内联到SSE4.1 roundsd的东西都可以使用SSE4.1 roundpd(或AVX vroundpd)自动向量化。(FP->整数转换也可用打包SIMD形式,除了FP->64位整数,它需要AVX512。)

std::nearbyint(): x86 clang: inlines to a single insn with -msse4.1. x86 gcc: inlines to a single insn only with -msse4.1 -ffast-math, and only on gcc 5.4 and earlier. Later gcc never inlines it (maybe they didn't realize that one of the immediate bits can suppress the inexact exception? That's what clang uses, but older gcc uses the same immediate as for rint when it does inline it) AArch64 gcc6.3: inlines to a single insn by default. std::rint: x86 clang: inlines to a single insn with -msse4.1 x86 gcc7: inlines to a single insn with -msse4.1. (Without SSE4.1, inlines to several instructions) x86 gcc6.x and earlier: inlines to a single insn with -ffast-math -msse4.1. AArch64 gcc: inlines to a single insn by default std::round: x86 clang: doesn't inline x86 gcc: inlines to multiple instructions with -ffast-math -msse4.1, requiring two vector constants. AArch64 gcc: inlines to a single instruction (HW support for this rounding mode as well as IEEE default and most others.) std::floor / std::ceil / std::trunc x86 clang: inlines to a single insn with -msse4.1 x86 gcc7.x: inlines to a single insn with -msse4.1 x86 gcc6.x and earlier: inlines to a single insn with -ffast-math -msse4.1 AArch64 gcc: inlines by default to a single instruction


舍入到int / long / long:

你有两个选择:使用lrint(像rint一样,但返回long,或llrint返回long long),或使用FP->FP四舍五入函数,然后以正常的方式(带截断)转换为整数类型。有些编译器的一种优化方式比另一种更好。

long l = lrint(x);

int  i = (int)rint(x);

注意int i = lrint(x)首先转换float或double -> long,然后将整型截断为int。对于超出范围的整数,这是有区别的:在c++中未定义行为,但在x86 FP -> int指令中定义良好(编译器将发出除非它在编译时看到UB,同时进行常量传播,那么它被允许使代码在执行时中断)。

On x86, an FP->integer conversion that overflows the integer produces INT_MIN or LLONG_MIN (a bit-pattern of 0x8000000 or the 64-bit equivalent, with just the sign-bit set). Intel calls this the "integer indefinite" value. (See the cvttsd2si manual entry, the SSE2 instruction that converts (with truncation) scalar double to signed integer. It's available with 32-bit or 64-bit integer destination (in 64-bit mode only). There's also a cvtsd2si (convert with current rounding mode), which is what we'd like the compiler to emit, but unfortunately gcc and clang won't do that without -ffast-math.

还要注意,从unsigned int / long到/从unsigned int / long的FP在x86上效率较低(没有AVX512)。在64位机器上转换为32位无符号是非常便宜的;只需转换为64位符号并截断即可。但除此之外,它明显变慢了。

x86 clang with/without -ffast-math -msse4.1: (int/long)rint inlines to roundsd / cvttsd2si. (missed optimization to cvtsd2si). lrint doesn't inline at all. x86 gcc6.x and earlier without -ffast-math: neither way inlines x86 gcc7 without -ffast-math: (int/long)rint rounds and converts separately (with 2 total instructions of SSE4.1 is enabled, otherwise with a bunch of code inlined for rint without roundsd). lrint doesn't inline. x86 gcc with -ffast-math: all ways inline to cvtsd2si (optimal), no need for SSE4.1. AArch64 gcc6.3 without -ffast-math: (int/long)rint inlines to 2 instructions. lrint doesn't inline AArch64 gcc6.3 with -ffast-math: (int/long)rint compiles to a call to lrint. lrint doesn't inline. This may be a missed optimization unless the two instructions we get without -ffast-math are very slow.