在C语言中使用移位运算符的乘法和除法真的更快吗?

例如，乘法和除法可以使用位运算符来实现

i*2 = i<<1
i*3 = (i<<1) + i;
i*10 = (i<<3) + (i<<1)

等等。

用say (i<<3)+(i<<1)乘10是否比直接用i*10更快?有没有什么输入是不能用这种方法乘或除的?

当前回答

I think in the one case that you want to multiply or divide by a power of two, you can't go wrong with using bitshift operators, even if the compiler converts them to a MUL/DIV, because some processors microcode (really, a macro) them anyway, so for those cases you will achieve an improvement, especially if the shift is more than 1. Or more explicitly, if the CPU has no bitshift operators, it will be a MUL/DIV anyway, but if the CPU has bitshift operators, you avoid a microcode branch and this is a few instructions less.

I am writing some code right now that requires a lot of doubling/halving operations because it is working on a dense binary tree, and there is one more operation that I suspect might be more optimal than an addition - a left (power of two multiply) shift with an addition. This can be replaced with a left shift and an xor if the shift is wider than the number of bits you want to add, easy example is (i<<1)^1, which adds one to a doubled value. This does not of course apply to a right shift (power of two divide) because only a left (little endian) shift fills the gap with zeros.

在我的代码中，这些乘/除2和2的幂运算被大量使用，因为公式已经很短了，每条可以消除的指令都可以获得很大的收益。如果处理器不支持这些位移操作符，就不会有增益，也不会有损失。

Also, in the algorithms I am writing, they visually represent the movements that occur so in that sense they are in fact more clear. The left hand side of a binary tree is bigger, and the right is smaller. As well as that, in my code, odd and even numbers have a special significance, and all left-hand children in the tree are odd and all right hand children, and the root, are even. In some cases, which I haven't encountered yet, but may, oh, actually, I didn't even think of this, x&1 may be a more optimal operation compared to x%2. x&1 on an even number will produce zero, but will produce 1 for an odd number.

再深入一点，如果x和3是0，我就知道4是这个数的因数，x%7是8，以此类推。我知道这些情况可能有有限的效用，但很高兴知道你可以避免模运算而使用按位逻辑运算，因为按位运算几乎总是最快的，而且对编译器来说不太可能是模糊的。

我在很大程度上发明了密集二叉树的领域，所以我预计人们可能不会理解这个评论的价值，因为很少有人想只对2的幂进行因数分解，或者只对2的幂进行乘/除。

2018-04-06 11:08:41

其他回答

Shift和整数乘法指令在大多数现代cpu上具有相似的性能——在20世纪80年代，整数乘法指令相对较慢，但通常情况下不再是这样。整数乘法指令可能有更高的延迟，所以仍然可能有移位更可取的情况。同样的情况下，你可以让更多的执行单元忙(尽管这是有利有弊)。

整数除法仍然相对较慢，所以使用shift代替2的幂除法仍然是一种胜利，大多数编译器将其作为一种优化来实现。但是请注意，要使这种优化有效，红利需要是无符号的，或者必须已知是正的。对于负红利，移位和除法是不相等的!

#include <stdio.h>

int main(void)
{
    int i;

    for (i = 5; i >= -5; --i)
    {
        printf("%d / 2 = %d, %d >> 1 = %d\n", i, i / 2, i, i >> 1);
    }
    return 0;
}

输出:

5 / 2 = 2, 5 >> 1 = 2
4 / 2 = 2, 4 >> 1 = 2
3 / 2 = 1, 3 >> 1 = 1
2 / 2 = 1, 2 >> 1 = 1
1 / 2 = 0, 1 >> 1 = 0
0 / 2 = 0, 0 >> 1 = 0
-1 / 2 = 0, -1 >> 1 = -1
-2 / 2 = -1, -2 >> 1 = -1
-3 / 2 = -1, -3 >> 1 = -2
-4 / 2 = -2, -4 >> 1 = -2
-5 / 2 = -2, -5 >> 1 = -3

所以如果你想帮助编译器，那么确保变量或表达式在被除数显式无符号。

2011-06-15 11:42:54

有些优化编译器无法做到，因为它们只适用于减少的输入集。

下面是c++示例代码，可以执行更快的除法，执行64位“乘倒数”。分子和分母都必须低于某个阈值。注意，它必须被编译为使用64位指令才能比普通除法更快。

#include <stdio.h>
#include <chrono>

static const unsigned s_bc = 32;
static const unsigned long long s_p = 1ULL << s_bc;
static const unsigned long long s_hp = s_p / 2;

static unsigned long long s_f;
static unsigned long long s_fr;

static void fastDivInitialize(const unsigned d)
{
    s_f = s_p / d;
    s_fr = s_f * (s_p - (s_f * d));
}

static unsigned fastDiv(const unsigned n)
{
    return (s_f * n + ((s_fr * n + s_hp) >> s_bc)) >> s_bc;
}

static bool fastDivCheck(const unsigned n, const unsigned d)
{
    // 32 to 64 cycles latency on modern cpus
    const unsigned expected = n / d;

    // At least 10 cycles latency on modern cpus
    const unsigned result = fastDiv(n);

    if (result != expected)
    {
        printf("Failed for: %u/%u != %u\n", n, d, expected);
        return false;
    }

    return true;
}

int main()
{
    unsigned result = 0;

    // Make sure to verify it works for your expected set of inputs
    const unsigned MAX_N = 65535;
    const unsigned MAX_D = 40000;

    const double ONE_SECOND_COUNT = 1000000000.0;

    auto t0 = std::chrono::steady_clock::now();
    unsigned count = 0;
    printf("Verifying...\n");
    for (unsigned d = 1; d <= MAX_D; ++d)
    {
        fastDivInitialize(d);
        for (unsigned n = 0; n <= MAX_N; ++n)
        {
            count += !fastDivCheck(n, d);
        }
    }
    auto t1 = std::chrono::steady_clock::now();
    printf("Errors: %u / %u (%.4fs)\n", count, MAX_D * (MAX_N + 1), (t1 - t0).count() / ONE_SECOND_COUNT);

    t0 = t1;
    for (unsigned d = 1; d <= MAX_D; ++d)
    {
        fastDivInitialize(d);
        for (unsigned n = 0; n <= MAX_N; ++n)
        {
            result += fastDiv(n);
        }
    }
    t1 = std::chrono::steady_clock::now();
    printf("Fast division time: %.4fs\n", (t1 - t0).count() / ONE_SECOND_COUNT);

    t0 = t1;
    count = 0;
    for (unsigned d = 1; d <= MAX_D; ++d)
    {
        for (unsigned n = 0; n <= MAX_N; ++n)
        {
            result += n / d;
        }
    }
    t1 = std::chrono::steady_clock::now();
    printf("Normal division time: %.4fs\n", (t1 - t0).count() / ONE_SECOND_COUNT);

    getchar();
    return result;
}

2017-06-03 02:15:58

In the case of signed integers and right shift vs division, it can make a difference. For negative numbers, the shift rounds rounds towards negative infinity whereas division rounds towards zero. Of course the compiler will change the division to something cheaper, but it will usually change it to something that has the same rounding behavior as division, because it is either unable to prove that the variable won't be negative or it simply doesn't care. So if you can prove that a number won't be negative or if you don't care which way it will round, you can do that optimization in a way that is more likely to make a difference.

2011-06-15 16:29:18

它是否真的更快取决于实际使用的硬件和编译器。

2019-07-28 10:22:05

我同意德鲁·霍尔的明确回答。不过，答案可能需要一些额外的注释。

对于绝大多数软件开发人员来说，处理器和编译器已经不再与问题相关。我们大多数人远远超出了8088和MS-DOS。它可能只与那些仍在开发嵌入式处理器的人有关……

在我的软件公司，Math (add/sub/mul/div)应该用于所有数学。当数据类型之间转换时应该使用Shift。字节长度为n>>8，而不是n/256。

2012-12-03 19:24:31

在C语言中使用移位运算符的乘法和除法真的更快吗?

推荐文章

最新文章

标签