在C语言中使用移位运算符的乘法和除法真的更快吗?

例如，乘法和除法可以使用位运算符来实现

i*2 = i<<1
i*3 = (i<<1) + i;
i*10 = (i<<3) + (i<<1)

等等。

用say (i<<3)+(i<<1)乘10是否比直接用i*10更快?有没有什么输入是不能用这种方法乘或除的?

当前回答

Python测试对相同的随机数执行相同的乘法1亿次。

>>> from timeit import timeit
>>> setup_str = 'import scipy; from scipy import random; scipy.random.seed(0)'
>>> N = 10*1000*1000
>>> timeit('x=random.randint(65536);', setup=setup_str, number=N)
1.894096851348877 # Time from generating the random #s and no opperati

>>> timeit('x=random.randint(65536); x*2', setup=setup_str, number=N)
2.2799630165100098
>>> timeit('x=random.randint(65536); x << 1', setup=setup_str, number=N)
2.2616429328918457

>>> timeit('x=random.randint(65536); x*10', setup=setup_str, number=N)
2.2799630165100098
>>> timeit('x=random.randint(65536); (x << 3) + (x<<1)', setup=setup_str, number=N)
2.9485139846801758

>>> timeit('x=random.randint(65536); x // 2', setup=setup_str, number=N)
2.490908145904541
>>> timeit('x=random.randint(65536); x / 2', setup=setup_str, number=N)
2.4757170677185059
>>> timeit('x=random.randint(65536); x >> 1', setup=setup_str, number=N)
2.2316000461578369

因此，在python中做移位而不是用2的幂来做乘法/除法，会有轻微的改进(~10%用于除法;~1%的乘法)。如果它不是2的幂，可能会有相当大的放缓。

同样，这些#将根据你的处理器、编译器(或解释器——为了简单起见，在python中这样做)而改变。

和其他人一样，不要过早地优化。编写可读性非常强的代码，如果不够快就进行分析，然后尝试优化慢的部分。请记住，编译器在优化方面比您做得更好。

2011-06-15 18:23:29

其他回答

I think in the one case that you want to multiply or divide by a power of two, you can't go wrong with using bitshift operators, even if the compiler converts them to a MUL/DIV, because some processors microcode (really, a macro) them anyway, so for those cases you will achieve an improvement, especially if the shift is more than 1. Or more explicitly, if the CPU has no bitshift operators, it will be a MUL/DIV anyway, but if the CPU has bitshift operators, you avoid a microcode branch and this is a few instructions less.

I am writing some code right now that requires a lot of doubling/halving operations because it is working on a dense binary tree, and there is one more operation that I suspect might be more optimal than an addition - a left (power of two multiply) shift with an addition. This can be replaced with a left shift and an xor if the shift is wider than the number of bits you want to add, easy example is (i<<1)^1, which adds one to a doubled value. This does not of course apply to a right shift (power of two divide) because only a left (little endian) shift fills the gap with zeros.

在我的代码中，这些乘/除2和2的幂运算被大量使用，因为公式已经很短了，每条可以消除的指令都可以获得很大的收益。如果处理器不支持这些位移操作符，就不会有增益，也不会有损失。

Also, in the algorithms I am writing, they visually represent the movements that occur so in that sense they are in fact more clear. The left hand side of a binary tree is bigger, and the right is smaller. As well as that, in my code, odd and even numbers have a special significance, and all left-hand children in the tree are odd and all right hand children, and the root, are even. In some cases, which I haven't encountered yet, but may, oh, actually, I didn't even think of this, x&1 may be a more optimal operation compared to x%2. x&1 on an even number will produce zero, but will produce 1 for an odd number.

再深入一点，如果x和3是0，我就知道4是这个数的因数，x%7是8，以此类推。我知道这些情况可能有有限的效用，但很高兴知道你可以避免模运算而使用按位逻辑运算，因为按位运算几乎总是最快的，而且对编译器来说不太可能是模糊的。

我在很大程度上发明了密集二叉树的领域，所以我预计人们可能不会理解这个评论的价值，因为很少有人想只对2的幂进行因数分解，或者只对2的幂进行乘/除。

2018-04-06 11:08:41

只是一个具体的衡量点:许多年前，我对两个进行了基准测试我的哈希算法的版本:

unsigned
hash( char const* s )
{
    unsigned h = 0;
    while ( *s != '\0' ) {
        h = 127 * h + (unsigned char)*s;
        ++ s;
    }
    return h;
}

and

unsigned
hash( char const* s )
{
    unsigned h = 0;
    while ( *s != '\0' ) {
        h = (h << 7) - h + (unsigned char)*s;
        ++ s;
    }
    return h;
}

在我对它进行基准测试的每台机器上，第一台机器的速度至少和第二。有些令人惊讶的是，它有时更快(例如在一个 Sun Sparc)。当硬件不支持快速乘法(和大多数当时没有)，编译器将转换乘法转换成移位和加/减的适当组合。因为它知道了最终的目标，它有时可以在少于指令的情况下这样做当你明确地写出移位和加法/减法时。

请注意，这是15年前的事了。希望编译器从那以后就越来越好了，所以你可以指望编译器做正确的事情，可能比你做的更好。(另外, 这段代码看起来如此C'ish的原因是因为它是15年前的事情了。显然，我今天会使用std::string和迭代器。)

2011-06-15 12:35:57

它是否真的更快取决于实际使用的硬件和编译器。

2019-07-28 10:22:05

不要这样做，除非你绝对需要这样做，并且你的代码意图是移位而不是乘法/除法。

在典型的日子里，你可能会节省一些机器周期(或松弛，因为编译器更知道优化什么)，但成本并不值得——你把时间花在小细节上而不是实际的工作上，维护代码变得更加困难，你的同事会诅咒你。

对于高负载计算，您可能需要这样做，其中每个节省的周期意味着几分钟的运行时。但是，您应该一次优化一个地方，并每次都进行性能测试，看看您是否真的使它更快了，还是破坏了编译器逻辑。

2011-06-15 13:48:49

我同意德鲁·霍尔的明确回答。不过，答案可能需要一些额外的注释。

对于绝大多数软件开发人员来说，处理器和编译器已经不再与问题相关。我们大多数人远远超出了8088和MS-DOS。它可能只与那些仍在开发嵌入式处理器的人有关……

在我的软件公司，Math (add/sub/mul/div)应该用于所有数学。当数据类型之间转换时应该使用Shift。字节长度为n>>8，而不是n/256。

2012-12-03 19:24:31