了解汇编程序的原因之一是,有时可以使用汇编程序来编写比用高级语言(特别是C语言)编写的代码性能更好的代码。然而,我也听人说过很多次,尽管这并非完全错误,但实际上可以使用汇编程序来生成性能更好的代码的情况极其罕见,并且需要汇编方面的专业知识和经验。

这个问题甚至没有涉及到这样一个事实,即汇编程序指令将是特定于机器的、不可移植的,或者汇编程序的任何其他方面。当然,除了这一点之外,了解汇编还有很多很好的理由,但这是一个需要示例和数据的具体问题,而不是关于汇编程序与高级语言的扩展论述。

谁能提供一些具体的例子,说明使用现代编译器汇编代码比编写良好的C代码更快,并且您能否用分析证据支持这一说法?我相信这些案例确实存在,但我真的很想知道这些案例到底有多深奥,因为这似乎是一个有争议的问题。


当前回答

这很难具体地回答,因为这个问题非常不具体:到底什么是“现代编译器”?

理论上,几乎任何手动的汇编器优化都可以由编译器来完成——实际上它是否已经完成,不能笼统地说,只能说特定编译器的特定版本。许多可能需要花费大量的精力来确定它们是否可以在特定的上下文中应用而不产生副作用,以至于编译器编写者不会为它们烦恼。

其他回答

在运行时创建机器代码怎么样?

我的兄弟曾经(大约在2000年)通过在运行时生成代码实现了一个非常快速的实时光线跟踪器。我不记得细节了,但有一些主模块是通过对象循环的,然后它准备和执行一些特定于每个对象的机器代码。

然而,随着时间的推移,这种方法被新的图形硬件淘汰,变得毫无用处。

今天,我认为大数据(数百万条记录)上的一些操作,如数据透视表、钻孔、实时计算等,都可以用这种方法进行优化。问题是:这样的努力值得吗?

我不能给出具体的例子,因为那是很多年前的事情了,但是在很多情况下,手工编写的汇编程序可以胜过任何编译器。原因:

您可以偏离调用约定,在寄存器中传递参数。 您可以仔细考虑如何使用寄存器,避免将变量存储在内存中。 对于跳转表之类的东西,可以避免检查索引的边界。

基本上,编译器在优化方面做得很好,这几乎总是“足够好”,但在某些情况下(如图形渲染),你要为每一个周期付出高昂的代价,你可以走捷径,因为你知道代码,而编译器不能,因为它必须在安全的方面。

事实上,我听说过一些图形渲染代码,其中一个例程,如直线绘制或多边形填充例程,实际上在堆栈上生成了一小块机器代码并在那里执行,以避免关于线条样式、宽度、模式等的连续决策。

也就是说,我想让编译器为我生成好的汇编代码,但又不太聪明,它们通常都是这样做的。事实上,我讨厌Fortran的一个原因是它为了“优化”而打乱代码,通常没有什么重要的目的。

通常,当应用程序出现性能问题时,都是由于浪费的设计造成的。这些天,我永远不会推荐汇编程序的性能,除非整个应用程序已经在它的生命周期内进行了调优,仍然不够快,并且把所有的时间都花在了紧凑的内部循环中。

补充:我见过很多用汇编语言编写的应用程序,与C、Pascal、Fortran等语言相比,汇编语言的主要速度优势是因为程序员在用汇编语言编码时要谨慎得多。他或她每天要写大约100行代码,不管哪种语言,在编译器语言中,这将等于3或400条指令。

Walter Bright的《optimization Immutable and Purity》可能值得一看,它不是一个概要测试,但向您展示了手写和编译器生成ASM之间的区别。Walter Bright写优化编译器,所以值得一看他的其他博客文章。

尽管C语言“接近”于对8位、16位、32位和64位数据的低级操作,但仍有一些C语言不支持的数学操作通常可以在某些汇编指令集中优雅地执行:

Fixed-point multiplication: The product of two 16-bit numbers is a 32-bit number. But the rules in C says that the product of two 16-bit numbers is a 16-bit number, and the product of two 32-bit numbers is a 32-bit number -- the bottom half in both cases. If you want the top half of a 16x16 multiply or a 32x32 multiply, you have to play games with the compiler. The general method is to cast to a larger-than-necessary bit width, multiply, shift down, and cast back: int16_t x, y; // int16_t is a typedef for "short" // set x and y to something int16_t prod = (int16_t)(((int32_t)x*y)>>16);` In this case the compiler may be smart enough to know that you're really just trying to get the top half of a 16x16 multiply and do the right thing with the machine's native 16x16multiply. Or it may be stupid and require a library call to do the 32x32 multiply that's way overkill because you only need 16 bits of the product -- but the C standard doesn't give you any way to express yourself. Certain bitshifting operations (rotation/carries): // 256-bit array shifted right in its entirety: uint8_t x[32]; for (int i = 32; --i > 0; ) { x[i] = (x[i] >> 1) | (x[i-1] << 7); } x[0] >>= 1; This is not too inelegant in C, but again, unless the compiler is smart enough to realize what you are doing, it's going to do a lot of "unnecessary" work. Many assembly instruction sets allow you to rotate or shift left/right with the result in the carry register, so you could accomplish the above in 34 instructions: load a pointer to the beginning of the array, clear the carry, and perform 32 8-bit right-shifts, using auto-increment on the pointer. For another example, there are linear feedback shift registers (LFSR) that are elegantly performed in assembly: Take a chunk of N bits (8, 16, 32, 64, 128, etc), shift the whole thing right by 1 (see above algorithm), then if the resulting carry is 1 then you XOR in a bit pattern that represents the polynomial.

尽管如此,除非有严重的性能限制,否则我不会求助于这些技术。正如其他人所说,汇编代码比C代码更难记录/调试/测试/维护:性能的提高伴随着一些严重的代价。

编辑:3。溢出检测在汇编中是可能的(在C中不能真正做到),这使得一些算法更容易。

Actually you can build large scale programs in a large model mode segaments may be restricted to 64kb code but you can write many segaments, people give the argument against ASM as it is an old language and we don't need to preserve memory anymore, If that were the case why would we be packing our PC's with memory, the only Flaw I can find with ASM is that it is more or less Processor based so most programs written for the intel architecture Most likely would not run on An AMD Architecture. As for C being faster than ASM there is no language faster than ASM and ASM can do many thing's C and other HLL's can not do at processor level. ASM is a difficult language to learn but once you learn it no HLL can translate it better than you. If you could only see some of the things HLL's Do to you code, and understand what it is doing, you would wonder why More people don't use ASM and why assembers are no longer being updated ( For general public use anyway). So no C is not faster than ASM. Even experiences C++ programmers still use and write code Chunks in ASM added to there C++ code for speed. Other Languages Also that some people think are obsolete or possibly no good is a myth at times for instance Photoshop is written in Pascal/ASM 1st release of souce has been submitted to the technical history museum, and paintshop pro is written still written in Python,TCL and ASM ... a common denominator of these to "Fast and Great image processors is ASM, although photoshop may have Upgraded to delphi now it is still pascal. and any speed problems are comming from pascal but this is because we like the way programs look and not what they do now days. I would like to make a Photoshop Clone in pure ASM which I have been working on and its comming along rather well. not code,interpret,arange,rewwrite,etc.... Just code and go process complete.