最后的性能优化策略

在这个网站上已经有很多性能问题了，但是在我看来，几乎所有的问题都是非常具体的，而且相当狭窄。几乎所有人都重复了避免过早优化的建议。

我们假设:

代码已经正常工作了所选择的算法对于问题的环境已经是最优的对代码进行了测量，并隔离了有问题的例程所有优化的尝试也将被衡量，以确保它们不会使事情变得更糟

我在这里寻找的是策略和技巧，在一个关键算法中，当没有其他事情可做，但无论如何都要挤出最后百分之几。

理想情况下，尽量让答案与语言无关，并在适用的情况下指出所建议的策略的任何缺点。

我将添加一个带有我自己最初建议的回复，并期待Stack Overflow社区能想到的任何其他东西。

当前回答

内联例程(消除调用/返回和参数推送) 试着用表查找(如果它们更快的话)消除测试/开关展开循环(Duff的设备)到刚好适合CPU缓存的位置本地化内存访问，以免耗尽缓存如果优化器还没有本地化相关的计算如果优化器还没有这样做，就消除循环不变量

2009-05-29 15:05:04

其他回答

向它扔更多的硬件!

2009-05-29 14:32:26

很难对这个问题给出一般的答案。这实际上取决于你的问题领域和技术实现。一种与语言无关的通用技术:识别无法消除的代码热点，并手工优化汇编代码。

2009-05-29 14:32:36

缓存!要使几乎任何事情都变得更快，一个便宜的方法(在程序员的努力中)是在程序的任何数据移动区域添加缓存抽象层。无论是I/O还是只是传递/创建对象或结构。通常，向工厂类和读取器/写入器添加缓存是很容易的。

有时缓存不会给你带来太多好处，但这是一种简单的方法，只需添加缓存，然后在没有帮助的地方禁用它。我经常发现这样做可以获得巨大的性能，而无需对代码进行微观分析。

2009-09-13 16:07:25

首先，正如前面几个回答中提到的，了解是什么影响了您的性能——是内存、处理器、网络、数据库还是其他东西。这取决于…

...if it's memory - find one of the books written long time ago by Knuth, one of "The Art of Computer Programming" series. Most likely it's one about sorting and search - if my memory is wrong then you'll have to find out in which he talks about how to deal with slow tape data storage. Mentally transform his memory/tape pair into your pair of cache/main memory (or in pair of L1/L2 cache) respectively. Study all the tricks he describes - if you don's find something that solves your problem, then hire professional computer scientist to conduct a professional research. If your memory issue is by chance with FFT (cache misses at bit-reversed indexes when doing radix-2 butterflies) then don't hire a scientist - instead, manually optimize passes one-by-one until you're either win or get to dead end. You mentioned squeeze out up to the last few percent right? If it's few indeed you'll most likely win. ...if it's processor - switch to assembly language. Study processor specification - what takes ticks, VLIW, SIMD. Function calls are most likely replaceable tick-eaters. Learn loop transformations - pipeline, unroll. Multiplies and divisions might be replaceable / interpolated with bit shifts (multiplies by small integers might be replaceable with additions). Try tricks with shorter data - if you're lucky one instruction with 64 bits might turn out replaceable with two on 32 or even 4 on 16 or 8 on 8 bits go figure. Try also longer data - eg your float calculations might turn out slower than double ones at particular processor. If you have trigonometric stuff, fight it with pre-calculated tables; also keep in mind that sine of small value might be replaced with that value if loss of precision is within allowed limits. ...if it's network - think of compressing data you pass over it. Replace XML transfer with binary. Study protocols. Try UDP instead of TCP if you can somehow handle data loss. ...if it's database, well, go to any database forum and ask for advice. In-memory data-grid, optimizing query plan etc etc etc.

HTH:)

2011-07-29 03:28:14

最后几个%是一个非常CPU和应用程序依赖的东西....

缓存架构不同，有些芯片有片上内存你可以直接映射，ARM的(有时)有一个矢量单位，SH4是一个有用的矩阵操作码。有GPU吗也许一个着色器是可行的。TMS320非常对循环中的分支敏感(因此分离循环和如果可能的话，将条件移到室外)。

名单在....上但这类事情真的是最后的手段……

编译x86，并运行Valgrind/Cachegrind对代码进行适当的性能分析。或者德州仪器的 CCStudio有一个贴心的侧写器。然后你就知道在哪里了关注……

2009-08-10 23:59:47

最后的性能优化策略

推荐文章

最新文章

标签