在这个网站上已经有很多性能问题了,但是在我看来,几乎所有的问题都是非常具体的,而且相当狭窄。几乎所有人都重复了避免过早优化的建议。

我们假设:

代码已经正常工作了 所选择的算法对于问题的环境已经是最优的 对代码进行了测量,并隔离了有问题的例程 所有优化的尝试也将被衡量,以确保它们不会使事情变得更糟

我在这里寻找的是策略和技巧,在一个关键算法中,当没有其他事情可做,但无论如何都要挤出最后百分之几。

理想情况下,尽量让答案与语言无关,并在适用的情况下指出所建议的策略的任何缺点。

我将添加一个带有我自己最初建议的回复,并期待Stack Overflow社区能想到的任何其他东西。


当前回答

分而治之

如果正在处理的数据集太大,则对其中的大块进行循环。如果代码编写正确,实现应该很容易。如果您有一个单片程序,现在您就更清楚了。

其他回答

不好说。这取决于代码的样子。如果我们可以假设代码已经存在,那么我们可以简单地查看它并从中找出如何优化它。

更好的缓存位置,循环展开,尽量消除长依赖链,以获得更好的指令级并行性。尽可能选择有条件的移动而不是分支。尽可能利用SIMD指令。

理解你的代码在做什么,理解它运行在什么硬件上。然后,决定需要做什么来提高代码的性能就变得相当简单了。这是我能想到的唯一一个真正具有普遍性的建议。

好吧,还有“在SO上显示代码,并为特定的代码段寻求优化建议”。

建议:

Pre-compute rather than re-calculate: any loops or repeated calls that contain calculations that have a relatively limited range of inputs, consider making a lookup (array or dictionary) that contains the result of that calculation for all values in the valid range of inputs. Then use a simple lookup inside the algorithm instead. Down-sides: if few of the pre-computed values are actually used this may make matters worse, also the lookup may take significant memory. Don't use library methods: most libraries need to be written to operate correctly under a broad range of scenarios, and perform null checks on parameters, etc. By re-implementing a method you may be able to strip out a lot of logic that does not apply in the exact circumstance you are using it. Down-sides: writing additional code means more surface area for bugs. Do use library methods: to contradict myself, language libraries get written by people that are a lot smarter than you or me; odds are they did it better and faster. Do not implement it yourself unless you can actually make it faster (i.e.: always measure!) Cheat: in some cases although an exact calculation may exist for your problem, you may not need 'exact', sometimes an approximation may be 'good enough' and a lot faster in the deal. Ask yourself, does it really matter if the answer is out by 1%? 5%? even 10%? Down-sides: Well... the answer won't be exact.

谷歌方法是一个选项“缓存它..”只要可能,不要碰磁盘。”

When you get to the point that you're using efficient algorithms its a question of what you need more speed or memory. Use caching to "pay" in memory for more speed or use calculations to reduce the memory footprint. If possible (and more cost effective) throw hardware at the problem - faster CPU, more memory or HD could solve the problem faster then trying to code it. Use parallelization if possible - run part of the code on multiple threads. Use the right tool for the job. some programing languages create more efficient code, using managed code (i.e. Java/.NET) speed up development but native programing languages creates faster running code. Micro optimize. Only were applicable you can use optimized assembly to speed small pieces of code, using SSE/vector optimizations in the right places can greatly increase performance.

如果你有很多高度并行的浮点运算——尤其是单精度运算——尝试使用OpenCL或(对于NVidia芯片)CUDA将其卸载到图形处理器上(如果有的话)。gpu在着色器中拥有强大的浮点计算能力,这比CPU要大得多。