在这个网站上已经有很多性能问题了,但是在我看来,几乎所有的问题都是非常具体的,而且相当狭窄。几乎所有人都重复了避免过早优化的建议。

我们假设:

代码已经正常工作了 所选择的算法对于问题的环境已经是最优的 对代码进行了测量,并隔离了有问题的例程 所有优化的尝试也将被衡量,以确保它们不会使事情变得更糟

我在这里寻找的是策略和技巧,在一个关键算法中,当没有其他事情可做,但无论如何都要挤出最后百分之几。

理想情况下,尽量让答案与语言无关,并在适用的情况下指出所建议的策略的任何缺点。

我将添加一个带有我自己最初建议的回复,并期待Stack Overflow社区能想到的任何其他东西。


当前回答

当你不能再提高表现时,看看你是否可以提高感知的表现。

您可能无法使您的fooCalc算法更快,但通常有一些方法可以使您的应用程序对用户的响应更快。

举几个例子:

预测用户将要做什么 请求并开始着手这项工作 在那之前 将结果显示为 它们是进来的,而不是同时出现的 在最后 精确的进度计

这些不会让你的程序更快,但可能会让你的用户对你的速度更满意。

其他回答

我想这已经用不同的方式说过了。但是当你在处理一个处理器密集型算法时,你应该以牺牲其他所有东西为代价来简化最内部循环中的所有东西。

That may seem obvious to some, but it's something I try to focus on regardless of the language I'm working with. If you're dealing with nested loops, for example, and you find an opportunity to take some code down a level, you can in some cases drastically speed up your code. As another example, there are the little things to think about like working with integers instead of floating point variables whenever you can, and using multiplication instead of division whenever you can. Again, these are things that should be considered for your most inner loop.

有时,您可能会发现在内循环中对整数执行数学运算的好处,然后将其缩小为随后可以使用的浮点变量。这是一个牺牲一个部分的速度来提高另一个部分的速度的例子,但在某些情况下,这样做是值得的。

我花了一些时间优化在低带宽和长延迟网络(例如卫星、远程、离岸)上运行的客户端/服务器业务系统,并能够通过相当可重复的过程实现一些显著的性能改进。

Measure: Start by understanding the network's underlying capacity and topology. Talking to the relevant networking people in the business, and make use of basic tools such as ping and traceroute to establish (at a minimum) the network latency from each client location, during typical operational periods. Next, take accurate time measurements of specific end user functions that display the problematic symptoms. Record all of these measurements, along with their locations, dates and times. Consider building end-user "network performance testing" functionality into your client application, allowing your power users to participate in the process of improvement; empowering them like this can have a huge psychological impact when you're dealing with users frustrated by a poorly performing system. Analyze: Using any and all logging methods available to establish exactly what data is being transmitted and received during the execution of the affected operations. Ideally, your application can capture data transmitted and received by both the client and the server. If these include timestamps as well, even better. If sufficient logging isn't available (e.g. closed system, or inability to deploy modifications into a production environment), use a network sniffer and make sure you really understand what's going on at the network level. Cache: Look for cases where static or infrequently changed data is being transmitted repetitively and consider an appropriate caching strategy. Typical examples include "pick list" values or other "reference entities", which can be surprisingly large in some business applications. In many cases, users can accept that they must restart or refresh the application to update infrequently updated data, especially if it can shave significant time from the display of commonly used user interface elements. Make sure you understand the real behaviour of the caching elements already deployed - many common caching methods (e.g. HTTP ETag) still require a network round-trip to ensure consistency, and where network latency is expensive, you may be able to avoid it altogether with a different caching approach. Parallelise: Look for sequential transactions that don't logically need to be issued strictly sequentially, and rework the system to issue them in parallel. I dealt with one case where an end-to-end request had an inherent network delay of ~2s, which was not a problem for a single transaction, but when 6 sequential 2s round trips were required before the user regained control of the client application, it became a huge source of frustration. Discovering that these transactions were in fact independent allowed them to be executed in parallel, reducing the end-user delay to very close to the cost of a single round trip. Combine: Where sequential requests must be executed sequentially, look for opportunities to combine them into a single more comprehensive request. Typical examples include creation of new entities, followed by requests to relate those entities to other existing entities. Compress: Look for opportunities to leverage compression of the payload, either by replacing a textual form with a binary one, or using actual compression technology. Many modern (i.e. within a decade) technology stacks support this almost transparently, so make sure it's configured. I have often been surprised by the significant impact of compression where it seemed clear that the problem was fundamentally latency rather than bandwidth, discovering after the fact that it allowed the transaction to fit within a single packet or otherwise avoid packet loss and therefore have an outsize impact on performance. Repeat: Go back to the beginning and re-measure your operations (at the same locations and times) with the improvements in place, record and report your results. As with all optimisation, some problems may have been solved exposing others that now dominate.

In the steps above, I focus on the application related optimisation process, but of course you must ensure the underlying network itself is configured in the most efficient manner to support your application too. Engage the networking specialists in the business and determine if they're able to apply capacity improvements, QoS, network compression, or other techniques to address the problem. Usually, they will not understand your application's needs, so it's important that you're equipped (after the Analyse step) to discuss it with them, and also to make the business case for any costs you're going to be asking them to incur. I've encountered cases where erroneous network configuration caused the applications data to be transmitted over a slow satellite link rather than an overland link, simply because it was using a TCP port that was not "well known" by the networking specialists; obviously rectifying a problem like this can have a dramatic impact on performance, with no software code or configuration changes necessary at all.

谷歌方法是一个选项“缓存它..”只要可能,不要碰磁盘。”

分而治之

如果正在处理的数据集太大,则对其中的大块进行循环。如果代码编写正确,实现应该很容易。如果您有一个单片程序,现在您就更清楚了。

以下是我使用的一些快速而粗糙的优化技术。我认为这是“第一关”优化。

了解时间都花在了什么地方。是文件IO吗?是CPU时间吗?是因为网络吗?是数据库吗?如果IO不是瓶颈,优化IO是没有用的。

了解您的环境了解在哪里进行优化通常取决于开发环境。例如,在VB6中,通过引用传递比通过值传递慢,但是在C和c++中,通过引用传递要快得多。在C语言中,如果返回代码表明失败,尝试一些东西并做一些不同的事情是合理的,而在Dot Net中,捕获异常比尝试前检查有效条件要慢得多。

在频繁查询的数据库字段上构建索引。你几乎总是可以用空间来换取速度。

在要优化的循环内部,我避免了必须进行任何查找。找到循环外的偏移量和/或索引,并重用循环内的数据。

最小化IO尝试以一种减少必须读或写的次数的方式进行设计,特别是在网络连接上

减少抽象代码必须通过的抽象层越多,它就越慢。在关键循环内部,减少抽象(例如,揭示避免额外代码的低级方法)

对于带有用户界面的项目,生成一个新线程来执行较慢的任务使应用程序感觉反应更快,尽管不是。

你通常可以用空间来换取速度。如果有计算或其他密集的操作,看看是否可以在进入关键循环之前预先计算一些信息。