我必须承认,通常我不会费心在程序中的Debug和Release配置之间切换,我通常选择Debug配置,即使程序实际上部署在客户的位置。
据我所知,如果不手动更改,这些配置之间的唯一区别是Debug定义了Debug常量,而Release检查了优化代码。
所以我的问题实际上是双重的:
这两种配置在性能上有很大差异吗?是否有任何特定类型的代码会导致性能上的巨大差异,或者它实际上并不那么重要?
是否有任何类型的代码可以在调试配置下正常运行,但在发布配置下可能失败,或者您能否确定在调试配置下测试并正常运行的代码在发布配置下也可以正常运行。
You should never release a .NET Debug build into production. It may contain ugly code to support Edit-and-Continue or who knows what else. As far as I know, this happens only in VB not C# (note: the original post is tagged C#), but it should still give reason to pause as to what Microsoft thinks they are allowed to do with a Debug build. In fact, prior to .NET 4.0, VB code leaks memory proportional to the number of instances of objects with events that you construct in support of Edit-and-Continue. (Though this is reported to be fixed per https://connect.microsoft.com/VisualStudio/feedback/details/481671/vb-classes-with-events-are-not-garbage-collected-when-debugging, the generated code looks nasty, creating WeakReference objects and adding them to a static list while holding a lock) I certainly don't want any of this kind of debugging support in a production environment!
在发布版本中,c#编译器本身并没有对发出的IL进行很大的修改。值得注意的是,它不再发出允许您在花括号上设置断点的NOP操作码。最大的一个是内置于JIT编译器中的优化器。我知道它做了以下优化:
Method inlining. A method call is replaced by the injecting the code of the method. This is a big one, it makes property accessors essentially free.
CPU register allocation. Local variables and method arguments can stay stored in a CPU register without ever (or less frequently) being stored back to the stack frame. This is a big one, notable for making debugging optimized code so difficult. And giving the volatile keyword a meaning.
Array index checking elimination. An important optimization when working with arrays (all .NET collection classes use an array internally). When the JIT compiler can verify that a loop never indexes an array out of bounds then it will eliminate the index check. Big one.
Loop unrolling. Loops with small bodies are improved by repeating the code up to 4 times in the body and looping less. Reduces the branch cost and improves the processor's super-scalar execution options.
Dead code elimination. A statement like if (false) { /.../ } gets completely eliminated. This can occur due to constant folding and inlining. Other cases is where the JIT compiler can determine that the code has no possible side-effect. This optimization is what makes profiling code so tricky.
Code hoisting. Code inside a loop that is not affected by the loop can be moved out of the loop. The optimizer of a C compiler will spend a lot more time on finding opportunities to hoist. It is however an expensive optimization due to the required data flow analysis and the jitter can't afford the time so only hoists obvious cases. Forcing .NET programmers to write better source code and hoist themselves.
Common sub-expression elimination. x = y + 4; z = y + 4; becomes z = x; Pretty common in statements like dest[ix+1] = src[ix+1]; written for readability without introducing a helper variable. No need to compromise readability.
Constant folding. x = 1 + 2; becomes x = 3; This simple example is caught early by the compiler, but happens at JIT time when other optimizations make this possible.
Copy propagation. x = a; y = x; becomes y = a; This helps the register allocator make better decisions. It is a big deal in the x86 jitter because it has few registers to work with. Having it select the right ones is critical to perf.
这些都是非常重要的优化,可以产生很大的不同,例如,当你分析应用程序的调试构建并将其与发布构建进行比较时。只有当代码在你的关键路径上时,你写的5%到10%的代码才会真正影响你的程序性能。JIT优化器不够聪明,不能预先知道什么是关键的,它只能对所有代码应用“将它转到11”的拨号盘。
这些优化对程序执行时间的有效结果通常会受到在其他地方运行的代码的影响。读取文件、执行dbase查询等。使JIT优化器所做的工作完全不可见。不过它并不介意:)
The JIT optimizer is pretty reliable code, mostly because it has been put to the test millions of times. It is extremely rare to have problems in the Release build version of your program. It does happen however. Both the x64 and the x86 jitters have had problems with structs. The x86 jitter has trouble with floating point consistency, producing subtly different results when the intermediates of a floating point calculation are kept in a FPU register at 80-bit precision instead of getting truncated when flushed to memory.
我知道我的答案很晚,我的答案并不完全是你想要的,但是,我认为一些坚实而简单的例子会很好。不管怎样,这段代码导致了调试和发布之间的巨大差异。代码是在Visual Studio 2019上用c++编写的。代码是这样的:
#include <iostream>
using namespace std;
unsigned long long fibonacci(int n)
{
return n < 2 ? n : (fibonacci(n - 1) + fibonacci(n - 2));
}
int main()
{
int x = 47;
cout << "Calculating..." << endl;
cout << "fib(" << x << ") = " << fibonacci(x) << endl;
}
编辑:
计算斐波那契数列的性能差异
Debug Release
C++ x86 C++ x64 C++ x86 C++ x64 C# Debug C# Release
Time (mSeconds) 99384.9 27799.1 11066.0 11321.5 95233.7 24566.0
Time (Seconds) 99.4 27.8 11.1 11.3 95.2 24.6