
我已经读到一个线程的默认堆栈大小是1 MB,所以为了移动所有的float[],我必须将堆栈扩展大约50倍(到50 MB~)。

我知道这通常被认为是“不安全的”,不推荐,但在对我当前的代码进行了基准测试后,我发现处理速度提高了530% !所以我不能在没有进一步调查的情况下简单地跳过这个选项,这就引出了我的问题;将堆栈增加到如此大的大小有什么危险(可能会出错),我应该采取什么预防措施来最小化这种危险?


public static unsafe void TestMethod1()
    float* samples = stackalloc float[12500000];

    for (var ii = 0; ii < 12500000; ii++)
        samples[ii] = 32768;

public static void TestMethod2()
    var samples = new float[12500000];

    for (var i = 0; i < 12500000; i++)
        samples[i] = 32768;


Microbenchmarking languages with JIT and GC such as Java or C# can be a bit complicated, so it's generally a good idea to use an existing framework - Java offers mhf or Caliper which are excellent, sadly to the best of my knowledge C# doesn't offer anything approaching those. Jon Skeet wrote this here which I'll blindly assume takes care of the most important things (Jon knows what he's doing in that area; also yes no worries I did actually check). I tweaked the timing a bit because 30 seconds per test after warmup was too much for my patience (5 seconds ought to do).

首先是结果,Windows 7 x64下的。net 4.5.1——数字表示它可以在5秒内运行,所以越高越好。

x64 JIT:

Standard       10,589.00  (1.00)
UnsafeStandard 10,612.00  (1.00)
Stackalloc     12,088.00  (1.14)
FixedStandard  10,715.00  (1.01)
GlobalAlloc    12,547.00  (1.18)

x86 JIT(是的,这仍然有点可悲):

Standard       14,787.00   (1.02)
UnsafeStandard 14,549.00   (1.00)
Stackalloc     15,830.00   (1.09)
FixedStandard  14,824.00   (1.02)
GlobalAlloc    18,744.00   (1.29)



public static float Standard(int size) {
    float[] samples = new float[size];
    for (var ii = 0; ii < size; ii++) {
        samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);
    return samples[size - 1];

public static unsafe float UnsafeStandard(int size) {
    float[] samples = new float[size];
    for (var ii = 0; ii < size; ii++) {
        samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);
    return samples[size - 1];

public static unsafe float Stackalloc(int size) {
    float* samples = stackalloc float[size];
    for (var ii = 0; ii < size; ii++) {
        samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);
    return samples[size - 1];

public static unsafe float FixedStandard(int size) {
    float[] prev = new float[size];
    fixed (float* samples = &prev[0]) {
        for (var ii = 0; ii < size; ii++) {
            samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);
        return samples[size - 1];

public static unsafe float GlobalAlloc(int size) {
    var ptr = Marshal.AllocHGlobal(size * sizeof(float));
    try {
        float* samples = (float*)ptr;
        for (var ii = 0; ii < size; ii++) {
            samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);
        return samples[size - 1];
    } finally {

static void Main(string[] args) {
    int inputSize = 100000;
    var results = TestSuite.Create("Tests", inputSize, Standard(inputSize)).


有一件事可能会出错,那就是你可能没有得到这样做的许可。除非在完全信任模式下运行,框架将忽略对更大堆栈大小的请求(参见线程构造函数上的MSDN (ParameterizedThreadStart, Int32))


我发现处理速度提高了530% !


除了过度的递归,在. net程序中消耗大量堆栈空间是非常非常困难的。托管方法的堆栈框架的大小是固定的。简单地说就是方法的参数和方法中的局部变量的和。减去可以存储在CPU寄存器中的那些,你可以忽略它,因为它们的数量太少了。


This is unlike a native program, particularly one written in C, it can also reserve space for arrays on the stack frame. The basic malware attack vector behind stack buffer overflows. Possible in C# as well, you'd have to use the stackalloc keyword. If you are doing that then the obvious danger is having to write unsafe code that is subject to such attacks, as well as random stack frame corruption. Very hard to diagnose bugs. There is a counter-measure against this in later jitters, I think starting at .NET 4.0, where the jitter generates code to put a "cookie" on the stack frame and checks if it is still intact when the method returns. Instant crash to the desktop without any way to intercept or report the mishap if that happens. That's ... dangerous to the user's mental state.

The main thread of your program, the one started by the operating system, will have a 1 MB stack by default, 4 MB when you compile your program targeting x64. Increasing that requires running Editbin.exe with the /STACK option in a post build event. You can typically ask for up to 500 MB before your program will have trouble getting started when running in 32-bit mode. Threads can too, much easier of course, the danger zone typically hovers around 90 MB for a 32-bit program. Triggered when your program has been running for a long time and address space got fragmented from previous allocations. Total address space usage must already be high, over a gig, to get this failure mode.




除了你提供的两种方法外,我还尝试了其他一些方法。方法3的代码与方法2相同,但该函数被声明为不安全的。方法4是使用指针访问定期创建的数组。方法5是使用指针访问非托管内存,如Marc Gravell所述。这五种方法的运行时间非常相似。M5是最快的(M1紧随其后)。最快和最慢之间的差距是5%,这不是我关心的问题。

    public static unsafe float TestMethod3()
        float[] samples = new float[5000000];

        for (var ii = 0; ii < 5000000; ii++)
            samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);

        return samples[5000000 - 1];

    public static unsafe float TestMethod4()
        float[] prev = new float[5000000];
        fixed (float* samples = &prev[0])
            for (var ii = 0; ii < 5000000; ii++)
                samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);

            return samples[5000000 - 1];

    public static unsafe float TestMethod5()
        var ptr = Marshal.AllocHGlobal(5000000 * sizeof(float));
            float* samples = (float*)ptr;

            for (var ii = 0; ii < 5000000; ii++)
                samples[ii] = 32768 + (ii != 0 ? samples[ii - 1] : 0);

            return samples[5000000 - 1];




IL_0011:  ldloc.0 
IL_0012:  ldloc.1 
IL_0013:  ldc.i4.4 
IL_0014:  mul 
IL_0015:  add 
IL_0016:  ldc.r4 32768.
IL_001b:  stind.r4 // <----------- This one
IL_001c:  ldloc.1 
IL_001d:  ldc.i4.1 
IL_001e:  add 
IL_001f:  stloc.1 
IL_0020:  ldloc.1 
IL_0021:  ldc.i4 12500000
IL_0026:  blt IL_0011


IL_0012:  ldloc.0 
IL_0013:  ldloc.1 
IL_0014:  ldc.r4 32768.
IL_0019:  stelem.r4 // <----------- This one
IL_001a:  ldloc.1 
IL_001b:  ldc.i4.1 
IL_001c:  add 
IL_001d:  stloc.1 
IL_001e:  ldloc.1 
IL_001f:  ldc.i4 12500000
IL_0024:  blt IL_0012


stind.r4: Store value of type float32 into memory at address




stelem.r4: Replace array element at index with the float32 value on the stack.





在将测试代码与Sam进行比较后,我确定我们都是正确的! 然而,关于不同的事情:

访问内存(读和写)无论在哪里都一样快——堆栈、全局或堆。 然而,分配它在堆栈上最快,在堆上最慢。

它是这样的:stack < global < heap。(分配时间) 从技术上讲,堆栈分配并不是真正的分配,运行时只是确保堆栈的一部分(帧?)是为数组保留的。

不过,我强烈建议你在这方面要小心。 我的建议如下:

当你需要频繁地创建从不离开函数的数组时(例如通过传递它的引用),使用堆栈将是一个巨大的改进。 如果您可以回收一个数组,那么就尽可能地这样做!堆是长期对象存储的最佳位置。(污染全局内存不好;堆栈帧可以消失)


回答这个问题本身:我在任何大堆栈测试中都没有遇到过任何问题。 我相信唯一可能的问题是堆栈溢出,如果你不小心你的函数调用和运行的内存时,创建你的线程(s),如果系统运行低。




Stack-allocated array time: 00:00:00.2224429
Globally-allocated array time: 00:00:00.2206767
Heap-allocated array time: 00:00:00.1842670
Fastest: Heap.

  |    S    |    G    |    H    |
S |    -    | 100.80 %| 120.72 %|
G |  99.21 %|    -    | 119.76 %|
H |  82.84 %|  83.50 %|    -    |
Rates are calculated by dividing the row's value to the column's.

我在Windows 8.1 Pro(带有更新1)上测试,使用i7 4700 MQ,在.NET 4.5.1下 我用x86和x64进行了测试,结果是相同的。

编辑:我将所有线程的堆栈大小增加到201 MB,样本大小增加到5000万,迭代次数减少到5次。 结果同上:

Stack-allocated array time: 00:00:00.4504903
Globally-allocated array time: 00:00:00.4020328
Heap-allocated array time: 00:00:00.3439016
Fastest: Heap.

  |    S    |    G    |    H    |
S |    -    | 112.05 %| 130.99 %|
G |  89.24 %|    -    | 116.90 %|
H |  76.34 %|  85.54 %|    -    |
Rates are calculated by dividing the row's value to the column's.
