所以我现在正在学习MSIL来学习调试我的c# . net应用程序。

我一直在想:堆栈的目的是什么?

把我的问题联系起来: 为什么要从内存转移到堆栈或“加载”? 另一方面,为什么要从堆栈转移到内存或“存储”? 为什么不把它们都放在内存里呢?

是因为它更快吗? 是因为它是基于内存的吗? 效率呢?

我试图抓住这一点,以帮助我更深入地理解CIL代码。


维基百科上有一篇非常有趣/详细的文章,堆栈机器指令集的优点。我需要完全引用它,所以简单地放一个链接更容易。我将简单地引用副标题

非常紧凑的目标代码 简单的编译器/简单的解释器 最小处理器状态


如果不遵循堆栈/堆的概念,数据被加载到随机内存位置或数据从随机内存位置存储…它将是非结构化和无管理的。

这些概念用于以预定义的结构存储数据,以提高性能、内存使用……因此被称为数据结构。


请记住,当您谈论MSIL时,您是在谈论虚拟机的指令。. net中使用的虚拟机是基于栈的虚拟机。与基于寄存器的虚拟机相反,Android操作系统中使用的Dalvik虚拟机就是一个例子。

The stack in the VM is virtual, it is up to the interpreter or the just-in-time compiler to translate the VM instructions into actual code that runs on the processor. Which in the case of .NET is almost always a jitter, the MSIL instruction set was designed to be jitted from the get go. As opposed to Java bytecode for example, it has distinct instructions for operations on specific data types. Which makes it optimized to be interpreted. An MSIL interpreter actually exists though, it is used in the .NET Micro Framework. Which runs on processors with very limited resources, can't afford the RAM required to store machine code.

实际的机器代码模型是混合的,既有堆栈又有寄存器。JIT代码优化器的一项重要工作是找到方法将栈上的变量存储在寄存器中,从而大大提高执行速度。而Dalvik抖动则有相反的问题。

The machine stack is otherwise a very basic storage facility that has been around in processor designs for a very long time. It has very good locality of reference, a very important feature on modern CPUs that chew through data a lot faster than RAM can supply it and supports recursion. Language design is heavily influenced by having a stack, visible in support for local variables and scope limited to the method body. A significant problem with the stack is the one that this site is named for.


更新:我非常喜欢这个问题,所以在2011年11月18日把它作为我博客的主题。谢谢你的好问题!

我一直在想:堆栈的目的是什么?

我假定您指的是MSIL语言的计算堆栈,而不是运行时实际的每个线程堆栈。

为什么要从内存转移到堆栈或“加载”?另一方面,为什么要从堆栈转移到内存或“存储”?为什么不把它们都放在内存里呢?

MSIL是一种“虚拟机”语言。像c#编译器这样的编译器生成CIL,然后在运行时,另一个称为JIT (Just In Time)编译器的编译器将IL转换为可以执行的实际机器代码。

首先让我们来回答“为什么要有MSIL ?”为什么不直接让c#编译器写出机器代码呢?

因为这样做成本更低。假设我们不这么做;假设每种语言都必须有自己的机器代码生成器。你有20种不同的语言:c#、JScript .NET、Visual Basic、IronPython、f#……假设你有10个不同的处理器。你需要编写多少代码生成器?20 x 10 = 200个代码生成器。工作量很大。现在假设您想要添加一个新的处理器。你必须为它编写20次代码生成器,每种语言一次。

此外,这是困难和危险的工作。为你不是专家的芯片编写高效的代码生成器是一项艰难的工作!编译器设计人员是语言语义分析的专家,而不是新芯片组的有效寄存器分配的专家。

Now suppose we do it the CIL way. How many CIL generators do you have to write? One per language. How many JIT compilers do you have to write? One per processor. Total: 20 + 10 = 30 code generators. Moreover, the language-to-CIL generator is easy to write because CIL is a simple language, and the CIL-to-machine-code generator is also easy to write because CIL is a simple language. We get rid of all of the intricacies of C# and VB and whatnot and "lower" everything to a simple language that is easy to write a jitter for.

使用中间语言可以极大地降低生成新语言编译器的成本。它还大大降低了支持新芯片的成本。你想要支持一个新的芯片,你找一些该芯片的专家让他们写一个CIL jitter,你就完成了;然后在芯片上支持所有这些语言。

好了,我们已经确定了为什么要有MSIL;因为使用一种中间语言可以降低成本。那么为什么语言是“堆栈机器”呢?

因为对于语言编译器编写者来说,栈机在概念上是非常简单的。堆栈是一种简单、易于理解的描述计算的机制。对于JIT编译器编写者来说,堆栈机器在概念上也非常容易处理。使用堆栈是一种简化的抽象,因此,它降低了我们的成本。

你会问“为什么要有一个堆栈?”为什么不直接使用内存呢?让我们想想。假设你想为以下目标生成CIL代码:

int x = A() + B() + C() + 10;

假设我们有这样的约定:“add”、“call”、“store”等等总是将它们的参数从堆栈中取出,并将它们的结果(如果有的话)放在堆栈中。要为这个c#生成CIL代码,我们只需这样说:

load the address of x // The stack now contains address of x
call A()              // The stack contains address of x and result of A()
call B()              // Address of x, result of A(), result of B()
add                   // Address of x, result of A() + B()
call C()              // Address of x, result of A() + B(), result of C()
add                   // Address of x, result of A() + B() + C()
load 10               // Address of x, result of A() + B() + C(), 10
add                   // Address of x, result of A() + B() + C() + 10
store in address      // The result is now stored in x, and the stack is empty.

现在假设我们不使用堆栈。我们将按照您的方式来做,每个操作码都接受其操作数的地址和它存储结果的地址:

Allocate temporary store T1 for result of A()
Call A() with the address of T1
Allocate temporary store T2 for result of B()
Call B() with the address of T2
Allocate temporary store T3 for the result of the first addition
Add contents of T1 to T2, then store the result into the address of T3
Allocate temporary store T4 for the result of C()
Call C() with the address of T4
Allocate temporary store T5 for result of the second addition
...

You see how this goes? Our code is getting huge because we have to explicitly allocate all the temporary storage that would normally by convention just go on the stack. Worse, our opcodes themselves are all getting enormous because they all now have to take as an argument the address that they're going to write their result into, and the address of each operand. An "add" instruction that knows that it is going to take two things off the stack and put one thing on can be a single byte. An add instruction that takes two operand addresses and a result address is going to be enormous.

我们使用基于堆栈的操作码,因为堆栈可以解决常见问题。也就是说:我想分配一些临时存储空间,很快地使用它,然后在我完成时迅速地摆脱它。通过假设我们有一个堆栈,我们可以使操作码非常小,代码非常简洁。

更新:一些额外的想法

顺便提一下,这种通过(1)指定虚拟机,(2)编写针对VM语言的编译器,以及(3)在各种硬件上编写VM实现来大幅降低成本的想法根本不是一个新想法。它不是起源于MSIL、LLVM、Java字节码或任何其他现代基础设施。据我所知,最早实现这种策略的是1966年的pcode机器。

The first I personally heard of this concept was when I learned how the Infocom implementors managed to get Zork running on so many different machines so well. They specified a virtual machine called the Z-machine and then made Z-machine emulators for all the hardware they wanted to run their games on. This had the added enormous benefit that they could implement virtual memory management on primitive 8-bit systems; a game could be larger than would fit into memory because they could just page the code in from disk when they needed it and discard it when they needed to load new code.


To add a little more to the stack question. The stack concept derives from CPU design where the machine code in the arithmetic logic unit (ALU) operates on operands that are located on the stack. For example a multiply operation may take the two top operands from the stack, multiple them and place the result back on the stack. Machine language typically has two basic functions to add and remove operands from the stack; PUSH and POP. In many cpu's dsp's (digital signal processor) and machine controllers (such as that controlling a washing machine) the stack is located on the chip itself. This gives faster access to the ALU and consolidates the required functionality into a single chip.


通过使用连续传递的编码风格,可以让系统在没有堆栈的情况下工作。然后调用帧成为在垃圾回收堆中分配的continuation(垃圾回收器将需要一些堆栈)。

参见Andrew Appel的旧文章:使用延续和垃圾收集进行编译可以比堆栈分配更快

(由于缓存问题,他今天可能会有一点错误)


I looked for "interrupt" and nobody included that as an advantage. For each device that interrupts a microcontroller or other processor, there are usually registers that are pushed onto a stack, an interrupt service routine is called, and when it is done, the registers are popped back off of the stack, and put back where they were. Then the instruction pointer is restored, and normal activity picks up where it left off, almost as if the interrupt never happened. With the stack, you can actually have several devices (theoretically) interrupt each other, and it all just works -- because of the stack.

There is also a family of stack-based languages called concatenative languages. They are all (I believe) functional languages, because the stack is an implicit parameter passed-in, and also the changed stack is an implicit return from each function. Both Forth and Factor (which is excellent) are examples, along with others. Factor has been used similar to Lua, for scripting games, and was written by Slava Pestov, a genius currently working at Apple. His Google TechTalk on youtube I have watched a few times. He talks about Boa constructors, but I'm not sure what he means ;-).

I really think that some of the current VM's, like the JVM, Microsoft's CIL, and even the one I saw was written for Lua, should be written in some of these stack-based languages, to make them portable to even more platforms. I think that these concatenative languages are somehow missing their callings as VM creation kits, and portability platforms. There is even pForth, a "portable" Forth written in ANSI C, that could be used for even more universal portability. Anybody tried to compile it using Emscripten or WebAssembly?

使用基于堆栈的语言,有一种称为零点的代码风格,因为您可以只列出要调用的函数而不传递任何参数(有时)。如果这些函数完美地结合在一起,那么除了所有零点函数的列表之外,什么都没有,这就是您的应用程序(理论上)。如果你深入研究Forth或Factor,你就会明白我在说什么。

Easy Forth是一个用JavaScript编写的在线教程,这里有一个小样本(注意“sq sq sq sq”作为零点调用风格的例子):

: sq dup * ;  ok
2 sq . 4  ok
: ^4 sq sq ;  ok
2 ^4 . 16  ok
: ^8 sq sq sq sq ;  ok
2 ^8 . 65536  ok

另外,如果你看一下Easy Forth的网页源代码,你会在底部看到它是非常模块化的,由大约8个JavaScript文件编写。

I have spent a lot of money on just about every Forth book I could get my hands on in an attempt to assimilate Forth, but am now just beginning to grok it better. I want to give a heads up to those who come after, if you really want to get it (I found this out too late), get the book on FigForth and implement that. The commercial Forths are all too complicated, and the greatest thing about Forth is that it is possible to comprehend the whole system, from top to bottom. Somehow, Forth implements a whole development environment on a new processor, and though the need for that has seemed to pass with C on everything, it is still useful as a rite of passage to write a Forth from scratch. So, if you choose to do this, try the FigForth book -- it is several Forths implemented simultaneously on a variety of processors. A kind of Rosetta Stone of Forths.

为什么我们需要一个堆栈——效率、优化、零点、中断时保存寄存器,对于递归算法来说,它是“正确的形状”。