曾经,为了编写x86汇编程序,例如,你会有这样的指令:“用值5加载EDX寄存器”,“增加EDX”寄存器,等等。

对于拥有4核(甚至更多)的现代cpu,在机器代码级别上,它是否看起来就像有4个独立的cpu(即只有4个不同的“EDX”寄存器)?如果是这样,当你说“增加EDX寄存器”时,是什么决定哪个CPU的EDX寄存器被增加?现在在x86汇编器中有“CPU上下文”或“线程”概念吗?

内核之间的通信/同步是如何工作的?

如果您正在编写一个操作系统,通过硬件公开的什么机制允许您在不同的内核上调度执行?是一些特殊的特权指令吗?

如果你正在为一个多核CPU编写一个优化编译器/字节码虚拟机,你需要特别了解什么,比如说,x86,以使它生成跨所有核高效运行的代码?

为了支持多核功能,x86机器码做了哪些改变?


当前回答

如果你在写优化 多核编译器/字节码虚拟机 中央处理器,你需要知道什么 特别是关于x86的制作 它生成有效运行的代码 在所有的核上?

作为编写优化编译器/字节码虚拟机的人,我可能能够在这里帮助你。

您不需要特别了解x86,就可以让它生成跨所有核心高效运行的代码。

但是,您可能需要了解cmpxchg及其相关知识,以便编写能够在所有核心上正确运行的代码。多核编程要求在执行线程之间使用同步和通信。

您可能需要了解一些关于x86的知识,以便让它生成在x86上高效运行的代码。

你还可以学习其他一些有用的东西:

您应该了解操作系统(Linux或Windows或OSX)提供的允许您运行多个线程的功能。你应该学习并行化api,比如OpenMP和Threading Building Blocks,或者OSX 10.6“Snow Leopard”即将推出的“Grand Central”。

您应该考虑编译器是否应该自动并行,或者编译器编译的应用程序的作者是否需要在他的程序中添加特殊的语法或API调用来利用多核。

其他回答

根据我的理解,每个“核心”都是一个完整的处理器,有自己的寄存器集。基本上,BIOS启动时只运行一个核心,然后操作系统可以通过初始化其他核心并将它们指向要运行的代码等方式“启动”其他核心。

同步由操作系统完成。通常,每个处理器为操作系统运行不同的进程,因此操作系统的多线程功能负责决定哪个进程可以访问哪个内存,以及在内存碰撞的情况下该做什么。

The main difference between a single- and a multi-threaded application is that the former has one stack and the latter has one for each thread. Code is generated somewhat differently since the compiler will assume that the data and stack segment registers (ds and ss) are not equal. This means that indirection through the ebp and esp registers that default to the ss register won't also default to ds (because ds!=ss). Conversely, indirection through the other registers which default to ds won't default to ss.

The threads share everything else including data and code areas. They also share lib routines so make sure that they are thread-safe. A procedure that sorts an area in RAM can be multi-threaded to speed things up. The threads will then be accessing, comparing and ordering data in the same physical memory area and executing the same code but using different local variables to control their respective part of the sort. This of course is because the threads have different stacks where the local variables are contained. This type of programming requires careful tuning of the code so that inter-core data collisions (in caches and RAM) are reduced which in turn results in a code which is faster with two or more threads than it is with just one. Of course, an un-tuned code will often be faster with one processor than with two or more. To debug is more challenging because the standard "int 3" breakpoint will not be applicable since you want to interrupt a specific thread and not all of them. Debug register breakpoints do not solve this problem either unless you can set them on the specific processor executing the specific thread you want to interrupt.

其他多线程代码可能涉及在程序的不同部分运行的不同线程。这种类型的编程不需要同样的调优,因此更容易学习。

What has been added on every multiprocessing-capable architecture compared to the single-processor variants that came before them are instructions to synchronize between cores. Also, you have instructions to deal with cache coherency, flushing buffers, and similar low-level operations an OS has to deal with. In the case of simultaneous multithreaded architectures like IBM POWER6, IBM Cell, Sun Niagara, and Intel "Hyperthreading", you also tend to see new instructions to prioritize between threads (like setting priorities and explicitly yielding the processor when there is nothing to do).

但是基本的单线程语义是相同的,您只是添加额外的设施来处理与其他核心的同步和通信。

非官方的SMP FAQ

曾经,为了编写x86汇编程序,例如,你会有这样的指令:“用值5加载EDX寄存器”,“增加EDX”寄存器,等等。对于拥有4核(甚至更多)的现代cpu,在机器代码级别上,它是否看起来就像有4个独立的cpu(即只有4个不同的“EDX”寄存器)?

完全正确。有4组寄存器,包括4个单独的指令指针。

如果是这样,当你说“增加EDX寄存器”时,是什么决定哪个CPU的EDX寄存器被增加?

当然是执行指令的CPU。可以把它想象成4个完全不同的微处理器共享相同的内存。

现在在x86汇编器中有“CPU上下文”或“线程”概念吗?

不。汇编程序只是像往常一样翻译指令。没有变化。

内核之间的通信/同步是如何工作的?

由于它们共享相同的内存,这主要是程序逻辑的问题。虽然现在有一个处理器间中断机制,但它不是必要的,最初也没有出现在第一个双cpu x86系统中。

如果您正在编写一个操作系统,通过硬件公开的什么机制允许您在不同的内核上调度执行?

The scheduler actually doesn't change, except that it is slightly more carefully about critical sections and the types of locks used. Before SMP, kernel code would eventually call the scheduler, which would look at the run queue and pick a process to run as the next thread. (Processes to the kernel look a lot like threads.) The SMP kernel runs the exact same code, one thread at a time, it's just that now critical section locking needs to be SMP-safe to be sure two cores can't accidentally pick the same PID.

是一些特殊的特权指令吗?

不。这些核心都运行在相同的内存中,使用相同的旧指令。

如果你正在为一个多核CPU编写一个优化编译器/字节码虚拟机,你需要特别了解什么,比如说,x86,以使它生成跨所有核高效运行的代码?

运行与之前相同的代码。需要改变的是Unix或Windows内核。

你可以把我的问题总结为“为了支持多核功能,x86机器码做了哪些改变?”

没有什么是必要的。第一个SMP系统使用与单处理器完全相同的指令集。现在,x86体系结构已经有了很大的改进,并且有了大量的新指令来让事情变得更快,但是对于SMP来说没有一个是必要的。

For more information, see the Intel Multiprocessor Specification. Update: all the follow-up questions can be answered by just completely accepting that an n-way multicore CPU is almost1 exactly the same thing as n separate processors that just share the same memory.2 There was an important question not asked: how is a program written to run on more than one core for more performance? And the answer is: it is written using a thread library like Pthreads. Some thread libraries use "green threads" that are not visible to the OS, and those won't get separate cores, but as long as the thread library uses kernel thread features then your threaded program will automatically be multicore. 1. For backwards compatibility, only the first core starts up at reset, and a few driver-type things need to be done to fire up the remaining ones.2. They also share all the peripherals, naturally.

如果你在写优化 多核编译器/字节码虚拟机 中央处理器,你需要知道什么 特别是关于x86的制作 它生成有效运行的代码 在所有的核上?

作为编写优化编译器/字节码虚拟机的人,我可能能够在这里帮助你。

您不需要特别了解x86,就可以让它生成跨所有核心高效运行的代码。

但是,您可能需要了解cmpxchg及其相关知识,以便编写能够在所有核心上正确运行的代码。多核编程要求在执行线程之间使用同步和通信。

您可能需要了解一些关于x86的知识,以便让它生成在x86上高效运行的代码。

你还可以学习其他一些有用的东西:

您应该了解操作系统(Linux或Windows或OSX)提供的允许您运行多个线程的功能。你应该学习并行化api,比如OpenMP和Threading Building Blocks,或者OSX 10.6“Snow Leopard”即将推出的“Grand Central”。

您应该考虑编译器是否应该自动并行,或者编译器编译的应用程序的作者是否需要在他的程序中添加特殊的语法或API调用来利用多核。