曾经,为了编写x86汇编程序,例如,你会有这样的指令:“用值5加载EDX寄存器”,“增加EDX”寄存器,等等。
对于拥有4核(甚至更多)的现代cpu,在机器代码级别上,它是否看起来就像有4个独立的cpu(即只有4个不同的“EDX”寄存器)?如果是这样,当你说“增加EDX寄存器”时,是什么决定哪个CPU的EDX寄存器被增加?现在在x86汇编器中有“CPU上下文”或“线程”概念吗?
内核之间的通信/同步是如何工作的?
如果您正在编写一个操作系统,通过硬件公开的什么机制允许您在不同的内核上调度执行?是一些特殊的特权指令吗?
如果你正在为一个多核CPU编写一个优化编译器/字节码虚拟机,你需要特别了解什么,比如说,x86,以使它生成跨所有核高效运行的代码?
为了支持多核功能,x86机器码做了哪些改变?
I think the questioner probably wants to make a program run faster by having multiple cores work on it in parallel. That's what I would want anyway but all the answers leave me no wiser. However, I think I get this: You can't synchronize different threads down to instruction execution time accuracy. So you can't get 4 cores to do a multiply on four different array elements in parallel to speed up processing by 4:1. Rather, you have to look at your program as comprising major blocks that execute sequentially like
对一些数据做FFT吗
把结果放到一个矩阵中,然后找出它的特征值和特征向量
根据特征值对后者进行排序
用新的数据重复第一步
What you can do is run step 2 on the results of step 1 while running step one in a different core on new data, and running step 3 on the results of step2 in a different core while step 2 is running on the next data and step 1 is running on the data after that.
You can do this in Compaq Visual Fortran and Intel Fortran which is an evolution of CVF by writing three separate programs/ subroutines for the three steps and instead of one "calling" the next it calls an API to start its thread.
They can share data by using COMMON which will be COMMON data memory to all threads.
You have to study the manual till your head hurts and experiment until you get it to work but I have succeeded once at least.
I think the questioner probably wants to make a program run faster by having multiple cores work on it in parallel. That's what I would want anyway but all the answers leave me no wiser. However, I think I get this: You can't synchronize different threads down to instruction execution time accuracy. So you can't get 4 cores to do a multiply on four different array elements in parallel to speed up processing by 4:1. Rather, you have to look at your program as comprising major blocks that execute sequentially like
对一些数据做FFT吗
把结果放到一个矩阵中,然后找出它的特征值和特征向量
根据特征值对后者进行排序
用新的数据重复第一步
What you can do is run step 2 on the results of step 1 while running step one in a different core on new data, and running step 3 on the results of step2 in a different core while step 2 is running on the next data and step 1 is running on the data after that.
You can do this in Compaq Visual Fortran and Intel Fortran which is an evolution of CVF by writing three separate programs/ subroutines for the three steps and instead of one "calling" the next it calls an API to start its thread.
They can share data by using COMMON which will be COMMON data memory to all threads.
You have to study the manual till your head hurts and experiment until you get it to work but I have succeeded once at least.
如果你在写优化
多核编译器/字节码虚拟机
中央处理器,你需要知道什么
特别是关于x86的制作
它生成有效运行的代码
在所有的核上?
作为编写优化编译器/字节码虚拟机的人,我可能能够在这里帮助你。
您不需要特别了解x86,就可以让它生成跨所有核心高效运行的代码。
但是,您可能需要了解cmpxchg及其相关知识,以便编写能够在所有核心上正确运行的代码。多核编程要求在执行线程之间使用同步和通信。
您可能需要了解一些关于x86的知识,以便让它生成在x86上高效运行的代码。
你还可以学习其他一些有用的东西:
您应该了解操作系统(Linux或Windows或OSX)提供的允许您运行多个线程的功能。你应该学习并行化api,比如OpenMP和Threading Building Blocks,或者OSX 10.6“Snow Leopard”即将推出的“Grand Central”。
您应该考虑编译器是否应该自动并行,或者编译器编译的应用程序的作者是否需要在他的程序中添加特殊的语法或API调用来利用多核。