诚然,我不明白。假设您有一个内存,内存字的长度为1字节。为什么你不能访问一个4字节长的变量在一个内存访问上一个未对齐的地址(即。不能被4整除,就像对齐地址的情况一样?
当前回答
在PowerPC上,可以毫无问题地从奇数地址加载整数。
Sparc、I86和(我认为)Itatnium会在您尝试时引发硬件异常。
在大多数现代处理器上,一个32位负载和四个8位负载并没有太大区别。数据是否已经在缓存中将产生更大的影响。
其他回答
你可以使用一些处理器(nehalem可以这样做),但以前所有的内存访问都是在64位(或32位)行上对齐的,因为总线是64位宽的,你必须一次获取64位,并且在对齐的64位“块”中获取这些要容易得多。
如果你想获取一个字节,你获取64位块,然后屏蔽掉你不想要的位。如果您的字节位于右端,这很简单,也很快速,但如果它位于64位块的中间,则必须屏蔽不需要的位,然后将数据转移到正确的位置。更糟糕的是,如果您想要一个2字节的变量,但它被分成了2个块,那么这就需要双倍的内存访问。
因此,当每个人都认为内存很便宜时,他们只是让编译器在处理器的块大小上对齐数据,这样你的代码以浪费内存为代价运行得更快更有效。
@joshperry has given an excellent answer to this question. In addition to his answer, I have some numbers that show graphically the effects which were described, especially the 2X amplification. Here's a link to a Google spreadsheet showing what the effect of different word alignments look like. In addition here's a link to a Github gist with the code for the test. The test code is adapted from the article written by Jonathan Rentzsch which @joshperry referenced. The tests were run on a Macbook Pro with a quad-core 2.8 GHz Intel Core i7 64-bit processor and 16GB of RAM.
从根本上讲,这是因为内存总线有一些特定的长度,它比内存大小小得多。
因此,CPU从芯片上的L1缓存中读取,现在通常是32KB。但是连接L1缓存到CPU的内存总线的缓存线宽度要小得多。这将是128位的数量级。
So:
262,144 bits - size of memory
128 bits - size of bus
未对齐的访问偶尔会重叠两条缓存线,这将需要一个全新的缓存读取来获取数据。它甚至可能会错过到DRAM中。
此外,CPU的某些部分将不得不倒立起来,从这两条不同的缓存线中拼凑出一个单独的对象,每条缓存线都有一块数据。在一行上,它是非常高阶的位,在另一行上,它是非常低阶的位。
将会有专门的硬件完全集成到管道中,处理将对齐的对象移动到CPU数据总线的必要位上,但是对于未对齐的对象可能缺乏这样的硬件,因为使用这些晶体管来加速正确优化的程序可能更有意义。
在任何情况下,无论有多少特殊用途的硬件(假设的和愚蠢的)致力于修补错位的内存操作,有时必要的第二次内存读取都会减慢管道。
If a system with byte-addressable memory has a 32-bit-wide memory bus, that means there are effectively four byte-wide memory systems which are all wired to read or write the same address. An aligned 32-bit read will require information stored in the same address in all four memory systems, so all systems can supply data simultaneously. An unaligned 32-bit read would require some memory systems to return data from one address, and some to return data from the next higher address. Although there are some memory systems that are optimized to be able to fulfill such requests (in addition to their address, they effectively have a "plus one" signal which causes them to use an address one higher than specified) such a feature adds considerable cost and complexity to a memory system; most commodity memory systems simply cannot return portions of different 32-bit words at the same time.
在PowerPC上,可以毫无问题地从奇数地址加载整数。
Sparc、I86和(我认为)Itatnium会在您尝试时引发硬件异常。
在大多数现代处理器上,一个32位负载和四个8位负载并没有太大区别。数据是否已经在缓存中将产生更大的影响。