诚然,我不明白。假设您有一个内存,内存字的长度为1字节。为什么你不能访问一个4字节长的变量在一个内存访问上一个未对齐的地址(即。不能被4整除,就像对齐地址的情况一样?


当前回答

If a system with byte-addressable memory has a 32-bit-wide memory bus, that means there are effectively four byte-wide memory systems which are all wired to read or write the same address. An aligned 32-bit read will require information stored in the same address in all four memory systems, so all systems can supply data simultaneously. An unaligned 32-bit read would require some memory systems to return data from one address, and some to return data from the next higher address. Although there are some memory systems that are optimized to be able to fulfill such requests (in addition to their address, they effectively have a "plus one" signal which causes them to use an address one higher than specified) such a feature adds considerable cost and complexity to a memory system; most commodity memory systems simply cannot return portions of different 32-bit words at the same time.

其他回答

这是许多底层处理器的限制。它通常可以通过进行4次低效的单字节读取来解决,而不是进行一次高效的单词读取,但许多语言说明符认为,直接禁止它们并强制所有内容对齐会更容易。

OP在这个链接中发现了更多的信息。

如果您有一个32位数据总线,连接到内存的地址总线地址行将从A2开始,因此在单个总线周期中只能访问32位对齐的地址。

因此,如果一个字跨越了一个地址对齐边界——即A0(16/32位数据)或A1(32位数据)不为零,则需要两个总线周期来获取数据。

一些架构/指令集不支持未对齐的访问,并且会在这种尝试时生成异常,因此编译器生成的未对齐访问代码不仅需要额外的总线周期,还需要额外的指令,使其效率更低。

在PowerPC上,可以毫无问题地从奇数地址加载整数。

Sparc、I86和(我认为)Itatnium会在您尝试时引发硬件异常。

在大多数现代处理器上,一个32位负载和四个8位负载并没有太大区别。数据是否已经在缓存中将产生更大的影响。

If a system with byte-addressable memory has a 32-bit-wide memory bus, that means there are effectively four byte-wide memory systems which are all wired to read or write the same address. An aligned 32-bit read will require information stored in the same address in all four memory systems, so all systems can supply data simultaneously. An unaligned 32-bit read would require some memory systems to return data from one address, and some to return data from the next higher address. Although there are some memory systems that are optimized to be able to fulfill such requests (in addition to their address, they effectively have a "plus one" signal which causes them to use an address one higher than specified) such a feature adds considerable cost and complexity to a memory system; most commodity memory systems simply cannot return portions of different 32-bit words at the same time.

你可以使用一些处理器(nehalem可以这样做),但以前所有的内存访问都是在64位(或32位)行上对齐的,因为总线是64位宽的,你必须一次获取64位,并且在对齐的64位“块”中获取这些要容易得多。

如果你想获取一个字节,你获取64位块,然后屏蔽掉你不想要的位。如果您的字节位于右端,这很简单,也很快速,但如果它位于64位块的中间,则必须屏蔽不需要的位,然后将数据转移到正确的位置。更糟糕的是,如果您想要一个2字节的变量,但它被分成了2个块,那么这就需要双倍的内存访问。

因此,当每个人都认为内存很便宜时,他们只是让编译器在处理器的块大小上对齐数据,这样你的代码以浪费内存为代价运行得更快更有效。