内存对齐的目的

诚然，我不明白。假设您有一个内存，内存字的长度为1字节。为什么你不能访问一个4字节长的变量在一个内存访问上一个未对齐的地址(即。不能被4整除，就像对齐地址的情况一样?

当前回答

从根本上讲，这是因为内存总线有一些特定的长度，它比内存大小小得多。

因此，CPU从芯片上的L1缓存中读取，现在通常是32KB。但是连接L1缓存到CPU的内存总线的缓存线宽度要小得多。这将是128位的数量级。

So:

262,144 bits - size of memory
    128 bits - size of bus

未对齐的访问偶尔会重叠两条缓存线，这将需要一个全新的缓存读取来获取数据。它甚至可能会错过到DRAM中。

此外，CPU的某些部分将不得不倒立起来，从这两条不同的缓存线中拼凑出一个单独的对象，每条缓存线都有一块数据。在一行上，它是非常高阶的位，在另一行上，它是非常低阶的位。

将会有专门的硬件完全集成到管道中，处理将对齐的对象移动到CPU数据总线的必要位上，但是对于未对齐的对象可能缺乏这样的硬件，因为使用这些晶体管来加速正确优化的程序可能更有意义。

在任何情况下，无论有多少特殊用途的硬件(假设的和愚蠢的)致力于修补错位的内存操作，有时必要的第二次内存读取都会减慢管道。

2011-03-01 18:38:58

其他回答

如果您有一个32位数据总线，连接到内存的地址总线地址行将从A2开始，因此在单个总线周期中只能访问32位对齐的地址。

因此，如果一个字跨越了一个地址对齐边界——即A0(16/32位数据)或A1(32位数据)不为零，则需要两个总线周期来获取数据。

一些架构/指令集不支持未对齐的访问，并且会在这种尝试时生成异常，因此编译器生成的未对齐访问代码不仅需要额外的总线周期，还需要额外的指令，使其效率更低。

2014-06-19 20:10:06

@joshperry has given an excellent answer to this question. In addition to his answer, I have some numbers that show graphically the effects which were described, especially the 2X amplification. Here's a link to a Google spreadsheet showing what the effect of different word alignments look like. In addition here's a link to a Github gist with the code for the test. The test code is adapted from the article written by Jonathan Rentzsch which @joshperry referenced. The tests were run on a Macbook Pro with a quad-core 2.8 GHz Intel Core i7 64-bit processor and 16GB of RAM.

2016-04-06 14:26:02

你可以使用一些处理器(nehalem可以这样做)，但以前所有的内存访问都是在64位(或32位)行上对齐的，因为总线是64位宽的，你必须一次获取64位，并且在对齐的64位“块”中获取这些要容易得多。

如果你想获取一个字节，你获取64位块，然后屏蔽掉你不想要的位。如果您的字节位于右端，这很简单，也很快速，但如果它位于64位块的中间，则必须屏蔽不需要的位，然后将数据转移到正确的位置。更糟糕的是，如果您想要一个2字节的变量，但它被分成了2个块，那么这就需要双倍的内存访问。

因此，当每个人都认为内存很便宜时，他们只是让编译器在处理器的块大小上对齐数据，这样你的代码以浪费内存为代价运行得更快更有效。

2008-12-19 15:31:47

这是许多底层处理器的限制。它通常可以通过进行4次低效的单字节读取来解决，而不是进行一次高效的单词读取，但许多语言说明符认为，直接禁止它们并强制所有内容对齐会更容易。

OP在这个链接中发现了更多的信息。

2008-12-19 15:20:20

If a system with byte-addressable memory has a 32-bit-wide memory bus, that means there are effectively four byte-wide memory systems which are all wired to read or write the same address. An aligned 32-bit read will require information stored in the same address in all four memory systems, so all systems can supply data simultaneously. An unaligned 32-bit read would require some memory systems to return data from one address, and some to return data from the next higher address. Although there are some memory systems that are optimized to be able to fulfill such requests (in addition to their address, they effectively have a "plus one" signal which causes them to use an address one higher than specified) such a feature adds considerable cost and complexity to a memory system; most commodity memory systems simply cannot return portions of different 32-bit words at the same time.

2011-06-15 20:21:36

内存对齐的目的

推荐文章

最新文章

标签