我刚刚完成了工作面试的一部分测试,有一个问题难住了我,甚至用谷歌作为参考。我想看看StackOverflow的工作人员可以做什么:

memset_16aligned函数需要传递给它一个16字节的对齐指针,否则它将崩溃。 a)如何分配1024字节的内存,并将其对齐到16字节的边界? b)在memset_16aligned执行后释放内存。

{    
   void *mem;
   void *ptr;

   // answer a) here

   memset_16aligned(ptr, 0, 1024);

   // answer b) here    
}

当前回答

你也可以添加一些16字节,然后通过添加指针下面的(16-mod)将原始ptr推到16位对齐:

main(){
void *mem1 = malloc(1024+16);
void *mem = ((char*)mem1)+1; // force misalign ( my computer always aligns)
printf ( " ptr = %p \n ", mem );
void *ptr = ((long)mem+16) & ~ 0x0F;
printf ( " aligned ptr = %p \n ", ptr );

printf (" ptr after adding diff mod %p (same as above ) ", (long)mem1 + (16 -((long)mem1%16)) );


free(mem1);
}

其他回答

使用memalign, Aligned-Memory-Blocks可能是解决这个问题的好方法。

原来的答案

{
    void *mem = malloc(1024+16);
    void *ptr = ((char *)mem+16) & ~ 0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

固定的答案

{
    void *mem = malloc(1024+15);
    void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

按要求解释

The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)

The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.

最后一步free()很简单:你总是且只会返回给free()一个malloc()、calloc()或realloc()返回给你的值——其他任何步骤都是灾难。你正确地为我提供了那个值-谢谢。自由释放它。

最后,如果您了解系统的malloc包的内部结构,您可能会猜测它很可能返回16字节对齐的数据(也可能是8字节对齐的)。如果它是16字节对齐的,那么您就不需要对值进行丁克。然而,这是狡猾的和不可移植的-其他malloc包有不同的最小对齐,因此假设一件事当它做不同的事情时将导致核心转储。在广泛的范围内,这个解决方案是可移植的。

还有人提到posix_memalign()是获得对齐内存的另一种方法;并不是所有地方都可以使用它,但通常可以使用它作为基础来实现。注意,对齐是2的幂,这很方便;其他的结盟则更为混乱。

还有一条注释——这段代码不会检查分配是否成功。

修正案

Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

int main(void)
{
    void *mem = malloc(1024+15);
    void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
    return(0);
}

这里是一个稍微一般化的版本,它适用于2的幂的大小:

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

static void test_mask(size_t align)
{
    uintptr_t mask = ~(uintptr_t)(align - 1);
    void *mem = malloc(1024+align-1);
    void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
    assert((align & (align - 1)) == 0);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

int main(void)
{
    test_mask(16);
    test_mask(32);
    test_mask(64);
    test_mask(128);
    return(0);
}

要将test_mask()转换为通用分配函数,分配器的单个返回值必须对发布地址进行编码,正如一些人在他们的回答中所指出的那样。

与面试官的问题

Uri评论道:也许今天早上我的阅读理解有问题,但如果面试问题明确地说:“你如何分配1024字节的内存”,而你分配的内存显然不止这个数。这难道不是面试官的自动失败吗?

我的回答写不进300字的评论……

我想这要看情况。我想大多数人(包括我)认为这个问题的意思是“你将如何分配一个可以存储1024字节数据的空间,其中基址是16字节的倍数”。如果面试官真正的意思是如何分配1024字节(仅)并将其对齐为16字节,那么选择就更有限了。

Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful. Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.

然而,如果面试官期待这两种回答中的任何一种,我希望他们能意识到这个答案回答了一个密切相关的问题,然后重新组织他们的问题,把谈话引向正确的方向。(此外,如果面试官真的很暴躁,那么我就不会想要这份工作;如果对一个不够精确的要求的回答没有得到纠正就被猛烈抨击,那么这个面试官就不是一个安全的雇主。)

世界在前进

问题的题目最近变了。把我难住的是解决C语言中的记忆对齐问题。修改后的标题(如何仅使用标准库分配对齐内存?)需要一个稍微修改的答案-这个附录提供了它。

C11 (ISO/IEC 9899:2011)添加函数aligned_alloc():

7.22.3.1 The aligned_alloc function Synopsis #include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size); Description The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment. Returns The aligned_alloc function returns either a null pointer or a pointer to the allocated space.

POSIX定义了posix_memalign():

#include <stdlib.h> int posix_memalign(void **memptr, size_t alignment, size_t size); DESCRIPTION The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *). Upon successful completion, the value pointed to by memptr shall be a multiple of alignment. If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer. The free() function shall deallocate memory that has previously been allocated by posix_memalign(). RETURN VALUE Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error.

现在可以使用其中一个或两个函数来回答问题,但在最初回答问题时,只有POSIX函数是一个选项。

在幕后,新的对齐内存函数所做的工作与问题中概述的基本相同,只是它们能够更容易地强制对齐,并在内部跟踪对齐内存的开始,这样代码就不必特别处理—它只是释放使用的分配函数返回的内存。

三个稍微不同的答案取决于你如何看待这个问题:

1) Jonathan Leffler的解决方案很好地回答了这个问题,除了要四舍五入到16对齐,你只需要额外的15个字节,而不是16个。

A:

/* allocate a buffer with room to add 0-15 bytes to ensure 16-alignment */
void *mem = malloc(1024+15);
ASSERT(mem); // some kind of error-handling code
/* round up to multiple of 16: add 15 and then round down by masking */
void *ptr = ((char*)mem+15) & ~ (size_t)0x0F;

B:

free(mem);

2)对于一个更通用的内存分配函数,调用者不需要跟踪两个指针(一个使用,一个释放)。因此,在对齐的缓冲区下面存储一个指向“真实”缓冲区的指针。

A:

void *mem = malloc(1024+15+sizeof(void*));
if (!mem) return mem;
void *ptr = ((char*)mem+sizeof(void*)+15) & ~ (size_t)0x0F;
((void**)ptr)[-1] = mem;
return ptr;

B:

if (ptr) free(((void**)ptr)[-1]);

注意,与(1)中只向mem添加了15个字节不同,如果您的实现恰好保证了malloc的32字节对齐(不太可能,但理论上C实现可以有32字节对齐类型),那么这段代码实际上可以减少对齐。如果您所做的只是调用memset_16aligned,那么这并不重要,但如果您为结构体使用内存,那么这可能很重要。

我不确定一个好的修复是什么(除了警告用户返回的缓冲区不一定适合任意结构),因为没有办法通过编程确定特定于实现的对齐保证是什么。我猜在启动时,您可以分配两个或更多的1字节缓冲区,并假设您看到的最糟糕的对齐方式是保证对齐方式。如果你错了,你就浪费了记忆。谁有更好的主意,请说出来…

[Added: The 'standard' trick is to create a union of 'likely to be maximally aligned types' to determine the requisite alignment. The maximally aligned types are likely to be (in C99) 'long long', 'long double', 'void *', or 'void (*)(void)'; if you include <stdint.h>, you could presumably use 'intmax_t' in place of long long (and, on Power 6 (AIX) machines, intmax_t would give you a 128-bit integer type). The alignment requirements for that union can be determined by embedding it into a struct with a single char followed by the union:

struct alignment
{
    char     c;
    union
    {
        intmax_t      imax;
        long double   ldbl;
        void         *vptr;
        void        (*fptr)(void);
    }        u;
} align_data;
size_t align = (char *)&align_data.u.imax - &align_data.c;

然后,您将使用所请求的对齐(在示例中为16)和上面计算的对齐值中较大的一个。

在(64位)Solaris 10上,来自malloc()的结果的基本对齐方式似乎是32字节的倍数。 ]

在实践中,对齐分配器通常采用一个参数进行对齐,而不是硬连接。因此,用户将传递他们所关心的结构体的大小(或大于或等于2的最小次幂),一切都将正常。

3)使用你的平台提供的:posix_memalign用于POSIX, _aligned_malloc用于Windows。

4)如果你使用C11,那么最干净——可移植和简洁——的选项是使用在这个版本的语言规范中引入的标准库函数aligned_alloc。

我们一直在为accelerator .framework做这样的事情,这是一个高度向量化的OS X / iOS库,在那里我们必须一直注意对齐。有很多选择,其中一两个我在上面没有提到。

对于这样的小数组,最快的方法就是把它放在堆栈上。GCC / clang:

 void my_func( void )
 {
     uint8_t array[1024] __attribute__ ((aligned(16)));
     ...
 }

不需要free()。这通常是两条指令:从堆栈指针减去1024,然后用-align对堆栈指针进行AND运算。假设请求者需要堆上的数据,因为数组的生命周期超过了堆栈,或者递归在工作,或者堆栈空间非常宝贵。

在OS X / iOS上,所有调用malloc/calloc/etc。总是16字节对齐。例如,如果你需要为AVX对齐32字节,那么你可以使用posix_memalign:

void *buf = NULL;
int err = posix_memalign( &buf, 32 /*alignment*/, 1024 /*size*/);
if( err )
   RunInCirclesWaivingArmsWildly();
...
free(buf);

有些人提到c++接口的工作原理与此类似。

不要忘记页是按2的大幂进行对齐的,因此页对齐的缓冲区也是16字节对齐的。因此,mmap()和valloc()以及其他类似的接口也是选项。Mmap()的优点是,如果您愿意,可以在缓冲区中预先初始化一些非零的东西。由于它们具有页面对齐的大小,因此您将无法从中获得最小分配,并且在第一次接触它时可能会出现VM故障。

Cheesy:打开守卫malloc或类似的。像这样大小为n*16字节的缓冲区将对齐为n*16字节,因为VM用于捕获溢出,并且其边界位于页面边界。

Some Accelerate.framework functions take in a user supplied temp buffer to use as scratch space. Here we have to assume that the buffer passed to us is wildly misaligned and the user is actively trying to make our life hard out of spite. (Our test cases stick a guard page right before and after the temp buffer to underline the spite.) Here, we return the minimum size we need to guarantee a 16-byte aligned segment somewhere in it, and then manually align the buffer afterward. This size is desired_size + alignment - 1. So, In this case that is 1024 + 16 - 1 = 1039 bytes. Then align as so:

#include <stdint.h>
void My_func( uint8_t *tempBuf, ... )
{
    uint8_t *alignedBuf = (uint8_t*) 
                          (((uintptr_t) tempBuf + ((uintptr_t)alignment-1)) 
                                        & -((uintptr_t) alignment));
    ...
}

添加align -1会将指针移动到第一个对齐地址之前,然后使用-align进行and(例如0xfff…)Ff0 for alignment=16)将它带回对齐的地址。

正如其他文章所描述的,在其他没有16字节对齐保证的操作系统上,您可以调用更大的malloc,稍后将指针预留给free(),然后按照上面所述进行对齐并使用对齐的指针,这与我们的临时缓冲区的情况非常相似。

As for aligned_memset, this is rather silly. You only have to loop in up to 15 bytes to reach an aligned address, and then proceed with aligned stores after that with some possible cleanup code at the end. You can even do the cleanup bits in vector code, either as unaligned stores that overlap the aligned region (providing the length is at least the length of a vector) or using something like movmaskdqu. Someone is just being lazy. However, it is probably a reasonable interview question if the interviewer wants to know whether you are comfortable with stdint.h, bitwise operators and memory fundamentals, so the contrived example can be forgiven.

size =1024;
alignment = 16;
aligned_size = size +(alignment -(size %  alignment));
mem = malloc(aligned_size);
memset_16aligned(mem, 0, 1024);
free(mem);

希望这是一个最简单的实现,让我知道你的意见。