当问到C语言中常见的未定义行为时,人们有时会提到严格的混叠规则。 他们在谈论什么?


当前回答

通过指针强制转换(而不是使用联合)的类型双关是打破严格混叠的一个主要例子。

其他回答

一个典型的情况下,你遇到严格的混叠问题是当覆盖一个结构(如设备/网络msg)到你的系统的字大小的缓冲区(如uint32_ts或uint16_ts指针)。当你将一个结构叠加到这样的缓冲区上,或者通过指针强制转换将一个缓冲区叠加到这样的结构上时,你很容易违反严格的混叠规则。

在这种设置中,如果我想发送消息到某个对象,我必须有两个不兼容的指针指向同一块内存。然后我可能会天真地编写如下代码:

typedef struct Msg
{
    unsigned int a;
    unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
    // Get a 32-bit buffer from the system
    uint32_t* buff = malloc(sizeof(Msg));
    
    // Alias that buffer through message
    Msg* msg = (Msg*)(buff);
    
    // Send a bunch of messages    
    for (int i = 0; i < 10; ++i)
    {
        msg->a = i;
        msg->b = i+1;
        SendWord(buff[0]);
        SendWord(buff[1]);   
    }
}

严格的混叠规则使得这种设置是非法的:对一个指针进行解引用,该指针的混叠对象不是兼容类型或C 2011 6.5第71段允许的其他类型之一,这是未定义的行为。不幸的是,您仍然可以以这种方式编码,可能会得到一些警告,让它编译良好,但在运行代码时却会出现奇怪的意外行为。

(GCC在给出别名警告的能力上似乎有些不一致,有时给我们一个友好的警告,有时则不是。)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

请记住,如果您认为这个示例是虚构的,那么即使您将缓冲区传递给另一个为您执行发送的函数,也可能会发生这种情况。

void SendMessage(uint32_t* buff, size_t size32)
{
    for (int i = 0; i < size32; ++i) 
    {
        SendWord(buff[i]);
    }
}

并重写了之前的循环来利用这个方便的函数

for (int i = 0; i < 10; ++i)
{
    msg->a = i;
    msg->b = i+1;
    SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

我该怎么解决这个问题呢?

Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11. union { Msg msg; unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)]; }; You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc)) You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

初学者要小心

这只是将两种类型叠加在一起时的一个潜在雷区。您还应该了解字节顺序、单词对齐,以及如何通过正确打包结构来处理对齐问题。

脚注

C 2011 6.5 7允许左值访问的类型有:

a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type.

这是严格的混叠规则,可以在c++ 03标准的3.10节中找到(其他答案提供了很好的解释,但没有一个提供了规则本身):

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined: the dynamic type of the object, a cv-qualified version of the dynamic type of the object, a type that is the signed or unsigned type corresponding to the dynamic type of the object, a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, a char or unsigned char type.

c++ 11和c++ 14的措辞(强调更改):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: the dynamic type of the object, a cv-qualified version of the dynamic type of the object, a type similar (as defined in 4.4) to the dynamic type of the object, a type that is the signed or unsigned type corresponding to the dynamic type of the object, a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union), a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, a char or unsigned char type.

有两个变化很小:glvalue代替了lvalue,并澄清了聚合/并集的情况。

第三个变化提供了更强的保证(放宽强混叠规则):类似类型的新概念现在可以安全地进行混叠。


还有C的措辞(C99;Iso / iec 9899:1999 6.5/7;在ISO/IEC 9899:2011§6.5¶7中使用了完全相同的措辞:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types 73) or 88): a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type. 73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

根据C89的基本原理,标准的作者不想要求编译器给出如下代码:

int x;
int test(double *p)
{
  x=5;
  *p = 1.0;
  return x;
}

应该要求在赋值和返回语句之间重新加载x的值,以便允许p指向x的可能性,而对*p的赋值可能因此改变x的值。编译器应该有权假定在上述情况下不会出现混叠的概念是没有争议的。

不幸的是,C89的作者写规则的方式,如果从字面上读,甚至会使下面的函数调用未定义行为:

void test(void)
{
  struct S {int x;} s;
  s.x = 1;
}

because it uses an lvalue of type int to access an object of type struct S, and int is not among the types that may be used accessing a struct S. Because it would be absurd to treat all use of non-character-type members of structs and unions as Undefined Behavior, almost everyone recognizes that there are at least some circumstances where an lvalue of one type may be used to access an object of another type. Unfortunately, the C Standards Committee has failed to define what those circumstances are.

大部分问题是缺陷报告#028的结果,它询问了程序的行为,如:

int test(int *ip, double *dp)
{
  *ip = 1;
  *dp = 1.23;
  return *ip;
}
int test2(void)
{
  union U { int i; double d; } u;
  return test(&u.i, &u.d);
}

缺陷报告#28指出,程序调用了未定义行为,因为写入类型为“double”的联合成员并读取类型为“int”的联合成员的操作调用了实现定义的行为。这样的推理是毫无意义的,但却形成了有效类型规则的基础,这些规则不必要地使语言复杂化,而对解决原始问题毫无帮助。

解决原始问题的最好办法可能是治疗 关于规则目的的脚注,就像它是规范的一样,并作出 除非实际涉及使用别名的冲突访问,否则该规则不可执行。假设是这样的:

 void inc_int(int *p) { *p = 3; }
 int test(void)
 {
   int *p;
   struct S { int x; } s;
   s.x = 1;
   p = &s.x;
   inc_int(p);
   return s.x;
 }

在inc_int中没有冲突,因为所有通过*p访问的存储都是使用int类型的左值完成的,在test中也没有冲突,因为p明显地派生于结构体S,并且在下次使用S时,所有通过p访问的存储都已经发生了。

如果代码稍微改变一下……

 void inc_int(int *p) { *p = 3; }
 int test(void)
 {
   int *p;
   struct S { int x; } s;
   p = &s.x;
   s.x = 1;  //  !!*!!
   *p += 1;
   return s.x;
 }

这里,p和对s.x的访问在被标记的行上存在别名冲突,因为在执行时存在另一个引用,该引用将用于访问相同的存储。

如果缺陷报告028说原始示例调用UB是因为两个指针的创建和使用之间有重叠,那么事情就会变得更清楚,而不必添加“有效类型”或其他类似的复杂性。

严格的混叠是不允许不同的指针类型指向相同的数据。

本文将帮助您全面详细地理解这个问题。

从技术上讲,在c++中,严格的混叠规则可能永远都不适用。

注意indirection(*运算符)的定义:

一元*运算符执行间接操作:它所对应的表达式 是指向对象类型的指针,还是指向对象类型的指针 函数类型,结果是指向对象或的左值 表达式所指向的函数。

同样来自glvalue的定义

glvalue是一个表达式,其求值决定的标识 一个对象,(…剪)

因此,在任何定义良好的程序跟踪中,glvalue都指向对象。所以所谓的严格混叠规则并不适用。这可能不是设计师想要的。