为什么这些构造使用增量前和增量后未定义的行为?

#include <stdio.h>

int main(void)
{
   int i = 0;
   i = i++ + ++i;
   printf("%d\n", i); // 3

   i = 1;
   i = (i++);
   printf("%d\n", i); // 2 Should be 1, no ?

   volatile int u = 0;
   u = u++ + ++u;
   printf("%d\n", u); // 1

   u = 1;
   u = (u++);
   printf("%d\n", u); // 2 Should also be one, no ?

   register int v = 0;
   v = v++ + ++v;
   printf("%d\n", v); // 3 (Should be the same as u ?)

   int w = 0;
   printf("%d %d\n", ++w, w); // shouldn't this print 1 1

   int x[2] = { 5, 8 }, y = 0;
   x[y] = y ++;
   printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}

当前回答

这里的大多数答案都引用自C标准，强调这些构造的行为是未定义的。为了理解为什么这些结构的行为是未定义的，让我们先从C11标准的角度来理解这些术语:

测序(5.1.2.3):

给定任意两个评估A和B，如果A排在B之前，则A的执行应先于B的执行。

Unsequenced:

如果A不在B之前或之后测序，则A和B是未测序的。

评估可能是以下两种情况之一:

值计算，计算出表达式的结果;而且副作用，也就是对对象的修改。

序列:

在表达式a和表达式B的求值之间存在序列点，意味着与a相关的每个值计算和副作用都在与B相关的每个值计算和副作用之前排序。

现在回到问题，对于像这样的表达

int i = 1;
i = i++;

标准说:

6.5表达式:

如果标量对象上的副作用相对于同一标量对象上的不同副作用或使用同一标量对象的值进行的值计算没有排序，则行为未定义。[…]

因此，上面的表达式调用UB，因为同一对象i上的两个副作用彼此之间没有顺序。这意味着赋值给i的副作用是在++的副作用之前还是之后，没有排序。根据赋值是发生在增量之前还是之后，将产生不同的结果，这就是未定义行为的情况之一。

将赋值左边的i重命名为il，赋值右边(表达式i++)重命名为ir，则表达式为

il = ir++     // Note that suffix l and r are used for the sake of clarity.
              // Both il and ir represents the same object.

关于postfix++操作符的重要一点是:

仅仅因为++出现在变量之后并不意味着增量发生得晚。只要编译器确保使用原始值，增量可以在编译器喜欢的时间发生。

这意味着表达式il = ir++可以被求值为

temp = ir;      // i = 1
ir = ir + 1;    // i = 2   side effect by ++ before assignment
il = temp;      // i = 1   result is 1

temp = ir;      // i = 1
il = temp;      // i = 1   side effect by assignment before ++
ir = ir + 1;    // i = 2   result is 2

导致两个不同的结果1和2，这取决于通过赋值和++的副作用的顺序，因此调用UB。

2015-06-27 00:27:48

其他回答

虽然任何编译器和处理器实际上都不太可能这样做，但在C标准下，编译器使用以下序列实现"i++"是合法的:

In a single operation, read `i` and lock it to prevent access until further notice
Compute (1+read_value)
In a single operation, unlock `i` and store the computed value

虽然我不认为有任何处理器支持这样的硬件，可以有效地完成这样的事情，人们可以很容易地想象这样的行为会使多线程代码更容易(例如，它将保证如果两个线程同时尝试执行上面的序列，I将增加2)，而且未来的某些处理器可能会提供类似的功能，这并不是完全不可想象的。

If the compiler were to write i++ as indicated above (legal under the standard) and were to intersperse the above instructions throughout the evaluation of the overall expression (also legal), and if it didn't happen to notice that one of the other instructions happened to access i, it would be possible (and legal) for the compiler to generate a sequence of instructions that would deadlock. To be sure, a compiler would almost certainly detect the problem in the case where the same variable i is used in both places, but if a routine accepts references to two pointers p and q, and uses (*p) and (*q) in the above expression (rather than using i twice) the compiler would not be required to recognize or avoid the deadlock that would occur if the same object's address were passed for both p and q.

2012-12-05 18:30:27

虽然像a = a++或a++ + a++这样的表达式的语法是合法的，但这些结构的行为是未定义的，因为在C标准中不遵守shall。C99 6.5 p2:

在前一个序列点和下一个序列点之间，通过表达式求值，对象的存储值最多修改一次。[72]此外，前面的值只能被读取，以确定要存储的值[73]

脚注73进一步澄清

本段给出了未定义的语句表达式，如 I = ++ I + 1; A [i++] = i; 同时允许 I = I + 1; A [i] = i;

各序列点列于C11(和C99)的附件C:

The following are the sequence points described in 5.1.2.3: Between the evaluations of the function designator and actual arguments in a function call and the actual call. (6.5.2.2). Between the evaluations of the first and second operands of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); comma , (6.5.17). Between the evaluations of the first operand of the conditional ? : operator and whichever of the second and third operands is evaluated (6.5.15). The end of a full declarator: declarators (6.7.6); Between the evaluation of a full expression and the next full expression to be evaluated. The following are full expressions: an initializer that is not part of a compound literal (6.7.9); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the (optional) expressions of a for statement (6.8.5.3); the (optional) expression in a return statement (6.8.6.4). Immediately before a library function returns (7.1.4). After the actions associated with each formatted input/output function conversion specifier (7.21.6, 7.29.2). Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.22.5).

C11同一段的措词是:

如果标量对象上的副作用相对于同一标量对象上的不同副作用或使用同一标量对象的值进行的值计算没有排序，则行为未定义。如果一个表达式的子表达式有多个允许的顺序，那么如果这种未排序的副作用出现在任意一个顺序中，则该行为是未定义的。

您可以在程序中检测此类错误，例如使用带有-Wall和-Werror的最新版本的GCC，然后GCC将直接拒绝编译您的程序。gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005的输出如下:

% gcc plusplus.c -Wall -Werror -pedantic
plusplus.c: In function ‘main’:
plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
    i = i++ + ++i;
    ~~^~~~~~~~~~~
plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
plusplus.c:10:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
    i = (i++);
    ~~^~~~~~~
plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
    u = u++ + ++u;
    ~~^~~~~~~~~~~
plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
plusplus.c:18:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
    u = (u++);
    ~~^~~~~~~
plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
    v = v++ + ++v;
    ~~^~~~~~~~~~~
plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
cc1: all warnings being treated as errors

重要的部分是知道什么是序列点，什么是序列点，什么不是。例如，逗号操作符是一个序列点，所以

j = (i ++, ++ i);

定义良好，并将I加1，得到旧值，丢弃旧值;然后在逗号运算符，解决副作用;然后将I加1，结果值就变成了表达式的值——也就是说，这只是一种写j = (I += 2)的人为方法，这也是一种“聪明”的写法

i += 2;
j = i;

然而，函数参数列表中的，不是逗号操作符，并且在不同参数的计算之间没有序列点;相反，他们对彼此的评价是没有顺序的;函数调用

int i = 0;
printf("%d %d\n", i++, ++i, i);

具有未定义的行为，因为在函数参数中i++和++i的计算之间没有序列点，因此i的值在前一个序列点和下一个序列点之间被i++和++i修改了两次。

2017-03-26 14:58:07

ISO W14站点的文档n1188提供了关于这种计算的一个很好的解释。

我解释这些想法。

适用于这种情况的ISO 9899标准的主要规则是6.5p2。

在前一个序列点和下一个序列点之间，通过表达式求值，对象的存储值最多修改一次。此外，前面的值只能被读取，以确定要存储的值。

像i=i++这样的表达式中的序列点在i=i之前和i++之后。

In the paper that I quoted above it is explained that you can figure out the program as being formed by small boxes, each box containing the instructions between 2 consecutive sequence points. The sequence points are defined in annex C of the standard, in the case of i=i++ there are 2 sequence points that delimit a full-expression. Such an expression is syntactically equivalent with an entry of expression-statement in the Backus-Naur form of the grammar (a grammar is provided in annex A of the Standard).

所以盒子里的指令顺序没有明确的顺序。

i=i++

可以解释为

tmp = i
i=i+1
i = tmp

或者是

tmp = i
i = tmp
i=i+1

因为解释代码i=i++的所有这些形式都是有效的，并且都生成不同的答案，所以行为是未定义的。

因此，序列点可以从组成程序的每个方框的开始和结束处看到(方框是C语言中的原子单位)，并且在方框中，指令的顺序并不在所有情况下都是定义的。改变顺序有时会改变结果。

编辑:

其他解释这种歧义的很好的来源是c-faq网站(也出版了一本书)的条目，即这里、这里和这里。

2017-10-13 13:58:04

只要编译和反汇编你的代码行，如果你如此倾向于知道它是如何得到你所得到的。

这是我从我的机器上得到的，以及我认为正在发生的事情:

$ cat evil.c
void evil(){
  int i = 0;
  i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
   0x00000000 <+0>:   push   %ebp
   0x00000001 <+1>:   mov    %esp,%ebp
   0x00000003 <+3>:   sub    $0x10,%esp
   0x00000006 <+6>:   movl   $0x0,-0x4(%ebp)  // i = 0   i = 0
   0x0000000d <+13>:  addl   $0x1,-0x4(%ebp)  // i++     i = 1
   0x00000011 <+17>:  mov    -0x4(%ebp),%eax  // j = i   i = 1  j = 1
   0x00000014 <+20>:  add    %eax,%eax        // j += j  i = 1  j = 2
   0x00000016 <+22>:  add    %eax,-0x4(%ebp)  // i += j  i = 3
   0x00000019 <+25>:  addl   $0x1,-0x4(%ebp)  // i++     i = 4
   0x0000001d <+29>:  leave  
   0x0000001e <+30>:  ret
End of assembler dump.

(我…假设0x00000014指令是某种编译器优化?)

2010-05-24 13:26:05

回答这个问题的另一种方法是，不要陷入序列点和未定义行为的神秘细节中，而是简单地问，它们应该是什么意思?程序员试图做什么?

第一个片段，i = i++ + ++i，在我的书中很明显是疯狂的。没有人会把它写进真正的程序中，它的功能并不明显，没有人会尝试编写的算法会导致这种特殊的人为操作序列。因为你我都不清楚它应该做什么，所以在我的书中，如果编译器不能弄清楚它应该做什么也没关系。

第二个片段i = i++比较容易理解。很明显，有人试图增加i，并将结果赋值回给i。但是在c中有几种方法可以做到这一点。将1加到i，并将结果赋值回给i，这在几乎任何编程语言中都是相同的:

i = i + 1

当然，C有一个方便的快捷方式:

i++

这意味着，“将1加到i，并将结果赋给i”。所以如果我们通过写作来构造一个两者的大杂烩

i = i++

我们真正说的是“给i加1，然后把结果赋给i，再把结果赋给i”我们感到困惑，所以如果编译器也感到困惑，也不会太困扰我。

实际上，只有当人们将这些疯狂的表达式用作c++应该如何工作的人为示例时，才会写出这些疯狂的表达式。当然，理解++的工作原理也很重要。但使用++的一个实际规则是，“如果使用++的表达式的含义不明显，就不要写它。”

我们曾经在comp.lang.c上花了无数个小时讨论这样的表达式以及为什么它们是未定义的。我的两个较长的回答，试图真正解释为什么，被存档在网络上:

为什么标准没有定义它们的作用? 运算符的优先级不是决定求值的顺序吗?

请参见问题3.8和C常见问题列表第三部分的其他问题。

2015-06-18 11:55:45

为什么这些构造使用增量前和增量后未定义的行为?

推荐文章

最新文章

标签