我最近开始学习C语言,我正在上一门以C为主题的课程。我目前正在玩循环,我遇到了一些奇怪的行为,我不知道如何解释。

#include <stdio.h>

int main()
{
  int array[10],i;

  for (i = 0; i <=10 ; i++)
  {
    array[i]=0; /*code should never terminate*/
    printf("test \n");

  }
  printf("%d \n", sizeof(array)/sizeof(int));
  return 0;
}

在我运行Ubuntu 14.04的笔记本电脑上,这段代码没有崩溃。它运行到完成。在我学校运行CentOS 6.6的电脑上,它也运行得很好。在Windows 8.1上,循环永远不会终止。

更奇怪的是,当我将for循环的条件编辑为:I <= 11时,代码只在运行Ubuntu的笔记本电脑上终止。它永远不会在CentOS和Windows中终止。

有人能解释一下内存中发生了什么吗?为什么运行相同代码的不同操作系统会产生不同的结果?

编辑:我知道for循环越界了。我是故意这么做的。我只是不明白在不同的操作系统和计算机上,这种行为是如何不同的。


当前回答

In what should be the last run of the loop,you write to array[10], but there are only 10 elements in the array, numbered 0 through 9. The C language specification says that this is “undefined behavior”. What this means in practice is that your program will attempt to write to the int-sized piece of memory that lies immediately after array in memory. What happens then depends on what does, in fact, lie there, and this depends not only on the operating system but more so on the compiler, on the compiler options (such as optimization settings), on the processor architecture, on the surrounding code, etc. It could even vary from execution to execution, e.g. due to address space randomization (probably not on this toy example, but it does happen in real life). Some possibilities include:

The location wasn't used. The loop terminates normally. The location was used for something which happened to have the value 0. The loop terminates normally. The location contained the function's return address. The loop terminates normally, but then the program crashes because it tries to jump to the address 0. The location contains the variable i. The loop never terminates because i restarts at 0. The location contains some other variable. The loop terminates normally, but then “interesting” things happen. The location is an invalid memory address, e.g. because array is right at the end of a virtual memory page and the next page isn't mapped. Demons fly out of your nose. Fortunately most computers lack the requisite hardware.

What you observed on Windows was that the compiler decided to place the variable i immediately after the array in memory, so array[10] = 0 ended up assigning to i. On Ubuntu and CentOS, the compiler didn't place i there. Almost all C implementations do group local variables in memory, on a memory stack, with one major exception: some local variables can be placed entirely in registers. Even if the variable is on the stack, the order of variables is determined by the compiler, and it may depend not only on the order in the source file but also on their types (to avoid wasting memory to alignment constraints that would leave holes), on their names, on some hash value used in a compiler's internal data structure, etc.

如果你想知道你的编译器决定做什么,你可以告诉它给你看汇编代码。哦,还要学习破译汇编程序(这比编写汇编程序容易)。在GCC(以及其他一些编译器,特别是Unix世界)中,传递选项-S来生成汇编代码而不是二进制代码。例如,下面是使用GCC在amd64上编译循环的汇编程序片段,使用优化选项-O0(无优化),并手动添加注释:

.L3:
    movl    -52(%rbp), %eax           ; load i to register eax
    cltq
    movl    $0, -48(%rbp,%rax,4)      ; set array[i] to 0
    movl    $.LC0, %edi
    call    puts                      ; printf of a constant string was optimized to puts
    addl    $1, -52(%rbp)             ; add 1 to i
.L2:
    cmpl    $10, -52(%rbp)            ; compare i to 10
    jle     .L3

在这里,变量i位于堆栈顶部下方52个字节,而数组则从堆栈顶部下方48个字节开始。所以这个编译器刚好把i放在数组的前面;如果你碰巧写数组[-1],你会覆盖I。如果你将array[i]=0改为array[9-i]=0,你将在这个特定的平台上使用这些特定的编译器选项得到一个无限循环。

现在让我们用gcc -O1编译您的程序。

    movl    $11, %ebx
.L3:
    movl    $.LC0, %edi
    call    puts
    subl    $1, %ebx
    jne     .L3

那是短!编译器不仅拒绝为i分配堆栈位置——它只存储在寄存器ebx中——而且它也没有为数组分配任何内存,或者生成代码来设置它的元素,因为它注意到没有一个元素被使用过。

为了使这个例子更有说服力,让我们确保数组赋值是通过向编译器提供它无法优化的东西来执行的。一种简单的方法是使用来自另一个文件的数组——由于单独编译,编译器不知道在另一个文件中发生了什么(除非它在链接时进行优化,而gcc - o0或gcc - o1不会进行优化)。创建一个源文件use_array.c,包含

void use_array(int *array) {}

并将源代码更改为

#include <stdio.h>
void use_array(int *array);

int main()
{
  int array[10],i;

  for (i = 0; i <=10 ; i++)
  {
    array[i]=0; /*code should never terminate*/
    printf("test \n");

  }
  printf("%zd \n", sizeof(array)/sizeof(int));
  use_array(array);
  return 0;
}

编译和

gcc -c use_array.c
gcc -O1 -S -o with_use_array1.c with_use_array.c use_array.o

这次汇编代码看起来像这样:

    movq    %rsp, %rbx
    leaq    44(%rsp), %rbp
.L3:
    movl    $0, (%rbx)
    movl    $.LC0, %edi
    call    puts
    addq    $4, %rbx
    cmpq    %rbp, %rbx
    jne     .L3

Now the array is on the stack, 44 bytes from the top. What about i? It doesn't appear anywhere! But the loop counter is kept in the register rbx. It's not exactly i, but the address of the array[i]. The compiler has decided that since the value of i was never used directly, there was no point in performing arithmetic to calculate where to store 0 during each run of the loop. Instead that address is the loop variable, and the arithmetic to determine the boundaries was performed partly at compile time (multiply 11 iterations by 4 bytes per array element to get 44) and partly at run time but once and for all before the loop starts (perform a subtraction to get the initial value).

Even on this very simple example, we've seen how changing compiler options (turn on optimization) or changing something minor (array[i] to array[9-i]) or even changing something apparently unrelated (adding the call to use_array) can make a significant difference to what the executable program generated by the compiler does. Compiler optimizations can do a lot of things that may appear unintuitive on programs that invoke undefined behavior. That's why undefined behavior is left completely undefined. When you deviate ever so slightly from the tracks, in real-world programs, it can be very hard to understand the relationship between what the code does and what it should have done, even for experienced programmers.

其他回答

漏洞存在于以下代码段之间:

int array[10],i;

for (i = 0; i <=10 ; i++)

array[i]=0;

由于数组只有10个元素,在最后一次迭代中数组[10]= 0;是缓冲区溢出。缓冲区溢出是未定义的行为,这意味着它们可能格式化您的硬盘驱动器或导致恶魔从您的鼻子里飞出来。

所有的堆栈变量都是相邻排列的,这是很常见的。如果i位于数组[10]写入的位置,则UB将i重置为0,从而导致未终止循环。

要修复,将循环条件更改为i < 10。

除了内存布局的可能性之外,写入[10]的尝试实际上会覆盖i,优化编译器也可能会确定,如果代码没有首先访问不存在的数组元素[10],则i值大于10时无法达到循环测试。

Since an attempt to access that element would be undefined behavior, the compiler would have no obligations with regard to what the program might do after that point. More specifically, since the compiler would have no obligation to generate code to check the loop index in any case where it might be greater than ten, it would have no obligation to generate code to check it at all; it could instead assume that the <=10 test will always yield true. Note that this would be true even if the code would read a[10] rather than writing it.

你声明int array[10]表示数组的索引为0到9(它总共可以容纳10个整数元素)。但是接下来的循环,

for (i = 0; i <=10 ; i++)

将循环0到10意味着11次。因此,当i = 10时,它将溢出缓冲区并导致未定义行为。

所以试试这个:

for (i = 0; i < 10 ; i++)

or,

for (i = 0; i <= 9 ; i++)

它在数组[10]上是未定义的,并给出了之前描述的未定义行为。你可以这样想:

我的购物车里有10样东西。它们是:

0:一盒麦片 1:面包 2:牛奶 3:派 4:鸡蛋 5:蛋糕 6:一杯2升的苏打水 7:沙拉 8:汉堡 9:冰淇淋

购物车[10]未定义,在某些编译器中可能会出现越界异常。但是,很多人显然没有。显而易见的第11件商品实际上并不在购物车中。第11项是指,我称之为“恶作剧项目”。它从未存在过,但它就在那里。

为什么一些编译器给i一个数组[10]或数组[11]甚至数组[-1]的索引是因为你的初始化/声明语句。一些编译器将其解释为:

“为数组[10]分配10个int块和另一个int块。为了方便起见,把它们放在一起。” 和之前一样,但是移动一两个空格,这样数组[10]就不指向i了。 执行与前面相同的操作,但是在数组[-1]处分配i(因为数组的索引不能或不应该为负),或者在完全不同的位置分配i,因为操作系统可以处理它,这样更安全。

一些编译器希望运行得更快,而一些编译器更喜欢安全。这一切都与环境有关。例如,如果我正在为古老的BREW操作系统(基本手机的操作系统)开发一款应用程序,它就不会关心安全性。如果我开发的是iPhone 6,那么它无论如何都能运行得很快,所以我需要强调安全性。(说真的,你读过苹果的应用商店指南吗,或者读过Swift和Swift 2.0的开发吗?)

你有一个边界违反,在非终止平台上,我相信你无意中在循环结束时将I设置为0,这样它就会重新开始。

数组[10]无效;它包含10个元素,从数组[0]到数组[9],数组[10]是第11个。你的循环应该在10之前停止,如下所示:

for (i = 0; i < 10; i++)

数组[10]落在实现定义的位置,有趣的是,在两个平台上,它落在i上,这些平台显然是直接放在数组后面。I被设置为0,循环将永远继续下去。对于其他平台,i可能位于array之前,或者array可能在它之后有一些填充。