在C语言中,可以在这样的声明中使用字符串字面值:

char s[] = "hello";

或者像这样:

char *s = "hello";

那么有什么不同呢?我想知道在编译和运行时,在存储持续时间方面实际发生了什么。


当前回答

补充一点:它们的大小也会有不同的值。

printf("sizeof s[] = %zu\n", sizeof(s));  //6
printf("sizeof *s  = %zu\n", sizeof(s));  //4 or 8

如上所述,对于数组'\0'将被分配为最后一个元素。

其他回答

举个例子来说明区别:

printf("hello" + 2); //llo
char a[] = "hello" + 2; //error

在第一种情况下,指针算术是有效的(传递给函数的数组衰减为指针)。

不同之处在于

char *s = "Hello world";

将“Hello world”放置在内存的只读部分,并将s作为指向它的指针,使得对该内存的任何写入操作都是非法的。

虽然做的事情:

char s[] = "Hello world";

将字面值字符串放在只读内存中,并将字符串复制到堆栈上新分配的内存中。从而使

s[0] = 'J';

合法的。

首先,在函数参数中,它们是完全等价的:

void foo(char *x);
void foo(char x[]); // exactly the same in all respects

在其他上下文中,char *分配一个指针,而char[]分配一个数组。你会问,在前一种情况下,弦在哪里?编译器秘密地分配一个静态匿名数组来保存字符串字面值。所以:

char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;

注意,你不能试图通过这个指针修改这个匿名数组的内容;效果是未定义的(通常意味着崩溃):

x[1] = 'O'; // BAD. DON'T DO THIS.

使用数组语法直接将其分配到新的内存中。因此修改是安全的:

char x[] = "Foo";
x[1] = 'O'; // No problem.

然而,数组只在其包含范围内存在,所以如果在函数中这样做,不要返回或泄漏指向该数组的指针——而是使用strdup()或类似的方法进行复制。如果数组是在全局范围内分配的,当然没有问题。

在下列情况下:

char *x = "fred";

X是左值,它可以被赋值给。但在这种情况下:

char x[] = "fred";

X不是一个左值,它是一个右值——你不能给它赋值。

c99n1256草案

字符串字面量有两种不同的用法:

Initialize char[]: char c[] = "abc"; This is "more magic", and described at 6.7.8/14 "Initialization": An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. So this is just a shortcut for: char c[] = {'a', 'b', 'c', '\0'}; Like any other regular array, c can be modified. Everywhere else: it generates an: unnamed array of char What is the type of string literals in C and C++? with static storage that gives UB if modified So when you write: char *c = "abc"; This is similar to: /* __unnamed is magic because modifying it gives UB. */ static char __unnamed[] = "abc"; char *c = __unnamed; Note the implicit cast from char[] to char *, which is always legal. Then if you modify c[0], you also modify __unnamed, which is UB. This is documented at 6.4.5 "String literals": 5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence [...] 6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

6.7.8/32“初始化”给出了一个直接的例子:

EXAMPLE 8: The declaration char s[] = "abc", t[3] = "abc"; defines "plain" char array objects s and t whose elements are initialized with character string literals. This declaration is identical to char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; The contents of the arrays are modifiable. On the other hand, the declaration char *p = "abc"; defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

GCC 4.8 x86-64 ELF实现

计划:

#include <stdio.h>

int main(void) {
    char *s = "abc";
    printf("%s\n", s);
    return 0;
}

编译和反编译:

gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o

输出包含:

 char *s = "abc";
8:  48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
f:  00 
        c: R_X86_64_32S .rodata

结论:GCC将char* it存储在.rodata部分,而不是在.text中。

但是请注意,默认的链接器脚本将.rodata和.text放在同一个段中,该段有执行权限,但没有写权限。这可以观察到:

readelf -l a.out

它包含:

 Section to Segment mapping:
  Segment Sections...
   02     .text .rodata

如果我们对char[]做同样的操作:

 char s[] = "abc";

我们获得:

17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

因此它被存储在堆栈中(相对于%rbp)。