在C语言中,可以在这样的声明中使用字符串字面值:
char s[] = "hello";
或者像这样:
char *s = "hello";
那么有什么不同呢?我想知道在编译和运行时,在存储持续时间方面实际发生了什么。
在C语言中,可以在这样的声明中使用字符串字面值:
char s[] = "hello";
或者像这样:
char *s = "hello";
那么有什么不同呢?我想知道在编译和运行时,在存储持续时间方面实际发生了什么。
当前回答
char s[] = "Hello world";
这里,s是一个字符数组,如果我们愿意,它可以被覆盖。
char *s = "hello";
字符串字面值用于在内存中的某个地方创建这些字符块。我们可以通过改变它所指向的对象来重新赋值,但只要它指向一个字符串字面值,它所指向的字符块就不能被改变。
其他回答
char s[] = "Hello world";
这里,s是一个字符数组,如果我们愿意,它可以被覆盖。
char *s = "hello";
字符串字面值用于在内存中的某个地方创建这些字符块。我们可以通过改变它所指向的对象来重新赋值,但只要它指向一个字符串字面值,它所指向的字符块就不能被改变。
不同之处在于
char *s = "Hello world";
将“Hello world”放置在内存的只读部分,并将s作为指向它的指针,使得对该内存的任何写入操作都是非法的。
虽然做的事情:
char s[] = "Hello world";
将字面值字符串放在只读内存中,并将字符串复制到堆栈上新分配的内存中。从而使
s[0] = 'J';
合法的。
举个例子来说明区别:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
在第一种情况下,指针算术是有效的(传递给函数的数组衰减为指针)。
char s[] = "hello";
声明s为一个char数组,其长度足以容纳初始化式(5 + 1个字符),并通过将给定字符串文字的成员复制到数组中来初始化数组。
char *s = "hello";
声明为指向一个或多个(在本例中是多个)字符的指针,并将其直接指向一个包含文字“hello”的固定(只读)位置。
c99n1256草案
字符串字面量有两种不同的用法:
Initialize char[]: char c[] = "abc"; This is "more magic", and described at 6.7.8/14 "Initialization": An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. So this is just a shortcut for: char c[] = {'a', 'b', 'c', '\0'}; Like any other regular array, c can be modified. Everywhere else: it generates an: unnamed array of char What is the type of string literals in C and C++? with static storage that gives UB if modified So when you write: char *c = "abc"; This is similar to: /* __unnamed is magic because modifying it gives UB. */ static char __unnamed[] = "abc"; char *c = __unnamed; Note the implicit cast from char[] to char *, which is always legal. Then if you modify c[0], you also modify __unnamed, which is UB. This is documented at 6.4.5 "String literals": 5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence [...] 6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
6.7.8/32“初始化”给出了一个直接的例子:
EXAMPLE 8: The declaration char s[] = "abc", t[3] = "abc"; defines "plain" char array objects s and t whose elements are initialized with character string literals. This declaration is identical to char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; The contents of the arrays are modifiable. On the other hand, the declaration char *p = "abc"; defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF实现
计划:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
编译和反编译:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
输出包含:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
结论:GCC将char* it存储在.rodata部分,而不是在.text中。
但是请注意,默认的链接器脚本将.rodata和.text放在同一个段中,该段有执行权限,但没有写权限。这可以观察到:
readelf -l a.out
它包含:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
如果我们对char[]做同样的操作:
char s[] = "abc";
我们获得:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
因此它被存储在堆栈中(相对于%rbp)。