下面的代码在第2行接收到seg错误:
char *str = "string";
str[0] = 'z'; // could be also written as *str = 'z'
printf("%s\n", str);
虽然这个方法非常有效:
char str[] = "string";
str[0] = 'z';
printf("%s\n", str);
用MSVC和GCC测试。
下面的代码在第2行接收到seg错误:
char *str = "string";
str[0] = 'z'; // could be also written as *str = 'z'
printf("%s\n", str);
虽然这个方法非常有效:
char str[] = "string";
str[0] = 'z';
printf("%s\n", str);
用MSVC和GCC测试。
因为在第一个例子的上下文中,“whatever”的类型是const char*(即使你将它赋值给一个非const char*),这意味着你不应该尝试写它。
编译器通过将字符串放在内存的只读部分来强制执行这一点,因此写入它会产生段错误。
通常,当程序运行时,字符串字面值存储在只读内存中。这是为了防止您意外地更改字符串常量。在第一个例子中,"string"存储在只读内存中,*str指向第一个字符。当您试图将第一个字符更改为'z'时,会发生段错误。
在第二个例子中,字符串"string"被编译器从其只读母数组复制到str[]数组中。然后允许更改第一个字符。你可以通过打印每个地址来检查:
printf("%p", str);
同样,在第二个例子中打印str的大小会显示编译器已经为它分配了7个字节:
printf("%d", sizeof(str));
char *str = "string";
上面的代码将str设置为指向在程序的二进制映像中硬编码的字面值“string”,它在内存中可能被标记为只读。
因此str[0]=试图写入应用程序的只读代码。我猜这可能依赖于编译器。
在第一个代码中,"string"是一个字符串常量,字符串常量永远不应该被修改,因为它们通常被放置在只读内存中。"str"是一个用来修改常量的指针。
在第二段代码中,"string"是一个数组初始化器,类似于
char str[7] = { 's', 't', 'r', 'i', 'n', 'g', '\0' };
"str"是堆栈上分配的数组,可以自由修改。
char *str = "string";
分配一个指向字符串字面量的指针,编译器将其放入可执行文件中不可修改的部分;
char str[] = "string";
分配并初始化一个可修改的本地数组
像“String”这样的字符串文字可能在可执行文件的地址空间中作为只读数据分配(通过编译器)。当你去触摸它时,它会害怕你在它的泳衣区,并让你知道一个隔离错误。
在第一个例子中,你得到一个指向const数据的指针。在第二个示例中,使用const数据的副本初始化一个7个字符的数组。
The
char *str = "string";
Line定义了一个指针,并将其指向一个字面值字符串。字面值字符串是不可写的,所以当你这样做:
str[0] = 'z';
你会得到一个隔离失误。在某些平台上,字面值可能位于可写内存中,因此您不会看到段错误,但无论如何它都是无效代码(导致未定义的行为)。
线:
char str[] = "string";
分配一个字符数组并将字面值字符串复制到该数组中,该数组是完全可写的,因此后续更新没有问题。
参见C常见问题,问题1.32
Q: What is the difference between these initializations? char a[] = "string literal"; char *p = "string literal"; My program crashes if I try to assign a new value to p[i]. A: A string literal (the formal term for a double-quoted string in C source) can be used in two slightly different ways: As the initializer for an array of char, as in the declaration of char a[] , it specifies the initial values of the characters in that array (and, if necessary, its size). Anywhere else, it turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the second declaration initializes p to point to the unnamed array's first element. Some compilers have a switch controlling whether string literals are writable or not (for compiling old code), and some may have options to cause string literals to be formally treated as arrays of const char (for better error catching).
首先,str是一个指向"string"的指针。编译器允许将字符串字面量放在内存中不能写入,但只能读取的地方。(这真的应该触发一个警告,因为你将一个const char *分配给一个char *。你是禁用了警告,还是忽略了它们?)
第二,你在创建一个数组,它是你可以完全访问的内存,并用"string"初始化它。您正在创建一个字符[7](六个用于字母,一个用于结尾的'\0'),您可以对它做任何您喜欢的事情。
The C FAQ that @matli linked to mentions it, but no one else here has yet, so for clarification: if a string literal (double-quoted string in your source) is used anywhere other than to initialize a character array (ie: @Mark's second example, which works correctly), that string is stored by the compiler in a special static string table, which is akin to creating a global static variable (read-only, of course) that is essentially anonymous (has no variable "name"). The read-only part is the important part, and is why the @Mark's first code example segfaults.
这些答案大部分都是正确的,但为了更清楚一点……
人们所说的“只读内存”是ASM术语中的文本段。它是内存中加载指令的同一个地方。出于安全等明显的原因,这是只读的。当创建一个初始化为字符串的char*时,字符串数据被编译到文本段中,程序初始化指向文本段的指针。所以如果你想改变它,就死定了。段错误。
当作为数组编写时,编译器将初始化的字符串数据放在数据段中,这与全局变量等存在的位置相同。这个内存是可变的,因为数据段中没有指令。这一次,当编译器初始化字符数组(仍然只是一个char*)时,它指向的是数据段而不是文本段,您可以在运行时安全地更改文本段。
// create a string constant like this - will be read only
char *str_p;
str_p = "String constant";
// create an array of characters like this
char *arr_p;
char arr[] = "String in an array";
arr_p = &arr[0];
// now we try to change a character in the array first, this will work
*arr_p = 'E';
// lets try to change the first character of the string contant
*str_p = 'G'; // this will result in a segmentation fault. Comment it out to work.
/*-----------------------------------------------------------------------------
* String constants can't be modified. A segmentation fault is the result,
* because most operating systems will not allow a write
* operation on read only memory.
*-----------------------------------------------------------------------------*/
//print both strings to see if they have changed
printf("%s\n", str_p); //print the string without a variable
printf("%s\n", arr_p); //print the string, which is in an array.
当您试图访问不可访问的内存时,会导致分割错误。
Char *str是一个指向不可修改的字符串的指针(这是导致segfault的原因)。
而char str[]是一个数组,可以修改。
要理解这个错误或问题,您应该首先了解指针和数组的差异b/w 所以在这里,我首先要解释一下它们的区别
字符串数组
char strarray[] = "hello";
在存储器数组中存储的是连续存储器单元,存储为[h][e][l][l][o][\0] =>[]是1个char字节大小的存储器单元,而这个连续存储器单元可以通过名为strarray的名称在这里访问。这里string数组strarray本身包含初始化的所有字符串。在这种情况下,"hello" 因此,我们可以通过访问每个字符的索引值来轻松地更改其内存内容
`strarray[0]='m'` it access character at index 0 which is 'h'in strarray
它的值变成了m所以strarray的值变成了mello;
这里需要注意的一点是,我们可以通过一个字符一个字符地改变字符串数组的内容,但不能像strarray="new string"这样直接初始化其他字符串,这是无效的
指针
我们都知道指针指向内存中的内存位置, 未初始化的指针指向随机内存位置,初始化后指向特定内存位置
char *ptr = "hello";
这里的指针ptr被初始化为字符串“hello”,这是一个存储在只读存储器(ROM)中的常量字符串,所以“hello”不能被更改,因为它存储在ROM中
PTR存储在堆栈部分并指向常量字符串"hello"
所以ptr[0]='m'是无效的,因为你不能访问只读内存
但是ptr可以直接初始化为其他字符串值,因为它只是一个指针,所以它可以指向其数据类型变量的任何内存地址
ptr="new string"; is valid
为什么我得到一个分割错误时写入字符串?
c99n1256草案
字符串字面量有两种不同的用法:
Initialize char[]: char c[] = "abc"; This is "more magic", and described at 6.7.8/14 "Initialization": An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. So this is just a shortcut for: char c[] = {'a', 'b', 'c', '\0'}; Like any other regular array, c can be modified. Everywhere else: it generates an: unnamed array of char What is the type of string literals in C and C++? with static storage that gives UB if modified So when you write: char *c = "abc"; This is similar to: /* __unnamed is magic because modifying it gives UB. */ static char __unnamed[] = "abc"; char *c = __unnamed; Note the implicit cast from char[] to char *, which is always legal. Then if you modify c[0], you also modify __unnamed, which is UB. This is documented at 6.4.5 "String literals": 5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence [...] 6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
6.7.8/32“初始化”给出了一个直接的例子:
EXAMPLE 8: The declaration char s[] = "abc", t[3] = "abc"; defines "plain" char array objects s and t whose elements are initialized with character string literals. This declaration is identical to char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; The contents of the arrays are modifiable. On the other hand, the declaration char *p = "abc"; defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF实现
计划:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
编译和反编译:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
输出包含:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
结论:GCC将char* it存储在.rodata部分,而不是在.text中。
如果我们对char[]做同样的操作:
char s[] = "abc";
我们获得:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
因此它被存储在堆栈中(相对于%rbp)。
但是请注意,默认的链接器脚本将.rodata和.text放在同一个段中,该段有执行权限,但没有写权限。这可以观察到:
readelf -l a.out
它包含:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
假设字符串是,
char a[] = "string literal copied to stack";
char *p = "string literal referenced by p";
在第一种情况下,当'a'进入作用域时,文字将被复制。这里'a'是定义在stack上的数组。这意味着字符串将在堆栈上创建,其数据从代码(文本)内存中复制,通常是只读的(这是特定于实现的,编译器也可以将这种只读的程序数据放在可读写内存中)。
在第二种情况下,p是定义在堆栈(本地作用域)上的指针,并引用存储在其他位置的字符串字面量(程序数据或文本)。通常,修改这样的内存不是好的实践,也不鼓励。
5.5节K&R的字符指针和功能也讨论了这个主题:
There is an important difference between these definitions: char amessage[] = "now is the time"; /* an array */ char *pmessage = "now is the time"; /* a pointer */ amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed but amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.
不变的记忆
由于字符串字面量在设计上是只读的,所以它们存储在内存的Constant部分。存储在那里的数据是不可变的,即不能被更改。因此,在C代码中定义的所有字符串字面值在这里都获得一个只读内存地址。
栈内存
内存的堆栈部分是存放局部变量地址的地方,例如,函数中定义的变量。
正如@matli的回答所暗示的,有两种方法来处理这些常量字符串。
1. 指向字符串字面量的指针
当我们定义指向字符串字面量的指针时,我们是在Stack内存中创建一个指针变量。它指向底层字符串字面值所在的只读地址。
#include <stdio.h>
int main(void) {
char *s = "hello";
printf("%p\n", &s); // Prints a read-only address, e.g. 0x7ffc8e224620
return 0;
}
如果我们试图通过插入来修改s
s[0] = 'H';
我们得到一个分割错误(核心转储)。我们试图访问不应该访问的内存。我们正在尝试修改只读地址0x7ffc8e224620的值。
2. 字符数组
对于示例而言,假设存储在常量内存中的字符串字面值“Hello”具有与上述地址相同的只读内存地址0x7ffc8e224620。
#include <stdio.h>
int main(void) {
// We create an array from a string literal with address 0x7ffc8e224620.
// C initializes an array variable in the stack, let's give it address
// 0x7ffc7a9a9db2.
// C then copies the read-only value from 0x7ffc8e224620 into
// 0x7ffc7a9a9db2 to give us a local copy we can mutate.
char a[] = "hello";
// We can now mutate the local copy
a[0] = 'H';
printf("%p\n", &a); // Prints the Stack address, e.g. 0x7ffc7a9a9db2
printf("%s\n", a); // Prints "Hello"
return 0;
}
注意:当使用指针指向字符串字面量时,如1。,最好的做法是使用const关键字,如const *s = "hello"。这样可读性更强,并且当它被违反时,编译器将提供更好的帮助。然后它将抛出类似error:分配只读位置' *s '的错误,而不是seg错误。编辑器中的linter也可能在手动编译代码之前发现错误。