如何确定导致分段错误的代码中的错误在哪里?

我的编译器(gcc)可以显示程序中错误的位置吗?


当前回答

此外,你可以给valgrind一个尝试:如果你安装valgrind并运行

valgrind --leak-check=full <program>

然后它将运行您的程序并显示任何段错误的堆栈跟踪,以及任何无效的内存读写和内存泄漏。它真的很有用。

其他回答

有很多工具可以帮助调试分割错误,我想把我最喜欢的工具添加到列表中:地址消毒器(通常缩写为ASAN)。

现代的编译器带有方便的-fsanitize=address标志,增加了一些编译时间和运行时开销,从而进行更多的错误检查。

根据文档,这些检查包括默认情况下捕获分段错误。这样做的好处是,您可以获得类似于gdb输出的堆栈跟踪,但无需在调试器中运行程序。一个例子:

int main() {
  volatile int *ptr = (int*)0;
  *ptr = 0;
}
$ gcc -g -fsanitize=address main.c
$ ./a.out
AddressSanitizer:DEADLYSIGNAL
=================================================================
==4848==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x5654348db1a0 bp 0x7ffc05e39240 sp 0x7ffc05e39230 T0)
==4848==The signal is caused by a WRITE memory access.
==4848==Hint: address points to the zero page.
    #0 0x5654348db19f in main /tmp/tmp.s3gwjqb8zT/main.c:3
    #1 0x7f0e5a052b6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)
    #2 0x5654348db099 in _start (/tmp/tmp.s3gwjqb8zT/a.out+0x1099)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/tmp.s3gwjqb8zT/main.c:3 in main
==4848==ABORTING

输出比gdb输出稍微复杂一些,但也有好处:

不需要重新生成问题来接收堆栈跟踪。在开发过程中启用标志就足够了。 asan捕捉到的不仅仅是分割错误。即使该内存区域可以被进程访问,许多越界访问也会被捕获。


¹即Clang 3.1+和GCC 4.8+。

Lucas关于核心转储的回答很好。在我的.cshrc中,我有:

alias core 'ls -lt core; echo where | gdb -core=core -silent; echo "\n"'

输入“core”显示反向跟踪。和日期戳,以确保我在看正确的文件:(。

补充:如果存在堆栈损坏错误,那么应用于核心转储的回溯通常是垃圾。在这种情况下,根据接受的答案(假设错误很容易重现),在gdb中运行程序可以得到更好的结果。同时也要注意多个进程同时转储核心;有些操作系统会将PID添加到核心文件的名称中。

如果您有一个可重复的异常,如分割错误,您可以使用像调试器这样的工具来重现错误。

我曾经找到源代码的位置,即使是不可复制的错误。它基于微软编译器工具链。但它是基于一个想法。

Save the MAP file for each binary (DLL,EXE) before you give it to the customer. If an exception occurs, lookup the address in the MAP file and determine the function whose start address is just below the exception address. As a result you know the function, where the exception occurred. Subtract the function start address from the exception address. The result is the offset in the function. Recompile the source file containing the function with assembly listing enabled. Extract the function's assembly listing. The assembly includes the offset of each instruction in the function. Lookup the source code line, that matches the offset in the function. Evaluate the assembler code for the specific source code line. The offset points exactly the assembler instruction that caused the thrown exception. Evaluate the code of this single source code line. With a bit of experience with the compiler output you can say what caused the exception. Be aware the reason for the exception might be at a totally different location. e.g. the code dereferenced a NULL pointer, but the actual reason, why the pointer is NULL can be somewhere else.

步骤6。和7。是有益的,因为您只要求代码行。但是我建议你应该注意这一点。

我希望GCC编译器在您的平台上也有类似的环境。如果没有可用的MAP文件,请使用工具链工具来获取函数的地址。我相信ELF文件格式支持这一点。

这是一种粗略的方法来找到确切的线之后,有分割故障。

定义线路记录函数

# include \ < iostream > 无效日志(int line) { Std::cout << line << Std::endl; }

查找并替换log函数后的所有分号为";日志(_LINE_);“ 确保在for(;;)循环中替换为函数的分号被删除

以上答案都是正确的,建议回答;如果前面提到的方法都不能使用,这个答案只是作为最后的手段。

If all else fails, you can always recompile your program with various temporary debug-print statements (e.g. fprintf(stderr, "CHECKPOINT REACHED @ %s:%i\n", __FILE__, __LINE__);) sprinkled throughout what you believe to be the relevant parts of your code. Then run the program, and observe what the was last debug-print printed just before the crash occurred -- you know your program got that far, so the crash must have happened after that point. Add or remove debug-prints, recompile, and run the test again, until you have narrowed it down to a single line of code. At that point you can fix the bug and remove all of the temporary debug-prints.

这很乏味,但它的优点是几乎可以在任何地方工作——如果由于某种原因无法访问stdout或stderr,或者如果您试图修复的bug是一个竞态条件,其行为会随着程序的时间变化而变化(因为调试打印会减慢程序并改变其时间),则可能无法工作。