为什么GCC中的C预处理器将单词linux(小字母)解释为常数1?
test.c:
#include <stdio.h>
int main(void)
{
int linux = 5;
return 0;
}
$ gcc -E test.c的结果(预处理阶段结束后停止):
....
int main(void)
{
int 1 = 5;
return 0;
}
这当然会产生一个错误。
(顺便说一下:在stdio.h文件中没有#define linux。)
In the Old Days (pre-ANSI), predefining symbols such as unix and vax was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdefs to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix and assumed that programmers would simply avoid using those names for their own purposes.
1989年的ANSI C标准引入了限制实现可以合法预定义的符号的规则。编译器预定义的宏的名称只能以两个下划线开头,或者下划线后面跟着一个大写字母,这使得程序员可以自由地使用与该模式不匹配的标识符,并且不在标准库中使用。
因此,任何预定义unix或linux的编译器都是不符合规范的,因为它将无法编译使用int linux = 5;之类的完全合法的代码。
碰巧的是,gcc在默认情况下是不一致的——但它可以与正确的命令行选项保持一致(相当好):
gcc -std=c90 -pedantic ... # or -std=c89 or -ansi
gcc -std=c99 -pedantic
gcc -std=c11 -pedantic
更多细节请参阅gcc手册。
GCC将在未来的版本中逐步淘汰这些定义,因此您不应该编写依赖于它们的代码。如果你的程序需要知道它是否正在为Linux目标编译,它可以检查是否定义了__linux__(假设你使用的是gcc或与之兼容的编译器)。有关更多信息,请参阅GNU C预处理器手册。
一个无关紧要的题外话:1987年国际混淆C代码大赛的“最佳一行代码”得主,由David Korn(是的,Korn Shell的作者)利用了预定义的unix宏:
main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
它会打印“unix”,但原因与宏名的拼写完全没有关系。
我不想在这里发布任何剧透,我鼓励阅读这篇文章的人先试着自己理解代码。但如果你真的想放弃,我在这里发布了一个解释:
https://gist.github.com/Keith-S-Thompson/6920347
从info gcc(强调我的):
-ansi
In C mode, this is equivalent to -std=c90. In C++ mode, it is
equivalent to -std=c++98.
This turns off certain features of GCC that are incompatible with
ISO C90 (when compiling C code), or of standard C++ (when compiling
C++ code), such as the asm and typeof keywords, and predefined
macros such as 'unix' and 'vax' that identify the type of system
you are using. It also enables the undesirable and rarely used ISO
trigraph feature. For the C compiler, it disables recognition of
C++ style // comments as well as the inline keyword.
(它在示例中使用vax而不是linux,因为它在编写时可能更受欢迎;-)。
其基本思想是,当GCC使用-ansi选项调用时,它只试图完全符合ISO标准。
因为linux是一个内置宏,当编译器运行在linux上或为linux编译(如果它是交叉编译器)时定义。
有很多这样的预定义宏。使用GCC,您可以使用:
cp /dev/null emptyfile.c
gcc -E -dM emptyfile.c
获取宏的列表。(我还没有设法说服GCC直接接受/dev/null,但是
空文件似乎工作正常。)在Mac OS X 10.8.5上运行GCC 4.8.1,我得到了输出:
#define __DBL_MIN_EXP__ (-1021)
#define __UINT_LEAST16_MAX__ 65535
#define __ATOMIC_ACQUIRE 2
#define __FLT_MIN__ 1.17549435082228750797e-38F
#define __UINT_LEAST8_TYPE__ unsigned char
#define __INTMAX_C(c) c ## L
#define __CHAR_BIT__ 8
#define __UINT8_MAX__ 255
#define __WINT_MAX__ 2147483647
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __SIZE_MAX__ 18446744073709551615UL
#define __WCHAR_MAX__ 2147483647
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
#define __DBL_DENORM_MIN__ ((double)4.94065645841246544177e-324L)
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
#define __FLT_EVAL_METHOD__ 0
#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __x86_64 1
#define __UINT_FAST64_MAX__ 18446744073709551615ULL
#define __SIG_ATOMIC_TYPE__ int
#define __DBL_MIN_10_EXP__ (-307)
#define __FINITE_MATH_ONLY__ 0
#define __GNUC_PATCHLEVEL__ 1
#define __UINT_FAST8_MAX__ 255
#define __DEC64_MAX_EXP__ 385
#define __INT8_C(c) c
#define __UINT_LEAST64_MAX__ 18446744073709551615ULL
#define __SHRT_MAX__ 32767
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __UINT_LEAST8_MAX__ 255
#define __GCC_ATOMIC_BOOL_LOCK_FREE 2
#define __APPLE_CC__ 1
#define __UINTMAX_TYPE__ long unsigned int
#define __DEC32_EPSILON__ 1E-6DF
#define __UINT32_MAX__ 4294967295U
#define __LDBL_MAX_EXP__ 16384
#define __WINT_MIN__ (-__WINT_MAX__ - 1)
#define __SCHAR_MAX__ 127
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __INT64_C(c) c ## LL
#define __DBL_DIG__ 15
#define __GCC_ATOMIC_POINTER_LOCK_FREE 2
#define __SIZEOF_INT__ 4
#define __SIZEOF_POINTER__ 8
#define __USER_LABEL_PREFIX__ _
#define __STDC_HOSTED__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __FLT_EPSILON__ 1.19209289550781250000e-7F
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __DEC32_MAX__ 9.999999E96DF
#define __strong
#define __INT32_MAX__ 2147483647
#define __SIZEOF_LONG__ 8
#define __APPLE__ 1
#define __UINT16_C(c) c
#define __DECIMAL_DIG__ 21
#define __LDBL_HAS_QUIET_NAN__ 1
#define __DYNAMIC__ 1
#define __GNUC__ 4
#define __MMX__ 1
#define __FLT_HAS_DENORM__ 1
#define __SIZEOF_LONG_DOUBLE__ 16
#define __BIGGEST_ALIGNMENT__ 16
#define __DBL_MAX__ ((double)1.79769313486231570815e+308L)
#define __INT_FAST32_MAX__ 2147483647
#define __DBL_HAS_INFINITY__ 1
#define __DEC32_MIN_EXP__ (-94)
#define __INT_FAST16_TYPE__ short int
#define __LDBL_HAS_DENORM__ 1
#define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL
#define __INT_LEAST32_MAX__ 2147483647
#define __DEC32_MIN__ 1E-95DF
#define __weak
#define __DBL_MAX_EXP__ 1024
#define __DEC128_EPSILON__ 1E-33DL
#define __SSE2_MATH__ 1
#define __ATOMIC_HLE_RELEASE 131072
#define __PTRDIFF_MAX__ 9223372036854775807L
#define __amd64 1
#define __tune_core2__ 1
#define __ATOMIC_HLE_ACQUIRE 65536
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __SIZEOF_SIZE_T__ 8
#define __SIZEOF_WINT_T__ 4
#define __GXX_ABI_VERSION 1002
#define __FLT_MIN_EXP__ (-125)
#define __INT_FAST64_TYPE__ long long int
#define __DBL_MIN__ ((double)2.22507385850720138309e-308L)
#define __LP64__ 1
#define __DEC128_MIN__ 1E-6143DL
#define __REGISTER_PREFIX__
#define __UINT16_MAX__ 65535
#define __DBL_HAS_DENORM__ 1
#define __UINT8_TYPE__ unsigned char
#define __NO_INLINE__ 1
#define __FLT_MANT_DIG__ 24
#define __VERSION__ "4.8.1"
#define __UINT64_C(c) c ## ULL
#define __GCC_ATOMIC_INT_LOCK_FREE 2
#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __INT32_C(c) c
#define __DEC64_EPSILON__ 1E-15DD
#define __ORDER_PDP_ENDIAN__ 3412
#define __DEC128_MIN_EXP__ (-6142)
#define __INT_FAST32_TYPE__ int
#define __UINT_LEAST16_TYPE__ short unsigned int
#define __INT16_MAX__ 32767
#define __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ 1080
#define __SIZE_TYPE__ long unsigned int
#define __UINT64_MAX__ 18446744073709551615ULL
#define __INT8_TYPE__ signed char
#define __FLT_RADIX__ 2
#define __INT_LEAST16_TYPE__ short int
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __UINTMAX_C(c) c ## UL
#define __SSE_MATH__ 1
#define __k8 1
#define __SIG_ATOMIC_MAX__ 2147483647
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_PTRDIFF_T__ 8
#define __x86_64__ 1
#define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF
#define __INT_FAST16_MAX__ 32767
#define __UINT_FAST32_MAX__ 4294967295U
#define __UINT_LEAST64_TYPE__ long long unsigned int
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MAX_10_EXP__ 38
#define __LONG_MAX__ 9223372036854775807L
#define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL
#define __FLT_HAS_INFINITY__ 1
#define __UINT_FAST16_TYPE__ short unsigned int
#define __DEC64_MAX__ 9.999999999999999E384DD
#define __CHAR16_TYPE__ short unsigned int
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __INT_LEAST16_MAX__ 32767
#define __DEC64_MANT_DIG__ 16
#define __INT64_MAX__ 9223372036854775807LL
#define __UINT_LEAST32_MAX__ 4294967295U
#define __GCC_ATOMIC_LONG_LOCK_FREE 2
#define __INT_LEAST64_TYPE__ long long int
#define __INT16_TYPE__ short int
#define __INT_LEAST8_TYPE__ signed char
#define __DEC32_MAX_EXP__ 97
#define __INT_FAST8_MAX__ 127
#define __INTPTR_MAX__ 9223372036854775807L
#define __LITTLE_ENDIAN__ 1
#define __SSE2__ 1
#define __LDBL_MANT_DIG__ 64
#define __CONSTANT_CFSTRINGS__ 1
#define __DBL_HAS_QUIET_NAN__ 1
#define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1)
#define __code_model_small__ 1
#define __k8__ 1
#define __INTPTR_TYPE__ long int
#define __UINT16_TYPE__ short unsigned int
#define __WCHAR_TYPE__ int
#define __SIZEOF_FLOAT__ 4
#define __pic__ 2
#define __UINTPTR_MAX__ 18446744073709551615UL
#define __DEC64_MIN_EXP__ (-382)
#define __INT_FAST64_MAX__ 9223372036854775807LL
#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1
#define __FLT_DIG__ 6
#define __UINT_FAST64_TYPE__ long long unsigned int
#define __INT_MAX__ 2147483647
#define __MACH__ 1
#define __amd64__ 1
#define __INT64_TYPE__ long long int
#define __FLT_MAX_EXP__ 128
#define __ORDER_BIG_ENDIAN__ 4321
#define __DBL_MANT_DIG__ 53
#define __INT_LEAST64_MAX__ 9223372036854775807LL
#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __DEC64_MIN__ 1E-383DD
#define __WINT_TYPE__ int
#define __UINT_LEAST32_TYPE__ unsigned int
#define __SIZEOF_SHORT__ 2
#define __SSE__ 1
#define __LDBL_MIN_EXP__ (-16381)
#define __INT_LEAST8_MAX__ 127
#define __SIZEOF_INT128__ 16
#define __LDBL_MAX_10_EXP__ 4932
#define __ATOMIC_RELAXED 0
#define __DBL_EPSILON__ ((double)2.22044604925031308085e-16L)
#define _LP64 1
#define __UINT8_C(c) c
#define __INT_LEAST32_TYPE__ int
#define __SIZEOF_WCHAR_T__ 4
#define __UINT64_TYPE__ long long unsigned int
#define __INT_FAST8_TYPE__ signed char
#define __DBL_DECIMAL_DIG__ 17
#define __FXSR__ 1
#define __DEC_EVAL_METHOD__ 2
#define __UINT32_C(c) c ## U
#define __INTMAX_MAX__ 9223372036854775807L
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __FLT_DENORM_MIN__ 1.40129846432481707092e-45F
#define __INT8_MAX__ 127
#define __PIC__ 2
#define __UINT_FAST32_TYPE__ unsigned int
#define __CHAR32_TYPE__ unsigned int
#define __FLT_MAX__ 3.40282346638528859812e+38F
#define __INT32_TYPE__ int
#define __SIZEOF_DOUBLE__ 8
#define __FLT_MIN_10_EXP__ (-37)
#define __INTMAX_TYPE__ long int
#define __DEC128_MAX_EXP__ 6145
#define __ATOMIC_CONSUME 1
#define __GNUC_MINOR__ 8
#define __UINTMAX_MAX__ 18446744073709551615UL
#define __DEC32_MANT_DIG__ 7
#define __DBL_MAX_10_EXP__ 308
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __INT16_C(c) c
#define __STDC__ 1
#define __PTRDIFF_TYPE__ long int
#define __ATOMIC_SEQ_CST 5
#define __UINT32_TYPE__ unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD
#define __DEC128_MANT_DIG__ 34
#define __LDBL_MIN_10_EXP__ (-4931)
#define __SIZEOF_LONG_LONG__ 8
#define __GCC_ATOMIC_LLONG_LOCK_FREE 2
#define __LDBL_DIG__ 18
#define __FLT_DECIMAL_DIG__ 9
#define __UINT_FAST16_MAX__ 65535
#define __GNUC_GNU_INLINE__ 1
#define __GCC_ATOMIC_SHORT_LOCK_FREE 2
#define __SSE3__ 1
#define __UINT_FAST8_TYPE__ unsigned char
#define __ATOMIC_ACQ_REL 4
#define __ATOMIC_RELEASE 3
这是一个空文件中的236个宏。当我在文件中添加#include <stdio.h>时,定义的宏数量增加到505。这些包括各种平台标识宏。
这似乎是一个(无文件记录)“GNU扩展”:[更正:我终于在文档中找到了一个提及。见下文。)
下面的命令使用-dM选项打印所有预处理器定义;由于输入的“文件”是空的,它精确地显示了预定义的宏。它是在标准的ubuntu安装上使用gcc-4.7.3运行的。可以看到预处理器是标准感知的。总共有243个宏带有-std=gnu99和240个宏带有-std=c99;我过滤了输出的相关性。
$ cpp --std=c89 -dM < /dev/null | grep linux
#define __linux 1
#define __linux__ 1
#define __gnu_linux__ 1
$ cpp --std=gnu89 -dM < /dev/null | grep linux
#define __linux 1
#define __linux__ 1
#define __gnu_linux__ 1
#define linux 1
$ cpp --std=c99 -dM < /dev/null | grep linux
#define __linux 1
#define __linux__ 1
#define __gnu_linux__ 1
$ cpp --std=gnu99 -dM < /dev/null | grep linux
#define __linux 1
#define __linux__ 1
#define __gnu_linux__ 1
#define linux 1
“gnu标准”版本也有#define unix。(使用c11和gnu11产生相同的结果。)
I suppose they had their reasons, but it seems to me to make the default installation of gcc (which compiles C code with -std=gnu89 unless otherwise specified) non-conformant, and -- as in this question -- surprising. Polluting the global namespace with macros whose names don't begin with an underscore is not permitted in a conformant implementation. (6.8.10p2: "Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second
underscore," but, as mentioned in Appendix J.5 (portability issues), such names are often predefined.)
当我最初写这个答案时,我无法在gcc中找到关于这个问题的任何文档,但我最终发现了它,不是在C实现定义的行为中,也不是在C扩展中,而是在cpp手册的3.7.3节中,其中指出:
我们正在逐步淘汰保留名称空间之外的所有预定义宏。你不应该在新程序中使用它们……
In the Old Days (pre-ANSI), predefining symbols such as unix and vax was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdefs to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix and assumed that programmers would simply avoid using those names for their own purposes.
1989年的ANSI C标准引入了限制实现可以合法预定义的符号的规则。编译器预定义的宏的名称只能以两个下划线开头,或者下划线后面跟着一个大写字母,这使得程序员可以自由地使用与该模式不匹配的标识符,并且不在标准库中使用。
因此,任何预定义unix或linux的编译器都是不符合规范的,因为它将无法编译使用int linux = 5;之类的完全合法的代码。
碰巧的是,gcc在默认情况下是不一致的——但它可以与正确的命令行选项保持一致(相当好):
gcc -std=c90 -pedantic ... # or -std=c89 or -ansi
gcc -std=c99 -pedantic
gcc -std=c11 -pedantic
更多细节请参阅gcc手册。
GCC将在未来的版本中逐步淘汰这些定义,因此您不应该编写依赖于它们的代码。如果你的程序需要知道它是否正在为Linux目标编译,它可以检查是否定义了__linux__(假设你使用的是gcc或与之兼容的编译器)。有关更多信息,请参阅GNU C预处理器手册。
一个无关紧要的题外话:1987年国际混淆C代码大赛的“最佳一行代码”得主,由David Korn(是的,Korn Shell的作者)利用了预定义的unix宏:
main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
它会打印“unix”,但原因与宏名的拼写完全没有关系。
我不想在这里发布任何剧透,我鼓励阅读这篇文章的人先试着自己理解代码。但如果你真的想放弃,我在这里发布了一个解释:
https://gist.github.com/Keith-S-Thompson/6920347