在某些情况下,选择静态链接而不是动态链接或反之,是否有令人信服的性能原因?我听过或读过以下内容,但我对这个主题了解不够,无法保证其真实性。
1)静态链接和动态链接在运行时性能上的差异通常可以忽略不计。
如果使用使用概要数据优化程序热路径的分析编译器,2)(1)是不正确的,因为使用静态链接,编译器可以优化你的代码和库代码。使用动态链接,只能优化您的代码。如果大部分时间都花在运行库代码上,这可能会有很大的不同。否则,(1)仍然适用。
在某些情况下,选择静态链接而不是动态链接或反之,是否有令人信服的性能原因?我听过或读过以下内容,但我对这个主题了解不够,无法保证其真实性。
1)静态链接和动态链接在运行时性能上的差异通常可以忽略不计。
如果使用使用概要数据优化程序热路径的分析编译器,2)(1)是不正确的,因为使用静态链接,编译器可以优化你的代码和库代码。使用动态链接,只能优化您的代码。如果大部分时间都花在运行库代码上,这可能会有很大的不同。否则,(1)仍然适用。
当前回答
在大量且越来越多的系统中,极端的静态链接可以对应用程序和系统性能产生巨大的积极影响。
我指的是通常所说的“嵌入式系统”,其中许多现在越来越多地使用通用操作系统,这些系统被用于所有可以想象到的事情。
An extremely common example are devices using GNU/Linux systems using Busybox. I've taken this to the extreme with NetBSD by building a bootable i386 (32-bit) system image that includes both a kernel and its root filesystem, the latter which contains a single static-linked (by crunchgen) binary with hard-links to all programs that itself contains all (well at last count 274) of the standard full-feature system programs (most except the toolchain), and it is less than 20 megabytes in size (and probably runs very comfortably in a system with only 64MB of memory (even with the root filesystem uncompressed and entirely in RAM), though I've been unable to find one so small to test it on).
It has been mentioned in earlier posts that the start-up time of a static-linked binaries is faster (and it can be a lot faster), but that is only part of the picture, especially when all object code is linked into the same file, and even more especially when the operating system supports demand paging of code direct from the executable file. In this ideal scenario the startup time of programs is literally negligible since almost all pages of code will already be in memory and be in use by the shell (and and init any other background processes that might be running), even if the requested program has not ever been run since boot since perhaps only one page of memory need be loaded to fulfill the runtime requirements of the program.
However that's still not the whole story. I also usually build and use the NetBSD operating system installs for my full development systems by static-linking all binaries. Even though this takes a tremendous amount more disk space (~6.6GB total for x86_64 with everything, including toolchain and X11 static-linked) (especially if one keeps full debug symbol tables available for all programs another ~2.5GB), the result still runs faster overall, and for some tasks even uses less memory than a typical dynamic-linked system that purports to share library code pages. Disk is cheap (even fast disk), and memory to cache frequently used disk files is also relatively cheap, but CPU cycles really are not, and paying the ld.so startup cost for every process that starts every time it starts will take hours and hours of CPU cycles away from tasks which require starting many processes, especially when the same programs are used over and over, such as compilers on a development system. Static-linked toolchain programs can reduce whole-OS multi-architecture build times for my systems by hours. I have yet to build the toolchain into my single crunchgen'ed binary, but I suspect when I do there will be more hours of build time saved because of the win for the CPU cache.
其他回答
1)基于这样一个事实,即调用DLL函数总是使用额外的间接跳转。如今,这通常可以忽略不计。在DLL内部,i386 CPU上有一些更多的开销,因为它们不能生成与位置无关的代码。在amd64上,跳转可以相对于程序计数器,因此这是一个巨大的改进。
2)这是正确的。通过分析引导的优化,您通常可以获得大约10- 15%的性能。既然CPU速度已经达到极限,那么这样做可能是值得的。
我想补充的是:(3)链接器可以将函数安排在一个更有效的缓存分组中,从而将昂贵的缓存级别的失败最小化。它还可能特别影响应用程序的启动时间(基于我在Sun c++编译器中看到的结果)
不要忘记,使用dll不能执行死代码消除。根据语言的不同,DLL代码也可能不是最佳的。虚函数总是虚函数,因为编译器不知道客户端是否正在覆盖它。
由于这些原因,如果不需要dll,那么就使用静态编译。
编辑(通过用户下划线回答注释)
这里有一个关于职位独立代码问题的很好的资源http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
正如所解释的,x86除了15位跳转范围之外没有其他的AFAIK,也没有无条件跳转和调用。这就是为什么函数(来自生成器)超过32K一直是一个问题,需要嵌入蹦床。
But on popular x86 OS like Linux you do not need to care if the .so/DLL file is not generated with the gcc switch -fpic (which enforces the use of the indirect jump tables). Because if you don't, the code is just fixed like a normal linker would relocate it. But while doing this it makes the code segment non shareable and it would need a full mapping of the code from disk into memory and touching it all before it can be used (emptying most of the caches, hitting TLBs) etc. There was a time when this was considered slow.
这样你就没有任何收益了。
我不记得什么操作系统(Solaris或FreeBSD)给我的Unix构建系统的问题,因为我只是没有这样做,不知道为什么它崩溃,直到我应用-fPIC到gcc。
静态链接是在编译时将链接的内容复制到主二进制文件中并变成单个二进制文件的过程。
缺点:
编译时间更长 输出二进制更大
动态链接是在运行时加载链接内容的过程。这项技术允许:
升级链接二进制文件而不重新编译主二进制文件,增加ABI稳定性[关于] 是否有单个共享副本
缺点:
开始时间较慢(应复制链接内容) 链接器错误在运行时抛出
[iOS静态vs动态框架]
Another consideration is the number of object files (translation units) that you actually consume in a library vs the total number available. If a library is built from many object files, but you only use symbols from a few of them, this might be an argument for favoring static linking, since you only link the objects that you use when you static link (typically) and don't normally carry the unused symbols. If you go with a shared lib, that lib contains all translation units and could be much larger than what you want or need.
动态链接是满足某些许可要求(如LGPL)的唯一实用方法。
在大量且越来越多的系统中,极端的静态链接可以对应用程序和系统性能产生巨大的积极影响。
我指的是通常所说的“嵌入式系统”,其中许多现在越来越多地使用通用操作系统,这些系统被用于所有可以想象到的事情。
An extremely common example are devices using GNU/Linux systems using Busybox. I've taken this to the extreme with NetBSD by building a bootable i386 (32-bit) system image that includes both a kernel and its root filesystem, the latter which contains a single static-linked (by crunchgen) binary with hard-links to all programs that itself contains all (well at last count 274) of the standard full-feature system programs (most except the toolchain), and it is less than 20 megabytes in size (and probably runs very comfortably in a system with only 64MB of memory (even with the root filesystem uncompressed and entirely in RAM), though I've been unable to find one so small to test it on).
It has been mentioned in earlier posts that the start-up time of a static-linked binaries is faster (and it can be a lot faster), but that is only part of the picture, especially when all object code is linked into the same file, and even more especially when the operating system supports demand paging of code direct from the executable file. In this ideal scenario the startup time of programs is literally negligible since almost all pages of code will already be in memory and be in use by the shell (and and init any other background processes that might be running), even if the requested program has not ever been run since boot since perhaps only one page of memory need be loaded to fulfill the runtime requirements of the program.
However that's still not the whole story. I also usually build and use the NetBSD operating system installs for my full development systems by static-linking all binaries. Even though this takes a tremendous amount more disk space (~6.6GB total for x86_64 with everything, including toolchain and X11 static-linked) (especially if one keeps full debug symbol tables available for all programs another ~2.5GB), the result still runs faster overall, and for some tasks even uses less memory than a typical dynamic-linked system that purports to share library code pages. Disk is cheap (even fast disk), and memory to cache frequently used disk files is also relatively cheap, but CPU cycles really are not, and paying the ld.so startup cost for every process that starts every time it starts will take hours and hours of CPU cycles away from tasks which require starting many processes, especially when the same programs are used over and over, such as compilers on a development system. Static-linked toolchain programs can reduce whole-OS multi-architecture build times for my systems by hours. I have yet to build the toolchain into my single crunchgen'ed binary, but I suspect when I do there will be more hours of build time saved because of the win for the CPU cache.