我一直不清楚ABI是什么。别给我指维基百科上的文章。如果我能理解,我就不会在这里发这么长的帖子了。

这是我对不同界面的看法:

电视遥控器是用户和电视之间的接口。它是一个现有的实体,但本身无用(不提供任何功能)。遥控器上每个按钮的所有功能都在电视机中实现。

Interface: It is an "existing entity" layer between the functionality and consumer of that functionality. An interface by itself doesn't do anything. It just invokes the functionality lying behind. Now depending on who the user is there are different type of interfaces. Command Line Interface (CLI) commands are the existing entities, the consumer is the user and functionality lies behind. functionality: my software functionality which solves some purpose to which we are describing this interface. existing entities: commands consumer: user Graphical User Interface(GUI) window, buttons, etc. are the existing entities, and again the consumer is the user and functionality lies behind. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: window, buttons etc.. consumer: user Application Programming Interface(API) functions (or to be more correct) interfaces (in interfaced based programming) are the existing entities, consumer here is another program not a user, and again functionality lies behind this layer. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: functions, Interfaces (array of functions). consumer: another program/application. Application Binary Interface (ABI) Here is where my problem starts. functionality: ??? existing entities: ??? consumer: ???

我用不同的语言编写过软件,并提供过不同类型的接口(CLI、GUI和API),但我不确定是否曾经提供过ABI。

维基百科说:

abi涵盖了诸如 数据类型、大小和对齐方式; 调用约定,它控制函数的实参 传递和返回检索到的值; 系统调用编号以及应用程序应该如何进行系统调用 到操作系统; 其他abi标准化细节,如 c++名字mangling, 异常传播,以及 调用约定的编译器之间在同一平台,但做 不需要跨平台兼容性。

谁需要这些细节?请不要说操作系统。我懂汇编编程。我知道如何链接和加载工作。我知道里面发生了什么。 为什么c++会出现名字混淆?我以为我们是在谈论二元的层面。为什么会出现语言?

无论如何,我已经下载了[PDF] System V应用程序二进制接口版4.1(1997-03-18)来看看它到底包含了什么。大部分都说不通啊。

Why does it contain two chapters (4th & 5th) to describe the ELF file format? In fact, these are the only two significant chapters of that specification. The rest of the chapters are "processor specific". Anyway, I though that it is a completely different topic. Please don't say that ELF file format specifications are the ABI. It doesn't qualify to be an interface according to the definition. I know, since we are talking at such a low level it must be very specific. But I'm not sure how is it "instruction set architecture (ISA)" specific? Where can I find Microsoft Windows' ABI?

这些是困扰我的主要问题。


当前回答

为了调用共享库中的代码,或者在编译单元之间调用代码,object文件需要包含调用的标签。c++修改了方法标签的名称,以加强数据隐藏并允许重载方法。这就是为什么您不能混合来自不同c++编译器的文件,除非它们显式地支持相同的ABI。

其他回答

调用方和被调用方之间的ABI需要一致,以确保调用成功。堆栈使用,寄存器使用,程序结束堆栈弹出。所有这些都是ABI中最重要的部分。

应用程序二进制接口(ABI)

功能:

从程序员的模型到底层系统的域数据的转换 类型,大小,对齐,调用约定,它控制如何 函数的参数被传递并返回检索到的值;的 系统调用编号以及应用程序应该如何进行系统调用 到操作系统;高级语言编译器的名称 破坏方案、异常传播和调用约定 在同一平台上的编译器之间,但不需要 跨平台兼容性……

现有的实体:

直接参与程序执行的逻辑块:ALU, 通用寄存器,用于内存/ I/O映射的寄存器,等等…

消费者:

语言处理器,链接器,汇编器…

任何必须确保构建工具链作为一个整体工作的人都需要这些。如果你用汇编语言写一个模块,用Python写另一个模块,而不是你自己的引导加载程序想要使用操作系统,那么你的“应用程序”模块是跨“二进制”边界工作的,需要这种“接口”的协议。

c++命名混乱,因为应用程序中可能需要链接来自不同高级语言的目标文件。考虑使用GCC标准库对Visual c++构建的Windows进行系统调用。

ELF是用于解释的来自对象文件的链接器的一种可能期望,尽管JVM可能有其他想法。

对于一个Windows RT商店应用程序,如果你真的想让一些构建工具链一起工作,尝试搜索ARM ABI。

让我至少回答你问题的一部分。通过一个例子说明Linux ABI如何影响系统调用,以及它为什么有用。

A systemcall is a way for a userspace program to ask the kernelspace for something. It works by putting the numeric code for the call and the argument in a certain register and triggering an interrupt. Than a switch occurs to kernelspace and the kernel looks up the numeric code and the argument, handles the request, puts the result back into a register and triggers a switch back to userspace. This is needed for example when the application wants to allocate memory or open a file (syscalls "brk" and "open").

现在系统调用有简短的名称“brk”等和相应的操作码,这些在系统特定的头文件中定义。只要这些操作码保持不变,您就可以使用不同更新的内核运行相同的已编译用户域程序,而无需重新编译。这样就有了预编译二进制文件使用的接口,因此就有了ABI。

理解“ABI”的一个简单方法是将其与“API”进行比较。

您已经熟悉了API的概念。如果你想使用某些库或操作系统的特性,你将根据API进行编程。API由数据类型/结构、常量、函数等组成,您可以在代码中使用它们来访问外部组件的功能。

An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.

ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.

有时,ABI更改是不可避免的。当这种情况发生时,任何使用该库的程序都将无法工作,除非它们被重新编译以使用新版本的库。如果ABI改变了,但API没有改变,那么新旧库版本有时被称为“源代码兼容”。这意味着,虽然为一个库版本编译的程序不能与另一个库版本一起工作,但为一个库版本编写的源代码如果重新编译,则可以与另一个库版本一起工作。

For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

除非您正在进行非常低级的系统设计工作,否则您不必显式地提供ABI。它也不是特定于语言的,因为(例如)C应用程序和Pascal应用程序在编译后可以使用相同的ABI。

Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

IIRC, Windows目前使用可移植可执行文件(PE)格式。在维基百科页面的“外部链接”部分有关于PE格式的更多信息的链接。

Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c" in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.

实际上你根本不需要ABI如果

你的程序没有函数,而且—— 你的程序是一个单独运行的可执行文件(即一个嵌入式系统),它实际上是唯一在运行的东西,它不需要与其他任何东西对话。

过度简化的总结:

API:“这里是你可以调用的所有函数。” ABI:“这是调用函数的方法。”

ABI是编译器和链接器遵守的一组规则,以便编译您的程序,使其正常工作。ABIs涵盖多个主题:

Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code. ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling". ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.

更深入地了解调用约定,我认为它是ABI的核心:

机器本身没有“功能”的概念。当你用高级语言(如c)编写函数时,编译器会生成一行汇编代码,如_MyFunction1:。这是一个标签,它最终将被汇编程序解析为一个地址。这个标签标记了程序集代码中“函数”的“开始”。在高级代码中,当你“调用”这个函数时,你真正做的是导致CPU跳转到那个标签的地址并继续在那里执行。

在为跳转做准备时,编译器必须做一些重要的事情。调用约定就像一个清单,编译器遵循它来完成所有这些事情:

First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing. Next, the compiler generates assembly code to pass the arguments. Some calling conventions dictate that arguments should be put on the stack (in a particular order of course). Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course). Still other conventions dictate that a specific combination of stack and registers should be used. Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them. Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (_MyFunction1:). At this point, you can consider the CPU to be "in" your "function". At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack. Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code. Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code. Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.

有许多不同的abi /调用约定。主要有:

x86或x86-64 CPU(32位环境): CDECL STDCALL FASTCALL VECTORCALL THISCALL x86-64(64位环境): SYSTEMV MSNATIVE VECTORCALL ARM CPU(32位) AAPCS ARM处理器(64位) AAPCS64

这里有一个很棒的页面,它实际显示了为不同的abi编译时生成的程序集的差异。

另一件需要提及的事情是,ABI不仅仅与程序的可执行模块内部相关。链接器还使用它来确保程序正确调用库函数。您的计算机上运行着多个共享库,只要编译器知道它们各自使用的ABI,它就可以正确地从它们调用函数,而不会破坏堆栈。

编译器理解如何调用库函数是非常重要的。在一个托管平台上(也就是说,一个OS加载程序的平台),如果不调用内核,您的程序甚至不能闪烁。