我一直不清楚ABI是什么。别给我指维基百科上的文章。如果我能理解,我就不会在这里发这么长的帖子了。

这是我对不同界面的看法:

电视遥控器是用户和电视之间的接口。它是一个现有的实体,但本身无用(不提供任何功能)。遥控器上每个按钮的所有功能都在电视机中实现。

Interface: It is an "existing entity" layer between the functionality and consumer of that functionality. An interface by itself doesn't do anything. It just invokes the functionality lying behind. Now depending on who the user is there are different type of interfaces. Command Line Interface (CLI) commands are the existing entities, the consumer is the user and functionality lies behind. functionality: my software functionality which solves some purpose to which we are describing this interface. existing entities: commands consumer: user Graphical User Interface(GUI) window, buttons, etc. are the existing entities, and again the consumer is the user and functionality lies behind. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: window, buttons etc.. consumer: user Application Programming Interface(API) functions (or to be more correct) interfaces (in interfaced based programming) are the existing entities, consumer here is another program not a user, and again functionality lies behind this layer. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: functions, Interfaces (array of functions). consumer: another program/application. Application Binary Interface (ABI) Here is where my problem starts. functionality: ??? existing entities: ??? consumer: ???

我用不同的语言编写过软件,并提供过不同类型的接口(CLI、GUI和API),但我不确定是否曾经提供过ABI。

维基百科说:

abi涵盖了诸如 数据类型、大小和对齐方式; 调用约定,它控制函数的实参 传递和返回检索到的值; 系统调用编号以及应用程序应该如何进行系统调用 到操作系统; 其他abi标准化细节,如 c++名字mangling, 异常传播,以及 调用约定的编译器之间在同一平台,但做 不需要跨平台兼容性。

谁需要这些细节?请不要说操作系统。我懂汇编编程。我知道如何链接和加载工作。我知道里面发生了什么。 为什么c++会出现名字混淆?我以为我们是在谈论二元的层面。为什么会出现语言?

无论如何,我已经下载了[PDF] System V应用程序二进制接口版4.1(1997-03-18)来看看它到底包含了什么。大部分都说不通啊。

Why does it contain two chapters (4th & 5th) to describe the ELF file format? In fact, these are the only two significant chapters of that specification. The rest of the chapters are "processor specific". Anyway, I though that it is a completely different topic. Please don't say that ELF file format specifications are the ABI. It doesn't qualify to be an interface according to the definition. I know, since we are talking at such a low level it must be very specific. But I'm not sure how is it "instruction set architecture (ISA)" specific? Where can I find Microsoft Windows' ABI?

这些是困扰我的主要问题。


当前回答

术语ABI用于指代两个不同但相关的概念。

当谈到编译器时,它指的是用于从源级结构转换到二进制结构的规则。数据类型有多大?堆栈是如何工作的?如何将参数传递给函数?调用者和被调用者应该保存哪些寄存器?

当谈到库时,它指的是由编译库提供的二进制接口。这个接口是多种因素的结果,包括库的源代码、编译器使用的规则,以及在某些情况下从其他库中获得的定义。

对库的更改可以在不破坏API的情况下破坏ABI。例如,考虑具有如下接口的库。

void initfoo(FOO * foo)
int usefoo(FOO * foo, int bar)
void cleanupfoo(FOO * foo)

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO foo;
  initfoo(&foo);
  int result = usefoo(&foo,bar)
  cleanupfoo(&foo);
  return result;
}

应用程序程序员并不关心FOO的大小或布局,但应用程序二进制文件最终会硬编码FOO的大小。如果标准库程序员在foo中添加了一个额外的字段,并且有人将新的标准库二进制文件与旧的应用程序二进制文件一起使用,那么标准库可能会进行越界内存访问。

OTOH,如果标准库的作者像这样设计他们的API。

FOO * newfoo(void)
int usefoo(FOO * foo, int bar)
void deletefoo((FOO * foo, int bar))

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO * foo;
  foo = newfoo();
  int result = usefoo(foo,bar)
  deletefoo(foo);
  return result;
}

然后,应用程序二进制文件不需要知道任何关于FOO的结构,这些都可以隐藏在库中。你为此付出的代价是涉及到堆操作。

其他回答

Linux共享库最小可运行ABI示例

在共享库的上下文中,“拥有稳定的ABI”最重要的含义是,在库更改后不需要重新编译程序。

例如:

如果您正在销售一个共享库,您可以为用户省去为每个新版本重新编译依赖于您的库的所有内容的麻烦 如果您正在销售依赖于用户发行版中的共享库的闭源程序,如果您确定ABI在目标操作系统的某些版本上是稳定的,那么您可以发布和测试更少的预构建。 这在C标准库的情况下尤其重要,您的系统中有许多程序都链接到C标准库。

现在我想提供一个最小的具体可运行的示例。

c

#include <assert.h>
#include <stdlib.h>

#include "mylib.h"

int main(void) {
    mylib_mystruct *myobject = mylib_init(1);
    assert(myobject->old_field == 1);
    free(myobject);
    return EXIT_SUCCESS;
}

mylib.c

#include <stdlib.h>

#include "mylib.h"

mylib_mystruct* mylib_init(int old_field) {
    mylib_mystruct *myobject;
    myobject = malloc(sizeof(mylib_mystruct));
    myobject->old_field = old_field;
    return myobject;
}

mylib.h

#ifndef MYLIB_H
#define MYLIB_H

typedef struct {
    int old_field;
} mylib_mystruct;

mylib_mystruct* mylib_init(int old_field);

#endif

编译和运行良好:

cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out

现在,假设对于标准库的v2,我们希望向mylib_mystruct添加一个名为new_field的新字段。

如果我们在old_field之前添加字段,如下所示:

typedef struct {
    int new_field;
    int old_field;
} mylib_mystruct;

重建了图书馆,但不是主要的。Out,则断言失败!

这是因为这一行:

myobject->old_field == 1

已生成程序集,该程序集试图访问结构体的第一个int,该结构体现在是new_field,而不是预期的old_field。

因此,这个更改破坏了ABI。

但是,如果我们在old_field之后添加new_field:

typedef struct {
    int old_field;
    int new_field;
} mylib_mystruct;

那么旧生成的程序集仍然访问结构的第一个int,程序仍然可以工作,因为我们保持了ABI的稳定。

下面是这个例子在GitHub上的一个全自动版本。

保持此ABI稳定的另一种方法是将mylib_mystruct视为不透明结构,仅通过方法帮助程序访问其字段。这样可以更容易地保持ABI的稳定,但是由于我们要进行更多的函数调用,因此会产生性能开销。

API 与 ABI

在前面的例子中,有趣的是,在old_field之前添加new_field只破坏了ABI,而没有破坏API。

这意味着,如果我们根据标准库重新编译main.c程序,无论如何它都会工作。

然而,如果我们改变了例如函数签名,我们也会破坏API:

mylib_mystruct* mylib_init(int old_field, int new_field);

因为在这种情况下,main.c将完全停止编译。

语义API vs编程API

我们还可以将API更改分为第三种类型:语义更改。

语义API通常是API应该做什么的自然语言描述,通常包含在API文档中。

因此,可以在不破坏程序构建本身的情况下破坏语义API。

例如,如果我们修改了

myobject->old_field = old_field;

to:

myobject->old_field = old_field + 1;

那么这既不会破坏编程API,也不会破坏ABI,但是main.c语义API会破坏。

有两种方法可以通过编程方式检查合约API:

测试一些极端情况。这很简单,但你可能总是错过一个。 正式的验证。更难做到,但产生了正确性的数学证明,本质上是将文档和测试统一为“人”/机器可验证的方式!当然,前提是你的正式描述中没有bug;-) 这个概念与数学本身的形式化密切相关:https://math.stackexchange.com/questions/53969/what-does-formal-mean/3297537#3297537

打破C / c++共享库abi的所有东西的列表

待办事项:查找/创建最终列表:

https://github.com/lvc/abi-compliance-checker自动化工具进行检查 https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B KDE c++ ABI指南 https://plan99.net/~mike/writing-shared-libraries.html

Java最小可运行示例

Java中的二进制兼容性是什么?

在Ubuntu 18.10, GCC 8.2.0中测试。

调用方和被调用方之间的ABI需要一致,以确保调用成功。堆栈使用,寄存器使用,程序结束堆栈弹出。所有这些都是ABI中最重要的部分。

实际上你根本不需要ABI如果

你的程序没有函数,而且—— 你的程序是一个单独运行的可执行文件(即一个嵌入式系统),它实际上是唯一在运行的东西,它不需要与其他任何东西对话。

过度简化的总结:

API:“这里是你可以调用的所有函数。” ABI:“这是调用函数的方法。”

ABI是编译器和链接器遵守的一组规则,以便编译您的程序,使其正常工作。ABIs涵盖多个主题:

Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code. ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling". ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.

更深入地了解调用约定,我认为它是ABI的核心:

机器本身没有“功能”的概念。当你用高级语言(如c)编写函数时,编译器会生成一行汇编代码,如_MyFunction1:。这是一个标签,它最终将被汇编程序解析为一个地址。这个标签标记了程序集代码中“函数”的“开始”。在高级代码中,当你“调用”这个函数时,你真正做的是导致CPU跳转到那个标签的地址并继续在那里执行。

在为跳转做准备时,编译器必须做一些重要的事情。调用约定就像一个清单,编译器遵循它来完成所有这些事情:

First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing. Next, the compiler generates assembly code to pass the arguments. Some calling conventions dictate that arguments should be put on the stack (in a particular order of course). Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course). Still other conventions dictate that a specific combination of stack and registers should be used. Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them. Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (_MyFunction1:). At this point, you can consider the CPU to be "in" your "function". At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack. Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code. Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code. Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.

有许多不同的abi /调用约定。主要有:

x86或x86-64 CPU(32位环境): CDECL STDCALL FASTCALL VECTORCALL THISCALL x86-64(64位环境): SYSTEMV MSNATIVE VECTORCALL ARM CPU(32位) AAPCS ARM处理器(64位) AAPCS64

这里有一个很棒的页面,它实际显示了为不同的abi编译时生成的程序集的差异。

另一件需要提及的事情是,ABI不仅仅与程序的可执行模块内部相关。链接器还使用它来确保程序正确调用库函数。您的计算机上运行着多个共享库,只要编译器知道它们各自使用的ABI,它就可以正确地从它们调用函数,而不会破坏堆栈。

编译器理解如何调用库函数是非常重要的。在一个托管平台上(也就是说,一个OS加载程序的平台),如果不调用内核,您的程序甚至不能闪烁。

答:简单地说,ABI与API的一个共同之处是它是一个接口。可重用程序公开了一个稳定的接口(API),可用于在另一个程序中重用该程序。

B. However, an ABI is an interface issued for some specific processor-platform for some specific language. All compiler-vendors desiring to target that platform for that same language will have to ensure that not only compiled code in form of relocatable object codes comply with the interface to be able to link and cross-link with each other but also executables comply with it to be able to run on the platform at all. So, ABI is much broader set of specifications/standard than a typical function API. It may include some API objects to be enforced upon the language-users by the compiler. The compiler-vendor will have to include support for the same in their distributions. Needless to say, the platform vendor is the rightful authority to issue ABIs for its platform. Both compiler vendors and ABIs need to comply with the corresponding language-standard (e.g. ISO standard for C++).

C.平台供应商对ABI的定义是:

“1。可执行文件为了在特定的执行环境中执行而必须遵守的规范。例如,Arm架构的Linux ABI。

独立生成的可重定位文件必须遵守的规范的一个特定方面,以便静态可链接和可执行。例如,Arm架构的c++ ABI, Arm架构的运行时ABI, Arm架构的C库ABI。”

D.举例;基于Itanium架构的c++通用ABI也由一个联盟发布。平台供应商自己的c++的abi在多大程度上符合它完全取决于平台供应商。

E.作为另一个例子。Arm架构的c++ ABI在这里。

F.前面已经说过,处理器体系结构的ABI将确保一个可重用程序和另一个重用它的程序之间的API适用于该处理器体系结构。

G. That brings us to service-oriented components (e.g. SOAP-based web services). They too require an API to exist between a SOAP-based web service and client program (could be an app, front-end or another web service) for the client program to reuse the web service.The API is described in terms of standardized protocols like WSDL (interface description) and SOAP(message format) and is language-neutral and platform-neutral. It is not targeted to any specific processor-platform and thus it is not "binary" like ABI. A client-program on any one platform type and written in any language can remotely reuse a web service written in any other language and hosted on an entirely different processor-platform. This is made possible by the fact that both WSDL and SOAP are text-based (XML) protocols. In case of RESTful web services, the transport protocol http--also a text-based protocol-- itself acts as the API (CRUD methods).

让我至少回答你问题的一部分。通过一个例子说明Linux ABI如何影响系统调用,以及它为什么有用。

A systemcall is a way for a userspace program to ask the kernelspace for something. It works by putting the numeric code for the call and the argument in a certain register and triggering an interrupt. Than a switch occurs to kernelspace and the kernel looks up the numeric code and the argument, handles the request, puts the result back into a register and triggers a switch back to userspace. This is needed for example when the application wants to allocate memory or open a file (syscalls "brk" and "open").

现在系统调用有简短的名称“brk”等和相应的操作码,这些在系统特定的头文件中定义。只要这些操作码保持不变,您就可以使用不同更新的内核运行相同的已编译用户域程序,而无需重新编译。这样就有了预编译二进制文件使用的接口,因此就有了ABI。