什么是应用程序二进制接口(ABI)?

我一直不清楚ABI是什么。别给我指维基百科上的文章。如果我能理解，我就不会在这里发这么长的帖子了。

这是我对不同界面的看法:

电视遥控器是用户和电视之间的接口。它是一个现有的实体，但本身无用(不提供任何功能)。遥控器上每个按钮的所有功能都在电视机中实现。

Interface: It is an "existing entity" layer between the functionality and consumer of that functionality. An interface by itself doesn't do anything. It just invokes the functionality lying behind. Now depending on who the user is there are different type of interfaces. Command Line Interface (CLI) commands are the existing entities, the consumer is the user and functionality lies behind. functionality: my software functionality which solves some purpose to which we are describing this interface. existing entities: commands consumer: user Graphical User Interface(GUI) window, buttons, etc. are the existing entities, and again the consumer is the user and functionality lies behind. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: window, buttons etc.. consumer: user Application Programming Interface(API) functions (or to be more correct) interfaces (in interfaced based programming) are the existing entities, consumer here is another program not a user, and again functionality lies behind this layer. functionality: my software functionality which solves some problem to which we are describing this interface. existing entities: functions, Interfaces (array of functions). consumer: another program/application. Application Binary Interface (ABI) Here is where my problem starts. functionality: ??? existing entities: ??? consumer: ???

我用不同的语言编写过软件，并提供过不同类型的接口(CLI、GUI和API)，但我不确定是否曾经提供过ABI。

维基百科说:

abi涵盖了诸如数据类型、大小和对齐方式; 调用约定，它控制函数的实参传递和返回检索到的值; 系统调用编号以及应用程序应该如何进行系统调用到操作系统; 其他abi标准化细节，如 c++名字mangling，异常传播，以及调用约定的编译器之间在同一平台，但做不需要跨平台兼容性。

谁需要这些细节?请不要说操作系统。我懂汇编编程。我知道如何链接和加载工作。我知道里面发生了什么。为什么c++会出现名字混淆?我以为我们是在谈论二元的层面。为什么会出现语言?

无论如何，我已经下载了[PDF] System V应用程序二进制接口版4.1(1997-03-18)来看看它到底包含了什么。大部分都说不通啊。

Why does it contain two chapters (4th & 5th) to describe the ELF file format? In fact, these are the only two significant chapters of that specification. The rest of the chapters are "processor specific". Anyway, I though that it is a completely different topic. Please don't say that ELF file format specifications are the ABI. It doesn't qualify to be an interface according to the definition. I know, since we are talking at such a low level it must be very specific. But I'm not sure how is it "instruction set architecture (ISA)" specific? Where can I find Microsoft Windows' ABI?

这些是困扰我的主要问题。

当前回答

术语ABI用于指代两个不同但相关的概念。

当谈到编译器时，它指的是用于从源级结构转换到二进制结构的规则。数据类型有多大?堆栈是如何工作的?如何将参数传递给函数?调用者和被调用者应该保存哪些寄存器?

当谈到库时，它指的是由编译库提供的二进制接口。这个接口是多种因素的结果，包括库的源代码、编译器使用的规则，以及在某些情况下从其他库中获得的定义。

对库的更改可以在不破坏API的情况下破坏ABI。例如，考虑具有如下接口的库。

void initfoo(FOO * foo)
int usefoo(FOO * foo, int bar)
void cleanupfoo(FOO * foo)

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO foo;
  initfoo(&foo);
  int result = usefoo(&foo,bar)
  cleanupfoo(&foo);
  return result;
}

应用程序程序员并不关心FOO的大小或布局，但应用程序二进制文件最终会硬编码FOO的大小。如果标准库程序员在foo中添加了一个额外的字段，并且有人将新的标准库二进制文件与旧的应用程序二进制文件一起使用，那么标准库可能会进行越界内存访问。

OTOH，如果标准库的作者像这样设计他们的API。

FOO * newfoo(void)
int usefoo(FOO * foo, int bar)
void deletefoo((FOO * foo, int bar))

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO * foo;
  foo = newfoo();
  int result = usefoo(foo,bar)
  deletefoo(foo);
  return result;
}

然后，应用程序二进制文件不需要知道任何关于FOO的结构，这些都可以隐藏在库中。你为此付出的代价是涉及到堆操作。

2018-05-22 13:50:18

其他回答

实际上你根本不需要ABI如果

你的程序没有函数，而且—— 你的程序是一个单独运行的可执行文件(即一个嵌入式系统)，它实际上是唯一在运行的东西，它不需要与其他任何东西对话。

过度简化的总结:

API:“这里是你可以调用的所有函数。” ABI:“这是调用函数的方法。”

ABI是编译器和链接器遵守的一组规则，以便编译您的程序，使其正常工作。ABIs涵盖多个主题:

Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code. ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling". ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.

更深入地了解调用约定，我认为它是ABI的核心:

机器本身没有“功能”的概念。当你用高级语言(如c)编写函数时，编译器会生成一行汇编代码，如_MyFunction1:。这是一个标签，它最终将被汇编程序解析为一个地址。这个标签标记了程序集代码中“函数”的“开始”。在高级代码中，当你“调用”这个函数时，你真正做的是导致CPU跳转到那个标签的地址并继续在那里执行。

在为跳转做准备时，编译器必须做一些重要的事情。调用约定就像一个清单，编译器遵循它来完成所有这些事情:

First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing. Next, the compiler generates assembly code to pass the arguments. Some calling conventions dictate that arguments should be put on the stack (in a particular order of course). Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course). Still other conventions dictate that a specific combination of stack and registers should be used. Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them. Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (_MyFunction1:). At this point, you can consider the CPU to be "in" your "function". At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack. Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code. Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code. Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.

有许多不同的abi /调用约定。主要有:

x86或x86-64 CPU(32位环境): CDECL STDCALL FASTCALL VECTORCALL THISCALL x86-64(64位环境): SYSTEMV MSNATIVE VECTORCALL ARM CPU(32位) AAPCS ARM处理器(64位) AAPCS64

这里有一个很棒的页面，它实际显示了为不同的abi编译时生成的程序集的差异。

另一件需要提及的事情是，ABI不仅仅与程序的可执行模块内部相关。链接器还使用它来确保程序正确调用库函数。您的计算机上运行着多个共享库，只要编译器知道它们各自使用的ABI，它就可以正确地从它们调用函数，而不会破坏堆栈。

编译器理解如何调用库函数是非常重要的。在一个托管平台上(也就是说，一个OS加载程序的平台)，如果不调用内核，您的程序甚至不能闪烁。

2016-12-30 20:40:46

术语ABI用于指代两个不同但相关的概念。

对库的更改可以在不破坏API的情况下破坏ABI。例如，考虑具有如下接口的库。

void initfoo(FOO * foo)
int usefoo(FOO * foo, int bar)
void cleanupfoo(FOO * foo)

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO foo;
  initfoo(&foo);
  int result = usefoo(&foo,bar)
  cleanupfoo(&foo);
  return result;
}

OTOH，如果标准库的作者像这样设计他们的API。

FOO * newfoo(void)
int usefoo(FOO * foo, int bar)
void deletefoo((FOO * foo, int bar))

应用程序程序员编写的代码是

int dostuffwithfoo(int bar) {
  FOO * foo;
  foo = newfoo();
  int result = usefoo(foo,bar)
  deletefoo(foo);
  return result;
}

然后，应用程序二进制文件不需要知道任何关于FOO的结构，这些都可以隐藏在库中。你为此付出的代价是涉及到堆操作。

2018-05-22 13:50:18

Linux共享库最小可运行ABI示例

在共享库的上下文中，“拥有稳定的ABI”最重要的含义是，在库更改后不需要重新编译程序。

例如:

如果您正在销售一个共享库，您可以为用户省去为每个新版本重新编译依赖于您的库的所有内容的麻烦如果您正在销售依赖于用户发行版中的共享库的闭源程序，如果您确定ABI在目标操作系统的某些版本上是稳定的，那么您可以发布和测试更少的预构建。这在C标准库的情况下尤其重要，您的系统中有许多程序都链接到C标准库。

现在我想提供一个最小的具体可运行的示例。

#include <assert.h>
#include <stdlib.h>

#include "mylib.h"

int main(void) {
    mylib_mystruct *myobject = mylib_init(1);
    assert(myobject->old_field == 1);
    free(myobject);
    return EXIT_SUCCESS;
}

mylib.c

#include <stdlib.h>

#include "mylib.h"

mylib_mystruct* mylib_init(int old_field) {
    mylib_mystruct *myobject;
    myobject = malloc(sizeof(mylib_mystruct));
    myobject->old_field = old_field;
    return myobject;
}

mylib.h

#ifndef MYLIB_H
#define MYLIB_H

typedef struct {
    int old_field;
} mylib_mystruct;

mylib_mystruct* mylib_init(int old_field);

#endif

编译和运行良好:

cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out

现在，假设对于标准库的v2，我们希望向mylib_mystruct添加一个名为new_field的新字段。

如果我们在old_field之前添加字段，如下所示:

typedef struct {
    int new_field;
    int old_field;
} mylib_mystruct;

重建了图书馆，但不是主要的。Out，则断言失败!

这是因为这一行:

myobject->old_field == 1

已生成程序集，该程序集试图访问结构体的第一个int，该结构体现在是new_field，而不是预期的old_field。

因此，这个更改破坏了ABI。

但是，如果我们在old_field之后添加new_field:

typedef struct {
    int old_field;
    int new_field;
} mylib_mystruct;

那么旧生成的程序集仍然访问结构的第一个int，程序仍然可以工作，因为我们保持了ABI的稳定。

下面是这个例子在GitHub上的一个全自动版本。

保持此ABI稳定的另一种方法是将mylib_mystruct视为不透明结构，仅通过方法帮助程序访问其字段。这样可以更容易地保持ABI的稳定，但是由于我们要进行更多的函数调用，因此会产生性能开销。

API 与 ABI

在前面的例子中，有趣的是，在old_field之前添加new_field只破坏了ABI，而没有破坏API。

这意味着，如果我们根据标准库重新编译main.c程序，无论如何它都会工作。

然而，如果我们改变了例如函数签名，我们也会破坏API:

mylib_mystruct* mylib_init(int old_field, int new_field);

因为在这种情况下，main.c将完全停止编译。

语义API vs编程API

我们还可以将API更改分为第三种类型:语义更改。

语义API通常是API应该做什么的自然语言描述，通常包含在API文档中。

因此，可以在不破坏程序构建本身的情况下破坏语义API。

例如，如果我们修改了

myobject->old_field = old_field;

to:

myobject->old_field = old_field + 1;

那么这既不会破坏编程API，也不会破坏ABI，但是main.c语义API会破坏。

有两种方法可以通过编程方式检查合约API:

测试一些极端情况。这很简单，但你可能总是错过一个。正式的验证。更难做到，但产生了正确性的数学证明，本质上是将文档和测试统一为“人”/机器可验证的方式!当然，前提是你的正式描述中没有bug;-) 这个概念与数学本身的形式化密切相关:https://math.stackexchange.com/questions/53969/what-does-formal-mean/3297537#3297537

打破C / c++共享库abi的所有东西的列表

待办事项:查找/创建最终列表:

https://github.com/lvc/abi-compliance-checker自动化工具进行检查 https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B KDE c++ ABI指南 https://plan99.net/~mike/writing-shared-libraries.html

Java最小可运行示例

Java中的二进制兼容性是什么?

在Ubuntu 18.10, GCC 8.2.0中测试。

2019-03-03 10:13:20

功能:一组影响编译器、程序集编写者、链接器和操作系统的契约。契约规定了函数如何布局，参数在哪里传递，参数如何传递，函数返回如何工作。这些元组通常特定于(处理器体系结构，操作系统)元组。

现有实体:参数布局、函数语义、寄存器分配。例如，ARM架构有许多ABI (APCS, EABI, GNU-EABI，更不用说一堆历史案例)-使用混合ABI会导致你的代码在跨边界调用时无法工作。

使用者:编译器、程序集编写器、操作系统、CPU特定架构。

谁需要这些细节?编译器，程序集编写者，代码生成(或对齐要求)的链接器，操作系统(中断处理，系统调用接口)。如果您进行汇编编程，那么您将遵循ABI!

c++的名称破坏是一个特殊的情况——它是一个以连接器和动态连接器为中心的问题——如果名称破坏没有标准化，那么动态链接将无法工作。从今以后，c++ ABI就这么叫了，c++ ABI。这不是链接器级别的问题，而是代码生成的问题。一旦你有了一个c++二进制文件，如果不从源代码重新编译，就不可能使它与另一个c++ ABI兼容(名称混乱，异常处理)。

ELF是一种用于加载器和动态链接器的文件格式。ELF是二进制代码和数据的容器格式，它指定了一段代码的ABI。我不认为ELF是严格意义上的ABI，因为PE可执行文件不是ABI。

所有的abi都是特定于指令集的。ARM ABI在MSP430或x86_64处理器上没有意义。

Windows有几个abi -例如，fastcall和stdcall是两个常用的abi。系统调用ABI又不同了。

2010-03-23 06:26:25

总结

对于定义ABI(应用程序二进制接口)的确切层有各种各样的解释和强烈的意见。

在我看来，ABI是对特定API的给定/平台的主观约定。ABI是对于特定API“不会改变”的约定的“剩余”部分，或者由运行时环境解决:执行器、工具、链接器、编译器、jvm和OS。

定义接口:ABI, API

如果你想使用像joda-time这样的库，你必须声明一个依赖joda-time-<major>.<minor>.<patch>.jar。标准库遵循最佳实践并使用语义版本控制。这在三个层次上定义了API的兼容性:

补丁——你根本不需要修改你的代码。这个库只是修复了一些错误。次要-你不需要改变你的代码，因为添加的东西(开闭原则是尊重的) 重要—接口(API)已更改，您可能需要更改代码。

为了让你使用同一个库的一个新的主要版本，还有很多其他的约定需要遵守:

库使用的二进制语言(在Java情况下是定义Java字节码的JVM目标版本) 调用约定 JVM规范链接约定运行时约定所有这些都是由我们使用的工具定义和管理的。

例子

Java案例研究

例如，Java标准化了所有这些约定，不是在一个工具中，而是在一个正式的JVM规范中。该规范允许其他供应商提供一组不同的工具来输出兼容的库。

Java为ABI提供了另外两个有趣的案例研究:Scala版本和Dalvik虚拟机。

Dalvik虚拟机破坏了ABI

The Dalvik VM needs a different type of bytecode than the Java bytecode. The Dalvik libraries are obtained by converting the Java bytecode (with same API) for Dalvik. In this way you can get two versions of the same API: defined by the original joda-time-1.7.2.jar. We could call it joda-time-1.7.2.jar and joda-time-1.7.2-dalvik.jar. They use a different ABI one is for the stack-oriented standard Java vms: Oracle's one, IBM's one, open Java or any other; and the second ABI is the one around Dalvik.

Scala后续版本不兼容

Scala在次要的Scala版本之间不具有二进制兼容性:2。X。由于这个原因，相同的API“io。reactivex" %% "rxscala" % "0.26.5"有三个版本(将来会有更多):针对Scala 2.10、2.11和2.12。改变了什么?我现在不知道，但是二进制文件是不兼容的。可能最新的版本增加了一些东西，使得库在旧的虚拟机上无法使用，可能是与链接/命名/参数约定有关的东西。

Java连续版本是不兼容的

Java在JVM的主要版本上也有问题:4,5,6,7,8,9。它们只提供向后兼容性。Jvm9知道如何运行针对所有其他版本的编译/目标代码(javac的-target选项)，而JVM 4不知道如何运行针对JVM 5的代码。而你只有一个joda-library。由于有不同的解决方案，这种不兼容性变得显而易见:

语义版本控制:当库的目标是更高的JVM时，它们通常会改变主版本。使用JVM 4作为ABI，您就安全了。 Java 9增加了一个关于如何在同一个库中包含特定目标JVM的字节码的规范。

为什么我要从API定义开始呢?

API and ABI are just conventions on how you define compatibility. The lower layers are generic in respect of a plethora of high level semantics. That's why it's easy to make some conventions. The first kind of conventions are about memory alignment, byte encoding, calling conventions, big and little endian encodings, etc. On top of them you get the executable conventions like others described, linking conventions, intermediate byte code like the one used by Java or LLVM IR used by GCC. Third you get conventions on how to find libraries, how to load them (see Java classloaders). As you go higher and higher in concepts you have new conventions that you consider as a given. That's why they didn't made it to the semantic versioning. They are implicit or collapsed in the major version. We could amend semantic versioning with <major>-<minor>-<patch>-<platform/ABI>. This is what is actually happening already: platform is already a rpm, dll, jar (JVM bytecode), war(jvm+web server), apk, 2.11 (specific Scala version) and so on. When you say APK you already talk about a specific ABI part of your API.

API可以移植到不同的ABI

抽象的顶层(针对最高API编写的源代码可以被重新编译/移植到任何其他较低层次的抽象。

假设我有一些rxscala的源代码。如果Scala工具改变了，我可以重新编译它们。如果JVM发生了变化，我就可以从旧机器自动转换到新机器，而不需要考虑高级概念。虽然移植可能很困难，但对任何其他客户端都有帮助。如果使用完全不同的汇编代码创建一个新的操作系统，则可以创建一个转换器。

跨语言移植的api

有些api可以移植到多种语言中，比如反应流。一般来说，它们定义到特定语言/平台的映射。我认为API是用人类语言甚至是特定的编程语言正式定义的主规范。在某种意义上，所有其他的“映射”都是ABI，比通常的ABI更多的API。REST接口也是如此。

2017-04-22 15:39:50

什么是应用程序二进制接口(ABI)?

推荐文章

最新文章

标签