Dockerfile.1执行多个RUN:

FROM busybox
RUN echo This is the A > a
RUN echo This is the B > b
RUN echo This is the C > c

Dockerfile.2将它们连接起来:

FROM busybox
RUN echo This is the A > a &&\
    echo This is the B > b &&\
    echo This is the C > c

每次RUN都创建一个层,所以我总是认为层越少越好,因此Dockerfile.2更好。

当一个RUN删除了之前RUN添加的东西时(例如yum install nano && yum clean all),这显然是正确的,但在每次RUN都添加一些东西的情况下,我们需要考虑以下几点:

Layers are supposed to just add a diff above the previous one, so if the later layer does not remove something added in a previous one, there should not be much disk space saving advantage between both methods. Layers are pulled in parallel from Docker Hub, so Dockerfile.1, although probably slightly bigger, would theoretically get downloaded faster. If adding a 4th sentence (i.e. echo This is the D > d) and locally rebuilding, Dockerfile.1 would build faster thanks to cache, but Dockerfile.2 would have to run all 4 commands again.

那么,问题来了:哪一种方式更好地实现Dockerfile?


当前回答

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

接下来,我个人根据层在其他映像中的重用潜力和预期的缓存使用情况对层进行了拆分。如果我有4个映像,它们都具有相同的基本映像(例如debian),我可能会在第一个运行命令中对这些映像中的大多数提取公共实用程序集合,以便其他映像受益于缓存。

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

在每一组更改中,我都尽可能地合并以减少层数。因此,如果有4个不同的源代码文件夹,它们会被放在一个文件夹中,这样就可以用一个命令添加它。在可能的情况下,从apt-get等程序安装的任何包都合并到一个RUN中,以最大限度地减少包管理器开销(更新和清理)。


针对多阶段构建的更新:

我不太担心在多阶段构建的非最后阶段减小映像大小。当这些阶段没有被标记并传送到其他节点时,您可以通过将每个命令分割到单独的RUN行来最大限度地提高缓存重用的可能性。

然而,这并不是压缩层的完美解决方案,因为在阶段之间复制的都是文件,而不是其他图像元数据,如环境变量设置、入口点和命令。当您在linux发行版中安装包时,库和其他依赖项可能分散在整个文件系统中,这使得复制所有依赖项变得困难。

正因为如此,我使用多阶段构建来代替在CI/CD服务器上构建二进制文件,这样我的CI/CD服务器只需要有运行docker构建的工具,而不需要安装jdk、nodejs、go和任何其他编译工具。

其他回答

上面的答案似乎已经过时了。文档指出:

在Docker 17.05之前,甚至在Docker 1.10之前,最小化图像中的层数是很重要的。的 以下改进减轻了这一需求: […] Docker 17.05及更高版本增加了对多阶段构建的支持 允许您只复制您需要的工件到最终的图像。 这允许您将工具和调试信息包含在 中间构建阶段,而不增加最终的大小 的形象。

这:

注意,这个示例还人为地压缩了两个RUN命令 一起使用Bash &&操作符,以避免创建额外的 图层在图像中。这很容易失败,而且很难维护。

最佳实践似乎已经改变为使用多级构建并保持dockerfile可读。

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

接下来,我个人根据层在其他映像中的重用潜力和预期的缓存使用情况对层进行了拆分。如果我有4个映像,它们都具有相同的基本映像(例如debian),我可能会在第一个运行命令中对这些映像中的大多数提取公共实用程序集合,以便其他映像受益于缓存。

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

在每一组更改中,我都尽可能地合并以减少层数。因此,如果有4个不同的源代码文件夹,它们会被放在一个文件夹中,这样就可以用一个命令添加它。在可能的情况下,从apt-get等程序安装的任何包都合并到一个RUN中,以最大限度地减少包管理器开销(更新和清理)。


针对多阶段构建的更新:

我不太担心在多阶段构建的非最后阶段减小映像大小。当这些阶段没有被标记并传送到其他节点时,您可以通过将每个命令分割到单独的RUN行来最大限度地提高缓存重用的可能性。

然而,这并不是压缩层的完美解决方案,因为在阶段之间复制的都是文件,而不是其他图像元数据,如环境变量设置、入口点和命令。当您在linux发行版中安装包时,库和其他依赖项可能分散在整个文件系统中,这使得复制所有依赖项变得困难。

正因为如此,我使用多阶段构建来代替在CI/CD服务器上构建二进制文件,这样我的CI/CD服务器只需要有运行docker构建的工具,而不需要安装jdk、nodejs、go和任何其他编译工具。

官方答案列出了他们的最佳做法(官方形象必须遵守这些)

尽量减少层数 你需要找到两者之间的平衡 Dockerfile的可读性(以及长期可维护性) 尽量减少它使用的层数。有策略,谨慎 关于你使用的层数。

从docker 1.10开始,COPY, ADD和RUN语句为你的图像添加了一个新层。使用这些语句时要小心。尝试将命令组合成一个RUN语句。仅在为了可读性需要时才将其分离。

更多信息:https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#minimize-the-number-of-layers

更新:docker >17.05的多级

对于多阶段构建,您可以在Dockerfile中使用多个FROM语句。每个FROM语句都是一个阶段,可以有自己的基映像。在最后一个阶段,您使用一个最小的基本映像(如alpine),从前面的阶段复制构建工件并安装运行时需求。这一阶段的最终结果就是你的形象。这就是前面描述的图层的问题。

和往常一样,docker在多阶段构建方面有很棒的文档。以下是一个简短的节选:

对于多阶段构建,您可以在 Dockerfile。每个FROM指令可以使用不同的基,并且每个FROM指令可以使用不同的基 其中一个开始了构建的新阶段。你可以选择性地复制 从一个阶段到另一个阶段,留下你的一切 不想在最后的图像。

关于这方面的博客文章可以在这里找到:https://blog.alexellis.io/mutli-stage-docker-builds/

回答你的问题:

Yes, layers are sort of like diffs. I don't think there are layers added if there's absolutely zero changes. The problem is that once you install / download something in layer #2, you can not remove it in layer #3. So once something is written in a layer, the image size can not be decreased anymore by removing that. Although layers can be pulled in parallel, making it potentially faster, each layer undoubtedly increases the image size, even if they're removing files. Yes, caching is useful if you're updating your docker file. But it works in one direction. If you have 10 layers, and you change layer #6, you'll still have to rebuild everything from layer #6-#10. So it's not too often that it will speed the build process up, but it's guaranteed to unnecessarily increase the size of your image.


感谢@Mohan提醒我更新这个答案。

这取决于你在图像层中包含了什么。关键在于共享尽可能多的层。

不好的例子

Dockerfile 运行yum install big-package && yum install pack1 Dockerfile 运行yum install big-package && yum install package2

很好的例子

Dockerfile 运行yum install big-package 执行yum install package1 Dockerfile 运行yum install big-package 执行yum install package2

另一个建议是,只有当删除操作与添加/安装操作发生在同一层时,删除操作才不那么有用。