使用Git管理大型二进制文件

我正在寻找如何处理我的源代码(web应用程序)依赖的大型二进制文件的意见。我们目前正在讨论几种替代方案:

Copy the binary files by hand. Pro: Not sure. Contra: I am strongly against this, as it increases the likelihood of errors when setting up a new site/migrating the old one. Builds up another hurdle to take. Manage them all with Git. Pro: Removes the possibility to 'forget' to copy a important file Contra: Bloats the repository and decreases flexibility to manage the code-base and checkouts, clones, etc. will take quite a while. Separate repositories. Pro: Checking out/cloning the source code is fast as ever, and the images are properly archived in their own repository. Contra: Removes the simpleness of having the one and only Git repository on the project. It surely introduces some other things I haven't thought about.

你对此有什么经验/想法?

还有:有人有在一个项目中使用多个Git存储库并管理它们的经验吗?

这些文件是用于生成包含这些文件的pdf文件的程序的图像。这些文件不会经常更改(例如几年)，但它们与程序非常相关。没有这些文件，程序将无法工作。

当前回答

看看git bup，这是一个git扩展，可以聪明地在git存储库中存储大型二进制文件。

您希望将它作为子模块使用，但不必担心存储库变得难以处理。他们的一个示例用例是在Git中存储VM映像。

实际上我还没有看到更好的压缩率，但我的存储库中并没有真正大的二进制文件。

你的里程可能会有所不同。

2011-03-21 21:59:54

其他回答

我将使用子模块(如Pat Notz)或两个不同的存储库。如果你太频繁地修改二进制文件，那么我会尽量减少巨大的存储库清理历史记录的影响:

几个月前我遇到了一个非常类似的问题:~21 GB的MP3文件，未分类(糟糕的名称，糟糕的id3，不知道我是否喜欢这个MP3文件……)，并在三台计算机上复制。

我使用带有主Git存储库的外部硬盘驱动器，并将其克隆到每台计算机中。然后，我开始用习惯的方式对它们进行分类(推、拉、合并……)多次删除和重命名)。

最后，我只有~ 6gb的MP3文件和~83 GB的.git目录。我使用git-write-tree和git-commit-tree创建了一个新的提交，没有提交祖先，并启动了一个指向该提交的新分支。该分支的“git日志”只显示了一次提交。

然后，我删除了旧的分支，只保留了新的分支，删除了ref-logs，并运行“git prune”:在那之后，我的.git文件夹只重约6gb…

你可以不时地用同样的方法“清除”这个巨大的存储库:你的“git克隆”会更快。

2009-02-12 14:52:57

看看git bup，这是一个git扩展，可以聪明地在git存储库中存储大型二进制文件。

您希望将它作为子模块使用，但不必担心存储库变得难以处理。他们的一个示例用例是在Git中存储VM映像。

实际上我还没有看到更好的压缩率，但我的存储库中并没有真正大的二进制文件。

你的里程可能会有所不同。

2011-03-21 21:59:54

我最近发现了git-annex，我觉得很棒。它是为有效地管理大文件而设计的。我用它来收集我的照片/音乐(等)。git-annex的开发非常活跃。文件的内容可以从Git存储库中删除，Git只跟踪树的层次结构(通过符号链接)。然而，要获得文件的内容，在拉/推之后需要第二步，例如:

$ git annex add mybigfile
$ git commit -m'add mybigfile'
$ git push myremote
$ git annex copy --to myremote mybigfile ## This command copies the actual content to myremote
$ git annex drop mybigfile ## Remove content from local repo
...
$ git annex get mybigfile ## Retrieve the content
## or to specify the remote from which to get:
$ git annex copy --from myremote mybigfile

有很多可用的命令，网站上有很好的文档。Debian上有一个软件包。

2011-07-09 13:54:28

Git LFS就是答案

# Init LFS
git lfs install
git lfs track "large_file_pattern"

# Then follow regular git workflow
git add large_file
git commit -m "Init a very large file"
git push origin HEAD

在后台，git lfs会创建一个对你的大文件的引用，而不是直接存储在git repo中

欲了解更多信息:https://git-lfs.github.com/

2022-06-10 04:14:12

SVN似乎比Git更有效地处理二进制增量。

我必须决定文档的版本控制系统(JPEG文件、PDF文件和.odt文件)。我刚刚测试了添加一个JPEG文件并将其旋转90度4次(以检查二进制增量的有效性)。Git的存储库增长了400%。SVN的存储库仅增长了11%。

因此，看起来SVN使用二进制文件更有效率。

所以我选择Git作为源代码，SVN作为文档之类的二进制文件。

2010-10-03 03:11:41

使用Git管理大型二进制文件

推荐文章

最新文章

标签