如何从Git存储库的提交历史中删除/删除一个大文件?

我不小心把一个dvd光盘放到了一个网站项目中，然后不小心提交-a -m…而且，快，回购膨胀了2.2 g。下次我做了一些编辑，删除了视频文件，并提交了所有内容，但压缩文件仍然在存储库中，在历史中。

我知道我可以从这些提交中启动分支，并将一个分支重置到另一个分支上。但是我应该怎么做才能合并两次提交，使大文件不显示在历史记录中，并在垃圾收集过程中被清理?

当前回答

根据GitHub文档，只需遵循以下步骤:

去掉大文件

选项1:你不想保留大文件:

rm path/to/your/large/file        # delete the large file

选项2:您希望将大文件保存到一个未跟踪的目录中

mkdir large_files                       # create directory large_files
touch .gitignore                        # create .gitignore file if needed
'/large_files/' >> .gitignore           # untrack directory large_files
mv path/to/your/large/file large_files/ # move the large file into the untracked directory

保存更改

git add path/to/your/large/file   # add the deletion to the index
git commit -m 'delete large file' # commit the deletion

从所有提交中删除大文件

git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch path/to/your/large/file" \
  --prune-empty --tag-name-filter cat -- --all
git push <remote> <branch>

2020-09-10 13:37:23

其他回答

这些命令在我的案例中起作用:

git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

它与上面的版本没有什么不同。

对于那些需要把这个推到github/bitbucket的人(我只用bitbucket测试了这个):

# WARNING!!!
# this will rewrite completely your bitbucket refs
# will delete all branches that you didn't have in your local

git push --all --prune --force

# Once you pushed, all your teammates need to clone repository again
# git pull will not work

2013-06-14 02:35:36

为什么不使用这个简单而强大的命令呢?

git filter-branch --tree-filter 'rm -f DVD-rip' HEAD

——tree-filter选项在项目每次签出后运行指定的命令，然后重新提交结果。在这种情况下，您从每个快照中删除一个名为DVD-rip的文件，无论它是否存在。

如果你知道是哪个提交引入了这个巨大的文件(比如35dsa2)，你可以用35dsa2替换HEAD。HEAD以避免重写太多的历史，从而避免在还没有推送的情况下出现不同的提交。@alpha_989提供的这个评论似乎太重要了，不能在这里省略。

请看这个链接。

2015-05-16 09:44:10

新的答案在20222年有效。

请勿使用:

git filter-branch

此命令可能不会在按下后更改远程回购。如果你在使用它后进行克隆，你会看到什么都没有改变，回购仍然有一个很大的大小。这个命令现在已经过时了。例如，如果您使用https://github.com/18F/C2/issues/439中的步骤，这将不起作用。

你需要使用

git filter-repo

步骤:

(1)找到。git中最大的文件:

git rev-list --objects --all | grep -f <(git verify-pack -v  .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)

(2)开始过滤这些大文件:

 git filter-repo --path-glob '../../src/../..' --invert-paths --force

 git filter-repo --path-glob '*.zip' --invert-paths --force

 git filter-repo --path-glob '*.a' --invert-paths --force

或无论你在第一步中找到什么。

(3)

 git remote add origin git@github.com:.../...git

(4)

git push --all --force

git push --tags --force

完成了! !

2022-11-27 17:01:52

如果您已经向其他开发人员发布了历史记录，那么您想要做的事情是非常具有破坏性的。关于修复历史记录后的必要步骤，请参阅git Rebase文档中的“从上游Rebase恢复”。

你至少有两个选择:git filter-branch和交互式rebase，这两个选项都在下面解释。

使用git filter-branch

我在Subversion导入的大量二进制测试数据中遇到过类似的问题，并写过关于从git存储库中删除数据的文章。

假设你的git历史是:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

注意，git lola是一个非标准但非常有用的别名。(详见答案末尾的附录)git日志的——name-status开关显示与每次提交相关的树修改。

在“粗心”提交(其SHA1对象名称为ce36c98)中，文件出错。iso是意外添加的DVD-rip文件，并在下次提交时删除cb14efd。使用上述博客文章中描述的技术，要执行的命令是:

git filter-branch --prune-empty -d /dev/shm/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
  --tag-name-filter cat -- --all

选项:

--prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history. -d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution. --index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise. --tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality. -- specifies the end of options to git filter-branch --all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.

经过一番折腾，现在的历史是:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
|
| * f772d66 (refs/original/refs/heads/master) Login page
| | A   login.html
| * cb14efd Remove DVD-rip
| | D   oops.iso
| * ce36c98 Careless
|/  A   oops.iso
|   A   other.html
|
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

注意，新的“粗心”提交只添加了other.html，而“Remove DVD-rip”提交不再在主分支上。标记为refs/original/refs/heads/master的分支包含了你的原始提交，以防你犯了错误。要删除它，请遵循“缩小存储库的检查表”中的步骤。

$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now

对于一个更简单的替代方法，克隆存储库以丢弃不需要的位。

$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo

使用文件:///…克隆URL复制对象，而不是只创建硬链接。

现在你的历史是:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

前两个提交(“Index”和“Admin page”)的SHA1对象名称保持不变，因为过滤操作没有修改这些提交。“粗心”输了。iso和“Login page”有了新的父节点，所以它们的sha1确实改变了。

交互式变基

历史:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

你想要移除。“粗心”中的iso，就好像你从来没有添加过一样，然后“删除DVD-rip”对你来说是没有用的。因此，我们进入交互式数据库的计划是保留“管理页面”，编辑“粗心”，并丢弃“删除DVD-rip”。

运行$ git rebase -i 5af4522启动一个包含以下内容的编辑器。

pick ce36c98 Careless
pick cb14efd Remove DVD-rip
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

执行我们的计划，我们把它修改为

edit ce36c98 Careless
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
# ...

也就是说，我们删除了“Remove DVD-rip”这一行，并将“Careless”上的操作更改为edit而不是pick。

保存退出编辑器将在命令提示符下退出，并显示以下消息。

Stopped at ce36c98... Careless
You can amend the commit now, with

        git commit --amend

Once you are satisfied with your changes, run

        git rebase --continue

正如消息告诉我们的那样，我们正处于想要编辑的“粗心”提交中，因此我们运行两个命令。

$ git rm --cached oops.iso
$ git commit --amend -C HEAD
$ git rebase --continue

第一个方法从索引中删除有问题的文件。第二个修改或修正" Careless "为更新后的索引，-C HEAD指示git重用旧的提交消息。最后，git rebase—continue继续执行其余的rebase操作。

这给出了一个历史:

$ git lola --name-status
* 93174be (HEAD, master) Login page
| A     login.html
* a570198 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

这就是你想要的。

附录:通过~/.gitconfig启用git lola

引用康拉德·帕克的话:

我在Scott Chacon在linux.conf.au 2010上的演讲中学到的最好的技巧是:Git的高级技巧和窍门:

Lol = log -graph - decoration -pretty=oneline -commit

这提供了一个非常好的树图，显示了合并等分支结构。当然，有非常好的GUI工具来显示这样的图形，但git lol的优势在于它可以在控制台或ssh上工作，所以它对于远程开发或嵌入式板上的本地开发非常有用……

因此，只需将下面的代码复制到~/。Gitconfig为您的全彩git Lola行动: (别名) Lol = log -graph - decoration -pretty=oneline -commit Lola = log -graph - decoration -pretty=oneline -commit -all (颜色) 分支=自动 Diff =自动交互=自动状态= auto

2010-01-28 21:55:32

这对我来说是完美的:在git扩展中:

右键单击所选的提交:

重置当前分支到这里:

硬复位;

令人惊讶的是，没有人能给出这个简单的答案。

2020-06-26 09:52:35

如何从Git存储库的提交历史中删除/删除一个大文件?

推荐文章

最新文章

标签