如果我在使用git时发生了哈希冲突,会发生什么?

例如,我设法提交了两个具有相同sha1校验和的文件, git会注意到它还是会破坏其中一个文件?

git是否可以改进以适应这种情况,或者我必须更改为一个新的哈希算法?

(请不要通过讨论这有多不可能来转移这个问题-谢谢)


当前回答

如果两个文件在git中具有相同的哈希和,它会将这两个文件视为相同的。在绝对不可能发生这种情况的情况下,你可以总是返回一次提交,并更改文件中的某些内容,这样它们就不会再碰撞了……

请参阅Linus Torvalds的帖子“开始考虑sha-256?”的邮件列表。

其他回答

我最近在一个BSD讨论组中发现了一篇来自2013-04-29的帖子

http://openbsd-archive.7691.n7.nabble.com/Why-does-OpenBSD-use-CVS-td226952.html

海报宣称:

我在使用git rebase时遇到了一次哈希碰撞。

不幸的是,他没有为自己的说法提供任何证据。但也许你想试着联系他,问问他关于这个所谓的事件。

但在更一般的层面上,由于生日攻击,SHA-1哈希碰撞的几率为1 / pow(2,80)。

这听起来很多,而且肯定比世界上所有Git存储库中出现的单个文件的版本总数还要多。

但是,这只适用于实际保留在版本历史中的版本。

If a developer relies very much on rebasing, every time a rebase is run for a branch, all the commits in all the versions of that branch (or rebased part of the branch) get new hashes. The same is true for every file modifies with "git filter-branch". Therefore, "rebase" and "filter-branch" might be big multipliers for the number of hashes generated over time, even though not all of them are actually kept: Frequently, after rebasing (especially for the purpose of "cleaning up" a branch), the original branch is thrown away.

但是,如果碰撞发生在重基或过滤器分支期间,它仍然会产生不利影响。

另一件事是估计git存储库中散列实体的总数,看看它们离pow(2,80)有多远。

假设我们有大约80亿人,他们都在运行git,并将他们的东西保存在每人100个git存储库中。让我们进一步假设平均存储库有100次提交和10个文件,并且每次提交只更改其中一个文件。

对于每个修订,我们至少有一个树对象和commit对象本身的哈希。加上修改后的文件,每个修订有3个哈希值,因此每个存储库有300个哈希值。

对于80亿人的100个存储库,这给出的pow(2,47)离pow(2,80)还很远。

但是,这并不包括上面提到的假定的乘法效应,因为我不确定如何将其包括在这个估计中。也许这会大大增加碰撞的几率。特别是当非常大的存储库有很长的提交历史时(比如Linux内核),许多人为了小的更改而重新基于存储库,这仍然会为所有受影响的提交创建不同的哈希值。

如果两个文件在git中具有相同的哈希和,它会将这两个文件视为相同的。在绝对不可能发生这种情况的情况下,你可以总是返回一次提交,并更改文件中的某些内容,这样它们就不会再碰撞了……

请参阅Linus Torvalds的帖子“开始考虑sha-256?”的邮件列表。

好吧,我想我们现在知道会发生什么了——你应该预料到你的存储库会被损坏(源代码)。

谷歌现在声称在某些前提条件下SHA-1碰撞是可能的: https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

由于git使用SHA-1来检查文件完整性,这意味着git中的文件完整性受到了损害。

在我看来,git应该使用更好的哈希算法,因为故意碰撞现在是可能的。

你可以在“Git如何处理一个blob上的SHA-1碰撞?”中看到一个很好的研究。

由于SHA1冲突现在是可能的(正如我在回答中用shatat .io提到的),Git 2.13(2017年第二季度)将通过Marc Stevens (CWI)和Dan Shumow(微软)实现的SHA-1“检测试图创建冲突”的变体来改善/缓解当前的情况。

参见Jeff King (peff)的commit f5f5e7f, commit 8325e43, commit c0c2006, commit 45a574e, commit 28dc98e(2017年3月16日)。 (由Junio C Hamano—gitster—在commit 48b3693中合并,2017年3月24日)

Makefile: make DC_SHA1 the default We used to use the SHA1 implementation from the OpenSSL library by default. As we are trying to be careful against collision attacks after the recent "shattered" announcement, switch the default to encourage people to use DC_SHA1 implementation instead. Those who want to use the implementation from OpenSSL can explicitly ask for it by OPENSSL_SHA1=YesPlease when running "make". We don't actually have a Git-object collision, so the best we can do is to run one of the shattered PDFs through test-sha1. This should trigger the collision check and die.


Git是否可以改进以适应这种情况,或者我是否必须改用新的哈希算法?

2017年12月Git 2.16(2018年第一季度)更新:支持替代SHA的努力正在进行中:参见“为什么Git不使用更现代的SHA?”。

您将能够使用另一种哈希算法:SHA1不再是Git的唯一算法。


Git 2.18(2018年第二季度)记录了这个过程。

参见Ævar Arnfjörð Bjarmason (avar)提交5988eb6,提交45fa195(2018年3月26日)。 (由Junio C Hamano - gitster -在commit d877975中合并,2018年4月11日)

doc hash-function-transition: clarify what SHAttered means Attempt to clarify what the SHAttered attack means in practice for Git. The previous version of the text made no mention whatsoever of Git already having a mitigation for this specific attack, which the SHAttered researchers claim will detect cryptanalytic collision attacks. I may have gotten some of the nuances wrong, but as far as I know this new text accurately summarizes the current situation with SHA-1 in git. I.e. git doesn't really use SHA-1 anymore, it uses Hardened-SHA-1 (they just so happen to produce the same outputs 99.99999999999...% of the time). Thus the previous text was incorrect in asserting that: [...]As a result [of SHAttered], SHA-1 cannot be considered cryptographically secure any more[...] That's not the case. We have a mitigation against SHAttered, however we consider it prudent to move to work towards a NewHash should future vulnerabilities in either SHA-1 or Hardened-SHA-1 emerge.

所以现在新的文档是这样的:

Git v2.13.0 and later subsequently moved to a hardened SHA-1 implementation by default, which isn't vulnerable to the SHAttered attack. Thus Git has in effect already migrated to a new hash that isn't SHA-1 and doesn't share its vulnerabilities, its new hash function just happens to produce exactly the same output for all known inputs, except two PDFs published by the SHAttered researchers, and the new implementation (written by those researchers) claims to detect future cryptanalytic collision attacks. Regardless, it's considered prudent to move past any variant of SHA-1 to a new hash. There's no guarantee that future attacks on SHA-1 won't be published in the future, and those attacks may not have viable mitigations. If SHA-1 and its variants were to be truly broken, Git's hash function could not be considered cryptographically secure any more. This would impact the communication of hash values because we could not trust that a given hash value represented the known good version of content that the speaker intended.

注意:现在同一文档(2018年第三季度,Git 2.19)明确地将“新哈希”引用为SHA-256:参见“为什么Git不使用更现代的SHA?”。