我有一个带有master和a分支的存储库,在这两个分支之间有很多合并活动。当分支A基于master创建时,我如何在我的存储库中找到提交?

我的存储库基本上是这样的:

-- X -- A -- B -- C -- D -- F  (master) 
          \     /   \     /
           \   /     \   /
             G -- H -- I -- J  (branch A)

我正在寻找修订A,这不是git merge-base(——all)找到的。


当前回答

要从分支点查找提交,可以使用这个。

git log --ancestry-path master..topicbranch

其他回答

目的:这个答案测试了在这个线程中给出的各种答案。

测试库

-- X -- A -- B -- C -- D -- F  (master) 
          \     /   \     /
           \   /     \   /
             G -- H -- I -- J  (branch A)
$ git --no-pager log --graph --oneline --all --decorate
* b80b645 (HEAD, branch_A) J - Work in branch_A branch
| *   3bd4054 (master) F - Merge branch_A into branch master
| |\  
| |/  
|/|   
* |   a06711b I - Merge master into branch_A
|\ \  
* | | bcad6a3 H - Work in branch_A
| | * b46632a D - Work in branch master
| |/  
| *   413851d C - Merge branch_A into branch master
| |\  
| |/  
|/|   
* | 6e343aa G - Work in branch_A
| * 89655bb B - Work in branch master
|/  
* 74c6405 (tag: branch_A_tag) A - Work in branch master
* 7a1c939 X - Work in branch master

正确的解决方案

唯一可行的解决方案是由lindes提供的正确返回A:

$ diff -u <(git rev-list --first-parent branch_A) \
          <(git rev-list --first-parent master) | \
      sed -ne 's/^ //p' | head -1
74c6405d17e319bd0c07c690ed876d65d89618d5

正如查尔斯·贝利指出的那样,这种解决方案非常脆弱。

如果你将branch_A合并为master,然后将master合并为branch_A,而不干预提交,那么lindes的解决方案只给你最近的第一次分歧。

这意味着对于我的工作流,我认为我将不得不坚持标记长时间运行的分支的分支点,因为我不能保证以后可以可靠地找到它们。

这实际上都归结为gits缺乏hg所谓的命名分支。博主jhw在他的文章《为什么我更喜欢Mercurial而不是Git》和他的后续文章《More On Mercurial vs. Git (with Graphs!)》中称这些为谱系vs.家族。我建议人们阅读它们,看看为什么一些mercurial转换错过了在git中没有命名分支。

不正确的解决方案

mipadi提供的解决方案返回两个答案I和C:

$ git rev-list --boundary branch_A...master | grep ^- | cut -c2-
a06711b55cf7275e8c3c843748daaa0aa75aef54
413851dfecab2718a3692a4bba13b50b81e36afc

由Greg Hewgill提供的解返回I

$ git merge-base master branch_A
a06711b55cf7275e8c3c843748daaa0aa75aef54
$ git merge-base --all master branch_A
a06711b55cf7275e8c3c843748daaa0aa75aef54

Karl提供的解返回X:

$ diff -u <(git log --pretty=oneline branch_A) \
          <(git log --pretty=oneline master) | \
       tail -1 | cut -c 2-42
7a1c939ec325515acfccb79040b2e4e1c3e7bbe5

测试存储库复制

创建一个测试存储库:

mkdir $1
cd $1
git init
git commit --allow-empty -m "X - Work in branch master"
git commit --allow-empty -m "A - Work in branch master"
git branch branch_A
git tag branch_A_tag     -m "Tag branch point of branch_A"
git commit --allow-empty -m "B - Work in branch master"
git checkout branch_A
git commit --allow-empty -m "G - Work in branch_A"
git checkout master
git merge branch_A       -m "C - Merge branch_A into branch master"
git checkout branch_A
git commit --allow-empty -m "H - Work in branch_A"
git merge master         -m "I - Merge master into branch_A"
git checkout master
git commit --allow-empty -m "D - Work in branch master"
git merge branch_A       -m "F - Merge branch_A into branch master"
git checkout branch_A
git commit --allow-empty -m "J - Work in branch_A branch"

我唯一添加的是标记,它明确了我们创建分支的点,从而明确了我们希望找到的提交。

我怀疑git版本对此有很大的不同,但是:

$ git --version
git version 1.7.1

感谢Charles Bailey向我展示了一种更紧凑的编写示例存储库脚本的方法。

一般来说,这是不可能的。在分支历史记录中,一个命名分支被分支之前的分支合并和两个命名分支的中间分支看起来是一样的。

在git中,分支只是历史记录部分提示的当前名称。他们并没有很强的认同感。

这通常不是一个大问题,因为两个提交的merge-base(参见Greg Hewgill的回答)通常更有用,给出两个分支共享的最近一次提交。

依赖于提交的父节点顺序的解决方案显然不适用于在某个分支历史上已经完全集成的情况。

git commit --allow-empty -m root # actual branch commit
git checkout -b branch_A
git commit --allow-empty -m  "branch_A commit"
git checkout master
git commit --allow-empty -m "More work on master"
git merge -m "Merge branch_A into master" branch_A # identified as branch point
git checkout branch_A
git merge --ff-only master
git commit --allow-empty -m "More work on branch_A"
git checkout master
git commit --allow-empty -m "More work on master"

如果一个集成合并是反向的父分支,这种技术也会失败(例如,一个临时分支被用来执行一个测试合并到主分支,然后快速跳转到功能分支以进一步构建)。

git commit --allow-empty -m root # actual branch point
git checkout -b branch_A
git commit --allow-empty -m  "branch_A commit"
git checkout master
git commit --allow-empty -m "More work on master"
git merge -m "Merge branch_A into master" branch_A # identified as branch point
git checkout branch_A
git commit --allow-empty -m "More work on branch_A"

git checkout -b tmp-branch master
git merge -m "Merge branch_A into tmp-branch (master copy)" branch_A
git checkout branch_A
git merge --ff-only tmp-branch
git branch -d tmp-branch

git checkout master
git commit --allow-empty -m "More work on master"

如果你喜欢简洁的命令,

git rev-list $(git rev-list --first-parent ^branch_name master | tail -n1)^^! 

下面是一个解释。

下面的命令提供了在创建branch_name之后发生的master中所有提交的列表

git rev-list --first-parent ^branch_name master 

因为你只关心那些最早的提交,所以你想要输出的最后一行:

git rev-list ^branch_name --first-parent master | tail -n1

最早提交的父文件不是“branch_name”的祖先,根据定义,它在“branch_name”中,并且在“master”中,因为它是“master”中的某个文件的祖先。两个分支中都有最早的提交。

命令

git rev-list commit^^!

只是一种显示父提交引用的方法。你可以用

git log -1 commit^

之类的。

PS:我不同意祖先顺序无关紧要的观点。这取决于你想要什么。例如,在这种情况下

_C1___C2_______ master
  \    \_XXXXX_ branch A (the Xs denote arbitrary cross-overs between master and A)
   \_____/ branch B

将C2输出为“分支”提交是非常有意义的。这是开发人员从“master”扩展出来的时候。当他进行分支时,分支B甚至没有合并到他的分支中!这就是本文给出的解决方案。

如果您想要的是最后一次提交——这样从起点到分支“A”上最后一次提交的所有路径都要经过C,那么您就需要忽略祖先顺序。这纯粹是拓扑学上的,让您了解从何时开始同时运行两个版本的代码。这时您将使用基于merge-base的方法,在我的示例中,它将返回C1。

当然我遗漏了一些东西,但在我看来,以上所有的问题都是因为我们总是试图找到历史上的分支点,这导致了各种各样的问题,因为可用的合并组合。

相反,我采用了一种不同的方法,基于两个分支共享很多历史,分支之前的所有历史都是100%相同的,所以我的建议是向前(从第一次提交开始),寻找两个分支的第一个差异。简单地说,分支点就是找到的第一个差值的父点。

在实践中:

#!/bin/bash
diff <( git rev-list "${1:-master}" --reverse --topo-order ) \
     <( git rev-list "${2:-HEAD}" --reverse --topo-order) \
--unified=1 | sed -ne 's/^ //p' | head -1

它解决了我所有的常规案件。当然,有些边境地区没有被覆盖,但是…你好:-)

有时这实际上是不可能的(除了一些例外情况,您可能幸运地拥有额外的数据),这里的解决方案不会起作用。

Git不保存历史引用(包括分支)。它只存储每个分支(头)的当前位置。这意味着随着时间的推移,你可能会丢失git中的一些分支历史。举个例子,每当你分支的时候,它就会立刻失去原来的那个分支。分支所做的就是:

git checkout branch1    # refs/branch1 -> commit1
git checkout -b branch2 # branch2 -> commit1

您可以假设第一个提交的是分支。情况往往如此,但也不总是如此。在上述操作之后,没有什么可以阻止您首先提交到任何一个分支。此外,git时间戳不能保证可靠。直到您对两者都做出承诺,它们才真正在结构上成为分支。

在图中,我们倾向于概念性地对提交进行编号,但是当提交树分支时,git没有真正稳定的序列概念。在这种情况下,您可以假设数字(表示顺序)是由时间戳决定的(当您将所有时间戳设置为相同时,看看git UI如何处理事情可能会很有趣)。

这是人类在概念上的期望:

After branch:
       C1 (B1)
      /
    -
      \
       C1 (B2)
After first commit:
       C1 (B1)
      /
    - 
      \
       C1 - C2 (B2)

这是你实际得到的结果:

After branch:
    - C1 (B1) (B2)
After first commit (human):
    - C1 (B1)
        \
         C2 (B2)
After first commit (real):
    - C1 (B1) - C2 (B2)

你会假设B1是原来的分支,但实际上它可能只是一个死分支(有人签出了-b,但从未提交给它)。直到你提交这两个,你才会在git中得到一个合法的分支结构:

Either:
      / - C2 (B1)
    -- C1
      \ - C3 (B2)
Or:
      / - C3 (B1)
    -- C1
      \ - C2 (B2)

You always know that C1 came before C2 and C3 but you never reliably know if C2 came before C3 or C3 came before C2 (because you can set the time on your workstation to anything for example). B1 and B2 is also misleading as you can't know which branch came first. You can make a very good and usually accurate guess at it in many cases. It is a bit like a race track. All things generally being equal with the cars then you can assume that a car that comes in a lap behind started a lap behind. We also have conventions that are very reliable, for example master will nearly always represent the longest lived branches although sadly I have seen cases where even this is not the case.

这里给出的例子是一个保存历史的例子:

Human:
    - X - A - B - C - D - F (B1)
           \     / \     /
            G - H ----- I - J (B2)
Real:
            B ----- C - D - F (B1)
           /       / \     /
    - X - A       /   \   /
           \     /     \ /
            G - H ----- I - J (B2)

Real here is also misleading because we as humans read it left to right, root to leaf (ref). Git does not do that. Where we do (A->B) in our heads git does (A<-B or B->A). It reads it from ref to root. Refs can be anywhere but tend to be leafs, at least for active branches. A ref points to a commit and commits only contain a like to their parent/s, not to their children. When a commit is a merge commit it will have more than one parent. The first parent is always the original commit that was merged into. The other parents are always commits that were merged into the original commit.

Paths:
    F->(D->(C->(B->(A->X)),(H->(G->(A->X))))),(I->(H->(G->(A->X))),(C->(B->(A->X)),(H->(G->(A->X)))))
    J->(I->(H->(G->(A->X))),(C->(B->(A->X)),(H->(G->(A->X)))))

这不是一个非常有效的表示,而是git可以从每个ref (B1和B2)中获得的所有路径的表达式。

Git的内部存储看起来更像这样(并不是A作为父文件出现了两次):

    F->D,I | D->C | C->B,H | B->A | A->X | J->I | I->H,C | H->G | G->A

如果你转储一个原始的git提交,你会看到零或多个父字段。如果为0,则表示没有父节点,提交的是根节点(实际上可以有多个根节点)。如果有一个,这意味着没有合并,它不是根提交。如果有多个,则意味着提交是合并的结果,第一个之后的所有父节点都是合并提交。

Paths simplified:
    F->(D->C),I | J->I | I->H,C | C->(B->A),H | H->(G->A) | A->X
Paths first parents only:
    F->(D->(C->(B->(A->X)))) | F->D->C->B->A->X
    J->(I->(H->(G->(A->X))) | J->I->H->G->A->X
Or:
    F->D->C | J->I | I->H | C->B->A | H->G->A | A->X
Paths first parents only simplified:
    F->D->C->B->A | J->I->->G->A | A->X
Topological:
    - X - A - B - C - D - F (B1)
           \
            G - H - I - J (B2)

When both hit A their chain will be the same, before that their chain will be entirely different. The first commit another two commits have in common is the common ancestor and from whence they diverged. there might be some confusion here between the terms commit, branch and ref. You can in fact merge a commit. This is what merge really does. A ref simply points to a commit and a branch is nothing more than a ref in the folder .git/refs/heads, the folder location is what determines that a ref is a branch rather than something else such as a tag.

你丢失历史的地方是合并会根据情况做两件事中的一件。

考虑:

      / - B (B1)
    - A
      \ - C (B2)

在这种情况下,任何一个方向的合并都将创建一个新的提交,其中第一个父节点作为当前检出分支指向的提交,第二个父节点作为您合并到当前分支的分支顶端的提交。它必须创建一个新的提交,因为自它们的共同祖先以来,两个分支都发生了必须合并的更改。

      / - B - D (B1)
    - A      /
      \ --- C (B2)

此时D (B1)现在拥有来自两个分支(自身和B2)的两组更改。然而,第二个分支没有从B1开始的更改。如果你合并B1到B2的变化,这样它们就同步了,那么你可能会看到这样的东西(你可以强制git合并,但是使用——no-ff):

Expected:
      / - B - D (B1)
    - A      / \
      \ --- C - E (B2)
Reality:
      / - B - D (B1) (B2)
    - A      /
      \ --- C

即使B1有额外的提交,也会得到这个结果。只要B2中没有B1中没有的变化,两个分支就会合并。它做了一个快进,就像一个rebase (rebase也吃或线性化历史),除了不像rebase只有一个分支有一个变更集,它不需要从一个分支应用一个变更集到另一个分支。

From:
      / - B - D - E (B1)
    - A      /
      \ --- C (B2)
To:
      / - B - D - E (B1) (B2)
    - A      /
      \ --- C

If you cease work on B1 then things are largely fine for preserving history in the long run. Only B1 (which might be master) will advance typically so the location of B2 in B2's history successfully represents the point that it was merged into B1. This is what git expects you to do, to branch B from A, then you can merge A into B as much as you like as changes accumulate, however when merging B back into A, it's not expected that you will work on B and further. If you carry on working on your branch after fast forward merging it back into the branch you were working on then your erasing B's previous history each time. You're really creating a new branch each time after fast forward commit to source then commit to branch. You end up with when you fast forward commit is lots of branches/merges that you can see in the history and structure but without the ability to determine what the name of that branch was or if what looks like two separate branches is really the same branch.

         0   1   2   3   4 (B1)
        /-\ /-\ /-\ /-\ /
    ----   -   -   -   -
        \-/ \-/ \-/ \-/ \
         5   6   7   8   9 (B2)

1 to 3 and 5 to 8 are structural branches that show up if you follow the history for either 4 or 9. There's no way in git to know which of this unnamed and unreferenced structural branches belong to with of the named and references branches as the end of the structure. You might assume from this drawing that 0 to 4 belongs to B1 and 4 to 9 belongs to B2 but apart from 4 and 9 was can't know which branch belongs to which branch, I've simply drawn it in a way that gives the illusion of that. 0 might belong to B2 and 5 might belong to B1. There are 16 different possibilies in this case of which named branch each of the structural branches could belong to. This is assuming that none of these structural branches came from a deleted branch or as a result of merging a branch into itself when pulling from master (the same branch name on two repos is infact two branches, a separate repository is like branching all branches).

There are a number of git strategies that work around this. You can force git merge to never fast forward and always create a merge branch. A horrible way to preserve branch history is with tags and/or branches (tags are really recommended) according to some convention of your choosing. I realy wouldn't recommend a dummy empty commit in the branch you're merging into. A very common convention is to not merge into an integration branch until you want to genuinely close your branch. This is a practice that people should attempt to adhere to as otherwise you're working around the point of having branches. However in the real world the ideal is not always practical meaning doing the right thing is not viable for every situation. If what you're doing on a branch is isolated that can work but otherwise you might be in a situation where when multiple developers are working one something they need to share their changes quickly (ideally you might really want to be working on one branch but not all situations suit that either and generally two people working on a branch is something you want to avoid).