我是否可以调用一个命令来计算Git存储库中特定作者更改的行数?我知道一定有方法来计算提交的数量,因为Github为他们的影响图这样做。


当前回答

我对上面的一个简短的回答作了修改,但这不足以满足我的需要。我需要能够对提交的行和最终代码中的行进行分类。我还想按文件进行细分。这段代码不递归,它只返回单个目录的结果,但如果有人想进一步了解,这是一个很好的开始。复制并粘贴到文件中并使其可执行或使用Perl运行。

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $dir = shift;

die "Please provide a directory name to check\n"
    unless $dir;

chdir $dir
    or die "Failed to enter the specified directory '$dir': $!\n";

if ( ! open(GIT_LS,'-|','git ls-files') ) {
    die "Failed to process 'git ls-files': $!\n";
}
my %stats;
while (my $file = <GIT_LS>) {
    chomp $file;
    if ( ! open(GIT_LOG,'-|',"git log --numstat $file") ) {
        die "Failed to process 'git log --numstat $file': $!\n";
    }
    my $author;
    while (my $log_line = <GIT_LOG>) {
        if ( $log_line =~ m{^Author:\s*([^<]*?)\s*<([^>]*)>} ) {
            $author = lc($1);
        }
        elsif ( $log_line =~ m{^(\d+)\s+(\d+)\s+(.*)} ) {
            my $added = $1;
            my $removed = $2;
            my $file = $3;
            $stats{total}{by_author}{$author}{added}        += $added;
            $stats{total}{by_author}{$author}{removed}      += $removed;
            $stats{total}{by_author}{total}{added}          += $added;
            $stats{total}{by_author}{total}{removed}        += $removed;

            $stats{total}{by_file}{$file}{$author}{added}   += $added;
            $stats{total}{by_file}{$file}{$author}{removed} += $removed;
            $stats{total}{by_file}{$file}{total}{added}     += $added;
            $stats{total}{by_file}{$file}{total}{removed}   += $removed;
        }
    }
    close GIT_LOG;

    if ( ! open(GIT_BLAME,'-|',"git blame -w $file") ) {
        die "Failed to process 'git blame -w $file': $!\n";
    }
    while (my $log_line = <GIT_BLAME>) {
        if ( $log_line =~ m{\((.*?)\s+\d{4}} ) {
            my $author = $1;
            $stats{final}{by_author}{$author}     ++;
            $stats{final}{by_file}{$file}{$author}++;

            $stats{final}{by_author}{total}       ++;
            $stats{final}{by_file}{$file}{total}  ++;
            $stats{final}{by_file}{$file}{total}  ++;
        }
    }
    close GIT_BLAME;
}
close GIT_LS;

print "Total lines committed by author by file\n";
printf "%25s %25s %8s %8s %9s\n",'file','author','added','removed','pct add';
foreach my $file (sort keys %{$stats{total}{by_file}}) {
    printf "%25s %4.0f%%\n",$file
            ,100*$stats{total}{by_file}{$file}{total}{added}/$stats{total}{by_author}{total}{added};
    foreach my $author (sort keys %{$stats{total}{by_file}{$file}}) {
        next if $author eq 'total';
        if ( $stats{total}{by_file}{$file}{total}{added} ) {
            printf "%25s %25s %8d %8d %8.0f%%\n",'', $author,@{$stats{total}{by_file}{$file}{$author}}{qw{added removed}}
            ,100*$stats{total}{by_file}{$file}{$author}{added}/$stats{total}{by_file}{$file}{total}{added};
        } else {
            printf "%25s %25s %8d %8d\n",'', $author,@{$stats{total}{by_file}{$file}{$author}}{qw{added removed}} ;
        }
    }
}
print "\n";

print "Total lines in the final project by author by file\n";
printf "%25s %25s %8s %9s %9s\n",'file','author','final','percent', '% of all';
foreach my $file (sort keys %{$stats{final}{by_file}}) {
    printf "%25s %4.0f%%\n",$file
            ,100*$stats{final}{by_file}{$file}{total}/$stats{final}{by_author}{total};
    foreach my $author (sort keys %{$stats{final}{by_file}{$file}}) {
        next if $author eq 'total';
        printf "%25s %25s %8d %8.0f%% %8.0f%%\n",'', $author,$stats{final}{by_file}{$file}{$author}
            ,100*$stats{final}{by_file}{$file}{$author}/$stats{final}{by_file}{$file}{total}
            ,100*$stats{final}{by_file}{$file}{$author}/$stats{final}{by_author}{total}
        ;
    }
}
print "\n";


print "Total lines committed by author\n";
printf "%25s %8s %8s %9s\n",'author','added','removed','pct add';
foreach my $author (sort keys %{$stats{total}{by_author}}) {
    next if $author eq 'total';
    printf "%25s %8d %8d %8.0f%%\n",$author,@{$stats{total}{by_author}{$author}}{qw{added removed}}
        ,100*$stats{total}{by_author}{$author}{added}/$stats{total}{by_author}{total}{added};
};
print "\n";


print "Total lines in the final project by author\n";
printf "%25s %8s %9s\n",'author','final','percent';
foreach my $author (sort keys %{$stats{final}{by_author}}) {
    printf "%25s %8d %8.0f%%\n",$author,$stats{final}{by_author}{$author}
        ,100*$stats{final}{by_author}{$author}/$stats{final}{by_author}{total};
}

其他回答

这个脚本可以做到这一点。把它放入authorship.sh, chmod +x,就完成了。

#!/bin/sh
declare -A map
while read line; do
    if grep "^[a-zA-Z]" <<< "$line" > /dev/null; then
        current="$line"
        if [ -z "${map[$current]}" ]; then 
            map[$current]=0
        fi
    elif grep "^[0-9]" <<<"$line" >/dev/null; then
        for i in $(cut -f 1,2 <<< "$line"); do
            map[$current]=$((map[$current] + $i))
        done
    fi
done <<< "$(git log --numstat --pretty="%aN")"

for i in "${!map[@]}"; do
    echo -e "$i:${map[$i]}"
done | sort -nr -t ":" -k 2 | column -t -s ":"

除了Charles Bailey的回答之外,您可能还想在命令中添加-C参数。否则,即使文件内容没有被修改,文件重命名也会被视为大量的添加和删除(与文件的行数一样多)。

为了说明,当使用git log——oneline——shortstat命令时,这里有一个从我的一个项目中移动的大量文件的提交:

9052459 Reorganized project structure
 43 files changed, 1049 insertions(+), 1000 deletions(-)

这里使用git log——oneline——shortstat -C命令来检测文件的复制和重命名:

9052459 Reorganized project structure
 27 files changed, 134 insertions(+), 85 deletions(-)

在我看来,后者给出了一个人对项目有多大影响的更现实的观点,因为重命名一个文件比从头开始写文件要小得多。

这是一个伟大的回购,使您的生活更容易

git-quick-stats

在安装了brew的mac上

酿造安装git-quick-stats

Run

git-quick-stats

只需从这个列表中输入列出的数字并按enter键,选择您想要的选项。

 Generate:
    1) Contribution stats (by author)
    2) Contribution stats (by author) on a specific branch
    3) Git changelogs (last 10 days)
    4) Git changelogs by author
    5) My daily status
    6) Save git log output in JSON format

 List:
    7) Branch tree view (last 10)
    8) All branches (sorted by most recent commit)
    9) All contributors (sorted by name)
   10) Git commits per author
   11) Git commits per date
   12) Git commits per month
   13) Git commits per weekday
   14) Git commits per hour
   15) Git commits by author per hour

 Suggest:
   16) Code reviewers (based on git history)

我发现下面的方法对于查看当前代码库中谁拥有最多的行很有用:

git ls-files -z | xargs -0n1 git blame -w | ruby -n -e '$_ =~ /^.*\((.*?)\s[\d]{4}/; puts $1.strip' | sort -f | uniq -c | sort -n

其他答案主要集中在提交中更改的行,但如果提交无法存活并被覆盖,则它们可能只是被更改了。上面的咒语还可以让您按行对所有提交者进行排序,而不是一次只排序一个。您可以向git blame (-C -M)添加一些选项,以获得一些更好的数字,将文件移动和文件之间的行移动考虑在内,但如果这样做,该命令可能会运行更长时间。

同样,如果你正在为所有提交者寻找在所有提交中更改的行,下面的小脚本很有帮助:

http://git-wt-commit.rubyforge.org/#git-rank-contributors

要统计给定作者(或所有作者)在给定分支上提交的数量,可以使用git-shortlog;特别是它的——编号和——摘要选项,例如在git存储库上运行时:

$ git shortlog v1.6.4 --numbered --summary
  6904  Junio C Hamano
  1320  Shawn O. Pearce
  1065  Linus Torvalds
    692  Johannes Schindelin
    443  Eric Wong