当你需要在不同的列上应用不同的聚合函数(并且你必须/想要坚持以R为基底)时,我发现它非常有用(并且有效):
e.g.
假设输入如下:
DF <-
data.frame(Categ1=factor(c('A','A','B','B','A','B','A')),
Categ2=factor(c('X','Y','X','X','X','Y','Y')),
Samples=c(1,2,4,3,5,6,7),
Freq=c(10,30,45,55,80,65,50))
> DF
Categ1 Categ2 Samples Freq
1 A X 1 10
2 A Y 2 30
3 B X 4 45
4 B X 3 55
5 A X 5 80
6 B Y 6 65
7 A Y 7 50
我们要按1类和2类进行分组,并计算Freq的样本和均值。
下面是使用ave的一个可能的解决方案:
# create a copy of DF (only the grouping columns)
DF2 <- DF[,c('Categ1','Categ2')]
# add sum of Samples by Categ1,Categ2 to DF2
# (ave repeats the sum of the group for each row in the same group)
DF2$GroupTotSamples <- ave(DF$Samples,DF2,FUN=sum)
# add mean of Freq by Categ1,Categ2 to DF2
# (ave repeats the mean of the group for each row in the same group)
DF2$GroupAvgFreq <- ave(DF$Freq,DF2,FUN=mean)
# remove the duplicates (keep only one row for each group)
DF2 <- DF2[!duplicated(DF2),]
结果:
> DF2
Categ1 Categ2 GroupTotSamples GroupAvgFreq
1 A X 6 45
2 A Y 9 40
3 B X 7 50
6 B Y 6 65