我有一个有两列的数据帧。第一列包含类别,如“第一”,“第二”,“第三”,第二列有数字,表示我从“类别”中看到特定组的次数。

例如:

Category     Frequency
First        10
First        15
First        5
Second       2
Third        14
Third        20
Second       3

我想按类别对数据进行排序,并将所有频率相加:

Category     Frequency
First        30
Second       5
Third        34

在R中怎么做呢?


当前回答

从dplyr 1.0.0开始,可以使用across()函数:

df %>%
 group_by(Category) %>%
 summarise(across(Frequency, sum))

  Category Frequency
  <chr>        <int>
1 First           30
2 Second           5
3 Third           34

如果对多个变量感兴趣:

df %>%
 group_by(Category) %>%
 summarise(across(c(Frequency, Frequency2), sum))

  Category Frequency Frequency2
  <chr>        <int>      <int>
1 First           30         55
2 Second           5         29
3 Third           34        190

以及使用select helper来选择变量:

df %>%
 group_by(Category) %>%
 summarise(across(starts_with("Freq"), sum))

  Category Frequency Frequency2 Frequency3
  <chr>        <int>      <int>      <dbl>
1 First           30         55        110
2 Second           5         29         58
3 Third           34        190        380

样本数据:

df <- read.table(text = "Category Frequency Frequency2 Frequency3
                 1    First        10         10         20
                 2    First        15         30         60
                 3    First         5         15         30
                 4   Second         2          8         16
                 5    Third        14         70        140
                 6    Third        20        120        240
                 7   Second         3         21         42",
                 header = TRUE,
                 stringsAsFactors = FALSE)

其他回答

使用聚合:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

在上面的例子中,可以在列表中指定多个维度。相同数据类型的多个聚合指标可以通过cbind合并:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(嵌入@thelatemail评论),聚合也有一个公式界面

aggregate(Frequency ~ Category, x, sum)

或者,如果希望聚合多个列,可以使用。符号(也适用于一列)

aggregate(. ~ Category, x, sum)

或tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34 

使用这些数据:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

虽然我最近对大多数这些类型的操作都转换为dplyr,但sqldf包对于某些事情仍然非常好(恕我直言,可读性更强)。

下面是一个示例,说明如何使用sqldf回答这个问题

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                  "Third", "Third", "Second")), 
                Frequency=c(10,15,5,2,14,20,3))

sqldf("select 
          Category
          ,sum(Frequency) as Frequency 
       from x 
       group by 
          Category")

##   Category Frequency
## 1    First        30
## 2   Second         5
## 3    Third        34
library(plyr)
ddply(tbl, .(Category), summarise, sum = sum(Frequency))

使用cast代替reccast(注意'Frequency'现在是'value')

df  <- data.frame(Category = c("First","First","First","Second","Third","Third","Second")
                  , value = c(10,15,5,2,14,20,3))

install.packages("reshape")

result<-cast(df, Category ~ . ,fun.aggregate=sum)

得到:

Category (all)
First     30
Second    5
Third     34

再加上第三个选项:

require(doBy)
summaryBy(Frequency~Category, data=yourdataframe, FUN=sum)

编辑:这是一个非常古老的答案。现在,我建议使用group_by和来自dplyr的summarise,如@docendo answer。