我有一个有两列的数据帧。第一列包含类别,如“第一”,“第二”,“第三”,第二列有数字,表示我从“类别”中看到特定组的次数。
例如:
Category Frequency
First 10
First 15
First 5
Second 2
Third 14
Third 20
Second 3
我想按类别对数据进行排序,并将所有频率相加:
Category Frequency
First 30
Second 5
Third 34
在R中怎么做呢?
虽然我最近对大多数这些类型的操作都转换为dplyr,但sqldf包对于某些事情仍然非常好(恕我直言,可读性更强)。
下面是一个示例,说明如何使用sqldf回答这个问题
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
sqldf("select
Category
,sum(Frequency) as Frequency
from x
group by
Category")
## Category Frequency
## 1 First 30
## 2 Second 5
## 3 Third 34
从dplyr 1.0.0开始,可以使用across()函数:
df %>%
group_by(Category) %>%
summarise(across(Frequency, sum))
Category Frequency
<chr> <int>
1 First 30
2 Second 5
3 Third 34
如果对多个变量感兴趣:
df %>%
group_by(Category) %>%
summarise(across(c(Frequency, Frequency2), sum))
Category Frequency Frequency2
<chr> <int> <int>
1 First 30 55
2 Second 5 29
3 Third 34 190
以及使用select helper来选择变量:
df %>%
group_by(Category) %>%
summarise(across(starts_with("Freq"), sum))
Category Frequency Frequency2 Frequency3
<chr> <int> <int> <dbl>
1 First 30 55 110
2 Second 5 29 58
3 Third 34 190 380
样本数据:
df <- read.table(text = "Category Frequency Frequency2 Frequency3
1 First 10 10 20
2 First 15 30 60
3 First 5 15 30
4 Second 2 8 16
5 Third 14 70 140
6 Third 20 120 240
7 Second 3 21 42",
header = TRUE,
stringsAsFactors = FALSE)