处理类似这样的数据帧:
set.seed(100)
df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))
df <- df[order(df$cat, df$val), ]
df
cat val
1 aaa 0.05638315
2 aaa 0.25767250
3 aaa 0.30776611
4 aaa 0.46854928
5 aaa 0.55232243
6 bbb 0.17026205
7 bbb 0.37032054
8 bbb 0.48377074
9 bbb 0.54655860
10 bbb 0.81240262
11 ccc 0.28035384
12 ccc 0.39848790
13 ccc 0.62499648
14 ccc 0.76255108
15 ccc 0.88216552
我试图在每个组中添加一个编号列。这样做显然没有使用R的幂:
df$num <- 1
for (i in 2:(length(df[,1]))) {
if (df[i,"cat"]==df[(i-1),"cat"]) {
df[i,"num"]<-df[i-1,"num"]+1
}
}
df
cat val num
1 aaa 0.05638315 1
2 aaa 0.25767250 2
3 aaa 0.30776611 3
4 aaa 0.46854928 4
5 aaa 0.55232243 5
6 bbb 0.17026205 1
7 bbb 0.37032054 2
8 bbb 0.48377074 3
9 bbb 0.54655860 4
10 bbb 0.81240262 5
11 ccc 0.28035384 1
12 ccc 0.39848790 2
13 ccc 0.62499648 3
14 ccc 0.76255108 4
15 ccc 0.88216552 5
做这件事的好方法是什么?
我想增加一个数据。使用rank()函数的表变体,它提供了额外的可能性来改变排序,从而使它比seq_len()解决方案更灵活,并且非常类似于RDBMS中的row_number函数。
# Variant with ascending ordering
library(data.table)
dt <- data.table(df)
dt[, .( val
, num = rank(val))
, by = list(cat)][order(cat, num),]
cat val num
1: aaa 0.05638315 1
2: aaa 0.25767250 2
3: aaa 0.30776611 3
4: aaa 0.46854928 4
5: aaa 0.55232243 5
6: bbb 0.17026205 1
7: bbb 0.37032054 2
8: bbb 0.48377074 3
9: bbb 0.54655860 4
10: bbb 0.81240262 5
11: ccc 0.28035384 1
12: ccc 0.39848790 2
13: ccc 0.62499648 3
14: ccc 0.76255108 4
# Variant with descending ordering
dt[, .( val
, num = rank(desc(val)))
, by = list(cat)][order(cat, num),]
在2021-04-16进行编辑,使降序和升序之间的切换更加安全
我想增加一个数据。使用rank()函数的表变体,它提供了额外的可能性来改变排序,从而使它比seq_len()解决方案更灵活,并且非常类似于RDBMS中的row_number函数。
# Variant with ascending ordering
library(data.table)
dt <- data.table(df)
dt[, .( val
, num = rank(val))
, by = list(cat)][order(cat, num),]
cat val num
1: aaa 0.05638315 1
2: aaa 0.25767250 2
3: aaa 0.30776611 3
4: aaa 0.46854928 4
5: aaa 0.55232243 5
6: bbb 0.17026205 1
7: bbb 0.37032054 2
8: bbb 0.48377074 3
9: bbb 0.54655860 4
10: bbb 0.81240262 5
11: ccc 0.28035384 1
12: ccc 0.39848790 2
13: ccc 0.62499648 3
14: ccc 0.76255108 4
# Variant with descending ordering
dt[, .( val
, num = rank(desc(val)))
, by = list(cat)][order(cat, num),]
在2021-04-16进行编辑,使降序和升序之间的切换更加安全