这个主题几乎耗尽了,但我想提供一个稍微更一般的版本,你不知道输出列的数量,一个先验的解决方案。举个例子
before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2', 'foo_and_bar_2_and_bar_3', 'foo_and_bar'))
attr type
1 1 foo_and_bar
2 30 foo_and_bar_2
3 4 foo_and_bar_2_and_bar_3
4 6 foo_and_bar
我们不能使用dplyr separate(),因为在分割之前我们不知道结果列的数量,所以我随后创建了一个使用stringr分割列的函数,并给出了所生成列的模式和名称前缀。我希望使用的编码模式是正确的。
split_into_multiple <- function(column, pattern = ", ", into_prefix){
cols <- str_split_fixed(column, pattern, n = Inf)
# Sub out the ""'s returned by filling the matrix to the right, with NAs which are useful
cols[which(cols == "")] <- NA
cols <- as.tibble(cols)
# name the 'cols' tibble as 'into_prefix_1', 'into_prefix_2', ..., 'into_prefix_m'
# where m = # columns of 'cols'
m <- dim(cols)[2]
names(cols) <- paste(into_prefix, 1:m, sep = "_")
return(cols)
}
然后我们可以在dplyr管道中使用split_into_multiple,如下所示:
after <- before %>%
bind_cols(split_into_multiple(.$type, "_and_", "type")) %>%
# selecting those that start with 'type_' will remove the original 'type' column
select(attr, starts_with("type_"))
>after
attr type_1 type_2 type_3
1 1 foo bar <NA>
2 30 foo bar_2 <NA>
3 4 foo bar_2 bar_3
4 6 foo bar <NA>
然后我们可以用gather来整理…
after %>%
gather(key, val, -attr, na.rm = T)
attr key val
1 1 type_1 foo
2 30 type_1 foo
3 4 type_1 foo
4 6 type_1 foo
5 1 type_2 bar
6 30 type_2 bar_2
7 4 type_2 bar_2
8 6 type_2 bar
11 4 type_3 bar_3