我有麻烦重新安排以下数据帧:
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)
dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357
我想重塑它,以便每个唯一的“name”变量都是一个行名,“值”作为该行的观察值,“数字”作为冒号。就像这样:
name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
我试过熔化和铸造,还有其他一些方法,但似乎都不行。
Win-Vector公司的天才数据科学家(他们制作了vtreat、seplyr和replyr)推出了一个非常强大的新软件包,名为cdata。它实现了本文和本文中描述的“协调数据”原则。其思想是,无论如何组织数据,都应该能够使用“数据坐标”系统识别单个数据点。下面是约翰·芒特最近博客文章的节选:
The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.
It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.
我们将首先构建控制表(有关详细信息,请参阅博客文章),然后执行数据从行到列的移动。
library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
columnToTakeKeysFrom = 'numbers', # this will become column headers
columnToTakeValuesFrom = 'value', # this contains data
sep="_") # optional for making column names
# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
keyColumns = c('name'), # this(these) column(s) should stay untouched
controlTable = pivotControlTable# control table above
)
dat_wide
#> name numbers_1 numbers_2 numbers_3 numbers_4
#> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
只使用dplyr和map。
library(dplyr)
library(purrr)
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2), value = rnorm(8)
)
longer_to_wider <- function(data, name_from, value_from){
group <- colnames(data)[!(colnames(data) %in% c(name_from,value_from))]
data %>% group_by(.data[[group]]) %>%
summarise( name = list(.data[[name_from]]),
value = list(.data[[value_from]])) %>%
{
d <- data.frame(
name = .[[name_from]] %>% unlist() %>% unique()
)
e <- map_dfc(.[[group]],function(x){
y <- data_frame(
x = data %>% filter(.data[[group]] == x) %>% pull(value_from)
)
colnames(y) <- x
y
})
cbind(d,e)
}
}
longer_to_wider(dat1, "name", "value")
# name 1 2 3 4
# 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
# 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
简单多了!
devtools::install_github("yikeshu0611/onetree") #install onetree package
library(onetree)
widedata=reshape_toWide(data = dat1,id = "name",j = "numbers",value.var.prefix = "value")
widedata
name value1 value2 value3 value4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
如果你想从宽返回到长,只改变宽为长,不改变对象。
reshape_toLong(data = widedata,id = "name",j = "numbers",value.var.prefix = "value")
name numbers value
firstName 1 0.3407997
secondName 1 -0.8981073
firstName 2 -0.7033403
secondName 2 -0.3347941
firstName 3 -0.3795377
secondName 3 -0.5013782
firstName 4 -0.7460474
secondName 4 -0.1745357