我有麻烦重新安排以下数据帧:
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)
dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357
我想重塑它,以便每个唯一的“name”变量都是一个行名,“值”作为该行的观察值,“数字”作为冒号。就像这样:
name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
我试过熔化和铸造,还有其他一些方法,但似乎都不行。
其他两种选择:
基本包:
df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df
sqldf包:
library(sqldf)
sqldf('SELECT name,
MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
FROM dat1
GROUP BY name')
Win-Vector公司的天才数据科学家(他们制作了vtreat、seplyr和replyr)推出了一个非常强大的新软件包,名为cdata。它实现了本文和本文中描述的“协调数据”原则。其思想是,无论如何组织数据,都应该能够使用“数据坐标”系统识别单个数据点。下面是约翰·芒特最近博客文章的节选:
The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.
It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.
我们将首先构建控制表(有关详细信息,请参阅博客文章),然后执行数据从行到列的移动。
library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
columnToTakeKeysFrom = 'numbers', # this will become column headers
columnToTakeValuesFrom = 'value', # this contains data
sep="_") # optional for making column names
# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
keyColumns = c('name'), # this(these) column(s) should stay untouched
controlTable = pivotControlTable# control table above
)
dat_wide
#> name numbers_1 numbers_2 numbers_3 numbers_4
#> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
对于tidyr,有pivot_wider()和pivot_longer(),它们分别被广义为从long -> wide或wide -> long进行重塑。使用OP的数据:
单列长>宽
library(tidyr)
dat1 %>%
pivot_wider(names_from = numbers, values_from = value)
# # A tibble: 2 x 5
# name `1` `2` `3` `4`
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 firstName 0.341 -0.703 -0.380 -0.746
# 2 secondName -0.898 -0.335 -0.501 -0.175
多列长>宽
Pivot_wider()还能够执行更复杂的枢轴操作。例如,你可以同时对多个列进行主元操作:
# create another column for showing the functionality
dat2 <- dat1 %>%
dplyr::rename(valA = value) %>%
dplyr::mutate(valB = valA * 2)
dat2 %>%
pivot_wider(names_from = numbers, values_from = c(valA, valB))
# # A tibble: 2 × 9
# name valA_1 valA_2 valA_3 valA_4 valB_1 valB_2 valB_3 valB_4
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 firstName 0.341 -0.703 -0.380 -0.746 0.682 -1.41 -0.759 -1.49
# 2 secondName -0.898 -0.335 -0.501 -0.175 -1.80 -0.670 -1.00 -0.349
在文档中可以找到更多的功能。