我有麻烦重新安排以下数据帧:

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

我想重塑它,以便每个唯一的“name”变量都是一个行名,“值”作为该行的观察值,“数字”作为冒号。就像这样:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

我试过熔化和铸造,还有其他一些方法,但似乎都不行。


当前回答

Win-Vector公司的天才数据科学家(他们制作了vtreat、seplyr和replyr)推出了一个非常强大的新软件包,名为cdata。它实现了本文和本文中描述的“协调数据”原则。其思想是,无论如何组织数据,都应该能够使用“数据坐标”系统识别单个数据点。下面是约翰·芒特最近博客文章的节选:

The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases. It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.

我们将首先构建控制表(有关详细信息,请参阅博客文章),然后执行数据从行到列的移动。

library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above
                    ) 
dat_wide

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

其他回答

基本重塑功能工作得非常好:

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

在哪里

Idvar是分隔行的类列 Timevar是要宽转换的类列 V.names是包含数值的列 方向指定宽或长格式 可选的sep参数是输出data.frame中timevar类名和v.names之间的分隔符。

如果不存在idvar,在使用重塑()函数之前创建一个:

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

只需要记住idvar是必需的!timevar和v.names部分很简单。这个函数的输出比其他一些函数更可预测,因为所有内容都是显式定义的。

如果考虑性能,另一个选择是使用数据。表格对reshape2的melt和dcast函数的扩展

(参考:使用data.tables进行高效重塑)

library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

#          name          1          2         3         4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

至于数据。表v1.9.6可以对多个列进行强制转换

## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

对于tidyr,有pivot_wider()和pivot_longer(),它们分别被广义为从long -> wide或wide -> long进行重塑。使用OP的数据:

单列长>宽

library(tidyr)

dat1 %>% 
    pivot_wider(names_from = numbers, values_from = value)

# # A tibble: 2 x 5
#   name          `1`    `2`    `3`    `4`
#   <fct>       <dbl>  <dbl>  <dbl>  <dbl>
# 1 firstName   0.341 -0.703 -0.380 -0.746
# 2 secondName -0.898 -0.335 -0.501 -0.175

多列长>宽

Pivot_wider()还能够执行更复杂的枢轴操作。例如,你可以同时对多个列进行主元操作:

# create another column for showing the functionality
dat2 <- dat1 %>% 
    dplyr::rename(valA = value) %>%
    dplyr::mutate(valB = valA * 2) 

dat2 %>% 
    pivot_wider(names_from = numbers, values_from = c(valA, valB))

# # A tibble: 2 × 9
#   name       valA_1 valA_2 valA_3 valA_4 valB_1 valB_2 valB_3 valB_4
#   <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#  1 firstName   0.341 -0.703 -0.380 -0.746  0.682 -1.41  -0.759 -1.49 
#  2 secondName -0.898 -0.335 -0.501 -0.175 -1.80  -0.670 -1.00  -0.349

在文档中可以找到更多的功能。

使用基R聚合函数:

aggregate(value ~ name, dat1, I)

# name           value.1  value.2  value.3  value.4
#1 firstName      0.4145  -0.4747   0.0659   -0.5024
#2 secondName    -0.8259   0.1669  -0.8962    0.1681

使用重塑功能:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")