我有麻烦重新安排以下数据帧:

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

我想重塑它,以便每个唯一的“name”变量都是一个行名,“值”作为该行的观察值,“数字”作为冒号。就像这样:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

我试过熔化和铸造,还有其他一些方法,但似乎都不行。


当前回答

基本重塑功能工作得非常好:

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

在哪里

Idvar是分隔行的类列 Timevar是要宽转换的类列 V.names是包含数值的列 方向指定宽或长格式 可选的sep参数是输出data.frame中timevar类名和v.names之间的分隔符。

如果不存在idvar,在使用重塑()函数之前创建一个:

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

只需要记住idvar是必需的!timevar和v.names部分很简单。这个函数的输出比其他一些函数更可预测,因为所有内容都是显式定义的。

其他回答

新的(2014年)tidyr包也简单地做到了这一点,gather()/spread()是melt/cast的术语。

编辑:现在,在2019年,tidyr v 1.0已经推出,并将spread和gather设置为弃用路径,更倾向于pivot_更宽和pivot_更长,您可以在这个答案中找到描述。如果你想简要了解一下传播/聚集的短暂生活,请继续阅读。

library(tidyr)
spread(dat1, key = numbers, value = value)

从github,

Tidyr是为了配合整洁的数据框架而设计的重塑重塑,并与magrittr和dplyr携手合作,为数据分析构建一个坚实的管道。 就像reshape2做得比重塑少一样,tidyr做得比重塑少。它是专门为整理数据而设计的,而不是像重塑2那样进行一般的重塑,也不是像重塑那样进行一般的聚合。特别是,内置方法只适用于数据帧,而tidyr不提供边距或聚合。

Win-Vector公司的天才数据科学家(他们制作了vtreat、seplyr和replyr)推出了一个非常强大的新软件包,名为cdata。它实现了本文和本文中描述的“协调数据”原则。其思想是,无论如何组织数据,都应该能够使用“数据坐标”系统识别单个数据点。下面是约翰·芒特最近博客文章的节选:

The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases. It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.

我们将首先构建控制表(有关详细信息,请参阅博客文章),然后执行数据从行到列的移动。

library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above
                    ) 
dat_wide

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

将三列数据框架重塑为矩阵(“长”到“宽”格式)。这个问题已经结束了,所以我在这里写了一个替代解。

我找到了另一种解决方案,可能对寻找将三列转换为矩阵的人有用。我指的是去耦(2.3.2)包。以下摘自他们的网站


生成一种表,其中行来自id_cols,列来自names_from,值来自values_from。

使用

pivot_wider_profile(
data,
id_cols,
names_from,
values_from,
values_fill = NA,
to_matrix = FALSE,
to_sparse = FALSE,
...
)

即使你有丢失的对,它也能工作,而且它不需要排序(as.matrix(dat1)[,1:2]可以用cbind(dat1[,1],dat1[,2])替换):

> set.seed(45);dat1=data.frame(name=rep(c("firstName","secondName"),each=4),numbers=rep(1:4,2),value=rnorm(8))
> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> m=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
> m[as.matrix(dat1)[,1:2]]=dat1[,3]
> m
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

如果你有缺少的对并且需要排序,这是行不通的,但如果对已经排序了,它会更短一些:

> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> dat1=dat1[order(dat1[,1],dat1[,2]),] # not actually needed in this case
> matrix(dat1[,3],length(u1),,T,list(u1,u2))
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

下面是第一个方法的函数版本(添加as.data.frame使其与tibbles一起工作):

l2w=function(x,row=1,col=2,val=3,sort=F){
  u1=unique(x[,row])
  u2=unique(x[,col])
  if(sort){u1=sort(u1);u2=sort(u2)}
  out=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
  out[cbind(x[,row],x[,col])]=x[,val]
  out
}

或者如果你只有下三角形的值,你可以这样做:

> euro=as.matrix(eurodist)[1:3,1:3]
> lower=data.frame(V1=rownames(euro)[row(euro)[lower.tri(euro)]],V2=colnames(euro)[col(euro)[lower.tri(euro)]],V3=euro[lower.tri(euro)])
> lower
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
> n=unique(c(lower[,1],lower[,2]))
> full=rbind(lower,setNames(lower[,c(2,1,3)],names(lower)),data.frame(V1=n,V2=n,V3=0))
> full
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
4    Athens Barcelona 3313
5    Athens  Brussels 2963
6 Barcelona  Brussels 1318
7    Athens    Athens    0
8 Barcelona Barcelona    0
9  Brussels  Brussels    0
> l2w(full,sort=T)
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

或者还有另一种方法:

> rc=as.matrix(lower[-3])
> n=sort(unique(c(rc)))
> m=matrix(0,length(n),length(n),,list(n,n))
> m[rc]=lower[,3]
> m[rc[,2:1]]=lower[,3]
> m
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

base R中的另一个简单方法是使用xtabs。xtabs的结果基本上只是一个带有花哨类名的矩阵,但你可以让它看起来像一个普通的矩阵,class(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x)):

> x=xtabs(value~name+numbers,dat1);x
            numbers
name                  1          2          3          4
  firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 'xtabs' num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ name   : chr [1:2] "firstName" "secondName"
  ..$ numbers: chr [1:4] "1" "2" "3" "4"
 - attr(*, "call")= language xtabs(formula = value ~ name + numbers, data = dat1)
> class(x)
[1] "xtabs" "table"
> class(as.matrix(x)) # `as.matrix` has no effect because `x` is already a matrix
[1] "xtabs" "table"
> class(x)=NULL;class(x)
[1] "matrix" "array"
> attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
> x # now it looks like a regular matrix
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "firstName" "secondName"
  ..$ : chr [1:4] "1" "2" "3" "4"

通常as.data.frame(x)将xtab的结果转换回长格式,但你可以使用class(x)=NULL来避免这种情况:

> x=xtabs(value~name+numbers,dat1);as.data.frame(x)
        name numbers       Freq
1  firstName       1  0.3407997
2 secondName       1 -0.8981073
3  firstName       2 -0.7033403
4 secondName       2 -0.3347941
5  firstName       3 -0.3795377
6 secondName       3 -0.5013782
7  firstName       4 -0.7460474
8 secondName       4 -0.1745357
> class(x)=NULL;as.data.frame(x)
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

这将宽格式的数据转换为长格式(unlist将数据帧转换为向量,c将矩阵转换为向量):

w2l=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(c(unlist(x))))

使用重塑功能:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")