在常数平摊时间O(1)中将一个对象追加到R中的列表?

如果我有一些R列表mylist，你可以像这样添加一个obj项:

mylist[[length(mylist)+1]] <- obj

但肯定有一些更紧凑的方式。当我刚在R工作时，我试着像这样写lappend():

lappend <- function(lst, obj) {
    lst[[length(lst)+1]] <- obj
    return(lst)
}

但是，由于R的按名调用语义(lst在调用时被有效地复制，因此对lst的更改在lappend()的作用域之外是不可见的)，这当然是行不通的。我知道您可以在R函数中进行环境入侵，从而超出函数的作用域并改变调用环境，但对于编写一个简单的附加函数来说，这似乎是一个巨大的打击。

有谁能提出一个更漂亮的方法吗?如果它对向量和列表都适用，那就更好了。

当前回答

事实上，c()函数有一个微妙之处。如果你有:

x <- list()
x <- c(x,2)
x = c(x,"foo")

如你所料，你将获得:

[[1]]
[1]

[[2]]
[1] "foo"

但是如果你添加一个x <- c(x, matrix(5,2,2))的矩阵，你的列表将有另外4个值为5的元素! 你最好做:

x <- c(x, list(matrix(5,2,2))

它适用于任何其他对象，你将获得预期的:

[[1]]
[1]

[[2]]
[1] "foo"

[[3]]
     [,1] [,2]
[1,]    5    5
[2,]    5    5

最后，你的函数变成:

push <- function(l, ...) c(l, list(...))

它适用于任何类型的对象。你可以更聪明地去做:

push_back <- function(l, ...) c(l, list(...))
push_front <- function(l, ...) c(list(...), l)

2014-06-24 19:56:47

其他回答

我运行了以下基准测试:

bench=function(...,n=1,r=3){
  a=match.call(expand.dots=F)$...
  t=matrix(ncol=length(a),nrow=n)
  for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}
  o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))
  round(1e3*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)
}

ns=10^c(3:7)
m=sapply(ns,function(n)bench(n=5,
  `vector at length + 1`={l=c();for(i in 1:n)l[length(l)+1]=i},
  `vector at index`={l=c();for(i in 1:n)l[i]=i},
  `vector at index, initialize with type`={l=integer();for(i in 1:n)l[i]=i},
  `vector at index, initialize with length`={l=vector(length=n);for(i in 1:n)l[i]=i},
  `vector at index, initialize with type and length`={l=integer(n);for(i in 1:n)l[i]=i},
  `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
  `list at index`={l=list();for(i in 1:n)l[[i]]=i},
  `list at index, initialize with length`={l=vector('list',n);for(i in 1:n)l[[i]]=i},
  `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
  `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)},
  `nested lists`={l=list();for(i in 1:n)l=list(l,i)},
  `nested lists with unlist`={if(n<=1e5){l=list();for(i in 1:n)l=list(l,i);o=unlist(l)}},
  `nested lists with manual unlist`={l=list();for(i in 1:n)l=list(l,i);o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}},
  `JanKanis better_env_as_container`={env=new.env(hash=T,parent=globalenv());for(i in 1:n)env[[as.character(i)]]=i},
  `JanKanis inlineLinkedList`={a=list();for(i in 1:n)a=list(a,i);b=vector('list',n);head=a;for(i in n:1){b[[i]]=head[[2]];head=head[[1]]}},
  `JanKanis inlineExpandingList`={l=vector('list',10);cap=10;len=0;for(i in 1:n){if(len==cap){l=c(l,vector('list',cap));cap=cap*2};len=len+1;l[[len]]=i};l[1:len]},
  `c`={if(n<=1e5){l=c();for(i in 1:n)l=c(l,i)}},
  `append vector`={if(n<=1e5){l=integer(n);for(i in 1:n)l=append(l,i)}},
  `append list`={if(n<=1e9){l=list();for(i in 1:n)l=append(l,i)}}
)[,1])

m[rownames(m)%in%c("nested lists with unlist","c","append vector","append list"),4:5]=NA
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

输出:

 1e3   1e4   1e5  1e6   1e7
2.35  24.5   245 2292 27146 vector at length + 1
0.61   5.9    60  590  7360 vector at index
0.61   5.9    64  587  7132 vector at index, initialize with type
0.56   5.6    54  523  6418 vector at index, initialize with length
0.54   5.5    55  522  6371 vector at index, initialize with type and length
2.65  28.8   299 3955 48204 list at length + 1
0.93   9.2    96 1605 13480 list at index
0.58   5.6    57  707  8461 list at index, initialize with length
0.62   5.8    59  739  9413 list at index, initialize with double length, remove null
0.88   8.4    81  962 11872 list at index, double when full, get length from variable
0.96   9.5    92 1264 15813 list at index, double when full, check length inside loop
0.21   1.9    22  426  3826 nested lists
0.25   2.4    29   NA    NA nested lists with unlist
2.85  27.5   295 3065 31427 nested lists with manual unlist
1.65  20.2   293 6505  8835 JanKanis better_env_as_container
1.11  10.1   110 1534 27119 JanKanis inlineLinkedList
2.66  26.3   266 3592 47120 JanKanis inlineExpandingList
1.22 118.6 15466   NA    NA c
3.64 512.0 45167   NA    NA append vector
6.35 664.8 71399   NA    NA append list

上表显示的是每种方法的中值时间，而不是平均时间，因为有时单个运行的时间比典型运行的时间长得多，这会扭曲平均运行时间。但是在第一次运行之后的后续运行中，没有一种方法变得更快，因此每种方法的最小时间和中值时间通常是相似的。

The method "vector at index" (l=c();for(i in 1:n)l[i]=i) was about 5 times faster than "vector at length + 1" (l=c();for(i in 1:n)l[length(l)]=i), because getting the length of the vector took longer than adding an element to the vector. When I initialized the vector with a predetermined length, it made the code about 20% faster, but initializing with a specific type didn't make a difference, because the type just needs to be changed once when the first item is added to the vector. And in the case of lists, when you compare the methods "list at index" and "list at index initialized with length", initializing the list with a predetermined length made a bigger difference as the length of the list increased, because it made the code about twice as fast at length 1e6 but about 3 times as fast at length 1e7.

方法“list at index”(l=list();for(i in 1:n)l[[i]]=i)比方法“list at length +1”(l=list();for(i in 1:n)l[[length(l)+1]]=i)快3-4倍。

JanKanis的链表和扩展列表方法比“索引列表”慢，但比“长度+ 1列表”快。链表比扩展表快。

有些人声称append函数比c函数快，但在我的基准测试中，append大约比c慢3-4倍。

在上表中，长度1e6和1e7和在三个方法中缺失:对于“c”，“append vector”和“append list”，因为它们具有二次时间复杂度，以及对于“带有unlist的嵌套列表”，因为它会导致堆栈溢出。

The "nested lists" option was the fastest, but it doesn't include the time that it takes to flatten the list. When I used the unlist function to flatten the nested list, I got a stack overflow when the length of the list was around 1.26e5 or higher, because the unlist function calls itself recursively by default: n=1.26e5;l=list();for(i in 1:n)l=list(l,list(i));u=unlist(l). And when I used repeated calls of unlist(recursive=F), it took about 4 seconds to run even for a list with only 10,000 items: for(i in 1:n)l=unlist(l,recursive=F). But when I did the unlisting manually, it only took about 0.3 seconds to run for a list with a million items: o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}.

If you don't know how many items you are going to append to a list in advance but you know the maximum number of items, then you can try to initialize the list at the maximum length and then later remove NULL values. Or another approach is to double the size of the list every time the list becomes full (which you can do faster if you have one variable for the length of the list and another variable for the number of items you have added to the list, so then you don't have to check the length of the list object on each iteration of a loop):

ns=10^c(2:7)
m=sapply(ns,function(n)bench(n=5,
  `list at index`={l=list();for(i in 1:n)l[[i]]=i},
  `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
  `list at index, initialize with length`={l=vector("list",n);for(i in 1:n)l[[i]]=i},
  `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, initialize with length 1e7, remove null`={l=vector("list",1e7);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, initialize with length 1e8, remove null`={l=vector("list",1e8);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
  `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)}
)[,1])

m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

输出:

  1e4 1e5  1e6   1e7
  9.3 102 1225 13250 list at index
 27.4 315 3820 45920 list at length + 1
  5.7  58  726  7548 list at index, initialize with length
  5.8  60  748  8057 list at index, initialize with double length, remove null
 33.4  88  902  7684 list at index, initialize with length 1e7, remove null
333.2 393 2691 12245 list at index, initialize with length 1e8, remove null
  8.6  83 1032 10611 list at index, double when full, get length from variable
  9.3  96 1280 14319 list at index, double when full, check length inside loop

2022-07-30 10:40:30

如果你将列表变量作为一个带引号的字符串传入，你可以像这样从函数内部到达它:

push <- function(l, x) {
  assign(l, append(eval(as.name(l)), x), envir=parent.frame())
}

so:

> a <- list(1,2)
> a
[[1]]
[1] 1

[[2]]
[1] 2

> push("a", 3)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

>

或者为了获得额外的学分:

> v <- vector()
> push("v", 1)
> v
[1] 1
> push("v", 2)
> v
[1] 1 2
>

2010-03-13 02:30:07

如果它是一个字符串列表，只需使用c()函数:

R> LL <- list(a="tom", b="dick")
R> c(LL, c="harry")
$a
[1] "tom"

$b
[1] "dick"

$c
[1] "harry"

R> class(LL)
[1] "list"
R>

这也适用于向量，我得到额外的分数了吗?

编辑(2015-02-01):这篇文章即将迎来它的第五个生日。一些好心的读者一直在重复它的缺点，所以无论如何也要看看下面的一些评论。关于列表类型的一个建议:

newlist <- list(oldlist, list(someobj))

一般来说，R类型使得所有类型和用途都很难有一个或只有一个习语。

2010-03-13 01:56:18

我认为你要做的实际上是通过引用(指针)传递给函数——创建一个新的环境(通过引用传递给函数)，并添加列表:

listptr=new.env(parent=globalenv())
listptr$list=mylist

#Then the function is modified as:
lPtrAppend <- function(lstptr, obj) {
    lstptr$list[[length(lstptr$list)+1]] <- obj
}

现在您只是在修改现有的列表(而不是创建一个新的列表)

2011-08-24 16:02:11