在常数平摊时间O(1)中将一个对象追加到R中的列表?

如果我有一些R列表mylist，你可以像这样添加一个obj项:

mylist[[length(mylist)+1]] <- obj

但肯定有一些更紧凑的方式。当我刚在R工作时，我试着像这样写lappend():

lappend <- function(lst, obj) {
    lst[[length(lst)+1]] <- obj
    return(lst)
}

但是，由于R的按名调用语义(lst在调用时被有效地复制，因此对lst的更改在lappend()的作用域之外是不可见的)，这当然是行不通的。我知道您可以在R函数中进行环境入侵，从而超出函数的作用域并改变调用环境，但对于编写一个简单的附加函数来说，这似乎是一个巨大的打击。

有谁能提出一个更漂亮的方法吗?如果它对向量和列表都适用，那就更好了。

当前回答

The OP (in the April 2012 updated revision of the question) is interested in knowing if there's a way to add to a list in amortized constant time, such as can be done, for example, with a C++ vector<> container. The best answer(s?) here so far only show the relative execution times for various solutions given a fixed-size problem, but do not address any of the various solutions' algorithmic efficiency directly. Comments below many of the answers discuss the algorithmic efficiency of some of the solutions, but in every case to date (as of April 2015) they come to the wrong conclusion.

算法效率捕获随着问题规模增长的时间(执行时间)或空间(内存消耗量)的增长特征。给定一个固定大小的问题，为各种解决方案运行性能测试并不能解决各种解决方案的增长速度。OP感兴趣的是知道是否有一种方法可以在“平摊常数时间”中将对象追加到R列表中。这是什么意思?为了解释，首先让我描述一下“常数时间”:

Constant or O(1) growth: If the time required to perform a given task remains the same as the size of the problem doubles, then we say the algorithm exhibits constant time growth, or stated in "Big O" notation, exhibits O(1) time growth. When the OP says "amortized" constant time, he simply means "in the long run"... i.e., if performing a single operation occasionally takes much longer than normal (e.g. if a preallocated buffer is exhausted and occasionally requires resizing to a larger buffer size), as long as the long-term average performance is constant time, we'll still call it O(1). For comparison, I will also describe "linear time" and "quadratic time": Linear or O(n) growth: If the time required to perform a given task doubles as the size of the problem doubles, then we say the algorithm exhibits linear time, or O(n) growth. Quadratic or O(n2) growth: If the time required to perform a given task increases by the square of the problem size, them we say the algorithm exhibits quadratic time, or O(n2) growth.

还有很多其他效率算法;我参考维基百科的文章进行进一步讨论。

我感谢@CronAcronis的回答，因为我是R的新手，很高兴有一个完整的代码块来对本页上提供的各种解决方案进行性能分析。我借用了他的代码用于我的分析，我复制了下面(包装在一个函数中):

library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 
runBenchmark <- function(n) {
    microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            listptr
        }   
    )
}

@CronAcronis发布的结果显然表明，a <- list(a, list(i))方法是最快的，至少对于10000大小的问题是这样，但是对于单个问题大小的结果并没有解决解决方案的增长问题。为此，我们需要针对不同的问题大小运行至少两个概要测试:

> runBenchmark(2e+3)
Unit: microseconds
              expr       min        lq      mean    median       uq       max neval
    env_with_list_  8712.146  9138.250 10185.533 10257.678 10761.33 12058.264     5
                c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738     5
             list_   854.110   913.407  1064.463   914.167  1301.50  1339.132     5
          by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363     5
           append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560     5
 env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502     5
> runBenchmark(2e+4)
Unit: milliseconds
              expr         min         lq        mean    median          uq         max neval
    env_with_list_  534.955014  550.57150  550.329366  553.5288  553.955246  558.636313     5
                c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706     5
             list_    8.746356    8.79615    9.162577    8.8315    9.601226    9.837655     5
          by_index  953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200     5
           append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874     5
 env_as_container_  204.134468  205.35348  208.011525  206.4490  208.279580  215.841129     5
>

First of all, a word about the min/lq/mean/median/uq/max values: Since we are performing the exact same task for each of 5 runs, in an ideal world, we could expect that it would take exactly the same amount of time for each run. But the first run is normally biased toward longer times due to the fact that the code we are testing is not yet loaded into the CPU's cache. Following the first run, we would expect the times to be fairly consistent, but occasionally our code may be evicted from the cache due to timer tick interrupts or other hardware interrupts that are unrelated to the code we are testing. By testing the code snippets 5 times, we are allowing the code to be loaded into the cache during the first run and then giving each snippet 4 chances to run to completion without interference from outside events. For this reason, and because we are really running the exact same code under the exact same input conditions each time, we will consider only the 'min' times to be sufficient for the best comparison between the various code options.

请注意，我选择首先运行问题大小为2000，然后运行问题大小为20000，因此我的问题大小从第一次运行到第二次运行增加了10倍。

表解性能:O(1)(常数时间)

Let's first look at the growth of the list solution, since we can tell right away that it's the fastest solution in both profiling runs: In the first run, it took 854 microseconds (0.854 milliseconds) to perform 2000 "append" tasks. In the second run, it took 8.746 milliseconds to perform 20000 "append" tasks. A naïve observer would say, "Ah, the list solution exhibits O(n) growth, since as the problem size grew by a factor of ten, so did the time required to execute the test." The problem with that analysis is that what the OP wants is the growth rate of a single object insertion, not the growth rate of the overall problem. Knowing that, it's clear then that the list solution provides exactly what the OP wants: a method of appending objects to a list in O(1) time.

其他解决方案的性能

其他的解决方案都没有接近列表解决方案的速度，但无论如何，检查它们都是有信息的:

Most of the other solutions appear to be O(n) in performance. For example, the by_index solution, a very popular solution based on the frequency with which I find it in other SO posts, took 11.6 milliseconds to append 2000 objects, and 953 milliseconds to append ten times that many objects. The overall problem's time grew by a factor of 100, so a naïve observer might say "Ah, the by_index solution exhibits O(n2) growth, since as the problem size grew by a factor of ten, the time required to execute the test grew by a factor of 100." As before, this analysis is flawed, since the OP is interested in the growth of a single object insertion. If we divide the overall time growth by the problem's size growth, we find that the time growth of appending objects increased by a factor of only 10, not a factor of 100, which matches the growth of the problem size, so the by_index solution is O(n). There are no solutions listed which exhibit O(n2) growth for appending a single object.

2015-04-25 21:09:01

其他回答

试试这个函数lappend

lappend <- function (lst, ...){
  lst <- c(lst, list(...))
  return(lst)
}

将命名向量添加到列表中

Bye.

2012-10-26 10:49:19

事实上，c()函数有一个微妙之处。如果你有:

x <- list()
x <- c(x,2)
x = c(x,"foo")

如你所料，你将获得:

[[1]]
[1]

[[2]]
[1] "foo"

但是如果你添加一个x <- c(x, matrix(5,2,2))的矩阵，你的列表将有另外4个值为5的元素! 你最好做:

x <- c(x, list(matrix(5,2,2))

它适用于任何其他对象，你将获得预期的:

[[1]]
[1]

[[2]]
[1] "foo"

[[3]]
     [,1] [,2]
[1,]    5    5
[2,]    5    5

最后，你的函数变成:

push <- function(l, ...) c(l, list(...))

它适用于任何类型的对象。你可以更聪明地去做:

push_back <- function(l, ...) c(l, list(...))
push_front <- function(l, ...) c(list(...), l)

2014-06-24 19:56:47

这是一个非常有趣的问题，我希望我下面的想法可以为解决这个问题提供一种方式。这个方法给出了一个没有索引的平面列表，但是它有列表和反列表来避免嵌套结构。我不确定速度，因为我不知道如何基准。

a_list<-list()
for(i in 1:3){
  a_list<-list(unlist(list(unlist(a_list,recursive = FALSE),list(rnorm(2))),recursive = FALSE))
}
a_list

[[1]]
[[1]][[1]]
[1] -0.8098202  1.1035517

[[1]][[2]]
[1] 0.6804520 0.4664394

[[1]][[3]]
[1] 0.15592354 0.07424637

2016-04-14 16:43:40

如果你将列表变量作为一个带引号的字符串传入，你可以像这样从函数内部到达它:

push <- function(l, x) {
  assign(l, append(eval(as.name(l)), x), envir=parent.frame())
}

so:

> a <- list(1,2)
> a
[[1]]
[1] 1

[[2]]
[1] 2

> push("a", 3)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

>

或者为了获得额外的学分:

> v <- vector()
> push("v", 1)
> v
[1] 1
> push("v", 2)
> v
[1] 1 2
>

2010-03-13 02:30:07

为了验证，我运行了@Cron提供的基准测试代码。有一个主要的区别(除了在更新的i7处理器上运行更快):by_index现在的性能几乎和list_一样好:

Unit: milliseconds
              expr        min         lq       mean     median         uq
    env_with_list_ 167.882406 175.969269 185.966143 181.817187 185.933887
                c_ 485.524870 501.049836 516.781689 518.637468 537.355953
             list_   6.155772   6.258487   6.544207   6.269045   6.290925
          by_index   9.290577   9.630283   9.881103   9.672359  10.219533
           append_ 505.046634 543.319857 542.112303 551.001787 553.030110
 env_as_container_ 153.297375 154.880337 156.198009 156.068736 156.800135

这里是从@Cron的回答中逐字复制的基准代码(以防他后来更改内容):

n = 1e+4
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj}

microbenchmark(times = 5,
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = {
            a <- list(0)
            for(i in 1:n) {a <- append(a, i)}
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)}
            listptr
        }
)

2018-06-24 22:31:24

在常数平摊时间O(1)中将一个对象追加到R中的列表?

推荐文章

最新文章

标签