The OP (in the April 2012 updated revision of the question) is interested in knowing if there's a way to add to a list in amortized constant time, such as can be done, for example, with a C++ vector<> container. The best answer(s?) here so far only show the relative execution times for various solutions given a fixed-size problem, but do not address any of the various solutions' algorithmic efficiency directly. Comments below many of the answers discuss the algorithmic efficiency of some of the solutions, but in every case to date (as of April 2015) they come to the wrong conclusion.
算法效率捕获随着问题规模增长的时间(执行时间)或空间(内存消耗量)的增长特征。给定一个固定大小的问题,为各种解决方案运行性能测试并不能解决各种解决方案的增长速度。OP感兴趣的是知道是否有一种方法可以在“平摊常数时间”中将对象追加到R列表中。这是什么意思?为了解释,首先让我描述一下“常数时间”:
Constant or O(1) growth:
If the time required to perform a given task remains the same as the size of the problem doubles, then we say the algorithm exhibits constant time growth, or stated in "Big O" notation, exhibits O(1) time growth. When the OP says "amortized" constant time, he simply means "in the long run"... i.e., if performing a single operation occasionally takes much longer than normal (e.g. if a preallocated buffer is exhausted and occasionally requires resizing to a larger buffer size), as long as the long-term average performance is constant time, we'll still call it O(1).
For comparison, I will also describe "linear time" and "quadratic time":
Linear or O(n) growth:
If the time required to perform a given task doubles as the size of the problem doubles, then we say the algorithm exhibits linear time, or O(n) growth.
Quadratic or O(n2) growth:
If the time required to perform a given task increases by the square of the problem size, them we say the algorithm exhibits quadratic time, or O(n2) growth.
还有很多其他效率算法;我参考维基百科的文章进行进一步讨论。
我感谢@CronAcronis的回答,因为我是R的新手,很高兴有一个完整的代码块来对本页上提供的各种解决方案进行性能分析。我借用了他的代码用于我的分析,我复制了下面(包装在一个函数中):
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj}
runBenchmark <- function(n) {
microbenchmark(times = 5,
env_with_list_ = {
listptr <- new.env(parent=globalenv())
listptr$list <- NULL
for(i in 1:n) {envAppendList(listptr, i)}
listptr$list
},
c_ = {
a <- list(0)
for(i in 1:n) {a = c(a, list(i))}
},
list_ = {
a <- list(0)
for(i in 1:n) {a <- list(a, list(i))}
},
by_index = {
a <- list(0)
for(i in 1:n) {a[length(a) + 1] <- i}
a
},
append_ = {
a <- list(0)
for(i in 1:n) {a <- append(a, i)}
a
},
env_as_container_ = {
listptr <- new.env(parent=globalenv())
for(i in 1:n) {lPtrAppend(listptr, i, i)}
listptr
}
)
}
@CronAcronis发布的结果显然表明,a <- list(a, list(i))方法是最快的,至少对于10000大小的问题是这样,但是对于单个问题大小的结果并没有解决解决方案的增长问题。为此,我们需要针对不同的问题大小运行至少两个概要测试:
> runBenchmark(2e+3)
Unit: microseconds
expr min lq mean median uq max neval
env_with_list_ 8712.146 9138.250 10185.533 10257.678 10761.33 12058.264 5
c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738 5
list_ 854.110 913.407 1064.463 914.167 1301.50 1339.132 5
by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363 5
append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560 5
env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502 5
> runBenchmark(2e+4)
Unit: milliseconds
expr min lq mean median uq max neval
env_with_list_ 534.955014 550.57150 550.329366 553.5288 553.955246 558.636313 5
c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706 5
list_ 8.746356 8.79615 9.162577 8.8315 9.601226 9.837655 5
by_index 953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200 5
append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874 5
env_as_container_ 204.134468 205.35348 208.011525 206.4490 208.279580 215.841129 5
>
First of all, a word about the min/lq/mean/median/uq/max values: Since we are performing the exact same task for each of 5 runs, in an ideal world, we could expect that it would take exactly the same amount of time for each run. But the first run is normally biased toward longer times due to the fact that the code we are testing is not yet loaded into the CPU's cache. Following the first run, we would expect the times to be fairly consistent, but occasionally our code may be evicted from the cache due to timer tick interrupts or other hardware interrupts that are unrelated to the code we are testing. By testing the code snippets 5 times, we are allowing the code to be loaded into the cache during the first run and then giving each snippet 4 chances to run to completion without interference from outside events. For this reason, and because we are really running the exact same code under the exact same input conditions each time, we will consider only the 'min' times to be sufficient for the best comparison between the various code options.
请注意,我选择首先运行问题大小为2000,然后运行问题大小为20000,因此我的问题大小从第一次运行到第二次运行增加了10倍。
表解性能:O(1)(常数时间)
Let's first look at the growth of the list solution, since we can tell right away that it's the fastest solution in both profiling runs: In the first run, it took 854 microseconds (0.854 milliseconds) to perform 2000 "append" tasks. In the second run, it took 8.746 milliseconds to perform 20000 "append" tasks. A naïve observer would say, "Ah, the list solution exhibits O(n) growth, since as the problem size grew by a factor of ten, so did the time required to execute the test." The problem with that analysis is that what the OP wants is the growth rate of a single object insertion, not the growth rate of the overall problem. Knowing that, it's clear then that the list solution provides exactly what the OP wants: a method of appending objects to a list in O(1) time.
其他解决方案的性能
其他的解决方案都没有接近列表解决方案的速度,但无论如何,检查它们都是有信息的:
Most of the other solutions appear to be O(n) in performance. For example, the by_index solution, a very popular solution based on the frequency with which I find it in other SO posts, took 11.6 milliseconds to append 2000 objects, and 953 milliseconds to append ten times that many objects. The overall problem's time grew by a factor of 100, so a naïve observer might say "Ah, the by_index solution exhibits O(n2) growth, since as the problem size grew by a factor of ten, the time required to execute the test grew by a factor of 100." As before, this analysis is flawed, since the OP is interested in the growth of a single object insertion. If we divide the overall time growth by the problem's size growth, we find that the time growth of appending objects increased by a factor of only 10, not a factor of 100, which matches the growth of the problem size, so the by_index solution is O(n). There are no solutions listed which exhibit O(n2) growth for appending a single object.