我运行了以下基准测试:
bench=function(...,n=1,r=3){
a=match.call(expand.dots=F)$...
t=matrix(ncol=length(a),nrow=n)
for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}
o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))
round(1e3*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)
}
ns=10^c(3:7)
m=sapply(ns,function(n)bench(n=5,
`vector at length + 1`={l=c();for(i in 1:n)l[length(l)+1]=i},
`vector at index`={l=c();for(i in 1:n)l[i]=i},
`vector at index, initialize with type`={l=integer();for(i in 1:n)l[i]=i},
`vector at index, initialize with length`={l=vector(length=n);for(i in 1:n)l[i]=i},
`vector at index, initialize with type and length`={l=integer(n);for(i in 1:n)l[i]=i},
`list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
`list at index`={l=list();for(i in 1:n)l[[i]]=i},
`list at index, initialize with length`={l=vector('list',n);for(i in 1:n)l[[i]]=i},
`list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
`list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)},
`nested lists`={l=list();for(i in 1:n)l=list(l,i)},
`nested lists with unlist`={if(n<=1e5){l=list();for(i in 1:n)l=list(l,i);o=unlist(l)}},
`nested lists with manual unlist`={l=list();for(i in 1:n)l=list(l,i);o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}},
`JanKanis better_env_as_container`={env=new.env(hash=T,parent=globalenv());for(i in 1:n)env[[as.character(i)]]=i},
`JanKanis inlineLinkedList`={a=list();for(i in 1:n)a=list(a,i);b=vector('list',n);head=a;for(i in n:1){b[[i]]=head[[2]];head=head[[1]]}},
`JanKanis inlineExpandingList`={l=vector('list',10);cap=10;len=0;for(i in 1:n){if(len==cap){l=c(l,vector('list',cap));cap=cap*2};len=len+1;l[[len]]=i};l[1:len]},
`c`={if(n<=1e5){l=c();for(i in 1:n)l=c(l,i)}},
`append vector`={if(n<=1e5){l=integer(n);for(i in 1:n)l=append(l,i)}},
`append list`={if(n<=1e9){l=list();for(i in 1:n)l=append(l,i)}}
)[,1])
m[rownames(m)%in%c("nested lists with unlist","c","append vector","append list"),4:5]=NA
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))
输出:
1e3 1e4 1e5 1e6 1e7
2.35 24.5 245 2292 27146 vector at length + 1
0.61 5.9 60 590 7360 vector at index
0.61 5.9 64 587 7132 vector at index, initialize with type
0.56 5.6 54 523 6418 vector at index, initialize with length
0.54 5.5 55 522 6371 vector at index, initialize with type and length
2.65 28.8 299 3955 48204 list at length + 1
0.93 9.2 96 1605 13480 list at index
0.58 5.6 57 707 8461 list at index, initialize with length
0.62 5.8 59 739 9413 list at index, initialize with double length, remove null
0.88 8.4 81 962 11872 list at index, double when full, get length from variable
0.96 9.5 92 1264 15813 list at index, double when full, check length inside loop
0.21 1.9 22 426 3826 nested lists
0.25 2.4 29 NA NA nested lists with unlist
2.85 27.5 295 3065 31427 nested lists with manual unlist
1.65 20.2 293 6505 8835 JanKanis better_env_as_container
1.11 10.1 110 1534 27119 JanKanis inlineLinkedList
2.66 26.3 266 3592 47120 JanKanis inlineExpandingList
1.22 118.6 15466 NA NA c
3.64 512.0 45167 NA NA append vector
6.35 664.8 71399 NA NA append list
上表显示的是每种方法的中值时间,而不是平均时间,因为有时单个运行的时间比典型运行的时间长得多,这会扭曲平均运行时间。但是在第一次运行之后的后续运行中,没有一种方法变得更快,因此每种方法的最小时间和中值时间通常是相似的。
The method "vector at index" (l=c();for(i in 1:n)l[i]=i) was about 5 times faster than "vector at length + 1" (l=c();for(i in 1:n)l[length(l)]=i), because getting the length of the vector took longer than adding an element to the vector. When I initialized the vector with a predetermined length, it made the code about 20% faster, but initializing with a specific type didn't make a difference, because the type just needs to be changed once when the first item is added to the vector. And in the case of lists, when you compare the methods "list at index" and "list at index initialized with length", initializing the list with a predetermined length made a bigger difference as the length of the list increased, because it made the code about twice as fast at length 1e6 but about 3 times as fast at length 1e7.
方法“list at index”(l=list();for(i in 1:n)l[[i]]=i)比方法“list at length +1”(l=list();for(i in 1:n)l[[length(l)+1]]=i)快3-4倍。
JanKanis的链表和扩展列表方法比“索引列表”慢,但比“长度+ 1列表”快。链表比扩展表快。
有些人声称append函数比c函数快,但在我的基准测试中,append大约比c慢3-4倍。
在上表中,长度1e6和1e7和在三个方法中缺失:对于“c”,“append vector”和“append list”,因为它们具有二次时间复杂度,以及对于“带有unlist的嵌套列表”,因为它会导致堆栈溢出。
The "nested lists" option was the fastest, but it doesn't include the time that it takes to flatten the list. When I used the unlist function to flatten the nested list, I got a stack overflow when the length of the list was around 1.26e5 or higher, because the unlist function calls itself recursively by default: n=1.26e5;l=list();for(i in 1:n)l=list(l,list(i));u=unlist(l). And when I used repeated calls of unlist(recursive=F), it took about 4 seconds to run even for a list with only 10,000 items: for(i in 1:n)l=unlist(l,recursive=F). But when I did the unlisting manually, it only took about 0.3 seconds to run for a list with a million items: o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}.
If you don't know how many items you are going to append to a list in advance but you know the maximum number of items, then you can try to initialize the list at the maximum length and then later remove NULL values. Or another approach is to double the size of the list every time the list becomes full (which you can do faster if you have one variable for the length of the list and another variable for the number of items you have added to the list, so then you don't have to check the length of the list object on each iteration of a loop):
ns=10^c(2:7)
m=sapply(ns,function(n)bench(n=5,
`list at index`={l=list();for(i in 1:n)l[[i]]=i},
`list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
`list at index, initialize with length`={l=vector("list",n);for(i in 1:n)l[[i]]=i},
`list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, initialize with length 1e7, remove null`={l=vector("list",1e7);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, initialize with length 1e8, remove null`={l=vector("list",1e8);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
`list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)}
)[,1])
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))
输出:
1e4 1e5 1e6 1e7
9.3 102 1225 13250 list at index
27.4 315 3820 45920 list at length + 1
5.7 58 726 7548 list at index, initialize with length
5.8 60 748 8057 list at index, initialize with double length, remove null
33.4 88 902 7684 list at index, initialize with length 1e7, remove null
333.2 393 2691 12245 list at index, initialize with length 1e8, remove null
8.6 83 1032 10611 list at index, double when full, get length from variable
9.3 96 1280 14319 list at index, double when full, check length inside loop