人们使用什么技巧来管理交互式R会话的可用内存?我使用下面的函数[基于Petr Pikal和David Hinds在2004年发布的r-help列表]来列出(和/或排序)最大的对象,并偶尔rm()其中一些对象。但到目前为止最有效的解决办法是……在64位Linux下运行,有充足的内存。

大家还有什么想分享的妙招吗?请每人寄一份。

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

当前回答

这是对这个优秀的老问题的一个新的回答。来自哈德利的高级R:

install.packages("pryr")

library(pryr)

object_size(1:10)
## 88 B

object_size(mean)
## 832 B

object_size(mtcars)
## 6.74 kB

(http://adv-r.had.co.nz/memory.html)

其他回答

确保在可重复的脚本中记录您的工作。不时地重新打开R,然后source()您的脚本。您将清除不再使用的任何东西,作为一个额外的好处,您将测试您的代码。

我在推特上看到了这个,觉得德克的功能太棒了!根据JD Long的回答,为了方便用户阅读,我会这样做:

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.prettysize <- napply(names, function(x) {
                           format(utils::object.size(x), units = "auto") })
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
    
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

lsos()

结果如下:

                      Type   Size PrettySize Length/Rows Columns
pca.res                 PCA 790128   771.6 Kb          7      NA
DF               data.frame 271040   264.7 Kb        669      50
factor.AgeGender   factanal  12888    12.6 Kb         12      NA
dates            data.frame   9016     8.8 Kb        669       2
sd.                 numeric   3808     3.7 Kb         51      NA
napply             function   2256     2.2 Kb         NA      NA
lsos               function   1944     1.9 Kb         NA      NA
load               loadings   1768     1.7 Kb         12       2
ind.sup             integer    448  448 bytes        102      NA
x                 character     96   96 bytes          1      NA

注:我补充的主要部分是(再次改编自JD的回答):

obj.prettysize <- napply(names, function(x) {
                           print(object.size(x), units = "auto") })

为了进一步说明频繁重启的常见策略,我们可以使用littler,它允许我们直接从命令行运行简单的表达式。这里有一个例子,我有时会用不同的BLAS为一个简单的交叉刺计时。

 r -e'N<-3*10^3; M<-matrix(rnorm(N*N),ncol=N); print(system.time(crossprod(M)))'

同样的,

 r -lMatrix -e'example(spMatrix)'

加载Matrix包(通过——packages | -l开关)并运行spMatrix函数的示例。由于总是“新鲜”开始,这个方法在包开发过程中也是一个很好的测试。

最后但并非最不重要的是,r在脚本中使用'#!/usr/bin/r shebang-header。Rscript是little不可用的替代方案(例如在Windows上)。

这并没有增加上面的内容,而是以我喜欢的简单和大量注释的风格编写的。它生成一个对象大小排序表,但没有上面例子中给出的一些细节:

#Find the objects       
MemoryObjects = ls()    
#Create an array
MemoryAssessmentTable=array(NA,dim=c(length(MemoryObjects),2))
#Name the columns
colnames(MemoryAssessmentTable)=c("object","bytes")
#Define the first column as the objects
MemoryAssessmentTable[,1]=MemoryObjects
#Define a function to determine size        
MemoryAssessmentFunction=function(x){object.size(get(x))}
#Apply the function to the objects
MemoryAssessmentTable[,2]=t(t(sapply(MemoryAssessmentTable[,1],MemoryAssessmentFunction)))
#Produce a table with the largest objects first
noquote(MemoryAssessmentTable[rev(order(as.numeric(MemoryAssessmentTable[,2]))),])

I'm fortunate and my large data sets are saved by the instrument in "chunks" (subsets) of roughly 100 MB (32bit binary). Thus I can do pre-processing steps (deleting uninformative parts, downsampling) sequentially before fusing the data set. Calling gc () "by hand" can help if the size of the data get close to available memory. Sometimes a different algorithm needs much less memory. Sometimes there's a trade off between vectorization and memory use. compare: split & lapply vs. a for loop. For the sake of fast & easy data analysis, I often work first with a small random subset (sample ()) of the data. Once the data analysis script/.Rnw is finished data analysis code and the complete data go to the calculation server for over night / over weekend / ... calculation.