人们使用什么技巧来管理交互式R会话的可用内存?我使用下面的函数[基于Petr Pikal和David Hinds在2004年发布的r-help列表]来列出(和/或排序)最大的对象,并偶尔rm()其中一些对象。但到目前为止最有效的解决办法是……在64位Linux下运行,有充足的内存。

大家还有什么想分享的妙招吗?请每人寄一份。

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

当前回答

Tip for dealing with objects requiring heavy intermediate calculation: When using objects that require a lot of heavy calculation and intermediate steps to create, I often find it useful to write a chunk of code with the function to create the object, and then a separate chunk of code that gives me the option either to generate and save the object as an rmd file, or load it externally from an rmd file I have already previously saved. This is especially easy to do in R Markdown using the following code-chunk structure.

```{r Create OBJECT}

COMPLICATED.FUNCTION <- function(...) { Do heavy calculations needing lots of memory;
                                        Output OBJECT; }

```
```{r Generate or load OBJECT}

LOAD <- TRUE
SAVE <- TRUE
#NOTE: Set LOAD to TRUE if you want to load saved file
#NOTE: Set LOAD to FALSE if you want to generate the object from scratch
#NOTE: Set SAVE to TRUE if you want to save the object externally

if(LOAD) { 
  OBJECT <- readRDS(file = 'MySavedObject.rds') 
} else {
  OBJECT <- COMPLICATED.FUNCTION(x, y, z)
  if (SAVE) { saveRDS(file = 'MySavedObject.rds', object = OBJECT) } }

```

With this code structure, all I need to do is to change LOAD depending on whether I want to generate the object, or load it directly from an existing saved file. (Of course, I have to generate it and save it the first time, but after this I have the option of loading it.) Setting LOAD <- TRUE bypasses use of my complicated function and avoids all of the heavy computation therein. This method still requires enough memory to store the object of interest, but it saves you from having to calculate it each time you run your code. For objects that require a lot of heavy calculation of intermediate steps (e.g., for calculations involving loops over large arrays) this can save a substantial amount of time and computation.

其他回答

For both speed and memory purposes, when building a large data frame via some complex series of steps, I'll periodically flush it (the in-progress data set being built) to disk, appending to anything that came before, and then restart it. This way the intermediate steps are only working on smallish data frames (which is good as, e.g., rbind slows down considerably with larger objects). The entire data set can be read back in at the end of the process, when all the intermediate objects have been removed.

dfinal <- NULL
first <- TRUE
tempfile <- "dfinal_temp.csv"
for( i in bigloop ) {
    if( !i %% 10000 ) { 
        print( i, "; flushing to disk..." )
        write.table( dfinal, file=tempfile, append=!first, col.names=first )
        first <- FALSE
        dfinal <- NULL   # nuke it
    }

    # ... complex operations here that add data to 'dfinal' data frame  
}
print( "Loop done; flushing to disk and re-reading entire data set..." )
write.table( dfinal, file=tempfile, append=TRUE, col.names=FALSE )
dfinal <- read.table( tempfile )

我从不保存R工作区。我使用导入脚本和数据脚本,并将我不想经常重新创建的任何特别大的数据对象输出到文件。这样,我总是从一个新的工作空间开始,不需要清理大的物体。这是一个很好的函数。

确保在可重复的脚本中记录您的工作。不时地重新打开R,然后source()您的脚本。您将清除不再使用的任何东西,作为一个额外的好处,您将测试您的代码。

这是对这个优秀的老问题的一个新的回答。来自哈德利的高级R:

install.packages("pryr")

library(pryr)

object_size(1:10)
## 88 B

object_size(mean)
## 832 B

object_size(mtcars)
## 6.74 kB

(http://adv-r.had.co.nz/memory.html)

Tip for dealing with objects requiring heavy intermediate calculation: When using objects that require a lot of heavy calculation and intermediate steps to create, I often find it useful to write a chunk of code with the function to create the object, and then a separate chunk of code that gives me the option either to generate and save the object as an rmd file, or load it externally from an rmd file I have already previously saved. This is especially easy to do in R Markdown using the following code-chunk structure.

```{r Create OBJECT}

COMPLICATED.FUNCTION <- function(...) { Do heavy calculations needing lots of memory;
                                        Output OBJECT; }

```
```{r Generate or load OBJECT}

LOAD <- TRUE
SAVE <- TRUE
#NOTE: Set LOAD to TRUE if you want to load saved file
#NOTE: Set LOAD to FALSE if you want to generate the object from scratch
#NOTE: Set SAVE to TRUE if you want to save the object externally

if(LOAD) { 
  OBJECT <- readRDS(file = 'MySavedObject.rds') 
} else {
  OBJECT <- COMPLICATED.FUNCTION(x, y, z)
  if (SAVE) { saveRDS(file = 'MySavedObject.rds', object = OBJECT) } }

```

With this code structure, all I need to do is to change LOAD depending on whether I want to generate the object, or load it directly from an existing saved file. (Of course, I have to generate it and save it the first time, but after this I have the option of loading it.) Setting LOAD <- TRUE bypasses use of my complicated function and avoids all of the heavy computation therein. This method still requires enough memory to store the object of interest, but it saves you from having to calculate it each time you run your code. For objects that require a lot of heavy calculation of intermediate steps (e.g., for calculations involving loops over large arrays) this can save a substantial amount of time and computation.