人们使用什么技巧来管理交互式R会话的可用内存?我使用下面的函数[基于Petr Pikal和David Hinds在2004年发布的r-help列表]来列出(和/或排序)最大的对象,并偶尔rm()其中一些对象。但到目前为止最有效的解决办法是……在64位Linux下运行,有充足的内存。

大家还有什么想分享的妙招吗?请每人寄一份。

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

当前回答

在将数据框架传递给回归函数的data=参数时,我积极地使用子集参数,只选择所需的变量。如果我忘记向公式和select=向量添加变量,确实会导致一些错误,但由于减少了对象的复制,它仍然节省了大量时间,并显著减少了内存占用。假设我有400万条记录和110个变量(我确实有)。例子:

# library(rms); library(Hmisc) for the cph,and rcs functions
Mayo.PrCr.rbc.mdl <- 
cph(formula = Surv(surv.yr, death) ~ age + Sex + nsmkr + rcs(Mayo, 4) + 
                                     rcs(PrCr.rat, 3) +  rbc.cat * Sex, 
     data = subset(set1HLI,  gdlab2 & HIVfinal == "Negative", 
                           select = c("surv.yr", "death", "PrCr.rat", "Mayo", 
                                      "age", "Sex", "nsmkr", "rbc.cat")
   )            )

通过设置上下文和策略:gdlab2变量是一个逻辑向量,它是为一组实验室测试的所有正常或几乎正常值的数据集中的主题构建的,而HIVfinal是一个字符向量,总结了艾滋病毒的初步和确认测试。

其他回答

我喜欢Dirk的.ls.objects()脚本,但我总是眯着眼睛数大小列中的字符。所以我做了一些丑陋的hack,使它呈现出漂亮的格式大小:

.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.prettysize <- sapply(obj.size, function(r) prettyNum(r, big.mark = ",") )
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size,obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
        out <- out[c("Type", "PrettySize", "Rows", "Columns")]
        names(out) <- c("Type", "Size", "Rows", "Columns")
    if (head)
        out <- head(out, n)
    out
}

Unfortunately I did not have time to test it extensively but here is a memory tip that I have not seen before. For me the required memory was reduced with more than 50%. When you read stuff into R with for example read.csv they require a certain amount of memory. After this you can save them with save("Destinationfile",list=ls()) The next time you open R you can use load("Destinationfile") Now the memory usage might have decreased. It would be nice if anyone could confirm whether this produces similar results with a different dataset.

Tip for dealing with objects requiring heavy intermediate calculation: When using objects that require a lot of heavy calculation and intermediate steps to create, I often find it useful to write a chunk of code with the function to create the object, and then a separate chunk of code that gives me the option either to generate and save the object as an rmd file, or load it externally from an rmd file I have already previously saved. This is especially easy to do in R Markdown using the following code-chunk structure.

```{r Create OBJECT}

COMPLICATED.FUNCTION <- function(...) { Do heavy calculations needing lots of memory;
                                        Output OBJECT; }

```
```{r Generate or load OBJECT}

LOAD <- TRUE
SAVE <- TRUE
#NOTE: Set LOAD to TRUE if you want to load saved file
#NOTE: Set LOAD to FALSE if you want to generate the object from scratch
#NOTE: Set SAVE to TRUE if you want to save the object externally

if(LOAD) { 
  OBJECT <- readRDS(file = 'MySavedObject.rds') 
} else {
  OBJECT <- COMPLICATED.FUNCTION(x, y, z)
  if (SAVE) { saveRDS(file = 'MySavedObject.rds', object = OBJECT) } }

```

With this code structure, all I need to do is to change LOAD depending on whether I want to generate the object, or load it directly from an existing saved file. (Of course, I have to generate it and save it the first time, but after this I have the option of loading it.) Setting LOAD <- TRUE bypasses use of my complicated function and avoids all of the heavy computation therein. This method still requires enough memory to store the object of interest, but it saves you from having to calculate it each time you run your code. For objects that require a lot of heavy calculation of intermediate steps (e.g., for calculations involving loops over large arrays) this can save a substantial amount of time and computation.

当我在一个有很多中间步骤的大型项目中工作时,我会尽量减少对象的数量。而不是创建许多唯一的对象

Dataframe -> step1 -> step2 -> step3 -> result

raster->多pliedrast -> meanRastF -> sqrtRast -> resultRast

我使用临时对象,我称之为temp。

Dataframe -> temp -> temp -> temp -> result

这样就少了一些中间文件,多了一些概览。

raster  <- raster('file.tif')
temp <- raster * 10
temp <- mean(temp)
resultRast <- sqrt(temp)

为了节省更多内存,我可以在不再需要时简单地删除temp。

rm(temp)

如果我需要几个中间文件,我使用temp1, temp2, temp3。

对于测试,我使用test, test2,…

我在推特上看到了这个,觉得德克的功能太棒了!根据JD Long的回答,为了方便用户阅读,我会这样做:

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.prettysize <- napply(names, function(x) {
                           format(utils::object.size(x), units = "auto") })
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
    
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

lsos()

结果如下:

                      Type   Size PrettySize Length/Rows Columns
pca.res                 PCA 790128   771.6 Kb          7      NA
DF               data.frame 271040   264.7 Kb        669      50
factor.AgeGender   factanal  12888    12.6 Kb         12      NA
dates            data.frame   9016     8.8 Kb        669       2
sd.                 numeric   3808     3.7 Kb         51      NA
napply             function   2256     2.2 Kb         NA      NA
lsos               function   1944     1.9 Kb         NA      NA
load               loadings   1768     1.7 Kb         12       2
ind.sup             integer    448  448 bytes        102      NA
x                 character     96   96 bytes          1      NA

注:我补充的主要部分是(再次改编自JD的回答):

obj.prettysize <- napply(names, function(x) {
                           print(object.size(x), units = "auto") })