R会话中管理可用内存的技巧

人们使用什么技巧来管理交互式R会话的可用内存?我使用下面的函数[基于Petr Pikal和David Hinds在2004年发布的r-help列表]来列出(和/或排序)最大的对象，并偶尔rm()其中一些对象。但到目前为止最有效的解决办法是……在64位Linux下运行，有充足的内存。

大家还有什么想分享的妙招吗?请每人寄一份。

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

当前回答

I'm fortunate and my large data sets are saved by the instrument in "chunks" (subsets) of roughly 100 MB (32bit binary). Thus I can do pre-processing steps (deleting uninformative parts, downsampling) sequentially before fusing the data set. Calling gc () "by hand" can help if the size of the data get close to available memory. Sometimes a different algorithm needs much less memory. Sometimes there's a trade off between vectorization and memory use. compare: split & lapply vs. a for loop. For the sake of fast & easy data analysis, I often work first with a small random subset (sample ()) of the data. Once the data analysis script/.Rnw is finished data analysis code and the complete data go to the calculation server for over night / over weekend / ... calculation.

2012-05-22 13:12:00

其他回答

除了以上回答中给出的更通用的内存管理技术外，我总是尽可能地减小对象的大小。例如，我处理非常大但非常稀疏的矩阵，换句话说，大多数值为零的矩阵。使用“矩阵”包(大写很重要)，我能够将我的平均对象大小从~2GB减小到~200MB，简单如下:

my.matrix <- Matrix(my.matrix)

Matrix包包含的数据格式可以像常规矩阵一样使用(不需要更改其他代码)，但能够更有效地存储稀疏数据，无论是加载到内存中还是保存到磁盘中。

此外，我收到的原始文件是“长”格式的，其中每个数据点都有变量x, y, z, I。将数据转换为只有变量I的x * y * z维度数组更有效。

了解你的数据并使用一些常识。

2016-03-31 15:22:43

我喜欢Dirk的.ls.objects()脚本，但我总是眯着眼睛数大小列中的字符。所以我做了一些丑陋的hack，使它呈现出漂亮的格式大小:

.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.prettysize <- sapply(obj.size, function(r) prettyNum(r, big.mark = ",") )
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size,obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
        out <- out[c("Type", "PrettySize", "Rows", "Columns")]
        names(out) <- c("Type", "Size", "Rows", "Columns")
    if (head)
        out <- head(out, n)
    out
}

2010-03-09 15:59:45

如果您正在Linux上工作，希望使用多个进程，并且只需要对一个或多个大对象执行读取操作，请使用makeForkCluster而不是makePSOCKcluster。这也节省了将大对象发送给其他进程的时间。

2017-11-14 19:56:14

确保在可重复的脚本中记录您的工作。不时地重新打开R，然后source()您的脚本。您将清除不再使用的任何东西，作为一个额外的好处，您将测试您的代码。

2009-08-31 16:09:59

这是个好把戏。

另一个建议是尽可能使用内存效率高的对象:例如，使用矩阵而不是data.frame。

这并没有真正解决内存管理问题，但是一个不为人所知的重要函数是memory.limit()。可以使用memory.limit(size=2500)命令增加默认值，这里的大小以MB为单位。正如Dirk提到的，为了真正利用这一点，您需要使用64位。

2009-08-31 19:08:45

R会话中管理可用内存的技巧

推荐文章

最新文章

标签