人们使用什么技巧来管理交互式R会话的可用内存?我使用下面的函数[基于Petr Pikal和David Hinds在2004年发布的r-help列表]来列出(和/或排序)最大的对象,并偶尔rm()其中一些对象。但到目前为止最有效的解决办法是……在64位Linux下运行,有充足的内存。
大家还有什么想分享的妙招吗?请每人寄一份。
# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
decreasing=FALSE, head=FALSE, n=5) {
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.size <- napply(names, object.size)
obj.dim <- t(napply(names, function(x)
as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size, obj.dim)
names(out) <- c("Type", "Size", "Rows", "Columns")
if (!missing(order.by))
out <- out[order(out[[order.by]], decreasing=decreasing), ]
if (head)
out <- head(out, n)
out
}
# shorthand
lsos <- function(..., n=10) {
.ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}
Tip for dealing with objects requiring heavy intermediate calculation: When using objects that require a lot of heavy calculation and intermediate steps to create, I often find it useful to write a chunk of code with the function to create the object, and then a separate chunk of code that gives me the option either to generate and save the object as an rmd file, or load it externally from an rmd file I have already previously saved. This is especially easy to do in R Markdown using the following code-chunk structure.
```{r Create OBJECT}
COMPLICATED.FUNCTION <- function(...) { Do heavy calculations needing lots of memory;
Output OBJECT; }
```
```{r Generate or load OBJECT}
LOAD <- TRUE
SAVE <- TRUE
#NOTE: Set LOAD to TRUE if you want to load saved file
#NOTE: Set LOAD to FALSE if you want to generate the object from scratch
#NOTE: Set SAVE to TRUE if you want to save the object externally
if(LOAD) {
OBJECT <- readRDS(file = 'MySavedObject.rds')
} else {
OBJECT <- COMPLICATED.FUNCTION(x, y, z)
if (SAVE) { saveRDS(file = 'MySavedObject.rds', object = OBJECT) } }
```
With this code structure, all I need to do is to change LOAD depending on whether I want to generate the object, or load it directly from an existing saved file. (Of course, I have to generate it and save it the first time, but after this I have the option of loading it.) Setting LOAD <- TRUE bypasses use of my complicated function and avoids all of the heavy computation therein. This method still requires enough memory to store the object of interest, but it saves you from having to calculate it each time you run your code. For objects that require a lot of heavy calculation of intermediate steps (e.g., for calculations involving loops over large arrays) this can save a substantial amount of time and computation.
使用knitr和将脚本放在Rmd块中也可以获得一些好处。
我通常将代码划分为不同的块,并选择将检查点保存到缓存或RDS文件中
在那里,你可以设置一个块被保存到“缓存”,或者你可以决定运行或不运行一个特定的块。这样,在第一次运行时,你只能处理“第一部分”,而在另一次执行时,你只能选择“第二部分”,等等。
例子:
part1
```{r corpus, warning=FALSE, cache=TRUE, message=FALSE, eval=TRUE}
corpusTw <- corpus(twitter) # build the corpus
```
part2
```{r trigrams, warning=FALSE, cache=TRUE, message=FALSE, eval=FALSE}
dfmTw <- dfm(corpusTw, verbose=TRUE, removeTwitter=TRUE, ngrams=3)
```
作为一个副作用,这也可以让你在可重复性方面省去一些麻烦:)
我喜欢Dirk的.ls.objects()脚本,但我总是眯着眼睛数大小列中的字符。所以我做了一些丑陋的hack,使它呈现出漂亮的格式大小:
.ls.objects <- function (pos = 1, pattern, order.by,
decreasing=FALSE, head=FALSE, n=5) {
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.size <- napply(names, object.size)
obj.prettysize <- sapply(obj.size, function(r) prettyNum(r, big.mark = ",") )
obj.dim <- t(napply(names, function(x)
as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size,obj.prettysize, obj.dim)
names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
if (!missing(order.by))
out <- out[order(out[[order.by]], decreasing=decreasing), ]
out <- out[c("Type", "PrettySize", "Rows", "Columns")]
names(out) <- c("Type", "Size", "Rows", "Columns")
if (head)
out <- head(out, n)
out
}
我非常喜欢Dirk开发的改进的对象函数。不过,大多数时候,一个包含对象名称和大小的更基本的输出对我来说就足够了。这是一个具有类似目标的简单函数。内存使用可以按字母顺序或大小排序,可以限制为一定数量的对象,并且可以按升序或降序排序。此外,我经常处理1GB以上的数据,因此该函数相应地改变单位。
showMemoryUse <- function(sort="size", decreasing=FALSE, limit) {
objectList <- ls(parent.frame())
oneKB <- 1024
oneMB <- 1048576
oneGB <- 1073741824
memoryUse <- sapply(objectList, function(x) as.numeric(object.size(eval(parse(text=x)))))
memListing <- sapply(memoryUse, function(size) {
if (size >= oneGB) return(paste(round(size/oneGB,2), "GB"))
else if (size >= oneMB) return(paste(round(size/oneMB,2), "MB"))
else if (size >= oneKB) return(paste(round(size/oneKB,2), "kB"))
else return(paste(size, "bytes"))
})
memListing <- data.frame(objectName=names(memListing),memorySize=memListing,row.names=NULL)
if (sort=="alphabetical") memListing <- memListing[order(memListing$objectName,decreasing=decreasing),]
else memListing <- memListing[order(memoryUse,decreasing=decreasing),] #will run if sort not specified or "size"
if(!missing(limit)) memListing <- memListing[1:limit,]
print(memListing, row.names=FALSE)
return(invisible(memListing))
}
下面是一些输出示例:
> showMemoryUse(decreasing=TRUE, limit=5)
objectName memorySize
coherData 713.75 MB
spec.pgram_mine 149.63 kB
stoch.reg 145.88 kB
describeBy 82.5 kB
lmBandpass 68.41 kB
我在推特上看到了这个,觉得德克的功能太棒了!根据JD Long的回答,为了方便用户阅读,我会这样做:
# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
decreasing=FALSE, head=FALSE, n=5) {
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.prettysize <- napply(names, function(x) {
format(utils::object.size(x), units = "auto") })
obj.size <- napply(names, object.size)
obj.dim <- t(napply(names, function(x)
as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
if (!missing(order.by))
out <- out[order(out[[order.by]], decreasing=decreasing), ]
if (head)
out <- head(out, n)
out
}
# shorthand
lsos <- function(..., n=10) {
.ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}
lsos()
结果如下:
Type Size PrettySize Length/Rows Columns
pca.res PCA 790128 771.6 Kb 7 NA
DF data.frame 271040 264.7 Kb 669 50
factor.AgeGender factanal 12888 12.6 Kb 12 NA
dates data.frame 9016 8.8 Kb 669 2
sd. numeric 3808 3.7 Kb 51 NA
napply function 2256 2.2 Kb NA NA
lsos function 1944 1.9 Kb NA NA
load loadings 1768 1.7 Kb 12 2
ind.sup integer 448 448 bytes 102 NA
x character 96 96 bytes 1 NA
注:我补充的主要部分是(再次改编自JD的回答):
obj.prettysize <- napply(names, function(x) {
print(object.size(x), units = "auto") })