R有许多在帮助文件中有巧妙描述的*apply函数(例如?apply)。但是,它们太多了,初学者可能很难决定哪一个适合他们的情况,甚至很难记住它们。他们可能有一个普遍的感觉,“我应该在这里使用一个*apply函数”,但一开始很难把它们都说清楚。
尽管事实上(在其他回答中提到)*apply系列的大部分功能都由非常流行的plyr包覆盖,但基本函数仍然有用,值得了解。
这个答案旨在作为新用户的一个路标,帮助他们针对特定的问题找到正确的*apply函数。注意,这不是为了简单地复制或替换R文档!希望这个答案能帮助您决定哪个*apply函数适合您的情况,然后由您进一步研究。除了一个例外,性能差异将不予处理。
apply - When you want to apply a function to the rows or columns
of a matrix (and higher-dimensional analogues); not generally advisable for data frames as it will coerce to a matrix first.
# Two dimensional matrix
M <- matrix(seq(1,16), 4, 4)
# apply min to rows
apply(M, 1, min)
[1] 1 2 3 4
# apply max to columns
apply(M, 2, max)
[1] 4 8 12 16
# 3 dimensional array
M <- array( seq(32), dim = c(4,4,2))
# Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension
apply(M, 1, sum)
# Result is one-dimensional
[1] 120 128 136 144
# Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension
apply(M, c(1,2), sum)
# Result is two-dimensional
[,1] [,2] [,3] [,4]
[1,] 18 26 34 42
[2,] 20 28 36 44
[3,] 22 30 38 46
[4,] 24 32 40 48
If you want row/column means or sums for a 2D matrix, be sure to
investigate the highly optimized, lightning-quick colMeans,
rowMeans, colSums, rowSums.
lapply - When you want to apply a function to each element of a
list in turn and get a list back.
This is the workhorse of many of the other *apply functions. Peel
back their code and you will often find lapply underneath.
x <- list(a = 1, b = 1:3, c = 10:100)
lapply(x, FUN = length)
$a
[1] 1
$b
[1] 3
$c
[1] 91
lapply(x, FUN = sum)
$a
[1] 1
$b
[1] 6
$c
[1] 5005
sapply - When you want to apply a function to each element of a
list in turn, but you want a vector back, rather than a list.
If you find yourself typing unlist(lapply(...)), stop and consider
sapply.
x <- list(a = 1, b = 1:3, c = 10:100)
# Compare with above; a named vector, not a list
sapply(x, FUN = length)
a b c
1 3 91
sapply(x, FUN = sum)
a b c
1 6 5005
In more advanced uses of sapply it will attempt to coerce the
result to a multi-dimensional array, if appropriate. For example, if our function returns vectors of the same length, sapply will use them as columns of a matrix:
sapply(1:5,function(x) rnorm(3,x))
If our function returns a 2 dimensional matrix, sapply will do essentially the same thing, treating each returned matrix as a single long vector:
sapply(1:5,function(x) matrix(x,2,2))
Unless we specify simplify = "array", in which case it will use the individual matrices to build a multi-dimensional array:
sapply(1:5,function(x) matrix(x,2,2), simplify = "array")
Each of these behaviors is of course contingent on our function returning vectors or matrices of the same length or dimension.
vapply - When you want to use sapply but perhaps need to
squeeze some more speed out of your code or want more type safety.
For vapply, you basically give R an example of what sort of thing
your function will return, which can save some time coercing returned
values to fit in a single atomic vector.
x <- list(a = 1, b = 1:3, c = 10:100)
#Note that since the advantage here is mainly speed, this
# example is only for illustration. We're telling R that
# everything returned by length() should be an integer of
# length 1.
vapply(x, FUN = length, FUN.VALUE = 0L)
a b c
1 3 91
mapply - For when you have several data structures (e.g.
vectors, lists) and you want to apply a function to the 1st elements
of each, and then the 2nd elements of each, etc., coercing the result
to a vector/array as in sapply.
This is multivariate in the sense that your function must accept
multiple arguments.
#Sums the 1st elements, the 2nd elements, etc.
mapply(sum, 1:5, 1:5, 1:5)
[1] 3 6 9 12 15
#To do rep(1,4), rep(2,3), etc.
mapply(rep, 1:4, 4:1)
[[1]]
[1] 1 1 1 1
[[2]]
[1] 2 2 2
[[3]]
[1] 3 3
[[4]]
[1] 4
Map - A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.
Map(sum, 1:5, 1:5, 1:5)
[[1]]
[1] 3
[[2]]
[1] 6
[[3]]
[1] 9
[[4]]
[1] 12
[[5]]
[1] 15
rapply - For when you want to apply a function to each element of a nested list structure, recursively.
To give you some idea of how uncommon rapply is, I forgot about it when first posting this answer! Obviously, I'm sure many people use it, but YMMV. rapply is best illustrated with a user-defined function to apply:
# Append ! to string, otherwise increment
myFun <- function(x){
if(is.character(x)){
return(paste(x,"!",sep=""))
}
else{
return(x + 1)
}
}
#A nested list structure
l <- list(a = list(a1 = "Boo", b1 = 2, c1 = "Eeek"),
b = 3, c = "Yikes",
d = list(a2 = 1, b2 = list(a3 = "Hey", b3 = 5)))
# Result is named vector, coerced to character
rapply(l, myFun)
# Result is a nested list like l, with values altered
rapply(l, myFun, how="replace")
tapply - For when you want to apply a function to subsets of a
vector and the subsets are defined by some other vector, usually a
factor.
The black sheep of the *apply family, of sorts. The help file's use of
the phrase "ragged array" can be a bit confusing, but it is actually
quite simple.
A vector:
x <- 1:20
A factor (of the same length!) defining groups:
y <- factor(rep(letters[1:5], each = 4))
Add up the values in x within each subgroup defined by y:
tapply(x, y, sum)
a b c d e
10 26 42 58 74
More complex examples can be handled where the subgroups are defined
by the unique combinations of a list of several factors. tapply is
similar in spirit to the split-apply-combine functions that are
common in R (aggregate, by, ave, ddply, etc.) Hence its
black sheep status.