我试着用R来计算矩阵中一系列值的移动平均值。R中似乎没有一个内置函数可以让我计算移动平均线。有任何软件包提供这样的服务吗?还是需要我自己写?
当前回答
在数据。表1.12.0增加了新的滚动平均值函数,以计算快速和准确的滚动平均值,仔细处理NA, NaN和+Inf, -Inf值。
由于在这个问题中没有可重复的例子,所以在这里没有更多的问题要解决。
你可以在手册中找到更多关于?frollmean的信息,也可以在?frollmean网站上找到。
下面是手册中的例子:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
其他回答
动物园包中的滚动平均值/最大值/中位数(rollmean) TTR中的移动平均线 马云在预测
编辑:非常喜欢添加侧参数,例如,一个日期向量的过去7天的移动平均值(或总和,或…)。
对于那些只想自己计算的人来说,它无非是:
# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))
for (i in seq_len(length(x))) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- mean(x[ind])
}
y
但是让它独立于mean()会很有趣,所以你可以计算任何“移动”函数!
# our working horse:
moving_fn <- function(x, w, fun, ...) {
# x = vector with numeric data
# w = window length
# fun = function to apply
# side = side to take, (c)entre, (l)eft or (r)ight
# ... = parameters passed on to 'fun'
y <- numeric(length(x))
for (i in seq_len(length(x))) {
if (side %in% c("c", "centre", "center")) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
} else if (side %in% c("l", "left")) {
ind <- c((i - floor(w) + 1):i)
} else if (side %in% c("r", "right")) {
ind <- c(i:(i + floor(w) - 1))
} else {
stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
}
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- fun(x[ind], ...)
}
y
}
# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}
moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}
moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}
moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}
moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}
moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
滑块包可以用于此。它有一个专门设计的界面,感觉类似呜呜声。它接受任何任意函数,并可以返回任何类型的输出。数据帧甚至按行迭代。pkgdown网站在这里。
library(slider)
x <- 1:3
# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5
df <- data.frame(x = x, y = x)
# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#> x y
#> 1 1 1
#>
#> [[2]]
#> x y
#> 1 1 1
#> 2 2 2
#>
#> [[3]]
#> x y
#> 1 2 2
#> 2 3 3
滑块和数据的开销。Table的frollapply()应该非常低(比zoo快得多)。对于这个简单的示例,Frollapply()看起来稍微快一些,但请注意,它只接受数字输入,并且输出必须是标量数值。滑块函数是完全通用的,你可以返回任何数据类型。
library(slider)
library(zoo)
library(data.table)
x <- 1:50000 + 0L
bench::mark(
slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
datatable = frollapply(x, n = 6, FUN = function(x) 1L),
iterations = 200
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slider 19.82ms 26.4ms 38.4 829.8KB 19.0
#> 2 zoo 177.92ms 211.1ms 4.71 17.9MB 24.8
#> 3 datatable 7.78ms 10.9ms 87.9 807.1KB 38.7
虽然有点慢,但你也可以使用zoo::rollapply在矩阵上执行计算。
reqd_ma <- rollapply(x, FUN = mean, width = n)
其中x为数据集,FUN = mean为函数;你也可以改变它为min, max, sd等,宽度是滚动窗口。
为了配合坎迪奇西斯和罗德里戈·雷麦黛奥的回答;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)