我试着用R来计算矩阵中一系列值的移动平均值。R中似乎没有一个内置函数可以让我计算移动平均线。有任何软件包提供这样的服务吗?还是需要我自己写?


当前回答

编辑:非常喜欢添加侧参数,例如,一个日期向量的过去7天的移动平均值(或总和,或…)。


对于那些只想自己计算的人来说,它无非是:

# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))

for (i in seq_len(length(x))) {
  ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
  ind <- ind[ind %in% seq_len(length(x))]
  y[i] <- mean(x[ind])
}

y

但是让它独立于mean()会很有趣,所以你可以计算任何“移动”函数!

# our working horse:
moving_fn <- function(x, w, fun, ...) {
  # x = vector with numeric data
  # w = window length
  # fun = function to apply
  # side = side to take, (c)entre, (l)eft or (r)ight
  # ... = parameters passed on to 'fun'
  y <- numeric(length(x))
  for (i in seq_len(length(x))) {
    if (side %in% c("c", "centre", "center")) {
      ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
    } else if (side %in% c("l", "left")) {
      ind <- c((i - floor(w) + 1):i)
    } else if (side %in% c("r", "right")) {
      ind <- c(i:(i + floor(w) - 1))
    } else {
      stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
    }
    ind <- ind[ind %in% seq_len(length(x))]
    y[i] <- fun(x[ind], ...)
  }
  y
}

# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}

moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}

moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}

moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}

moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}

moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}

其他回答

可以使用runner包来移动函数。在本例中是mean_run函数。cummean的问题是它不处理NA值,但mean_run可以。Runner包还支持不规则时间序列,Windows可以依赖于日期:

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

还可以指定其他选项,如延迟和仅在特定索引处滚动。更多内容在包和函数文档中。

动物园包中的滚动平均值/最大值/中位数(rollmean) TTR中的移动平均线 马云在预测

或者你可以简单地计算它使用过滤器,这是我使用的函数:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

如果使用dplyr,请注意在上面的函数中指定stats::filter。

下面的示例代码展示了如何使用zoo包中的rollmean函数计算居中移动平均和尾随移动平均。

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9
vector_avg <- function(x){
  sum_x = 0
  for(i in 1:length(x)){
    if(!is.na(x[i]))
      sum_x = sum_x + x[i]
  }
  return(sum_x/length(x))
}