Python中是否有SciPy函数或NumPy函数或模块来计算给定特定窗口的1D数组的运行平均值?
当前回答
我的解决方案是基于维基百科上的“简单移动平均”。
from numba import jit
@jit
def sma(x, N):
s = np.zeros_like(x)
k = 1 / N
s[0] = x[0] * k
for i in range(1, N + 1):
s[i] = s[i - 1] + x[i] * k
for i in range(N, x.shape[0]):
s[i] = s[i - 1] + (x[i] - x[i - N]) * k
s = s[N - 1:]
return s
与之前建议的解决方案相比,它比scipy最快的解决方案“uniform_filter1d”快两倍,并且具有相同的错误顺序。 速度测试:
import numpy as np
x = np.random.random(10000000)
N = 1000
from scipy.ndimage.filters import uniform_filter1d
%timeit uniform_filter1d(x, size=N)
95.7 ms ± 9.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sma(x, N)
47.3 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
错误的比较:
np.max(np.abs(np.convolve(x, np.ones((N,))/N, mode='valid') - uniform_filter1d(x, size=N, mode='constant', origin=-(N//2))[:-(N-1)]))
8.604228440844963e-14
np.max(np.abs(np.convolve(x, np.ones((N,))/N, mode='valid') - sma(x, N)))
1.41886502547095e-13
其他回答
出于教学目的,让我再添加两个Numpy解决方案(比cumsum解决方案慢):
import numpy as np
from numpy.lib.stride_tricks import as_strided
def ra_strides(arr, window):
''' Running average using as_strided'''
n = arr.shape[0] - window + 1
arr_strided = as_strided(arr, shape=[n, window], strides=2*arr.strides)
return arr_strided.mean(axis=1)
def ra_add(arr, window):
''' Running average using add.reduceat'''
n = arr.shape[0] - window + 1
indices = np.array([0, window]*n) + np.repeat(np.arange(n), 2)
arr = np.append(arr, 0)
return np.add.reduceat(arr, indices )[::2]/window
使用的函数:as_strided, add.reduceat
我觉得使用瓶颈可以很好地解决这个问题
参见下面的基本示例:
import numpy as np
import bottleneck as bn
a = np.random.randint(4, 1000, size=100)
mm = bn.move_mean(a, window=5, min_count=1)
“mm”是“a”的移动平均值。 “窗口”是考虑移动均值的最大条目数。 "min_count"是考虑移动平均值的最小条目数(例如,对于前几个元素或如果数组有nan值)。
好在瓶颈有助于处理nan值,而且非常高效。
对于一个简短、快速的解决方案,在一个循环中完成所有事情,没有依赖关系,下面的代码工作得很好。
mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []
for i, x in enumerate(mylist, 1):
cumsum.append(cumsum[i-1] + x)
if i>=N:
moving_ave = (cumsum[i] - cumsum[i-N])/N
#can do stuff with moving_ave here
moving_aves.append(moving_ave)
更新:下面的例子展示了老熊猫。Rolling_mean函数,该函数在最近版本的pandas中已被删除。该函数调用的现代等价函数将使用pandas.Series.rolling:
In [8]: pd.Series(x).rolling(window=N).mean().iloc[N-1:].values
Out[8]:
array([ 0.49815397, 0.49844183, 0.49840518, ..., 0.49488191,
0.49456679, 0.49427121])
pandas比NumPy或SciPy更适合这一点。它的函数rolling_mean很方便地完成了这项工作。当输入是一个数组时,它还返回一个NumPy数组。
使用任何定制的纯Python实现都很难在性能上击败rolling_mean。下面是针对两个提议的解决方案的性能示例:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: def running_mean(x, N):
...: cumsum = np.cumsum(np.insert(x, 0, 0))
...: return (cumsum[N:] - cumsum[:-N]) / N
...:
In [4]: x = np.random.random(100000)
In [5]: N = 1000
In [6]: %timeit np.convolve(x, np.ones((N,))/N, mode='valid')
10 loops, best of 3: 172 ms per loop
In [7]: %timeit running_mean(x, N)
100 loops, best of 3: 6.72 ms per loop
In [8]: %timeit pd.rolling_mean(x, N)[N-1:]
100 loops, best of 3: 4.74 ms per loop
In [9]: np.allclose(pd.rolling_mean(x, N)[N-1:], running_mean(x, N))
Out[9]: True
关于如何处理边缘值,也有很好的选项。
我还没有检查这有多快,但你可以试试:
from collections import deque
cache = deque() # keep track of seen values
n = 10 # window size
A = xrange(100) # some dummy iterable
cum_sum = 0 # initialize cumulative sum
for t, val in enumerate(A, 1):
cache.append(val)
cum_sum += val
if t < n:
avg = cum_sum / float(t)
else: # if window is saturated,
cum_sum -= cache.popleft() # subtract oldest value
avg = cum_sum / float(n)