让我们假设我们有一个数据集,它大概是
import numpy as np
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
Therefore we have a variation of 20% of the dataset. My first idea was to use the UnivariateSpline function of scipy, but the problem is that this does not consider the small noise in a good way. If you consider the frequencies, the background is much smaller than the signal, so a spline only of the cutoff might be an idea, but that would involve a back and forth fourier transformation, which might result in bad behaviour.
Another way would be a moving average, but this would also need the right choice of the delay.
有什么提示/书籍或链接可以解决这个问题吗?
编辑:看看这个答案。使用np。Cumsum比np.卷积快得多
我使用了一种快速而肮脏的方法来平滑数据,基于移动平均盒(通过卷积):
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
plot(x, y,'o')
plot(x, smooth(y,3), 'r-', lw=2)
plot(x, smooth(y,19), 'g-', lw=2)
编辑:看看这个答案。使用np。Cumsum比np.卷积快得多
我使用了一种快速而肮脏的方法来平滑数据,基于移动平均盒(通过卷积):
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
plot(x, y,'o')
plot(x, smooth(y,3), 'r-', lw=2)
plot(x, smooth(y,19), 'g-', lw=2)
如果你对周期信号的“平滑”版本感兴趣(就像你的例子),那么FFT是正确的方法。进行傅里叶变换并减去低贡献频率:
import numpy as np
import scipy.fftpack
N = 100
x = np.linspace(0,2*np.pi,N)
y = np.sin(x) + np.random.random(N) * 0.2
w = scipy.fftpack.rfft(y)
f = scipy.fftpack.rfftfreq(N, x[1]-x[0])
spectrum = w**2
cutoff_idx = spectrum < (spectrum.max()/5)
w2 = w.copy()
w2[cutoff_idx] = 0
y2 = scipy.fftpack.irfft(w2)
即使你的信号不是完全周期性的,这也能很好地去除白噪声。有许多类型的过滤器可以使用(高通,低通,等等…),合适的一个取决于你正在寻找什么。
为你的数据拟合一个移动平均线可以消除噪音,看看这个如何做到这一点的答案。
如果你想使用LOWESS来拟合你的数据(它类似于移动平均,但更复杂),你可以使用statmodels库:
import numpy as np
import pylab as plt
import statsmodels.api as sm
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
lowess = sm.nonparametric.lowess(y, x, frac=0.1)
plt.plot(x, y, '+')
plt.plot(lowess[:, 0], lowess[:, 1])
plt.show()
最后,如果你知道信号的函数形式,你就可以为你的数据拟合曲线,这可能是最好的办法。