让我们假设我们有一个数据集,它大概是

import numpy as np
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2

Therefore we have a variation of 20% of the dataset. My first idea was to use the UnivariateSpline function of scipy, but the problem is that this does not consider the small noise in a good way. If you consider the frequencies, the background is much smaller than the signal, so a spline only of the cutoff might be an idea, but that would involve a back and forth fourier transformation, which might result in bad behaviour. Another way would be a moving average, but this would also need the right choice of the delay.

有什么提示/书籍或链接可以解决这个问题吗?


当前回答

编辑:看看这个答案。使用np。Cumsum比np.卷积快得多

我使用了一种快速而肮脏的方法来平滑数据,基于移动平均盒(通过卷积):

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

plot(x, y,'o')
plot(x, smooth(y,3), 'r-', lw=2)
plot(x, smooth(y,19), 'g-', lw=2)

其他回答

使用移动平均线,一种快速的方法(也适用于非双射函数)是

def smoothen(x, winsize=5):
    return np.array(pd.Series(x).rolling(winsize).mean())[winsize-1:]

此代码基于https://towardsdatascience.com/data-smoothing-for-data-science-visualization-the-goldilocks-trio-part-1-867765050615。文中还讨论了更先进的解决方案。

你也可以用这个:

def smooth(scalars, weight = 0.8):  # Weight between 0 and 1
        return [scalars[i] * weight + (1 - weight) * scalars[i+1] for i in range(len(scalars)) if i < len(scalars)-1]

在scipy中有一个简单的函数。Ndimage也适用于我:

from scipy.ndimage import uniform_filter1d

y_smooth = uniform_filter1d(y,size=15)

编辑:看看这个答案。使用np。Cumsum比np.卷积快得多

我使用了一种快速而肮脏的方法来平滑数据,基于移动平均盒(通过卷积):

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

plot(x, y,'o')
plot(x, smooth(y,3), 'r-', lw=2)
plot(x, smooth(y,19), 'g-', lw=2)

另一个选择是在statmodel中使用KernelReg:

from statsmodels.nonparametric.kernel_regression import KernelReg
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2

# The third parameter specifies the type of the variable x;
# 'c' stands for continuous
kr = KernelReg(y,x,'c')
plt.plot(x, y, '+')
y_pred, y_std = kr.fit(x)

plt.plot(x, y_pred)
plt.show()