在python中是否有用于均方根误差(RMSE)的库函数?

我知道我可以实现这样一个均方根误差函数:

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

如果这个rmse函数是在某个库中实现的，可能是在scipy或scikit-learn中，我在寻找什么?

当前回答

基准

对于不需要开销处理程序并且总是期望numpy数组输入的特定用例，最快的方法是手动在numpy中编写函数。更重要的是，如果频繁调用它，可以使用numba来加快速度。

import numpy as np
from numba import jit
from sklearn.metrics import mean_squared_error

%%timeit
mean_squared_error(y[i],y[j], squared=False)

445 µs ± 90.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

def euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.sqrt(((y1-y2)**2).mean())

%%timeit
euclidian_distance(y[i],y[j])

28.8 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@jit(nopython=True)
def jit_euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.sqrt(((y1-y2)**2).mean())

%%timeit
jit_euclidian_distance(y[i],y[j])

2.1 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@jit(nopython=True)
def jit2_euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.linalg.norm(y1-y2)/np.sqrt(y1.shape[0])

%%timeit
jit2_euclidian_distance(y[i],y[j])

2.67 µs ± 60.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

额外注意:在我的用例中，numba在np.sqrt(((y1-y2)**2).mean()上给出的结果略有不同，但可以忽略不计，其中没有numba，结果将等于scipy结果。你自己试试。

2021-12-29 14:44:58

其他回答

或者只使用NumPy函数:

def rmse(y, y_pred):
    return np.sqrt(np.mean(np.square(y - y_pred)))

地点:

Y是我的目标 Y_pred是我的预测

注意，由于平方函数，rmse(y, y_pred)==rmse(y_pred, y)。

2019-05-18 08:26:44

基准

import numpy as np
from numba import jit
from sklearn.metrics import mean_squared_error

%%timeit
mean_squared_error(y[i],y[j], squared=False)

445 µs ± 90.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

def euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.sqrt(((y1-y2)**2).mean())

%%timeit
euclidian_distance(y[i],y[j])

28.8 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@jit(nopython=True)
def jit_euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.sqrt(((y1-y2)**2).mean())

%%timeit
jit_euclidian_distance(y[i],y[j])

2.1 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@jit(nopython=True)
def jit2_euclidian_distance(y1, y2):
    """
    RMS Euclidean method
    """
    return np.linalg.norm(y1-y2)/np.sqrt(y1.shape[0])

%%timeit
jit2_euclidian_distance(y[i],y[j])

2.67 µs ± 60.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

额外注意:在我的用例中，numba在np.sqrt(((y1-y2)**2).mean()上给出的结果略有不同，但可以忽略不计，其中没有numba，结果将等于scipy结果。你自己试试。

2021-12-29 14:44:58

在scikit-learn 0.22.0中，您可以将参数squared=False传递给mean_squared_error()以返回RMSE。

from sklearn.metrics import mean_squared_error
mean_squared_error(y_actual, y_predicted, squared=False)

2020-01-26 16:38:42

什么是RMSE?也称为MSE, RMD或RMS。它能解决什么问题?

如果你理解RMSE:(均方根误差)，MSE:(均方误差)RMD(均方根偏差)和RMS:(均方根平方)，那么要求一个库来为你计算这是不必要的过度工程。所有这些都可以直观地写在一行代码中。Rmse mse rmd和RMS是同一事物的不同名称。

RMSE回答:“平均而言，list1和list2中的数字有多相似?”两个列表的大小必须相同。洗掉任何两个给定元素之间的噪声，洗掉收集到的数据的大小，并得到一个单一的数字结果”。

RMSE的直观和ELI5。它能解决什么问题?：

想象一下你正在学习在飞镖板上投掷飞镖。每天练习一小时。你想知道你是变好了还是变坏了。所以你每天投10次，然后测量靶心和你的飞镖击中的地方之间的距离。

你把这些数字列成一个列表。使用第一天的距离与包含全零的列表2之间的均方根误差。在第2天和第n天做同样的事情。你得到的是一个随时间递减的数字。当你的RMSE为零时，你每次都能击中靶心。如果rmse值上升，情况就会变得更糟。

在python中计算均方根误差的例子:

import numpy as np
d = [0.000, 0.166, 0.333]   #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998]   #your performance goes here

print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))

打印:

d is: ['0.00000000', '0.16600000', '0.33300000']
p is: ['0.00000000', '0.25400000', '0.99800000']
rms error between lists d and p is: 0.387284994115

数学符号:

符号说明:n是一个正整数，表示投掷的次数。I表示一个完整的正整数计数器，枚举sum。D代表理想距离，上面例子中的list2包含所有的零。P代表性能，即上面例子中的list1。上标2代表数字的平方。Di是d的第i个指标，PI是p的第i个指标。

rmse以小步骤完成，因此可以理解为:

def rmse(predictions, targets):

    differences = predictions - targets                       #the DIFFERENCEs.

    differences_squared = differences ** 2                    #the SQUAREs of ^

    mean_of_differences_squared = differences_squared.mean()  #the MEAN of ^

    rmse_val = np.sqrt(mean_of_differences_squared)           #ROOT of ^

    return rmse_val                                           #get the ^

RMSE的每一步是如何工作的:

用一个数减去另一个数就得到它们之间的距离。

8 - 5 = 3         #absolute distance between 8 and 5 is +3
-20 - 10 = -30    #absolute distance between -20 and 10 is +30

如果你用任何一个数乘以它自己，结果总是正的，因为负数乘以负数是正的:

3*3     = 9   = positive
-30*-30 = 900 = positive

把它们都加起来，但是等一下，一个有很多元素的数组会比一个小数组有更大的误差，所以用它们的元素数量求平均值。

但是我们之前把它们都平方了，使它们都是正的。用平方根消除伤害。

这样就只剩下一个数字，它平均表示list1的每个值与其对应的list2的元素值之间的距离。

如果RMSE值随着时间下降，我们很高兴，因为方差在减小。这里的“缩小方差”是一种原始的机器学习算法。

RMSE不是最精确的直线拟合策略，总最小二乘是:

均方根误差测量点和线之间的垂直距离，所以如果你的数据形状像香蕉，底部平坦，顶部陡峭，那么RMSE将报告到高点的距离更大，但到低点的距离更短，而实际上距离是相等的。这就导致了一个倾斜，即该线更倾向于接近高的点而不是低的点。

如果这是一个问题，总最小二乘法解决这个问题: https://mubaris.com/posts/linear-regression

可以破坏RMSE函数的陷阱:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn't. Adding random noise on a best guess could be preferred if there are lots of missing values.

为了保证RMSE输出的相对正确性，必须从输入中消除所有null /无穷大。

RMSE对不属于它的异常数据点具有零容忍

均方根误差平方依赖于所有数据都是正确的，并且都被视为相等。这意味着在左外野的一个偏离点将完全破坏整个计算。要处理异常值数据点并在某个阈值后消除它们的巨大影响，请参见鲁棒估计器，该估计器构建了一个阈值，将异常值排除为极端罕见事件，不需要其古怪的结果来改变我们的行为。

2016-06-16 14:17:14

Sklearn >= 0.22.0

sklearn。metrics有一个带有平方kwarg的mean_squared_error函数(默认为True)。将平方设置为False将返回RMSE。

from sklearn.metrics import mean_squared_error

rms = mean_squared_error(y_actual, y_predicted, squared=False)

Sklearn < 0.22.0

sklearn。Metrics有一个mean_squared_error函数。RMSE就是它返回值的平方根。

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(y_actual, y_predicted))

2013-09-04 20:56:57

在python中是否有用于均方根误差(RMSE)的库函数?

推荐文章

最新文章

标签