从Udacity的深度学习课程中,y_i的softmax仅仅是指数除以整个Y向量的指数之和:

其中S(y_i)是y_i的软最大函数e是指数函数j是no。输入向量Y中的列。

我试过以下几种方法:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回:

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出,尽管第一个实现显式地取每列与Max的差值,然后除以和。

有人能用数学方法解释一下吗?一个是对的,另一个是错的?

实现在代码和时间复杂度方面是否相似?哪个更有效率?


当前回答

在上面的回答中已经回答了很多细节。Max被减去以避免溢出。我在这里添加了python3中的另一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

其他回答

我的建议是:

def softmax(z):
    z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
    return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))

它既适用于随机,也适用于批量。 欲了解更多详情,请参阅: https://medium.com/@ravish1729/analysis-of-softmax-function-ad058d6a564d

我用了这三句简单的话:

x_exp=np.exp(x)
x_sum=np.sum(x_exp, axis = 1, keepdims = True)
s=x_exp / x_sum

softmax函数是一种激活函数,它将数字转换为和为1的概率。softmax函数输出一个向量,表示结果列表的概率分布。它也是深度学习分类任务中使用的核心元素。

当我们有多个类时,使用Softmax函数。

它对于找出有最大值的类很有用。概率。

Softmax函数理想地用于输出层,在那里我们实际上试图获得定义每个输入类的概率。

取值范围是0 ~ 1。

Softmax函数将对数[2.0,1.0,0.1]转换为概率[0.7,0.2,0.1],概率和为1。Logits是神经网络最后一层输出的原始分数。在激活发生之前。为了理解softmax函数,我们必须看看第(n-1)层的输出。

softmax函数实际上是一个arg max函数。这意味着它不会返回输入中的最大值,而是返回最大值的位置。

例如:

softmax之前

X = [13, 31, 5]

softmax后

array([1.52299795e-08, 9.99999985e-01, 5.10908895e-12]

代码:

import numpy as np

# your solution:

def your_softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum() 

# correct solution: 

def softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum(axis=0) 

# only difference

从数学的角度看,两边是相等的。

这很容易证明。m = max (x)。现在你的函数softmax返回一个向量,它的第i个坐标等于

注意,这适用于任何m,因为对于所有(甚至是复数)数e^m != 0

from computational complexity point of view they are also equivalent and both run in O(n) time, where n is the size of a vector. from numerical stability point of view, the first solution is preferred, because e^x grows very fast and even for pretty small values of x it will overflow. Subtracting the maximum value allows to get rid of this overflow. To practically experience the stuff I was talking about try to feed x = np.array([1000, 5]) into both of your functions. One will return correct probability, the second will overflow with nan your solution works only for vectors (Udacity quiz wants you to calculate it for matrices as well). In order to fix it you need to use sum(axis=0)

The purpose of the softmax function is to preserve the ratio of the vectors as opposed to squashing the end-points with a sigmoid as the values saturate (i.e. tend to +/- 1 (tanh) or from 0 to 1 (logistical)). This is because it preserves more information about the rate of change at the end-points and thus is more applicable to neural nets with 1-of-N Output Encoding (i.e. if we squashed the end-points it would be harder to differentiate the 1-of-N output class because we can't tell which one is the "biggest" or "smallest" because they got squished.); also it makes the total output sum to 1, and the clear winner will be closer to 1 while other numbers that are close to each other will sum to 1/p, where p is the number of output neurons with similar values.

从向量中减去最大值的目的是,当你计算e^y指数时,你可能会得到非常高的值,将浮点数夹在最大值处,导致平局,但在这个例子中不是这样。如果你减去最大值得到一个负数,那么就会出现一个大问题,然后你就会得到一个负指数,它会迅速缩小数值,改变比率,这就是在海报上的问题中发生的情况,并得到错误的答案。

Udacity提供的答案效率低得可怕。我们需要做的第一件事是计算所有向量分量的e^y_j, KEEP这些值,然后求和,然后除。Udacity搞砸的地方是他们计算了两次e^y_j !!正确答案如下:

def softmax(y):
    e_to_the_y_j = np.exp(y)
    return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)