从Udacity的深度学习课程中,y_i的softmax仅仅是指数除以整个Y向量的指数之和:

其中S(y_i)是y_i的软最大函数e是指数函数j是no。输入向量Y中的列。

我试过以下几种方法:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回:

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出,尽管第一个实现显式地取每列与Max的差值,然后除以和。

有人能用数学方法解释一下吗?一个是对的,另一个是错的?

实现在代码和时间复杂度方面是否相似?哪个更有效率?


当前回答

这将泛化并假设您正在规范化尾随维度。

def softmax(x: np.ndarray) -> np.ndarray:
    e_x = np.exp(x - np.max(x, axis=-1)[..., None])
    e_y = e_x.sum(axis=-1)[..., None]
    return e_x / e_y

其他回答

更简明的说法是:

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

这也适用于np. remodeling。

   def softmax( scores):
        """
        Compute softmax scores given the raw output from the model

        :param scores: raw scores from the model (N, num_classes)
        :return:
            prob: softmax probabilities (N, num_classes)
        """
        prob = None

        exponential = np.exp(
            scores - np.max(scores, axis=1).reshape(-1, 1)
        )  # subract the largest number https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
        prob = exponential / exponential.sum(axis=1).reshape(-1, 1)

        

        return prob

为了保持数值的稳定性,应减去max(x)。下面是softmax函数的代码;

def softmax (x):

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

编辑。从1.2.0版本开始,scipy包含了softmax作为一个特殊函数:

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

我写了一个在任意轴上应用softmax的函数:

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

正如其他用户所描述的那样,减去最大值是很好的做法。我在这里写了一篇详细的文章。

目标是使用Numpy和Tensorflow实现类似的结果。与原始答案的唯一变化是np的轴参数。和api。

初始方法:axis=0 -然而,当维度为N时,这并不能提供预期的结果。

修改方法:axis=len(e_x.shape)-1 -总是在最后一个维度上求和。这提供了与tensorflow的softmax函数类似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)