在接下来的TensorFlow函数中,我们必须在最后一层中输入人工神经元的激活。我能理解。但我不明白为什么叫logits?这不是一个数学函数吗?

loss_function = tf.nn.softmax_cross_entropy_with_logits(
     logits = last_layer,
     labels = target_output
)

当前回答

总结

在深度学习的上下文中,logits层指的是输入到softmax(或其他类似的规范化)的层。softmax的输出是分类任务的概率,其输入是logits层。logits层通常生成从-∞到+∞的值,而softmax层将其转换为从0到1的值。

历史背景

Where does this term comes from? In 1930s and 40s, several people were trying to adapt linear regression to the problem of predicting probabilities. However linear regression produces output from -infinity to +infinity while for probabilities our desired output is 0 to 1. One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to +infinity and then use linear regression as usual. One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this "probit" model, short for "probability unit". However this function is computationally expensive while lacking some of the desirable properties for multi-class classification. In 1944 Joseph Berkson used the function log(p/(1-p)) to do this mapping and called it logit, short for "logistic unit". The term logistic regression derived from this as well.

的混乱

不幸的是,logits这个术语在深度学习中被滥用了。从纯数学的角度来看,logit是执行上述映射的函数。在深度学习中,人们开始把输入logit函数的层称为“logits层”。然后人们开始称这一层的输出值为“logit”,造成了与函数logit的混淆。

TensorFlow代码

Unfortunately TensorFlow code further adds in to confusion by names like tf.nn.softmax_cross_entropy_with_logits. What does logits mean here? It just means the input of the function is supposed to be the output of last neuron layer as described above. The _with_logits suffix is redundant, confusing and pointless. Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains. In fact TensorFlow has another similar function sparse_softmax_cross_entropy where they fortunately forgot to add _with_logits suffix creating inconsistency and adding in to confusion. PyTorch on the other hand simply names its function without these kind of suffixes.

参考

Logit/Probit讲座幻灯片是理解Logit最好的资源之一。我也更新了维基百科的文章与上述的一些信息。

其他回答

logit (/ o . oʊdʒɪt/ LOH-jit)函数是数学,特别是统计学中使用的s型“逻辑”函数或逻辑变换的逆函数。当函数的变量表示概率p时,logit函数给出log-odds,或p/(1 - p)的对数。

请看这里:https://en.wikipedia.org/wiki/Logit

以下是一个简明的答案,供将来的读者参考。Tensorflow的logit被定义为不应用激活函数的神经元输出:

logit = w*x + b,

X:输入,w:权重,b:偏差。就是这样。


以下内容与这个问题无关。

关于历史课程,请阅读其他答案。向Tensorflow“创造性地”令人困惑的命名惯例致敬。在PyTorch中,只有一个CrossEntropyLoss,它接受未激活的输出。卷积、矩阵乘法和激活都是同一层次的运算。设计更加模块化,更少混乱。这也是我从Tensorflow转向PyTorch的原因之一。

总结

在深度学习的上下文中,logits层指的是输入到softmax(或其他类似的规范化)的层。softmax的输出是分类任务的概率,其输入是logits层。logits层通常生成从-∞到+∞的值,而softmax层将其转换为从0到1的值。

历史背景

Where does this term comes from? In 1930s and 40s, several people were trying to adapt linear regression to the problem of predicting probabilities. However linear regression produces output from -infinity to +infinity while for probabilities our desired output is 0 to 1. One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to +infinity and then use linear regression as usual. One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this "probit" model, short for "probability unit". However this function is computationally expensive while lacking some of the desirable properties for multi-class classification. In 1944 Joseph Berkson used the function log(p/(1-p)) to do this mapping and called it logit, short for "logistic unit". The term logistic regression derived from this as well.

的混乱

不幸的是,logits这个术语在深度学习中被滥用了。从纯数学的角度来看,logit是执行上述映射的函数。在深度学习中,人们开始把输入logit函数的层称为“logits层”。然后人们开始称这一层的输出值为“logit”,造成了与函数logit的混淆。

TensorFlow代码

Unfortunately TensorFlow code further adds in to confusion by names like tf.nn.softmax_cross_entropy_with_logits. What does logits mean here? It just means the input of the function is supposed to be the output of last neuron layer as described above. The _with_logits suffix is redundant, confusing and pointless. Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains. In fact TensorFlow has another similar function sparse_softmax_cross_entropy where they fortunately forgot to add _with_logits suffix creating inconsistency and adding in to confusion. PyTorch on the other hand simply names its function without these kind of suffixes.

参考

Logit/Probit讲座幻灯片是理解Logit最好的资源之一。我也更新了维基百科的文章与上述的一些信息。

Logits是一个重载的术语,可以有很多不同的含义:


在数学中,Logit是一个将概率([0,1])映射到R ((-inf, inf))的函数。

概率0.5对应logit为0。负logit对应概率小于0.5,正到>等于0.5。

在ML中,它可以是

原始向量(非规格化)的预测即一种分类 生成模型,然后通常将其传递给规范化 函数。如果模型正在解决一个多类分类 问题是,对数通常成为softmax函数的输入。的 然后,Softmax函数生成一个(标准化)概率向量 每个可能的类都有一个值。

logit有时也指sigmoid函数的元素逆。

个人理解,在TensorFlow领域,logits是用作softmax输入的值。我是在这个张量流教程的基础上得到这个理解的。

https://www.tensorflow.org/tutorials/layers


虽然logit确实是数学(尤其是统计学)中的一个函数,但我不认为这是你所看到的那个“logit”。在Ian Goodfellow的《深度学习》一书中,他提到,

函数σ−1(x)在统计学中被称为logit,但这个术语 很少用于机器学习。σ−1(x)为 logistic s型函数的逆函数。

在TensorFlow中,它经常被视为最后一层的名称。在Aurélien Géron的《使用Scikit-learn和TensorFLow进行动手机器学习》一书的第10章中,我看到了这段话,其中清楚地说明了logits层。

注意,logits是神经网络在运行之前的输出 通过softmax激活函数:出于优化原因,我们 稍后将处理softmax计算。

也就是说,虽然我们在设计的最后一层使用了softmax作为激活函数,但是为了计算方便,我们分别取出了logits。这是因为同时计算软最大和交叉熵损失效率更高。记住,交叉熵是一个代价函数,不用于正向传播。