为什么LogSoftmax比Softmax更好？

coder1479

已于 2022-03-15 12:17:37 修改

阅读量3.3k

点赞数 4

分类专栏：深度学习文章标签： pytorch 深度学习人工智能

于 2022-03-15 10:32:56 首次发布

深度学习专栏收录该内容

24 篇文章 1 订阅

订阅专栏

前言

本文翻译自pytorch论坛。

有网友问到：

我本以为nn.log_softmax 将提供与 nn.softmax 相同的性能。因为它只是加了一个log，但它（LogSoftmax）似乎提供了更好的结果。
对此有什么解释吗？

从笔者的实验来看，log_softmax的确提供了更好的性能，网上有人说是计算上的trick，但在pytorch论坛上的一个高赞答案似乎更让人信服。

原文地址：
https://discuss.pytorch.org/t/logsoftmax-vs-softmax/21386
原作者：aplassard（Andrew Plassard）。

LogSoftmax和Softmax对比

I’m not sure if there is a definitive answer to why this works better, but to provide some insight, its worth noting that using the log-likelihood is very common in statistics. Here are some references on the use of the log-likelihood [1] 282, [2] 92, [3] 76.

我不确定是否有明确的答案来解释为什么这（LogSoftmax）更好，但为了提供一些见解，值得注意的是，使用对数似然在统计学中非常普遍。以下是一些关于使用对数似然 [1] 282、 [2] 92、 [3] 76 的参考资料。
[1] : https://blog.metaflow.fr/ml-notes-why-the-log-likelihood-24f7b6c40f83
[2] : https://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood
[3] : https://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution

One key point to notice is, depending on your loss function, this fundamentally changes the calculation. Let’s consider a case were your true class is 1 and your model estimates the probability of the true class is .9. If you loss function is the L1 Loss function, the value of the loss function is 0.1. On the other hand, if you are using the log-likelihood then the value of the loss function is 0.105 (assuming natural log).

需要注意的一个关键点是，根据你的损失函数，这从根本上改变了计算方式。让我们考虑一个情况，即您的真类是 1，并且您的模型估计真类的概率为 0.9。如果损失函数是 L1 损失函数，则损失函数的值为 0.1。另一方面，如果您使用的是对数似然，则损失函数的值为 0.105（假设自然对数）。

On the other hand, if your estimated probability is 0.3 and you are using the likelihood function the value of your loss function is 0.7. If you are using the log-likelihood function the value of your loss function is 1.20.

另一方面，如果您的估计概率为 0.3，并且您使用的是似然函数，则损失函数的值为 0.7。如果您使用的是对数似然函数，则损失函数的值为 1.20。

Now if we consider these two cases, using the standard likelihood function (akin to softmax), the error increases by a factor of 7 (.7/.1) between those two examples. Using the log-likelihood function (akin to log-softmax) the error increases by a factor of ~11 (1.20/.105).

现在，如果我们使用标准似然函数（类似于softmax）考虑这两种情况，则这两个示例之间的误差增加了7倍（0.7/0 .1）。使用对数似然函数（类似于对数软最大值），误差增加约11 倍（1.20/0.105）。

The point is, even though logsoftmax and softmax are monotonic, their effect on the relative values of the loss function changes. Using the log-softmax will punish bigger mistakes in likelihood space higher.