Sigmoid/Logistic function and softmax without overflow

最新推荐文章于 2024-07-23 16:40:54 发布

xceman1997

最新推荐文章于 2024-07-23 16:40:54 发布

阅读量4.8k

点赞数

分类专栏： DL

DL 专栏收录该内容

49 篇文章 0 订阅

订阅专栏

在看一些资料，常提到sigmoid函数和softmax函数，两者区别？应用场景？转载的这篇文章可供参考。

原文地址：http://www.appliedcuriosity.eu/blog/post/22/sigmoidlogistic-function-and-softmax-without-overflow

Friday, May 10, 2013 15:16

I’m currently working on machine learning applications, and a problem that is rarely mentioned in papers, but occurs frequently in practice is numerical overflows in thesigmoid (aka logit) function and in its big sister, softmax.

Sigmoid

As a reminder: σ(x)=11+exp(−x)

Its derivative: ddxσ(x)=(1−σ(x))∗σ(x)

The problem here is exp, which quickly goes to infinity, even though the result ofσ is restricted to the interval [0, 1]. The solution: The sigmoid can be expressed in terms of tanh: σ(x)=12(1+tanh(x2)) .

Softmax

Softmax, which is defined as softmaxi(a)=exp(ai)∑jexp(aj) (where a is a vector), is a little more complicated. The key here is to expresssoftmax in terms of the logsumexp function: logsumexp(a) = log(∑ _iexpa_i), for which good, non-overflowing implementations are usually available.

Then, we have softmax(a) = exp(a − logsumexp(a)).

As a bonus: The vector of partial derivatives / the gradient of softmax is analogous to the sigmoid, i.e ∂∂aisoftmax(a)=(1−softmaxi(a))∗softmaxi(a) .

最后来使自己填两句吧。

sigmoid和softmax的函数形式很像。如果不强调“概率”，则sigmoid就够了，而且sigmoid的导数简单，使得网络的学习算法简单。如果强调“概率”（最显然的是要满足归一化条件），则用softmax。softmax是神经网络针对多分类问题准备的，由softmax组成的三层神经网络，用作多分类分类器，也称作softmax regression。

xceman1997

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Sigmoid/Logistic function and softmax without overflow

在看一些资料，常提到sigmoid函数和softmax函数，两者区别？应用场景？转载的这篇文章可供参考。原文地址：http://www.appliedcuriosity.eu/blog/post/22/sigmoidlogistic-function-and-softmax-without-overflowFriday, May 10, 2013 15:16
复制链接

扫一扫

专栏目录