softmax 函数
又称为 normalized exponential function:is a generalization of the logistic function that “squashes” a K-dimensional vector z \mathbf {z} z of arbitrary real values to a K-dimensional vector σ ( z ) \sigma (\mathbf {z} ) σ(z) of real values in the range [0, 1] that add up to 1. The function is given by
σ ( z ) j = e z j ∑ k = 1 K e z k f o r j = 1 , … , K . \sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}}}{\sum _{k=1}^{K}e^{z_{k}}}} for j = 1, …, K. σ(z)j=∑k=1Kezkezjforj=1,…,K.
很显然,这个式子将一个n维的张量输入转化为n维的数,其中每个数的范围为0-1,所有数加起来为1。可以理解为为一种概率分布(probability distribution),比如一个多 label 的分类任务(比如手写字符识别0-9),其结果对应着分类结果为j的概率。
In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over K different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution.[further explanation needed]
The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression)[1]:206–209 [1], multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks.[2] Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of K distinct linear functions, and the predicted probability for the j’th class given a sample vector x and a weighting vector w[further explanation needed] is:
下面这个函数是通过向量版的softmax,与之前不同的是这里的x、w是特定维数的向量,输入的向量都在k维的空间中。
P
(
y
=
j
∣
x
)
=
e
x
T
w
j
∑
k
=
1
K
e
x
T
w
k
{ P(y=j\mid \mathbf {x} )={\frac {e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{j}}}{\sum _{k=1}^{K}e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{k}}}}}
P(y=j∣x)=∑k=1KexTwkexTwj
log_softmax函数
softmax函数在前面加入log可以使原softmax产生的值域x缩小到log(x),使用logsoftmax有更好的数值上的特性。
f
i
(
x
)
=
l
o
g
(
e
x
p
(
x
i
)
/
s
u
m
j
e
x
p
(
x
j
)
)
f_i(x) = log(exp(x_i) / sum_j exp(x_j) )
fi(x)=log(exp(xi)/sumjexp(xj))