Softmax
Softmax 函数接收一个这N维向量(或者MxN维的数组,M代表样本数,N代表类别数)作为输入,然后把每一维的值转换成(0,1)之间的一个实数,公式如下:
p
i
=
e
a
i
∑
k
=
1
N
e
a
k
p_{i}=\frac{e^{a_{i}}}{\sum_{k=1}^{N} e^{a_{k}}}
pi=∑k=1Neakeai
为保持数值稳定,避免出现nan
情况,一般对输入向量做归一化处理,数值稳定的Softmax 公式如下:
p
i
=
e
a
i
−
m
a
x
(
a
)
∑
k
=
1
N
e
a
k
−
m
a
x
(
a
)
p_{i}=\frac{e^{a_{i}-max(a)}}{\sum_{k=1}^{N} e^{a_{k}-max(a)}}
pi=∑k=1Neak−max(a)eai−max(a)
Softmax 函数的导数如下所示:
∂
p
j
∂
a
j
=
{
p
i
(
1
−
p
j
)
if
i
=
j
−
p
j
⋅
p
i
if
i
=
̸
j
\frac{\partial p_{j}}{\partial a_{j}}=\left\{\begin{array}{ll}{p_{i}\left(1-p_{j}\right)} & {\text { if } i=j} \\ {-p_{j} \cdot p_{i}} & {\text { if } i =\not j}\end{array}\right.
∂aj∂pj={pi(1−pj)−pj⋅pi if i=j if i≠j
CrossEntropy
CrossEntropy(交叉熵)通常作为Softmax分类的损失函数,即通常所说的交叉熵损失函数。交叉熵损失函数体现了模型输出的概率分布和真实样本的概率分布的相似程度。其定义公式如下,其中
y
i
y_{i}
yi代表One-hot 编码的标签:
L
=
H
(
y
,
p
)
=
−
∑
i
y
i
log
(
p
i
)
L=H(y, p)=-\sum_{i} y_{i} \log \left(p_{i}\right)
L=H(y,p)=−i∑yilog(pi)
交叉熵损失函数的导数如下所示:
∂
L
∂
o
i
=
p
i
−
y
i
\frac{\partial L}{\partial o_{i}}=p_{i}-y_{i}
∂oi∂L=pi−yi
Python实现代码如下:
# -*- coding: UTF-8 -*-
import numpy as np
def softmax(X):
"""Compute the softmax of output from classification layer.
Parameters
----------
X: list.
A array of M x N. M is the number of samples and N is the number of categories.
Returns
-------
rst: list.
The result of softmax.
"""
exps = np.exp(X-np.max(X, axis = 1).reshape(-1, 1))
rst = exps/np.sum(exps,axis=1).reshape(-1,1)
return rst
def cross_entropy(X,y):
"""Compute the cross entropy loss and grad of output from softmax.
Parameters
----------
X: list.
A array of M x N. M is the number of samples and N is the number of categories.
y: list.
A array of M x 1. The value is the label of GT.
Returns
-------
loss: float.
The loss of prediction and label.
grad: list.
The gradient produced by each element of the predicted value.
"""
m = len(y)
p = softmax(X)
log_likelihood = -np.log(p[range(m), y])
loss = np.sum(log_likelihood) / m
grad = p
grad[range(m), y] -= 1
grad = grad / m
return loss, grad
def main():
X = [[0.1, 1.5, -0.3, 2.2, 0.7],
[1.0, -2.3, 5.2, -0.1, 2.9],
[-3.5, -1.1, 3.7, 0.2, 2.6]]
y = [3,4,2]
rst = softmax(X)
print('softmax rst:\n',rst)
print('softmax check:\n',rst.sum(axis=1).reshape(-1,1))
loss, grad = cross_entropy(X,y)
print('loss:',loss)
print('grad:',grad)
if __name__ == "__main__":
main()