Softmax
题目:
softmax:
实际上是有限项离散概率分布的梯度对数归一化,是LR在多分类上的推广:
P
(
y
=
i
∣
x
;
θ
)
=
e
θ
i
T
x
∑
j
=
1
K
e
θ
j
T
x
P(y=i|x;\theta)=\frac{e^{{\theta}^T_ix}}{\sum^K_{j=1}e^{{\theta}^T_jx}}
P(y=i∣x;θ)=∑j=1KeθjTxeθiTx
损失函数:
l
(
θ
)
=
−
1
m
[
∑
i
=
1
m
∑
j
=
1
K
1
{
y
(
i
)
=
j
}
l
o
g
e
θ
i
T
x
∑
j
=
1
K
e
θ
j
T
x
]
l(\theta)=-\frac{1}{m}[\sum^m_{i=1}\sum^{K}_{j=1}1\{{y^{(i)}=j}\}log\frac{e^{{\theta}^T_ix}}{\sum^K_{j=1}e^{{\theta}^T_jx}}]
l(θ)=−m1[i=1∑mj=1∑K1{y(i)=j}log∑j=1KeθjTxeθiTx]
m个样本,K个类别。
1
{
y
(
i
)
=
j
}
1\{{y^{(i)}=j}\}
1{y(i)=j}表示当
y
(
i
)
y^{(i)}
y(i)样本类别等于j时,取1.
证明:
s
o
f
t
m
a
x
(
x
+
c
)
i
=
e
x
p
(
x
i
+
c
)
∑
j
=
1
d
i
m
e
n
s
i
o
n
(
x
)
e
x
p
(
x
j
+
c
)
softmax(x+c)_i=\frac{exp(x_i+c)}{\sum^{dimension(x)}_{j=1}exp(x_j+c)}
softmax(x+c)i=∑j=1dimension(x)exp(xj+c)exp(xi+c)
=
e
x
p
(
x
i
)
e
x
p
(
c
)
e
x
p
(
c
)
∑
j
=
1
d
i
m
e
n
s
i
o
n
(
x
)
e
x
p
(
x
j
)
=\frac{exp(x_i)exp(c)}{exp(c)\sum^{dimension(x)}_{j=1}exp(x_j)}
=exp(c)∑j=1dimension(x)exp(xj)exp(xi)exp(c)
=
e
x
p
(
x
i
)
∑
j
=
1
d
i
m
e
n
s
i
o
n
(
x
)
e
x
p
(
x
j
)
=\frac{exp(x_i)}{\sum^{dimension(x)}_{j=1}exp(x_j)}
=∑j=1dimension(x)exp(xj)exp(xi)
=
s
o
f
t
m
a
x
(
x
)
=softmax(x)
=softmax(x)
这个等式说明了,向量的偏移(即 + c +c +c)不影响softmax的输出。
code:
import numpy as np
def softmax(x):
assert len(x.shape)>1 #x的维度一定要大于1
x=x-np.max(x,axis=1,keepdims=True)
x=np.exp(x)/np.sum(np.exp(x),axis=1,keepdims=True)
return x
if __name__=='__main__':
matrix=np.arange(0,30,2)
matrix=matrix.reshape(3,5)
#for i in range(len(matrix)):
print(softmax(matrix))