李宏毅ML lecture-4,5 分类以及逻辑回归
交叉熵(cross-entropy)
http://neuralnetworksanddeeplearning.com/chap3.html#the_cross-entropy_cost_function
首先定义正向传播算法:
(1)
z
=
w
x
+
b
z=wx+b \tag{1}
z=wx+b(1)
(2)
a
=
σ
(
z
)
a= \sigma(z) \tag{2}
a=σ(z)(2)
定义loss为交叉熵:
(1)
C
=
−
1
n
∑
x
[
y
ln
a
+
(
1
−
y
)
ln
(
1
−
a
)
]
C = -\frac{1}{n} \sum_x \left[y \ln a + (1-y ) \ln (1-a) \right] \tag{1}
C=−n1x∑[ylna+(1−y)ln(1−a)](1)
应用链式法则计算反向传播:
∂
C
∂
w
j
=
−
1
n
∑
x
∂
C
∂
a
∂
a
∂
z
∂
z
∂
w
j
=
−
1
n
∑
x
(
y
a
−
(
1
−
y
)
1
−
a
)
σ
′
(
z
)
x
j
.
=
1
n
∑
x
(
a
−
y
)
σ
′
(
z
)
x
j
a
(
1
−
a
)
\frac{\partial C}{\partial w_j} = -\frac{1}{n} \sum_x \frac{\partial C}{\partial a}\frac{\partial a}{\partial z}\frac{\partial z}{\partial w_j} \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =-\frac{1}{n} \sum_x \left( \frac{y }{a} -\frac{(1-y)}{1-a} \right) \sigma'(z) x_j. \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{n} \sum_x \frac{(a-y)\sigma'(z) x_j}{a (1-a)}
∂wj∂C=−n1x∑∂a∂C∂z∂a∂wj∂z =−n1x∑(ay−1−a(1−y))σ′(z)xj. =n1x∑a(1−a)(a−y)σ′(z)xj
已知sigmod函数的导数:
σ
′
(
z
)
=
σ
(
z
)
(
1
−
σ
(
z
)
)
=
a
(
1
−
a
)
\sigma'(z)=\sigma(z)(1-\sigma(z))=a(1-a)
σ′(z)=σ(z)(1−σ(z))=a(1−a)
所以:
∂
C
∂
w
j
=
=
1
n
∑
x
(
σ
(
z
)
−
y
)
x
j
\frac{\partial C}{\partial w_j} = = \frac{1}{n} \sum_x (\sigma(z)-y) x_j
∂wj∂C==n1x∑(σ(z)−y)xj
同理可得:
∂
C
∂
b
=
=
1
n
∑
x
(
a
−
y
)
\frac{\partial C}{\partial b} = = \frac{1}{n} \sum_x (a-y)
∂b∂C==n1x∑(a−y)
为什么在分类的问题需要使用交叉熵而不是欧式距离呢?
C
=
(
y
−
a
)
2
2
,
C = \frac{(y-a)^2}{2},
C=2(y−a)2,
∂
C
∂
w
=
(
a
−
y
)
σ
′
(
z
)
x
=
\frac{\partial C}{\partial w} = (a-y)\sigma'(z) x=
∂w∂C=(a−y)σ′(z)x=
∂
C
∂
b
=
(
a
−
y
)
σ
′
(
z
)
\frac{\partial C}{\partial b} = (a-y)\sigma'(z)
∂b∂C=(a−y)σ′(z)
答案就在梯度公式中,