梯度
凸函数可以找到全局最优解
影响搜索全局最优解的因素:
- 初始状态(权值需要初始化)
- 动量(摆脱局部最小值)
- 学习率(影响收敛速度与精度)
激活函数及其梯度
f
(
x
)
=
σ
(
x
)
=
1
1
+
e
−
x
f(x)=\sigma(x)=\frac{1}{1+e^{-x}}
f(x)=σ(x)=1+e−x1
有点:连续光滑,压缩在0~1,且计算方便
缺点:梯度消失
f
(
x
)
=
tanh
(
x
)
=
(
e
x
−
e
−
x
)
(
e
x
+
e
−
x
)
=
2
sigmoid
(
2
x
)
−
1
\begin{aligned} f(x)=& \tanh (x)=\frac{\left(e^{x}-e^{-x}\right)}{\left(e^{x}+e^{-x}\right)} \\ &=2 \operatorname{sigmoid}(2 x)-1 \end{aligned}
f(x)=tanh(x)=(ex+e−x)(ex−e−x)=2sigmoid(2x)−1
f
(
x
)
=
{
0
for
x
<
0
x
for
x
≥
0
f(x)=\left\{\begin{array}{ll} 0 & \text { for } x<0 \\ x & \text { for } x \geq 0 \end{array}\right.
f(x)={0x for x<0 for x≥0
f
′
(
x
)
=
{
0
for
x
<
0
1
for
x
≥
0
f^{\prime}(x)=\left\{\begin{array}{ll} 0 & \text { for } x<0 \\ 1 & \text { for } x \geq 0 \end{array}\right.
f′(x)={01 for x<0 for x≥0
减小梯度消失何梯度爆炸
loss function的梯度
MSE(均方差函数)
使用方法:
- torch.autograd.grad(loss,[w1,w2,w3,…])
- [w1 grad,w2 grad…]
- loss.backward()
- w1.grad
- w2.grad
- …
实例:
softmax函数
∂
p
i
∂
a
j
=
{
p
i
(
1
−
p
j
)
if
i
=
j
−
p
j
⋅
p
i
if
i
≠
j
\frac{\partial p_{i}}{\partial a_{j}}=\left\{\begin{array}{lll} p_{i}\left(1-p_{j}\right) & \text { if } & i=j \\ -p_{j} \cdot p_{i} & \text { if } & i \neq j \end{array}\right.
∂aj∂pi={pi(1−pj)−pj⋅pi if if i=ji=j
实例:
链式梯度推导(链式法则)
实例: