文章目录
什么是梯度
- 梯度是是由偏导数构成的向量, 梯度的方向是由最小值指向最大值。
如何更新参数
梯度优化时常见的问题:
- local minima
- Saddle point 鞍点:
影响优化表现的因素(Optimizer Performance):
- initial Point
- Learning Rate
- Momentum(帮助摆脱局部最优点,找到全局最优)
常见函数及其求导:
激活函数及其梯度
- sigmoid/logistic :
σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1+e^{-x}} σ(x)=1+e−x1 σ ′ = σ ( 1 − σ ) \sigma'=\sigma(1-\sigma) σ′=σ(1−σ)
torch.sigmoid(a)
- 双曲正切 Tanh
tanh ( x ) = e x − e − x e x + e − x \tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} tanh(x)=ex+e−xex−e−x tanh ′ ( x ) = 1 − tanh 2 ( x ) \tanh'(x) = 1-\tanh^2(x) tanh′(x)=1−tanh2(x)
torch.tanh(a)
- Relu
f ( x ) = { 0 x < 0 x x ≥ 0 f(x)=\begin{cases}0 & x<0\\ x & x \ge0 \end{cases} f(x)={ 0xx<0x≥0 f ′ ( x ) = { 0 x < 0 1 x ≥ 0 f'(x)=\begin{cases}0 & x<0\\ 1 & x \ge0\end{cases} f′(x)={ 01x<0x≥0
from torch.nn import functional as F
F.relu(a)
torch.relu(a)
典型的loss及其求导
-
MSE
l o s s = ∑ [ y − ( x w + b ) ] 2 loss = \sum[y-(xw+b)]^2 loss=∑[y−(xw+b)]