深度学习(26)随机梯度下降四: 损失函数的梯度
Outline
- Mean Squared Error
- Cross Entropy Loss
1. Mean Squared Error(MSE)
- l o s s = ∑ [ y − f θ ( x ) ] 2 loss=∑[y-f_θ (x)]^2 loss=∑[y−fθ(x)]2
- ∇ l o s s ∇ θ = 2 ∑ [ y − f θ ( x ) ] ∗ ∇ f θ ( x ) ∇ θ \frac{∇loss}{∇θ}=2∑[y-f_θ (x)] *\frac{∇f_θ (x)}{∇θ} ∇θ∇loss=2∑[y−fθ(x)]∗∇θ∇fθ(x)
- f θ ( x ) = s i g m o i d ( X W + b ) f_θ (x)=sigmoid(XW+b) fθ(x)=sigmoid(XW+b)
- f θ ( x ) = r e l u ( X W + b ) f_θ (x)=relu(XW+b) fθ(x)=relu(XW+b)
MSE Gradient
注: 如果不写tape.watch([w, b])
的话,就需要将w和b手动转换为Variable类型。
2. Cross Entropy Loss
CrossEntropy
- H ( [ 0 , 1 , 0 ] , [ p 0 , p 1 , p 2 ] ) = D K L ( p │ q ) = − 1 log p 1 H([0,1,0],[p_0,p_1,p_2 ])=D_{KL} (p│q)=-1 \log{p_1} H([0,1,0],[p0,p1,p2])=DKL(p│q)=−1logp1
- d d x log 2 ( x ) = 1 x ⋅ l n ( 2 ) \frac{d}{dx} \log_2{(x)}=\frac{1}{x⋅ln(2)} dxdlog2(x)=x⋅ln(2)1
- p = s o f t m a x ( l o g i t s ) p=softmax(logits) p=softmax(logits)
3. Softmax
- soft version of max
(1) Derivative
(2) Crossentropy gradient
参考文献:
[1] 龙良曲:《深度学习与TensorFlow2入门实战》