如何反向求梯度???

计算图

在这里插入图片描述用计算图来表示任何函数,其中图的节点表示我们要执行的每一步计算。如上图的线性分类器中,输入是 x x x W W W ∗ * 表示矩阵乘法,即 W ∗ x W*x Wx,输出得分向量。另一个节点表示 hinge loss,计算数据损失项 L i L_{i} Li,还有一个正则项,在右下角。在最后的总的损失 L L L,是正则项和数据项的和。
画出计算图后,可以用链式求导法则得到每个节点的梯度。

(x+y)z的链式求导公式

f ( x , y , z ) = ( x + y ) z f_{(x, y, z)}=(x+y)z f(x,y,z)=(x+y)z q ( x , y ) = x + y q_{(x, y)}=x+y q(x,y)=x+y,则
∂ f ∂ x = ∂ f ∂ q × ∂ q ∂ x = z × 1 = z \frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\times \frac{\partial q}{\partial x}=z\times 1=z xf=qf×xq=z×1=z
∂ f ∂ y = ∂ f ∂ q × ∂ q ∂ y = z × 1 = z \frac{\partial f}{\partial y}=\frac{\partial f}{\partial q}\times \frac{\partial q}{\partial y}=z\times 1=z yf=qf×yq=z×1=z
∂ f ∂ z = q = x + y \frac{\partial f}{\partial z}=q=x+y zf=q=x+y

反向求梯度的例子

在这里插入图片描述正向传播计算图如图所示,反向传播过程为:
开始第一个梯度为1。
f ( x ) = 1 x f_{(x)}=\frac{1}{x} f(x)=x1,则求导得 f ( x ) ′ = − 1 x 2 f_{(x)}^{'}=-\frac{1}{x^{2}} f(x)=x21,将 x = 1.37 x=1.37 x=1.37代入得 f ( x ) ′ = − 0.53 f_{(x)}^{'}=-0.53 f(x)=0.53,故其梯度为 − 0.53 × 1 = − 0.53 -0.53\times 1=-0.53 0.53×1=0.53
f ( x ) = x + 1 f_{(x)}=x+1 f(x)=x+1,则求导得 f ( x ) ′ = 1 f_{(x)}^{'}=1 f(x)=1,故其梯度为 1 × − 0.53 = − 0.53 1\times -0.53=-0.53 1×0.53=0.53
f ( x ) = e x f_{(x)}=e^{x} f(x)=ex,则求导得 f ( x ) ′ = e x f_{(x)}^{'}=e^{x} f(x)=ex,将 x = − 1 x=-1 x=1代入得 f ( x ) ′ = 0.37 f_{(x)}^{'}=0.37 f(x)=0.37,故其梯度为 0.37 × − 0.53 = − 0.2 0.37\times -0.53=-0.2 0.37×0.53=0.2
以此类推,得到所有梯度为:
在这里插入图片描述上图中画框的地方其实是 s i g m o i d sigmoid sigmoid函数,可以不用一步一步地从开始求解到0.20处,直接用 s i g m o i d sigmoid sigmoid求导得到梯度。

sigmoid求导

σ ( x ) = 1 1 + e − x \sigma_{(x)}=\frac{1}{1+e^{-x}} σ(x)=1+ex1
d σ ( x ) d x = e − x ( 1 + e − x ) 2 = ( 1 + e − x − 1 1 + e − x ) ( 1 1 + e − x ) = ( 1 − σ ( x ) ) σ ( x ) \frac{d\sigma_{(x)}}{dx}=\frac{e^{-x}}{(1+e^{-x})^{2}}=(\frac{1+e^{-x}-1}{1+e^{-x}})(\frac{1}{1+e^{-x}})=(1-\sigma_{(x)})\sigma_{(x)} dxdσ(x)=(1+ex)2ex=(1+ex1+ex1)(1+ex1)=(1σ(x))σ(x)

向量的反向传播

在这里插入图片描述如上图所示,对 f ( q i ) f_{(q_{i})} f(qi)求导,得到 ∂ f ∂ q i = 2 q i \frac{\partial f}{\partial q_{i}}=2q_{i} qif=2qi,即反向求导后得到梯度
[ 0.44 0.52 ] \begin{bmatrix} 0.44 \\ 0.52 \\ \end{bmatrix} [0.440.52]
q 1 q_{1} q1(即 W 1 , 1 x 1 + W 1 , 2 x 2 W_{1, 1}x_{1}+W_{1, 2}x_{2} W1,1x1+W1,2x2)对 W 1 , 1 W_{1, 1} W1,1求导,得 ∂ q 1 ∂ W 1 , 1 = x 1 = 0.2 \frac{\partial q_{1}}{\partial W_{1, 1}}=x_{1}=0.2 W1,1q1=x1=0.2
q 1 q_{1} q1 W 1 , 2 W_{1, 2} W1,2求导,得 ∂ q 1 ∂ W 1 , 2 = x 2 = 0.4 \frac{\partial q_{1}}{\partial W_{1, 2}}=x_{2}=0.4 W1,2q1=x2=0.4
q 1 q_{1} q1 W 2 , 1 W_{2, 1} W2,1求导,得 ∂ q 1 ∂ W 2 , 1 = 0 \frac{\partial q_{1}}{\partial W_{2, 1}}=0 W2,1q1=0
q 1 q_{1} q1 W 2 , 2 W_{2, 2} W2,2求导,得 ∂ q 1 ∂ W 2 , 2 = 0 \frac{\partial q_{1}}{\partial W_{2, 2}}=0 W2,2q1=0
同理, ∂ q 2 ∂ W 1 , 1 = 0 \frac{\partial q_{2}}{\partial W_{1, 1}}=0 W1,1q2=0 ∂ q 2 ∂ W 1 , 2 = 0 \frac{\partial q_{2}}{\partial W_{1, 2}}=0 W1,2q2=0 ∂ q 2 ∂ W 2 , 1 = x 1 = 0.2 \frac{\partial q_{2}}{\partial W_{2, 1}}=x_{1}=0.2 W2,1q2=x1=0.2 ∂ q 2 ∂ W 2 , 2 = x 2 = 0.4 \frac{\partial q_{2}}{\partial W_{2, 2}}=x_{2}=0.4 W2,2q2=x2=0.4
即:
∂ q k ∂ W i , j = 1 k = i x j \frac{\partial q_{k}}{\partial W_{i, j}}=1_{k=i}x_{j} Wi,jqk=1k=ixj
其中 1 k = i 1_{k=i} 1k=i指:如果 k = i k=i k=i,则 1 k = i = 1 1_{k=i}=1 1k=i=1,否则等于 0 0 0
故:
∂ f ∂ W i , j = ∑ k ∂ f ∂ q k ∂ q k ∂ W i , j = ∑ k ( 2 q k ) ( 1 k = i x j ) = 2 q i x j \frac{\partial f}{\partial W_{i, j}}=\sum_{k}\frac{\partial f}{\partial q_{k}}\frac{\partial q_{k}}{\partial W_{i, j}}=\sum_{k}(2q_{k})(1_{k=i}x_{j})=2q_{i}x_{j} Wi,jf=kqkfWi,jqk=k(2qk)(1k=ixj)=2qixj
故:
∂ f ∂ W 1 , 1 = 2 q 1 x 1 = 0.088 \frac{\partial f}{\partial W_{1, 1}}=2q_{1}x_{1}=0.088 W1,1f=2q1x1=0.088
∂ f ∂ W 1 , 2 = 2 q 1 x 2 = 0.176 \frac{\partial f}{\partial W_{1, 2}}=2q_{1}x_{2}=0.176 W1,2f=2q1x2=0.176
∂ f ∂ W 2 , 1 = 2 q 2 x 1 = 0.104 \frac{\partial f}{\partial W_{2, 1}}=2q_{2}x_{1}=0.104 W2,1f=2q2x1=0.104
∂ f ∂ W 2 , 2 = 2 q 2 x 2 = 0.208 \frac{\partial f}{\partial W_{2, 2}}=2q_{2}x_{2}=0.208 W2,2f=2q2x2=0.208
最终得到:
∂ f ∂ W = [ 0.088 0.176 0.104 0.208 ] \frac{\partial f}{\partial W}= \begin{bmatrix} 0.088 & 0.176 \\ 0.104 & 0.208 \\ \end{bmatrix} Wf=[0.0880.1040.1760.208]
继续用 q 1 q_{1} q1 x 1 x_{1} x1求导,得
∂ q 1 ∂ x 1 = W 1 , 1 = 0.1 \frac{\partial q_{1}}{\partial x_{1}}=W_{1, 1}=0.1 x1q1=W1,1=0.1
同理得
∂ q 1 ∂ x 2 = W 1 , 2 = 0.5 \frac{\partial q_{1}}{\partial x_{2}}=W_{1, 2}=0.5 x2q1=W1,2=0.5
∂ q 2 ∂ x 1 = W 2 , 1 = − 0.3 \frac{\partial q_{2}}{\partial x_{1}}=W_{2, 1}=-0.3 x1q2=W2,1=0.3
∂ q 2 ∂ x 2 = W 2 , 2 = 0.8 \frac{\partial q_{2}}{\partial x_{2}}=W_{2, 2}=0.8 x2q2=W2,2=0.8
即:
∂ q k ∂ x i = W k , i \frac{\partial q_{k}}{\partial x_{i}}=W_{k, i} xiqk=Wk,i
∂ f ∂ x i = ∑ k ∂ f ∂ q k ∂ q k ∂ x i = ∑ k 2 q k W k , i \frac{\partial f}{\partial x_{i}}=\sum_{k}\frac{\partial f}{\partial q_{k}}\frac{\partial q_{k}}{\partial x_{i}}=\sum_{k}2q_{k}W_{k, i} xif=kqkfxiqk=k2qkWk,i
故:
∂ f ∂ x 1 = 2 q 1 W 1 , 1 + 2 q 2 W 2 , 1 = − 0.112 \frac{\partial f}{\partial x_{1}}=2q_{1}W_{1, 1}+2q_{2}W_{2, 1}=-0.112 x1f=2q1W1,1+2q2W2,1=0.112
∂ f ∂ x 2 = 2 q 1 W 1 , 2 + 2 q 2 W 2 , 2 = 0.636 \frac{\partial f}{\partial x_{2}}=2q_{1}W_{1, 2}+2q_{2}W_{2, 2}=0.636 x2f=2q1W1,2+2q2W2,2=0.636
故:
∂ f ∂ x = [ − 0.112 0.636 ] \frac{\partial f}{\partial x}=\begin{bmatrix} -0.112 \\ 0.636 \\ \end{bmatrix} xf=[0.1120.636]
最终:
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

cofisher

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值