# 神经网络学习（十二）卷积神经网络与BP算法

20 篇文章 72 订阅

### DNN基本计算流程

1、 首先计算输出层的 δL δ L $\delta^L$:

δL=CaLσ(zL)(BP1) (BP1) δ L = ∂ C ∂ a L ⊙ σ ′ ( z L )

2、 利用链式法则，由 δl+1 δ l + 1 $\delta^{l+1}$ 可一步一步推出 δl δ l $\delta^{l}$ ，其表达式为

δl=(Wl+1)Tδl+1σ(zl)(BP2) (BP2) δ l = ( W l + 1 ) T δ l + 1 ⊙ σ ′ ( z l )

3、 由 δl δ l $\delta^l$ 计算出 W W $W$$b$$b$ 的梯度表达式

CW=δl(al1)T(BP3) (BP3) ∂ C ∂ W = δ l ( a l − 1 ) T
Cb=δl(BP4) (BP4) ∂ C ∂ b = δ l

4、 模型更新:

WWηCW W ← W − η ∂ C ∂ W
bbηCb b ← b − η ∂ C ∂ b

W(1ηλn)WηmxCxW,bbηmxCxb W ← ( 1 − η λ n ) W − η m ∑ x ∂ C x ∂ W , b ← b − η m ∑ x ∂ C x ∂ b

### 卷积神经网络的反向传播算法

#### 1、全连接到池化层的反向传播

δl=(Wl+1)Tδl+1 δ l = ( W l + 1 ) T δ l + 1

#### 2、池化层到卷积层的反向传播

Mean-pooling将每个小区域内的平均值作为池化结果；Max-pooling将每个小区域内的最大值作为池化结果

δl=upsample(δl+1)σ(zl) δ l = u p s a m p l e ( δ l + 1 ) ⊙ σ ′ ( z l )

#### 3、卷积层到上一层的反向传播

al11al21al31al12al22al32al13al23al33(wl+111wl+121wl+112wl+122)=(zl+111zl+121zl+112zl+122) ( a 11 l a 12 l a 13 l a 21 l a 22 l a 23 l a 31 l a 32 l a 33 l ) ∗ ( w 11 l + 1 w 12 l + 1 w 21 l + 1 w 22 l + 1 ) = ( z 11 l + 1 z 12 l + 1 z 21 l + 1 z 22 l + 1 )

$\begin{array}{l}{z}_{11}^{l+1}={a}_{11}^{l}{w}_{22}^{l+1}+{a}_{12}^{l}{w}_{21}^{l+1}+{a}_{21}^{l}{w}_{12}^{l+1}+{a}_{22}^{l}{w}_{11}^{l+1}\\ {z}_{12}^{l+1}={a}_{12}^{l}{w}_{22}^{l+1}+{a}_{13}^{l}{w}_{21}^{l+1}+{a}_{22}^{l}{w}_{12}^{l+1}+{a}_{23}^{l}{w}_{11}^{l+1}\\ {z}_{21}^{l+1}={a}_{21}^{l}{w}_{22}^{l+1}+{a}_{22}^{l}{w}_{21}^{l+1}+{a}_{31}^{l}{w}_{12}^{l+1}+{a}_{32}^{l}{w}_{11}^{l+1}\\ {z}_{22}^{l+1}={a}_{22}^{l}{w}_{22}^{l+1}+{a}_{23}^{l}{w}_{21}^{l+1}+{a}_{32}^{l}{w}_{12}^{l+1}+{a}_{33}^{l}{w}_{11}^{l+1}\end{array}$

function z = symconv(a,k)
%SYMCONV valid模式下的syms符号卷积
syms zero real
k = rot90(k,2);
[hw,ww] = size(k);
[ha,wa] = size(a);
h = ha - hw + 1;
w = wa - ww + 1;
for in = 1:h
for im = 1:w
z(in,im) = zero;
for jn = 1:hw
for jm = 1:ww
z(in,im) = z(in,im)+a(in+jn-1,im+jm-1)*k(jn,jm);
end
end
end
end
z = z - zero;
end

syms a11 a12 a13 a21 a22 a23 a31 a32 a33 real % a in L layer
syms w11 w12 w21 w22 real
syms z11 z12 z21 z22 real   % z in L+1 layer
syms d11 d12 d21 d22 real   % delta in L+1 layer
a = [a11 a12 a13; a21 a22 a23; a31 a32 a33];
d = [d11 d12; d21 d22];
w = [w11 w12; w21 w22];
z = symconv(a,w);

diff(z,a11); %对a11求导
diff(z,w11); %对w11求导

δlij=Czlij=(mnCzl+1mnzl+1mnalij)alijzlij δ i j l = ∂ C ∂ z i j l = ( ∑ m n ∂ C ∂ z m n l + 1 ∂ z m n l + 1 ∂ a i j l ) ∂ a i j l ∂ z i j l

δlij=Czlij=(δl+1zl+1alij)[1111]σ(zlij) δ i j l = ∂ C ∂ z i j l = ( δ l + 1 ⊙ ∂ z l + 1 ∂ a i j l ) ∗ [ 1 1 1 1 ] σ ′ ( z i j l )

δl11=(δl+111wl+122000)[1111]σ(zl11)=(000δl+111)(wl+122000)σ(zl11) δ 11 l = ( δ 11 l + 1 w 22 l + 1 0 0 0 ) ∗ [ 1 1 1 1 ] σ ′ ( z 11 l ) = ( 0 0 0 δ 11 l + 1 ) ∗ ( w 22 l + 1 0 0 0 ) σ ′ ( z 11 l )
δl12=(δl+111wl+1210δl+112wl+1220)[1111]σ(zl12)=(0δl+1110δl+112)(wl+1220wl+1210)σ(zl12) δ 12 l = ( δ 11 l + 1 w 21 l + 1 δ 12 l + 1 w 22 l + 1 0 0 ) ∗ [ 1 1 1 1 ] σ ′ ( z 12 l ) = ( 0 0 δ 11 l + 1 δ 12 l + 1 ) ∗ ( w 22 l + 1 w 21 l + 1 0 0 ) σ ′ ( z 12 l )
δl13=(00δl+112wl+1210)[1111]σ(zl13)=(0δl+11200)(00wl+1210)σ(zl13) δ 13 l = ( 0 δ 12 l + 1 w 21 l + 1 0 0 ) ∗ [ 1 1 1 1 ] σ ′ ( z 13 l ) = ( 0 0 δ 12 l + 1 0 ) ∗ ( 0 w 21 l + 1 0 0 ) σ ′ ( z 13 l )
δl22=(δl+111wl+111δl+121wl+121δl+112wl+112δl+122wl+122)[1111]σ(zl13)=(δl+111δl+112δl+112δl+122)(wl+122wl+112wl+121wl+111)σ(zl22) δ 22 l = ( δ 11 l + 1 w 11 l + 1 δ 12 l + 1 w 12 l + 1 δ 21 l + 1 w 21 l + 1 δ 22 l + 1 w 22 l + 1 ) ∗ [ 1 1 1 1 ] σ ′ ( z 13 l ) = ( δ 11 l + 1 δ 12 l + 1 δ 12 l + 1 δ 22 l + 1 ) ∗ ( w 22 l + 1 w 21 l + 1 w 12 l + 1 w 11 l + 1 ) σ ′ ( z 22 l )

δl=00000δl+111δl+11200δl+112δl+12200000(wl+122wl+112wl+121wl+111)σ(zl)=padding(δl+1)rot90(Wl+1,2) δ l = ( 0 0 0 0 0 δ 11 l + 1 δ 12 l + 1 0 0 δ 12 l + 1 δ 22 l + 1 0 0 0 0 0 ) ∗ ( w 22 l + 1 w 21 l + 1 w 12 l + 1 w 11 l + 1 ) σ ‘ ( z l ) = p a d d i n g ( δ l + 1 ) ∗ r o t 90 ( W l + 1 , 2 )

#### 4、卷积层 W W $W$和b$b$$b$的梯度

$\frac{\mathrm{\partial }C}{\mathrm{\partial }{W}_{ij}^{l}}={\sum }_{mn}\frac{\mathrm{\partial }C}{\mathrm{\partial }{z}_{mn}^{l}}\frac{\mathrm{\partial }{z}_{mn}^{l}}{\mathrm{\partial }{W}_{ij}^{l}}$

CWl11=(al133al123al132al122)(δl11δl21δl12δl22),CWl12=(al132al122al131al121)(δl11δl21δl12δl22) ∂ C ∂ W 11 l = ( a 33 l − 1 a 32 l − 1 a 23 l − 1 a 22 l − 1 ) ∗ ( δ 11 l δ 12 l δ 21 l δ 22 l ) , ∂ C ∂ W 12 l = ( a 32 l − 1 a 31 l − 1 a 22 l − 1 a 21 l − 1 ) ∗ ( δ 11 l δ 12 l δ 21 l δ 22 l )
CWl21=(al123al113al122al112)(δl11δl21δl12δl22),CWl22=(al122al112al121al111)(δl11δl21δl12δl22) ∂ C ∂ W 21 l = ( a 23 l − 1 a 22 l − 1 a 13 l − 1 a 12 l − 1 ) ∗ ( δ 11 l δ 12 l δ 21 l δ 22 l ) , ∂ C ∂ W 22 l = ( a 22 l − 1 a 21 l − 1 a 12 l − 1 a 11 l − 1 ) ∗ ( δ 11 l δ 12 l δ 21 l δ 22 l )

CWl=δlrot90(al1,2) ∂ C ∂ W l = δ l ∗ r o t 90 ( a l − 1 , 2 )

CWl=conv2(δl,rot90(al1,2),valid) ∂ C ∂ W l = c o n v 2 ( δ l , r o t 90 ( a l − 1 , 2 ) , ′ v a l i d ′ )

$\frac{\mathrm{\partial }C}{\mathrm{\partial }{b}^{l}}={\sum }_{mn}{\delta }_{mn}^{l}$

$\frac{\mathrm{\partial }C}{\mathrm{\partial }{b}^{l}}=\mathrm{m}\mathrm{e}\mathrm{a}\mathrm{n}\left({\sum }_{mn}{\delta }_{mn}^{l}\right)$

#### 5、卷积神经网络反向传播算法总结

• 小批量数据的大小 m m $m$
• CNN模型的层数 $L$$L$ 和所有隐藏层的类型
• 对于卷积层，要定义卷积核的大小 k k $k$，卷积核子矩阵的维度 $d$$d$，填充大小 p p $p$，步幅 $s$$s$
• 对于池化层，要定义池化区域大小 h h $h$ 和池化标准(max 或者 mean)
• 对于全连接层，要定义全连接层的激活函数和各层的神经元个数
• 对于输出层，要定义输出函数和代价函数，多分类任务一般采用 softmax 函数和交叉熵代价函数
• 超参数：学习速率 $\eta$$\eta$, 最大迭代次数 max_iter, 和停止条件 ϵ ϵ $\epsilon$
• ……

1. 初始化每个隐含层的 W,b W , b $W,b$ 的值为随机数
2. 正向传播
2.1).将输入数据 x x $x$ 赋值于输入神经元 ${a}^{1},{a}^{1}=x$$a^1, a^1 = x$
2.2).从第二层开始，根据下面3种情况进行前向传播计算:

• 如果当前是全连接层：则有 al=σ(zl)=σ(Wlal1+bl) a l = σ ( z l ) = σ ( W l a l − 1 + b l ) $a^{l} = \sigma(z^{l}) = \sigma(W^la^{l-1} + b^{l})$
• 如果当前是卷积层：则有 al=σ(zl)=σ(Wlal1+bl) a l = σ ( z l ) = σ ( W l ∗ a l − 1 + b l ) $a^{l} = \sigma(z^{l}) = \sigma(W^l*a^{l-1} + b^{l})$
• 如果当前是池化层：则有 al=pool(al1) a l = pool ( a l − 1 ) $a^{l}= \texttt{pool}(a^{l-1})$
2.3).对于输出层第 L L $L$ 层，计算输出 ${a}^{L}=\mathtt{\text{softmax}}\left({z}^{l}\right)=\mathtt{\text{softmax}}\left({W}^{l}{a}^{l-1}+{b}^{l}\right)$$a^{L}= \texttt{softmax}(z^{l}) = \texttt{softmax}(W^la^{l-1} + b^{l})$

3. 反向传播
3.1).通过损失函数计算输出层的 δL δ L $\delta^L$
3.2).从倒数第二层开始，根据下面3种情况逐层进行反向传播计算：

• 如果当前是全连接层：则有 δl=(Wl+1)Tδl+1σ(zl) δ l = ( W l + 1 ) T δ l + 1 ⊙ σ ′ ( z l ) $\delta^{l} = (W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^{l})$
• 如果上层是卷积层：则有 δl=δl+1rot180(Wl+1)σ(zl) δ l = δ l + 1 ∗ rot180 ( W l + 1 ) ⊙ σ ′ ( z l ) $\delta^{l} = \delta^{l+1}*\texttt{rot180}(W^{l+1}) \odot \sigma^{'}(z^{l})$
• 如果上层是池化层：则有 δl=upsample(δl+1) δ l = upsample ( δ l + 1 ) $\delta^{l} = \texttt{upsample}(\delta^{l+1})$
4. 根据以下两种情况进行模型更新
4.1).如果当前是全连接层：
Wl=Wlηm[δl(al1)T] W l = W l − η m ∑ [ δ l ( a l − 1 ) T ]
bl=blηm(δl) b l = b l − η m ∑ ( δ l )
4.2).如果当前是卷积层，对于每一个卷积核有：
Wl=Wlηm[δlrot90(al1,2)] W l = W l − η m ∑ [ δ l ∗ rot90 ( a l − 1 , 2 ) ]
bl=blηm[mean(δl)] b l = b l − η m ∑ [ mean ( δ l ) ]

• 13
点赞
• 67
收藏
觉得还不错? 一键收藏
• 打赏
• 2
评论
10-05 720
04-07
09-29 9245
12-26 1万+
08-21 2664
12-12 2万+
10-04 3万+
06-28 4017
05-28 1万+

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

oio328Loio

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。