神经网络及相关公式推导

1. 神经网络

neural_network
输入 [ x 1 , x 2 , . . . , x n ] [x_1, x_2,...,x_n] [x1,x2,...,xn],输出 [ y 1 , y 2 , . . . , y k ] [y_1, y_2,...,y_k] [y1,y2,...,yk]
当输出分类 k > 2 k>2 k>2时,使用
[ 1 0 . . . 0 ] , [ 0 1 . . . 0 ] , [ 0 . . . 1 0 ] , [ 0 0 . . . 1 ] \begin{bmatrix}1\\0\\... \\0\end{bmatrix},\begin{bmatrix}0\\1\\...\\0\end{bmatrix},\begin{bmatrix}0\\...\\1\\0\end{bmatrix},\begin{bmatrix}0\\0\\...\\1\end{bmatrix} 10...0,01...0,0...10,00...1
作为输出。

2. 代价函数

J ( Θ ) = − 1 m [ ∑ i = 1 m ∑ k = 1 K y k ( i ) l o g ( h Θ ( x ( i ) ) ) k + ( 1 − y k ( i ) ) l o g ( 1 − ( h Θ ( x ( i ) ) ) k ) ] + λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s ( l + 1 ) ( Θ j i ( l ) ) 2 J(\Theta)=-\frac{1}{m}\left[\sum_{i=1}^{m}\sum_{k=1}^{K}y_k^{(i)}log(h_{\Theta}(x^{(i)}))_k+(1-y_k^{(i)})log(1-(h_{\Theta}(x^{(i)}))_k)\right]+\frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{(l+1)}}(\Theta_{ji}^{(l)})^2 J(Θ)=m1[i=1mk=1Kyk(i)log(hΘ(x(i)))k+(1yk(i))log(1(hΘ(x(i)))k)]+2mλl=1L1i=1slj=1s(l+1)(Θji(l))2

3. 前向传播

a ( 1 ) = x a ( l ) = h Θ ( l ) ( a ( l − 1 ) ^ ) = g ( Θ ( l − 1 ) [ 1 a ( l − 1 ) ] ) , 1 < l ≤ L Θ ( l ) ∈ R s l + 1 × ( s l + 1 ) a^{(1)}=x \\a^{(l)} = h_{\Theta^{(l)}}(\widehat{a^{(l-1)}})=g\left({\Theta^{(l-1)}}\begin{bmatrix}1\\a^{(l-1)}\end{bmatrix}\right),1<l\le L \\\Theta^{(l)} \in \Bbb R^{s_{l+1}\times(s_l+1)} a(1)=xa(l)=hΘ(l)(a(l1) )=g(Θ(l1)[1a(l1)]),1<lLΘ(l)Rsl+1×(sl+1)

4. 后向传播

δ ( L ) = ( a ( L ) − y ) a ( L ) ( 1 − a ( L ) ) δ ( l ) = ( Θ ( l ) ^ ) T δ ( l + 1 ) . ∗ a ( l ) . ∗ ( 1 − a ( l ) ) , 1 < l < L Θ ( l ) ^ 为 不 包 含 θ 0 的 Θ ( l ) \delta^{(L)} = (a^{(L)}-y)a^{(L)}(1-a^{(L)}) \\ \delta^{(l)} = (\widehat{\Theta^{(l)}})^T\delta^{(l+1)}.*a^{(l)}.*(1-a^{(l)}), 1<l<L \\\widehat{\Theta^{(l)}}为不包含\theta_0的\Theta^{(l)} δ(L)=(a(L)y)a(L)(1a(L))δ(l)=(Θ(l) )Tδ(l+1).a(l).(1a(l)),1<l<LΘ(l) θ0Θ(l)

5. 后向传播推导

由前向传播可得:
[ θ 1 , 0 ( l ) θ 1 , 1 ( l ) . . . θ 1 , s l ( l ) θ 2 , 0 ( l ) θ 2 , 1 ( l ) . . . θ 2 , s l ( l ) . . . . . . . . . . . . θ s ( l + 1 ) , 0 ( l ) θ s ( l + 1 ) , 1 ( l ) . . . θ s ( l + 1 ) , s l ( l ) ] [ 1 a 1 ( l ) a 2 ( l ) . . . a s l ( l ) ] = [ z 1 ( l + 1 ) z 2 ( l + 1 ) . . . z s ( l + 1 ) ( l + 1 ) ] → g ( x ) → [ a 1 ( l + 1 ) a 2 ( l + 1 ) . . . a s ( l + 1 ) ( l + 1 ) ] \begin{bmatrix}\theta_{1,0}^{(l)} & \theta_{1,1}^{(l)} & ... & \theta_{1,s_l}^{(l)}\\\theta_{2,0}^{(l)} & \theta_{2,1}^{(l)} & ... & \theta_{2,s_l}^{(l)}\\... & ... & ... & ...\\\theta_{s_{(l+1)},0}^{(l)} & \theta_{s_{(l+1)},1}^{(l)} & ... & \theta_{s_{(l+1)},s_l}^{(l)}\\\end{bmatrix}\begin{bmatrix}1 \\a_1^{(l)} \\a_2^{(l)} \\... \\a_{s_l}^{(l)} \\\end{bmatrix}=\begin{bmatrix}z_1^{(l+1)} \\z_2^{(l+1)} \\... \\z_{s_{(l+1)}}^{(l+1)} \\\end{bmatrix}\to g(x) \to\begin{bmatrix}a_1^{(l+1)} \\a_2^{(l+1)} \\... \\a_{s_{(l+1)}}^{(l+1)} \\\end{bmatrix}\\ θ1,0(l)θ2,0(l)...θs(l+1),0(l)θ1,1(l)θ2,1(l)...θs(l+1),1(l)............θ1,sl(l)θ2,sl(l)...θs(l+1),sl(l)1a1(l)a2(l)...asl(l)=z1(l+1)z2(l+1)...zs(l+1)(l+1)g(x)a1(l+1)a2(l+1)...as(l+1)(l+1)
a i ( l + 1 ) = g ( z i ( l + 1 ) ) = g ( θ i , 0 ( l ) + θ i , 1 ( l ) a 1 ( l ) + θ i , 2 ( l ) a 2 ( l ) + . . . + θ i , s l ( l ) a s 1 ( l ) ) a_i^{(l+1)} = g(z_i^{(l+1)}) = g(\theta_{i,0}^{(l)}+\theta_{i,1}^{(l)}a_1^{(l)}+\theta_{i,2}^{(l)}a_2^{(l)}+...+\theta_{i,s_l}^{(l)}a_{s_1}^{(l)})\\ ai(l+1)=g(zi(l+1))=g(θi,0(l)+θi,1(l)a1(l)+θi,2(l)a2(l)+...+θi,sl(l)as1(l))
设输出对激励输入的偏导数为当前输入的误差,则:
d J ( x , θ ) = δ i ( l + 1 ) d ( z i ( l + 1 ) ) = δ i ( l + 1 ) d ( ∑ k = 0 s l θ i , k ( l ) a k ( l ) ) (1) dJ(x,\theta)=\delta_i^{(l+1)}d(z_i^{(l+1)}) = \delta_i^{(l+1)}d(\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}a_k^{(l)}) \tag{1} dJ(x,θ)=δi(l+1)d(zi(l+1))=δi(l+1)d(k=0slθi,k(l)ak(l))(1)
所以有:
d J ( x , θ ) d ( z m ( l ) ) = d J ( x , θ ) d ( z ( l + 1 ) ) d ( z ( l + 1 ) ) d ( z m ( l ) ) = ∑ i = 1 s l + 1 δ i ( l + 1 ) d ( ∑ k = 0 s l θ i , k ( l ) a k ( l ) ) / d ( z m ( l ) ) = ∑ i = 1 s l + 1 δ i ( l + 1 ) ∑ k = 0 s l θ i , k ( l ) d ( a k ( l ) ) d ( z m ( l ) ) = ∑ i = 1 s l + 1 δ i ( l + 1 ) θ i , m ( l ) a m ( l ) ( 1 − a m ( l ) ) \begin{aligned} \frac{dJ(x,\theta)}{d(z_m^{(l)})} &= \frac{dJ(x,\theta)}{d(z^{(l+1)})}\frac{d(z^{(l+1)})}{d(z_m^{(l)})}=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}d(\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}a_k^{(l)})/d(z_m^{(l)}) \\&=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}\frac{d(a_k^{(l)})}{d(z_m^{(l)})}=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\theta_{i, m}^{(l)}a_m^{(l)}(1-a_m^{(l)})\\ \end{aligned} d(zm(l))dJ(x,θ)=d(z(l+1))dJ(x,θ)d(zm(l))d(z(l+1))=i=1sl+1δi(l+1)d(k=0slθi,k(l)ak(l))/d(zm(l))=i=1sl+1δi(l+1)k=0slθi,k(l)d(zm(l))d(ak(l))=i=1sl+1δi(l+1)θi,m(l)am(l)(1am(l))
得到:
δ m ( l ) = a m ( l ) ( 1 − a m ( l ) ) ∑ i = 1 s l + 1 δ i ( l + 1 ) θ i , m ( l ) \delta_m^{(l)}=a_m^{(l)}(1-a_m^{(l)})\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\theta_{i, m}^{(l)}\\ δm(l)=am(l)(1am(l))i=1sl+1δi(l+1)θi,m(l)
向量扩充:
[ δ 1 ( l ) δ 2 ( l ) . . . δ s l ( l ) ] = [ a 1 ( l ) a 2 ( l ) . . . a s l ( l ) ] . ∗ ( 1 − [ a 1 ( l ) a 2 ( l ) . . . a s l ( l ) ] ) . ∗ ( [ θ 1 , 1 ( l ) θ 2 , 1 ( l ) . . . θ s l + 1 , 1 ( l ) θ 1 , 2 ( l ) θ 2 , 2 ( l ) . . . θ s l + 1 , 2 ( l ) . . . θ 1 , s l ( l ) θ 2 , s l ( l ) . . . θ s l + 1 , s l ( l ) ] [ δ 1 ( l + 1 ) δ 2 ( l + 1 ) . . . δ s l + 1 ( l + 1 ) ] ) \begin{bmatrix}\delta_1^{(l)} \\ \delta_2^{(l)} \\ ... \\ \delta_{s_l}^{(l)}\end{bmatrix}=\begin{bmatrix}a_1^{(l)} \\ a_2^{(l)} \\ ... \\ a_{s_l}^{(l)}\end{bmatrix}.*\left (1-\begin{bmatrix}a_1^{(l)} \\ a_2^{(l)} \\ ... \\ a_{s_l}^{(l)}\end{bmatrix}\right).*\left(\begin{bmatrix}\theta_{1, 1}^{(l)} &\theta_{2, 1}^{(l)} &...&\theta_{s_{l+1}, 1}^{(l)}\\ \theta_{1, 2}^{(l)} &\theta_{2, 2}^{(l)} &...&\theta_{s_{l+1}, 2}^{(l)} \\ ... \\ \theta_{1, s_l}^{(l)} &\theta_{2, s_l}^{(l)} &...&\theta_{s_{l+1}, s_l}^{(l)}\end{bmatrix}\begin{bmatrix}\delta_1^{(l+1)} \\ \delta_2^{(l+1)} \\ ... \\ \delta_{s_{l+1}}^{(l+1)}\end{bmatrix}\right) \\ δ1(l)δ2(l)...δsl(l)=a1(l)a2(l)...asl(l).1a1(l)a2(l)...asl(l).θ1,1(l)θ1,2(l)...θ1,sl(l)θ2,1(l)θ2,2(l)θ2,sl(l).........θsl+1,1(l)θsl+1,2(l)θsl+1,sl(l)δ1(l+1)δ2(l+1)...δsl+1(l+1)
最后得到:
δ ( l ) = a ( l ) . ∗ ( 1 − a ( l ) ) . ∗ ( θ ( l ) T δ ( l + 1 ) ) (2) \delta^{(l)}=a^{(l)}.*(1-a^{(l)}).*({\theta^{(l)}}^T\delta^{(l+1)}) \tag{2} δ(l)=a(l).(1a(l)).(θ(l)Tδ(l+1))(2)
又由公式(1)得:
d J ( x , θ ) d θ i , j ( l ) = δ i ( l + 1 ) a j ( l ) (3) \frac{dJ(x,\theta)}{d\theta_{i, j}^{(l)}}=\delta_i^{(l+1)}a_j^{(l)} \tag{3} dθi,j(l)dJ(x,θ)=δi(l+1)aj(l)(3)
假设 J ( x , θ ) = 1 2 ( h ( x ) − y ) 2 J(x,\theta)=\frac{1}{2}(h(x)-y)^2 J(x,θ)=21(h(x)y)2,则:
δ ( L ) = d J ( x , θ ) d z L = d J ( x , θ ) d a L d a L d z L \delta^{(L)}=\frac{dJ(x,\theta)}{dz^{L}}=\frac{dJ(x,\theta)}{da^{L}}\frac{da^{L}}{dz^{L}} δ(L)=dzLdJ(x,θ)=daLdJ(x,θ)dzLdaL
得到
δ ( L ) = ( a L − y ) a L ( 1 − a L ) (4) \delta^{(L)}=(a^{L}-y)a^{L}(1-a^{L}) \tag{4} δ(L)=(aLy)aL(1aL)(4)

6. 后向传播算法

  1. 误差矩阵 Δ i j ( l ) \Delta_{ij}^{(l)} Δij(l)初始化为零
  2. For i=1:m
      (1). a ( 1 ) = x ( i ) a^{(1)}=x^{(i)} a(1)=x(i) 其中 a a a的上标表示不同的层数, x x x的上标表示不同的测试样本
      (2). 利用前向传播计算 a ( 2 ) , a ( 3 ) , . . . , a ( L ) a^{(2)},a^{(3)},...,a^{(L)} a(2),a(3),...,a(L)
      (3). 初始 δ ( L ) = ( a ( L ) − y ( i ) ) a ( L ) ( 1 − a ( L ) ) \delta^{(L)}=(a^{(L)}-y^{(i)})a^{(L)}(1-a^{(L)}) δ(L)=(a(L)y(i))a(L)(1a(L))
      (4). 利用后向传播计算 δ ( l ) \delta^{(l)} δ(l)
      (5). Δ i j ( l ) \Delta_{ij}^{(l)} Δij(l):= Δ i j ( l ) + a j ( l ) δ i l + 1 \Delta_{ij}^{(l)}+a_j^{(l)}\delta_i^{l+1} Δij(l)+aj(l)δil+1
    求出偏导数 D i j ( l ) = 1 m Δ i j ( l ) + λ θ i j ( l ) D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\lambda\theta_{ij}^{(l)} Dij(l)=m1Δij(l)+λθij(l)
  3. 利用偏导数进行梯度下降

7. 神经网络训练

A. 参数随机化,若 Θ \Theta Θ全为0,会导致全为相同值,所以必须初始化初始值为随机值。

B. 利用正向传播计算所有层的值 a ( l ) a^{(l)} a(l)

C. 计算此时的代价函数 J ( Θ ) J(\Theta) J(Θ)

D. 利用后向传播计算所有偏导数

E. 利用数值检验法检验偏导数( D ≈ J ( Θ + ϵ ) − J ( Θ − ϵ ) 2 ϵ D\approx\frac{J(\Theta+\epsilon)-J(\Theta-\epsilon)}{2\epsilon} D2ϵJ(Θ+ϵ)J(Θϵ)

F. 利用优化算法最小化代价函数(梯度下降)

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值