Backpropagation Algorithm 的梯度

损失函数 J(θ) J ⁡ ( θ )

J(θ)=1mi=1mk=1K[y(i)kln(hθ(X(i))k)+(1y(i)k)ln(1hθ(X(i))k)] J ⁡ ( θ ) = − 1 m ∑ i = 1 m ∑ k = 1 K [ y k ( i ) ln ⁡ ( h θ ( X ( i ) ) k ) + ( 1 − y k ( i ) ) ln ⁡ ( 1 − h θ ( X ( i ) ) k ) ]
              +λ2ml=1L1i=1sl+1j=1sl(θ(l)i,j)2 + λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l + 1 ∑ j = 1 s l ( θ i , j ( l ) ) 2

λ=0 λ = 0 时的单样本损失函数 cost(θ;X,Y) cost ⁡ ( θ ; X , Y )

λ=0 λ = 0 时,单一样本 X=x1xs1,Y=y1yK X = ( x 1 ⋮ x s 1 ) , Y = ( y 1 ⋮ y K ) 的损失函数:
cost(θ;X,Y)=k=1K[ykln(hθ(X)k)+(1yk)ln(1hθ(X)k)] cost ⁡ ( θ ; X , Y ) = − ∑ k = 1 K [ y k ln ⁡ ( h θ ( X ) k ) + ( 1 − y k ) ln ⁡ ( 1 − h θ ( X ) k ) ]

a(1)=X a ( 1 ) = X
Z(l+1)=θ(l)a(l),1lL1 Z ( l + 1 ) = θ ( l ) a ( l ) , 1 ≤ l ≤ L − 1
a(l)=g(Z(l)),2lL, a ( l ) = g ( Z ( l ) ) , 2 ≤ l ≤ L , 其中函数 g g 是 Logistic 函数。
a(L)=hθ(X)
于是 cost(θ;X,Y)=k=1K[yklna(L)k+(1yk)ln(1a(L)k)] cost ⁡ ( θ ; X , Y ) = − ∑ k = 1 K [ y k ln ⁡ a k ( L ) + ( 1 − y k ) ln ⁡ ( 1 − a k ( L ) ) ]
J(θ)=1mi=1mcost(θ;X(i),Y(i))+λ2ml=1L1i=1sl+1j=1sl(θ(l)i,j)2 J ⁡ ( θ ) = 1 m ∑ i = 1 m cost ⁡ ( θ ; X ( i ) , Y ( i ) ) + λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l + 1 ∑ j = 1 s l ( θ i , j ( l ) ) 2

cost(θ;X,Y) cost ⁡ ( θ ; X , Y ) 关于 Z(l) Z ( l ) 的梯度

δ(l)=Z(l)cost(θ;X,Y)=z(l)1cost(θ;X,Y)z(l)slcost(θ;X,Y),2lL, δ ( l ) = ∂ ∂ Z ( l ) cost ⁡ ( θ ; X , Y ) = ( ∂ ∂ z 1 ( l ) cost ⁡ ( θ ; X , Y ) ⋮ ∂ ∂ z s l ( l ) cost ⁡ ( θ ; X , Y ) ) , 2 ≤ l ≤ L ,
δ(l)={a(L)Y,(θ(l))δ(l+1) . a(l) . (1a(l)),l=L,2lL1, δ ( l ) = { a ( L ) − Y , l = L , ( θ ( l ) ) ⊺ δ ( l + 1 )   . ∗   a ( l )   . ∗   ( 1 − a ( l ) ) , 2 ≤ l ≤ L − 1 ,
其中运算符  .    . ∗   为 element-wise 的乘积,如 x1xn . y1yn=x1y1xnyn ( x 1 ⋮ x n )   . ∗   ( y 1 ⋮ y n ) = ( x 1 y 1 ⋮ x n y n )

证明

命题等价于:
δ(l)j=a(L)jyj,[i=1sl+1θ(l)i,jδ(l+1)i]δ(l)j(1a(l)j),l=L,2lL1,1jsl δ j ( l ) = { a j ( L ) − y j , l = L , [ ∑ i = 1 s l + 1 θ i , j ( l ) δ i ( l + 1 ) ] ⋅ δ j ( l ) ( 1 − a j ( l ) ) , 2 ≤ l ≤ L − 1 , 1 ≤ j ≤ s l

{Z(l+1)=θ(l)a(l),a(l)=g(Z(l)),1lL1,2lL, { Z ( l + 1 ) = θ ( l ) a ( l ) , 1 ≤ l ≤ L − 1 , a ( l ) = g ( Z ( l ) ) , 2 ≤ l ≤ L , 得:
z(l+1)ia(l)j=θ(l)i,j,1lL1,da(l)jdz(l)j=g(z(l)j)=a(l)j(1a(l)j),2lL, { ∂ z i ( l + 1 ) ∂ a j ( l ) = θ i , j ( l ) , 1 ≤ l ≤ L − 1 , d ⁡ a j ( l ) d ⁡ z j ( l ) = g ′ ( z j ( l ) ) = a j ( l ) ( 1 − a j ( l ) ) , 2 ≤ l ≤ L ,
因此 z(l+1)iz(l)j=θ(l)i,ja(l)j(1a(l)j),2lL1, ∂ z i ( l + 1 ) ∂ z j ( l ) = θ i , j ( l ) a j ( l ) ( 1 − a j ( l ) ) , 2 ≤ l ≤ L − 1 ,
所以 δ(l)j=i=1sl+1δ(l+1)iz(l+1)iz(l)j δ j ( l ) = ∑ i = 1 s l + 1 δ i ( l + 1 ) ∂ z i ( l + 1 ) ∂ z j ( l )
               =i=1sl+1δ(l+1)iθ(l)i,ja(l)j(1a(l)j) = ∑ i = 1 s l + 1 δ i ( l + 1 ) θ i , j ( l ) a j ( l ) ( 1 − a j ( l ) )
               =[i=1sl+1θ(l)i,jδ(l+1)i]δ(l)j(1a(l)j),2lL1, = [ ∑ i = 1 s l + 1 θ i , j ( l ) δ i ( l + 1 ) ] ⋅ δ j ( l ) ( 1 − a j ( l ) ) , 2 ≤ l ≤ L − 1 ,

由于 a(L)kcost(θ;X,Y)=[yk1a(L)k(1yk)11a(L)k] ∂ ∂ a k ( L ) cost ⁡ ( θ ; X , Y ) = − [ y k 1 a k ( L ) − ( 1 − y k ) 1 1 − a k ( L ) ]
                                            =(yka(L)k)1a(L)k(1a(L)k) = − ( y k − a k ( L ) ) 1 a k ( L ) ( 1 − a k ( L ) )
                                            =(a(L)kyk)1a(L)k(1a(L)k),1ksL=K = ( a k ( L ) − y k ) 1 a k ( L ) ( 1 − a k ( L ) ) , 1 ≤ k ≤ s L = K
因此 (δ(L))j=aL,jcost(θ;X,Y)da(L)jdzL,j ( δ ( L ) ) j = ∂ ∂ a L , j cost ⁡ ( θ ; X , Y ) d ⁡ a j ( L ) d ⁡ z L , j
                      =(a(L)jyj)1a(L)j(1a(L)j)a(L)j(1a(L)j) = ( a j ( L ) − y j ) 1 a j ( L ) ( 1 − a j ( L ) ) a j ( L ) ( 1 − a j ( L ) )
                      =a(L)jyj,1jsL = a j ( L ) − y j , 1 ≤ j ≤ s L
因此,命题成立。

cost(θ;X,Y) cost ⁡ ( θ ; X , Y ) 关于 θ θ 的梯度

θ(l)i,jcost(θ;X,Y)=δ(l+1)ia(l)j,1l<L1 ∂ ∂ θ i , j ( l ) cost ⁡ ( θ ; X , Y ) = δ i ( l + 1 ) a j ( l ) , 1 ≤ l < L − 1

证明

z(l+1)iθ(l)i,j=a(l)j,1lL1, ∂ z i ( l + 1 ) ∂ θ i , j ( l ) = a j ( l ) , 1 ≤ l ≤ L − 1 ,
θ(l)i,jcost(θ;X,Y)=δ(l+1)iz(l+1)iθ(l)i,j=δ(l+1)ia(l)j,1l<L1 ∂ ∂ θ i , j ( l ) cost ⁡ ( θ ; X , Y ) = δ i ( l + 1 ) ∂ z i ( l + 1 ) ∂ θ i , j ( l ) = δ i ( l + 1 ) a j ( l ) , 1 ≤ l < L − 1

推论

θ(l)cost(θ;X,Y)=δ(l+1)(a(l)),1l<L1 ∂ ∂ θ ( l ) cost ⁡ ( θ ; X , Y ) = δ ( l + 1 ) ( a ( l ) ) ⊺ , 1 ≤ l < L − 1

损失函数 J(θ) J ⁡ ( θ ) 关于 θ θ 的梯度

tN,1tm, ∀ t ∈ N , 1 ≤ t ≤ m ,
a(t,1)=X(t), a ( t , 1 ) = X ( t ) ,
Z(t,l+1)=θ(l)a(t,l),1lL1, Z ( t , l + 1 ) = θ ( l ) a ( t , l ) , 1 ≤ l ≤ L − 1 ,
a(t,l)=g(Z(t,l)),2lL, a ( t , l ) = g ( Z ( t , l ) ) , 2 ≤ l ≤ L ,
a(t,L)=hθ(X(t)) a ( t , L ) = h θ ( X ( t ) )
δ(t,l)=Z(t,l)cost(θ;X(t),Y(t))=z(t,l)1cost(θ;X(t),Y(t))z(t,l)slcost(θ;X(t),Y(t)),2lL, δ ( t , l ) = ∂ ∂ Z ( t , l ) cost ⁡ ( θ ; X ( t ) , Y ( t ) ) = ( ∂ ∂ z 1 ( t , l ) cost ⁡ ( θ ; X ( t ) , Y ( t ) ) ⋮ ∂ ∂ z s l ( t , l ) cost ⁡ ( θ ; X ( t ) , Y ( t ) ) ) , 2 ≤ l ≤ L ,
δ(t,l)={a(t,L)Y(t),(θ(l))δ(t,l+1) . a(t,l) . (1a(t,l)),l=L,2lL1, δ ( t , l ) = { a ( t , L ) − Y ( t ) , l = L , ( θ ( l ) ) ⊺ δ ( t , l + 1 )   . ∗   a ( t , l )   . ∗   ( 1 − a ( t , l ) ) , 2 ≤ l ≤ L − 1 ,
于是 θ(l)i,jcost(θ;X(t),Y(t))=δ(t,l+1)ia(t,l)j,1l<L1 ∂ ∂ θ i , j ( l ) cost ⁡ ( θ ; X ( t ) , Y ( t ) ) = δ i ( t , l + 1 ) a j ( t , l ) , 1 ≤ l < L − 1
因此  θ(l)i,jJ(θ)=1mt=1mθ(l)i,jcost(θ;X(t),Y(t))+λmθ(l)i,j   ∂ ∂ θ i , j ( l ) J ⁡ ( θ ) = 1 m ∑ t = 1 m ∂ ∂ θ i , j ( l ) cost ⁡ ( θ ; X ( t ) , Y ( t ) ) + λ m θ i , j ( l )
                           =1mi=1mδ(t,l+1)ia(t,l)j+λmθ(l)i,j,1lL1 = 1 m ∑ i = 1 m δ i ( t , l + 1 ) a j ( t , l ) + λ m θ i , j ( l ) , 1 ≤ l ≤ L − 1

推论

 θ(l)J(θ)=1mi=1mδ(t,l+1)(a(t,l))+λmθ(l),1lL1   ∂ ∂ θ ( l ) J ⁡ ( θ ) = 1 m ∑ i = 1 m δ ( t , l + 1 ) ( a ( t , l ) ) ⊺ + λ m θ ( l ) , 1 ≤ l ≤ L − 1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值