Cost function of Logistic Regression and Neural Network

Logistic / Sigmoid function

g(x)=11+ex=ex1+ex g ( x ) = 1 1 + e − x = e x 1 + e x

Cost function

Logistic Regression

hθ(X)=f(Xθ)=P(y=1|X;θ) h θ ( X ) = f ( X ⊺ θ ) = P ( y = 1 | X ; θ )
z=Xθ, z = X ⊺ θ ,
lnP(y=y|X;θ) ln ⁡ P ( y = y | X ; θ )
=ylnP(y=1|X;θ)+(1y)lnP(y=0|X;θ) = y ln ⁡ P ( y = 1 | X ; θ ) + ( 1 − y ) ln ⁡ P ( y = 0 | X ; θ )
=ylnhθ(X)+(1y)ln[1hθ(X)] = y ln ⁡ h θ ( X ) + ( 1 − y ) ln ⁡ [ 1 − h θ ( X ) ]
=ylng(z)+(1y)ln[1g(z)] = y ln ⁡ g ( z ) + ( 1 − y ) ln ⁡ [ 1 − g ( z ) ]
因此 dlnP(y=y|X;θ)=ydlng(z)+(1y)dln[1g(z)] d ⁡ ln ⁡ P ( y = y | X ; θ ) = y d ⁡ ln ⁡ g ( z ) + ( 1 − y ) d ⁡ ln ⁡ [ 1 − g ( z ) ]
=y1g(z)g(z)[1g(z)]dz+(1y)11g(z)(1)g(z)[1g(z)]dz = y ⋅ 1 g ( z ) g ( z ) [ 1 − g ( z ) ] d ⁡ z + ( 1 − y ) 1 1 − g ( z ) ( − 1 ) g ( z ) [ 1 − g ( z ) ] d ⁡ z
={y[1g(z)](1y)g(z)}dz = { y ⋅ [ 1 − g ( z ) ] − ( 1 − y ) g ( z ) } d ⁡ z
=[yg(z)]dz = [ y − g ( z ) ] d ⁡ z
=[yg(Xθ)]Xdθ = [ y − g ( X ⊺ θ ) ] X ⊺ d ⁡ θ
最大似然函数 L(θ)=ln[i=1mP(y=yi|Xi;θ)]=i=1mlnP(y=yi|Xi;θ) L ⁡ ( θ ) = ln ⁡ [ ∏ i = 1 m P ( y = y i | X i ; θ ) ] = ∑ i = 1 m ln ⁡ P ( y = y i | X i ; θ )
cost(θ)=1mL(θ)=1mi=1mlnP(y=yi|Xi;θ) cost ⁡ ( θ ) = − 1 m L ⁡ ( θ ) = − 1 m ∑ i = 1 m ln ⁡ P ( y = y i | X i ; θ )
=1mi=1m{yilnhθ(Xi)+(1yi)ln[1hθ(Xi)]} = − 1 m ∑ i = 1 m { y i ln ⁡ h θ ( X i ) + ( 1 − y i ) ln ⁡ [ 1 − h θ ( X i ) ] }
=1mi=1m{yilng(zi)+(1yi)ln[1g(zi)]} = − 1 m ∑ i = 1 m { y i ln ⁡ g ( z i ) + ( 1 − y i ) ln ⁡ [ 1 − g ( z i ) ] } ,其中 zi=Xiθ z i = X i ⊺ θ
maxL(θ)=mmincost(θ) max L ⁡ ( θ ) = − m min cost ⁡ ( θ )
cost(θ) cost ⁡ ( θ ) 即为代价函数。
g(θ)=L(θ) g ( θ ) = − L ⁡ ( θ )
d[g(θ)]=i=1m[yig(Xiθ)]Xidθ d ⁡ [ g ( θ ) ] = − ∑ i = 1 m [ y i − g ( X i ⊺ θ ) ] X i ⊺ d ⁡ θ
=i=1m[g(Xiθ)yi]Xidθ = ∑ i = 1 m [ g ( X i ⊺ θ ) − y i ] X i ⊺ d ⁡ θ
因此 [g(θ)]=i=1m[g(Xiθ)yi]Xi ∇ [ g ( θ ) ] = ∑ i = 1 m [ g ( X i ⊺ θ ) − y i ] X i
=X[g(Xθ)y] = X ⊺ [ g ( X ⊺ θ ) − y ]
其中 X=X1Xm,y=y1ym,g(Xθ)=g(X1θ)g(Xmθ), X = ( X 1 ⊺ ⋮ X m ⊺ ) , y = ( y 1 ⊺ ⋮ y m ⊺ ) , g ( X ⊺ θ ) = ( g ( X 1 ⊺ θ ) ⋮ g ( X m ⊺ θ ) ) ,
d{[g(θ)]}=i=1md[g(Xiθ)]Xi d ⁡ { ∇ [ g ( θ ) ] } = ∑ i = 1 m d ⁡ [ g ( X i ⊺ θ ) ] X i
=i=1mg(Xiθ)(Xidθ)Xi = ∑ i = 1 m g ′ ( X i ⊺ θ ) ( X i ⊺ d ⁡ θ ) X i
=i=1mg(Xiθ)XiXidθ = ∑ i = 1 m g ′ ( X i ⊺ θ ) X i X i ⊺ d ⁡ θ
因此 Hg(θ)=i=1mg(Xiθ)XiXi H g ( θ ) = ∑ i = 1 m g ′ ( X i ⊺ θ ) X i X i ⊺

θjg(θ)=i=1m[g(Xiθ)yi]xij,jN,1jn ∂ ∂ θ j g ( θ ) = ∑ i = 1 m [ g ( X i ⊺ θ ) − y i ] x i j , j ∈ N , 1 ≤ j ≤ n

Regularized Logistic Regression

cost(θ)=1mi=1m{yilnhθ(Xi)+(1yi)ln[1hθ(Xi)]}+λ2nj=1nθ2j cost ⁡ ( θ ) = − 1 m ∑ i = 1 m { y i ln ⁡ h θ ( X i ) + ( 1 − y i ) ln ⁡ [ 1 − h θ ( X i ) ] } + λ 2 n ∑ j = 1 n θ j 2

Hcost(θ)=i=1mg(Xiθ)XiXi+λ2n011 H cost ⁡ ( θ ) = ∑ i = 1 m g ′ ( X i ⊺ θ ) X i X i ⊺ + λ 2 n ( 0 1 ⋱ 1 )

性质

Hcost(θ) H cost ⁡ ( θ ) 为正定矩阵。

证明

Z=z0znRn+1, ∀ Z = ( z 0 ⋮ z n ) ∈ R n + 1 ,
ZHcost(θ)Z=i=1mg(Xiθ)ZXiXiZ+λ2nj=1nz2j Z ⊺ H cost ⁡ ( θ ) ⁡ Z = ∑ i = 1 m g ′ ( X i ⊺ θ ) Z ⊺ X i X i ⊺ Z + λ 2 n ∑ j = 1 n z j 2
=i=1mg(Xiθ)(XiZ)2+λ2nj=1nz2j0 = ∑ i = 1 m g ′ ( X i ⊺ θ ) ( X i ⊺ Z ) 2 + λ 2 n ∑ j = 1 n z j 2 ≥ 0
ZHcost(θ)Z=0, Z ⊺ H cost ⁡ ( θ ) ⁡ Z = 0 , jN,1jn,zj=0, ∀ j ∈ N , 1 ≤ j ≤ n , z j = 0 ,
于是 ZHcost(θ)Z=i=1mg(Xiθ)z02=0z0=0 Z ⊺ H cost ⁡ ( θ ) ⁡ Z = ∑ i = 1 m g ′ ( X i ⊺ θ ) z 0 2 = 0 ⇒ z 0 = 0
于是 Z=0 Z = 0
因此 Hcost(θ) H cost ⁡ ( θ ) 为正定矩阵。

Neural Network for Classification

cost(θ)=1mi=1mk=1K{yik(lnhθ(Xi))k+(1yik)(ln[1hθ(Xi)])k} cost ⁡ ( θ ) = − 1 m ∑ i = 1 m ∑ k = 1 K { y i k ( ln ⁡ h θ ( X i ) ) k + ( 1 − y i k ) ( ln ⁡ [ 1 − h θ ( X i ) ] ) k }
               +λ2ml=1L1i=1sl+1j=1slθ2lij + λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l + 1 ∑ j = 1 s l θ l i j 2

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值