逻辑回归问题运用梯度下降法代价函数的求导

逻辑回归代价函数的求导过程没有具体展开,在此推导并记录:

逻辑回归的代价函数可以统一写成如下一个等式:
J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] J(\theta ) = -\frac{1}{m}\left[\sum_{i=1}^{m}y^{(i)}log(h_\theta (x^{(i)}))+(1-y^{(i)})log(1-h_\theta (x^{(i)})) \right] J(θ)=m1[i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]
其中: h θ ( x ( i ) ) = 1 1 + e − θ T x h_\theta (x^{(i)}) = \frac{1}{1+e^{-\theta^\mathrm {T} x}} hθ(x(i))=1+eθTx1

为了避免求导过程太冗长复杂,我们做一些显示的简化:

J ( θ ) = − 1 m [ ∑ i = 1 m K ( θ ) ] J(\theta ) = -\frac{1}{m}\left[\sum_{i=1}^{m}K(\theta)\right] J(θ)=m1[i=1mK(θ)]

其中: K ( θ ) = y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) K(\theta) = y^{(i)}log(h_\theta (x^{(i)}))+(1-y^{(i)})log(1-h_\theta (x^{(i)})) K(θ)=y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))) h θ ( x ( i ) ) = 1 1 + e − θ T x h_\theta (x^{(i)}) = \frac{1}{1+e^{-\theta^\mathrm {T} x}} hθ(x(i))=1+eθTx1

OK,下面开始我们的推导过程:如果要求 J ( θ ) J(\theta) J(θ)
对某一个参数 θ \theta θ的偏导数,则:

(1)根据求导公式,可以先把常数项 − 1 m ∑ i = 1 m -\frac{1}{m}\sum_{i=1}^{m} m1i=1m

提取出来,这样就只需要对求和符号内部的表达式求导,即:

J ( θ ) ′ = − 1 m [ ∑ i = 1 m K ( θ ) ′ ] J(\theta ){}' = -\frac{1}{m}\left[\sum_{i=1}^{m}K(\theta){}'\right] J(θ)=m1[i=1mK(θ)]

K ( θ ) ′ = ( y l o g ( h θ ( x ) ) + ( 1 − y ) l o g ( 1 − h θ ( x ) ) ) ′ K(\theta){}' = \left(ylog(h_\theta (x))+(1-y)log(1-h_\theta (x))\right ){}' K(θ)=(ylog(hθ(x))+(1y)log(1hθ(x)))

(为方便显示,先把右上角表示第i个样本的上标去掉)

(2)根据对数复合求导公式, l o g ( x ) ′ = 1 x x ′ log(x){}' = \frac{1}{x}x{}' log(x)=x1x,对 K ( θ ) K(\theta) K(θ)

继续求导可得:

K ( θ ) ′ = y 1 h θ ( x ) h θ ( x ) ′ + ( 1 − y ) 1 1 − h θ ( x ) ( 1 − h θ ( x ) ) ′ K(\theta){}' = y\frac{1}{h_\theta (x)}h_\theta (x){}'+(1-y)\frac{1}{1-h_\theta (x)}(1-h_\theta (x)){}' K(θ)=yhθ(x)1hθ(x)+(1y)1hθ(x)1(1hθ(x))

(3)根据幂函数复合求导公式, ( y x ) ′ = x y x − 1 x ′ (y^{x}){}' = xy^{x-1}x{}' (yx)=xyx1x
,及以e为底的指数求导公式,对 h θ ( x ) h_\theta (x) hθ(x)

继续求导可得:

h θ ( x ) ′ = ( 1 1 + e − θ T x ) ′ = − ( 1 + e − θ T x ) ′ ( 1 + e − θ T x ) 2 = e − θ T x ( θ T x ) ′ ( 1 + e − θ T x ) 2 = ( 1 1 + e − θ T x ( 1 − 1 1 + e − θ T x ) ) ( θ T x ) ′ = h θ ( x ) ( 1 − h θ ( x ) ) ( θ T x ) ′ h_\theta (x){}' = \left( \frac{1}{1+e^{-\theta^\mathrm {T} x}} \right){}'=-\frac{(1+e^{-\theta^\mathrm {T} x}){}'}{(1+e^{-\theta^\mathrm {T} x})^{2}} = \frac{e^{-\theta^\mathrm {T}x}(\theta^\mathrm {T} x){}'}{(1+e^{-\theta^\mathrm {T} x})^{2}} = \left(\frac{1}{1+e^{-\theta^\mathrm{T}x}}(1-\frac{1}{1+e^{-\theta^\mathrm{T}x}})\right)(\theta^\mathrm{T}x){}' = h_\theta(x)(1-h_\theta(x))(\theta^\mathrm{T}x){}' hθ(x)=(1+eθTx1)=(1+eθTx)2(1+eθTx)=(1+eθTx)2eθTx(θTx)=(1+eθTx1(11+eθTx1))(θTx)=hθ(x)(1hθ(x))(θTx)

同理, ( 1 − h θ ( x ) ) ′ = − e − θ T x ( θ T x ) ′ ( 1 + e − θ T x ) 2 = − h θ ( x ) ( 1 − h θ ( x ) ) ( θ T x ) ′ (1-h_\theta (x)){}'= -\frac{e^{-\theta^\mathrm {T}x}(\theta^\mathrm {T} x){}'}{(1+e^{-\theta^\mathrm {T} x})^{2}} = -h_\theta(x)(1-h_\theta(x))(\theta^\mathrm{T}x){}' (1hθ(x))=(1+eθTx)2eθTx(θTx)=hθ(x)(1hθ(x))(θTx)
(4)把步骤3的结果带入步骤2,化简后可得:

K ( θ ) ′ = ( y − h θ ( x ) ) ( θ T x ) ′ K(\theta){}' = (y-h_\theta(x))(\theta^\mathrm{T}x){}' K(θ)=(yhθ(x))(θTx)

再把上面结果带入步骤1,化简后可得:

J ( θ ) ′ = 1 m [ ∑ i = 1 m ( h θ ( x ) − y ) ( θ T x ) ′ ] J(\theta){}' = \frac{1}{m}\left[\sum_{i=1}^{m}(h_\theta(x)-y)(\theta^\mathrm{T}x){}'\right] J(θ)=m1[i=1m(hθ(x)y)(θTx)]

最后 ( θ T x ) ′ (\theta^\mathrm{T}x){}' (θTx),对第j个 θ \theta θ求偏导,结果即 X j X_{j} Xj,j表示样本中第几项),得到最终结果:

∂ J ( θ ) ∂ θ j = 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] \frac{\partial J(\theta)}{\partial \theta_{j}} = \frac{1}{m}\left[\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_{j}^{(i)}\right] θjJ(θ)=m1[i=1m(hθ(x(i))y(i))xj(i)]

  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值