机器学习(三)——逻辑回归(二元分类)

3.逻辑回归(二元分类)

逻辑回归是一个二分类问题,所以我们需要将预测的结果,映射到{0,1}之上。所以针对 θ T x θ^Tx θTx的值,我们修改为:
h θ ( x ) = 1 1 + e − θ T x h_θ(x)=\frac{1}{1+e^{-θ^Tx}} hθ(x)=1+eθTx1
θ T x θ^Tx θTx趋近正无穷时, h θ ( x ) h_θ(x) hθ(x)趋近于1。当 θ T x θ^Tx θTx趋近负无穷时, h θ ( x ) h_θ(x) hθ(x)趋近于0。则概率可写成如下形式:
P ( y = 1 ∣ x ; θ ) = h θ ( x ) P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) \begin{aligned} P(y=1|x;θ)&=h_θ(x)\\ P(y=0|x;θ)&=1-h_θ(x) \end{aligned} P(y=1x;θ)P(y=0x;θ)=hθ(x)=1hθ(x)
也可写成:
P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y P(y|x;θ)=(h_θ(x))^{y}(1-h_θ(x))^{1-y} P(yx;θ)=(hθ(x))y(1hθ(x))1y
此时我们使用,最大似然值来计算 θ θ θ(条件概率的连乘)。并且对似然函数取对数,将连乘化为相加。
l ( θ ) = l n [ L ( θ ) ] = ∑ i = 1 m { y ( i ) l o g [ h ( x ( i ) ] + ( 1 − y ( i ) ) l o g [ 1 − h ( x ( i ) ) ] } \begin{aligned} l(θ)&=ln[L(θ)]\\ &=\sum^m_{i=1}\left\{y^{(i)}log[h(x^{(i)}]+(1−y^{(i)})log[1−h(x^{(i)})]\right\} \end{aligned} l(θ)=ln[L(θ)]=i=1m{y(i)log[h(x(i)]+(1y(i))log[1h(x(i))]}
为了求似然函数的最大值,我们使用梯度上升法(沿着梯度的方向向上是增长最快的方向,下降也是如此)。
θ j = θ j + α ∂ l ( θ ) ∂ θ j θ_j=θ_j+α\frac{\partial l(θ)}{\partial θ_j} θj=θj+αθjl(θ)
所以我们需要求 l ( θ ) l(θ) l(θ) θ θ θ的梯度:

其中
∂ h ( x ( i ) ) ∂ θ j = e − θ T x ( i ) ( 1 + e − θ T x ( i ) ) 2 x j ( i ) = h ( x ( i ) ) [ 1 − h ( x ( i ) ) ] x j ( i ) \begin{aligned} \frac{\partial h(x^{(i)})}{\partial θ_j}&=\frac{e^{-θ^Tx^{(i)}}}{(1+e^{-θ^Tx^{(i)}})^2}x^{(i)}_j\\ &=h(x^{(i)})[1-h(x^{(i)})]x^{(i)}_j \end{aligned} θjh(x(i))=(1+eθTx(i))2eθTx(i)xj(i)=h(x(i))[1h(x(i))]xj(i)

  • 直接对元素求导
    ∂ l ( θ ) ∂ θ j = ∑ i = 1 m ∂ ∂ θ j { y ( i ) l o g [ h ( x ( i ) ] + ( 1 − y ( i ) ) l o g [ 1 − h ( x ( i ) ) ] } = ∑ i = 1 m [ ( y ( i ) h ( x ( i ) ) − 1 − y ( i ) 1 − h ( x ( i ) ) ) ∂ h ( x ( i ) ) ∂ θ j ] = ∑ i = 1 m [ ( y ( i ) ( 1 − h ( x ( i ) ) − ( 1 − y ( i ) ) h ( x ( i ) ) ) x j ( i ) ] = ∑ i = 1 m [ ( y ( i ) − h ( x ( i ) ) ) x j ( i ) ] \begin{aligned} \frac{\partial l(θ)}{\partial θ_j} &=\sum^m_{i=1}\frac{\partial }{\partial θ_j}\left\{y^{(i)}log[h(x^{(i)}]+(1−y^{(i)})log[1−h(x^{(i)})]\right\}\\ &=\sum^m_{i=1}\left[\left(\frac{y^{(i)}}{h(x^{(i)})}-\frac{1-y^{(i)}}{1-h(x^{(i)})}\right)\frac{\partial h(x^{(i)})}{\partial θ_j}\right]\\ &=\sum^m_{i=1}\left[\left(y^{(i)}(1-h(x^{(i)})-(1-y^{(i)})h(x^{(i)})\right)x^{(i)}_j\right]\\ &=\sum^m_{i=1}\left[\left(y^{(i)}-h(x^{(i)})\right)x^{(i)}_j\right]\\ \end{aligned} θjl(θ)=i=1mθj{y(i)log[h(x(i)]+(1y(i))log[1h(x(i))]}=i=1m[(h(x(i))y(i)1h(x(i))1y(i))θjh(x(i))]=i=1m[(y(i)(1h(x(i))(1y(i))h(x(i)))xj(i)]=i=1m[(y(i)h(x(i)))xj(i)]

  • 对矩阵求导

    令:
    X = [ — ( x ( 1 ) ) T — — ( x ( 2 ) ) T — ⋮ — ( x ( m ) ) T — ] , θ = [ θ 0 θ 1 ⋮ θ n ] , y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] X=\left[ \begin{matrix} —(x^{(1)})^T—\\ —(x^{(2)})^T—\\ \vdots\\ —(x^{(m)})^T— \end{matrix} \right] ,θ=\left[ \begin{matrix} θ_0\\ θ_1\\ \vdots\\ θ_n \end{matrix} \right], y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)} \end{matrix} \right] X=(x(1))T(x(2))T(x(m))T,θ=θ0θ1θn,y=y(1)y(2)y(m)
    则我们可以知道:
    h θ ( x ) = 1 1 + e − X θ h_{θ}(x)=\frac{1}{1+e^{-Xθ}} hθ(x)=1+eXθ1
    所以 l ( θ ) l(θ) l(θ)可以写成:
    l ( θ ) = y T l o g [ h θ ( x ) ] + ( 1 − y ) T l o g [ 1 − h θ ( x ) ] = ( y − 1 ) T X θ − 1 T l o g ( 1 + e − X θ ) \begin{aligned} l(θ)&=y^Tlog[h_{θ}(x)]+(1-y)^Tlog[1-h_{θ}(x)]\\ &=(y-1)^TXθ-\mathbf 1^Tlog(1+e^{-Xθ}) \end{aligned} l(θ)=yTlog[hθ(x)]+(1y)Tlog[1hθ(x)]=(y1)TXθ1Tlog(1+eXθ)
    我们令 l 1 = ( y − 1 ) T X θ , l 2 = 1 T l o g ( 1 + e − X θ ) l_1=(y-1)^TXθ,l_2=\mathbf 1^Tlog(1+e^{-Xθ}) l1=(y1)TXθl2=1Tlog(1+eXθ),则微分为:
    d ( l ) = d ( l 1 ) − d ( l 2 ) \begin{aligned} d(l)&=d(l_1)-d(l_2)\\ \end{aligned} d(l)=d(l1)d(l2)
    所以:
    d ( l 1 ) = ( y − 1 ) T X d ( θ ) d(l_1)=(y-1)^TXd(θ)\\ d(l1)=(y1)TXd(θ)
    下面我们来求 d ( l 2 ) d(l_2) d(l2),令 w = 1 + e a , a = − X θ w=1+e^{a},a=-Xθ w=1+ea,a=Xθ:
    d ( l 2 ) = t r [ 1 T d [ l o g ( w ) ] ] = t r [ 1 T ( 1 w ⊙ d ( w ) ) ] = t r [ ( 1 ⊙ 1 w ) T d ( w ) ] = t r [ ( 1 w ) T d ( w ) ] = t r [ ( ∂ l 2 ∂ w ) T d ( w ) ] \begin{aligned} d(l_2)&=tr\left[1^Td[log(w)]\right]\\ &=tr\left[1^T\left(\frac{1}{w}\odot d(w)\right)\right]\\ &=tr\left[\left(1\odot\frac{1}{w}\right)^T d(w)\right]\\ &=tr\left[(\frac{1}{w})^T d(w)\right]=tr\left[(\frac{\partial l_2}{\partial w})^T d(w)\right]\\ \end{aligned} d(l2)=tr[1Td[log(w)]]=tr[1T(w1d(w))]=tr[(1w1)Td(w)]=tr[(w1)Td(w)]=tr[(wl2)Td(w)]
    所以我们可以得出
    ∂ l 2 ∂ w = 1 w \frac{\partial l_2}{\partial w}=\frac{1}{w} wl2=w1
    又因为:
    d ( l 2 ) = t r [ ( ∂ l 2 ∂ w ) T d ( w ) ] = t r [ ( ∂ l 2 ∂ w ) T ( e a ⊙ d ( w ) ) ] = t r [ ( ∂ l 2 ∂ w ⊙ e a ) T d ( a ) ] = t r [ ( ∂ l 2 ∂ a ) T d ( a ) ] \begin{aligned} d(l_2)&=tr\left[(\frac{\partial l_2}{\partial w})^T d(w)\right]\\ &=tr\left[(\frac{\partial l_2}{\partial w})^T\left( e^a \odot d(w)\right)\right]\\ &=tr\left[\left(\frac{\partial l_2}{\partial w}\odot e^a\right)^T d(a)\right]=tr\left[(\frac{\partial l_2}{\partial a})^T d(a)\right]\\ \end{aligned} d(l2)=tr[(wl2)Td(w)]=tr[(wl2)T(ead(w))]=tr[(wl2ea)Td(a)]=tr[(al2)Td(a)]
    所以我们可以得出
    ∂ l 2 ∂ a = ∂ l 2 ∂ w ⊙ e a = e a w \frac{\partial l_2}{\partial a}=\frac{\partial l_2}{\partial w}\odot e^a=\frac{e^a}{w} al2=wl2ea=wea
    又因为:
    d ( l 2 ) = t r [ ( ∂ l 2 ∂ a ) T d ( a ) ] = t r [ ( ∂ l 2 ∂ a ) T ( − X ) d ( θ ) ] \begin{aligned} d(l_2)&=tr\left[(\frac{\partial l_2}{\partial a})^T d(a)\right]\\ &=tr\left[(\frac{\partial l_2}{\partial a})^T (-X)d(θ)\right]\\ \end{aligned} d(l2)=tr[(al2)Td(a)]=tr[(al2)T(X)d(θ)]
    因此我们可以得出:
    d ( l 2 ) = − ( e − X θ 1 + e − X θ ) T X d ( θ ) d(l_2)=-\left(\frac{e^{-Xθ}}{1+e^{-Xθ}}\right)^TXd(θ) d(l2)=(1+eXθeXθ)TXd(θ)
    所以:
    d ( l ) = d ( l 1 ) − d ( l 2 ) = ( y − 1 ) T X d ( θ ) + ( e − X θ 1 + e − X θ ) T X d ( θ ) = t r [ ( y − 1 1 + e − X θ ) T X d ( θ ) ] = t r [ ( ∂ l ∂ θ ) T d ( θ ) ] \begin{aligned} d(l)=d(l_1)-d(l_2)&=(y-1)^TXd(θ)+\left(\frac{e^{-Xθ}}{1+e^{-Xθ}}\right)^TXd(θ)\\ &=tr\left[\left(y-\frac{1}{1+e^{-Xθ}}\right)^TXd(θ)\right]=tr\left[\left(\frac{\partial l}{\partial θ}\right)^Td(θ)\right] \end{aligned} d(l)=d(l1)d(l2)=(y1)TXd(θ)+(1+eXθeXθ)TXd(θ)=tr[(y1+eXθ1)TXd(θ)]=tr[(θl)Td(θ)]
    最终我们可以得到:
    ∂ l ∂ θ = X T ( y − 1 1 + e − X θ ) \frac{\partial l}{\partial θ}=X^T\left(y-\frac{1}{1+e^{-Xθ}}\right) θl=XT(y1+eXθ1)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值