逻辑斯蒂回归法二元分类

返回目录

预测值为0或者1的离散序列。
x ⃗ \vec x x 映射成0或者1,使用sigmoid函数进行模拟。
sigmoid
假设函数:
h ( x ⃗ ) = 1 1 + e − θ ⃗ T x ⃗ h(\vec{x}) =\frac{1}{ 1+e^{ -\vec{\theta}^T\vec{x}}} h(x )=1+eθ Tx 1
其中:
x ⃗ = [ x 0 , x 1 , . . . , x n ] T ∈ R ( n + 1 ) × 1 θ ⃗ = [ θ 0 , θ 1 , . . . , θ n ] T ∈ R ( n + 1 ) × 1 ( n 为 特 征 个 数 ) \begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (n为特征个数) \end{aligned} x =[x0,x1,...,xn]TR(n+1)×1θ =[θ0,θ1,...,θn]TR(n+1)×1n
即找到一系列参数 θ ⃗ \vec{\theta} θ 尽可能使得 y = 0 y=0 y=0 h ( x ⃗ ) → 0 h(\vec x)\rightarrow 0 h(x )0 y = 1 y=1 y=1 h ( x ⃗ ) → 1 h(\vec x)\rightarrow 1 h(x )1
h ( x ⃗ ) h(\vec x) h(x )视为 h ( x ⃗ ) = 1 h(\vec x)=1 h(x )=1的概率,则 h ( x ⃗ ) h(\vec x) h(x )预测正确的概率为:
p = h ( x ⃗ ) y ( 1 − h ( x ⃗ ) ) ( 1 − y ) p=h(\vec x)^y(1-h(\vec x))^{(1-y)} p=h(x )y(1h(x ))(1y)
y = 0 y=0 y=0时, h ( x ⃗ ) h(\vec x) h(x )预测正确的概率(即 h ( x ⃗ ) = 0 h(\vec x)=0 h(x )=0)为 1 − h ( x ⃗ ) 1-h(\vec x) 1h(x )
y = 1 y=1 y=1时, h ( x ⃗ ) h(\vec x) h(x )预测正确的概率(即 h ( x ⃗ ) = 1 h(\vec x)=1 h(x )=1)为 h ( x ⃗ ) h(\vec x) h(x )
要使预测正确的概率最大,则对所有的测试数据满足:
max ⁡ θ ⃗ l ( θ ⃗ ) = max ⁡ θ ⃗ ( p ( 1 ) ⋅ p ( 2 ) ⋅ . . . p ( m ) ) = max ⁡ θ ⃗ ∏ i = 1 i = m h ( x ⃗ ( i ) ) y ( i ) ( 1 − h ( x ⃗ ( i ) ) ) ( 1 − y ( i ) ) \begin{aligned} \max_{\vec{\theta}}l(\vec{\theta}) &= \max_{\vec{\theta}}(p^{(1)}\cdot p^{(2)}\cdot ...p^{(m)})\\ &= \max_{\vec{\theta}}\prod_{i=1}^{i=m} h(\vec x^{(i)})^{y^{(i)}}(1-h(\vec x^{(i)}))^{(1-y^{(i)})}\\ \end{aligned} θ maxl(θ )=θ max(p(1)p(2)...p(m))=θ maxi=1i=mh(x (i))y(i)(1h(x (i)))(1y(i))
两边取对数有:
max ⁡ θ ⃗ L ( θ ⃗ ) = max ⁡ θ ⃗ l n ( l ( θ ⃗ ) ) ) = max ⁡ θ ⃗ ∑ i = 1 i = m y ( i ) l n ( h ( x ⃗ ( i ) ) ) + ( 1 − y ( i ) ) l n ( ( 1 − h ( x ⃗ ( i ) ) ) ) \begin{aligned} \max_{\vec{\theta}}L(\vec{\theta}) &= \max_{\vec{\theta}}ln(l(\vec{\theta})))\\ &= \max_{\vec{\theta}}\sum_{i=1}^{i=m} y^{(i)}ln(h(\vec x^{(i)}))+(1-y^{(i)})ln((1-h(\vec x^{(i)})))\\ \end{aligned} θ maxL(θ )=θ maxln(l(θ )))=θ maxi=1i=my(i)ln(h(x (i)))+(1y(i))ln((1h(x (i))))
所以令代价函数 J ( θ ⃗ ) = − 1 m L ( θ ⃗ ) J( \vec{\theta})=-\frac{1}{m}L(\vec{\theta}) J(θ )=m1L(θ )。转化成求使 J ( θ ⃗ ) J( \vec{\theta}) J(θ )最小的 θ ⃗ \vec{\theta} θ
故代价函数:
J ( θ ⃗ ) = − 1 m ( ∑ i = 1 i = m y ( i ) l n ( h ( x ⃗ ( i ) ) ) + ( 1 − y ( i ) ) l n ( 1 − h ( x ⃗ ( i ) ) ) ) J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)}))) J(θ )=m1(i=1i=my(i)ln(h(x (i)))+(1y(i))ln(1h(x (i))))
其中:
y ⃗ = [ y ( 0 ) , y ( 1 ) , . . . , y ( m ) ] T ∈ R ( m × 1 ) y ( i ) ∈ { 0 , 1 } ( m 为 测 试 样 本 个 数 ) \begin{aligned} \vec{y}=[y^{(0)}, y^{(1)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ y^{(i)}\in\{0, 1\} (m为测试样本个数) \end{aligned} y =[y(0),y(1),...,y(m)]TR(m×1)y(i){0,1}m
代价函数还可以做如下解释:
y = 0 y=0 y=0时, h ( x ⃗ ) = 1 h(\vec x)=1 h(x )=1的代价趋于无穷, h ( x ⃗ ) = 0 h(\vec x)=0 h(x )=0的代价为零。
y = 1 y=1 y=1时, h ( x ⃗ ) = 0 h(\vec x)=0 h(x )=0的代价趋于无穷, h ( x ⃗ ) = 1 h(\vec x)=1 h(x )=1的代价为零。
梯度下降法:
θ ⃗ j : = θ ⃗ j − α ∂ J ( θ ⃗ ) ∂ θ j \begin{aligned} \vec{\theta}_j&:=\vec{\theta}_j-\alpha\frac{\partial J(\vec{\theta})}{\partial \theta_j} \\ \end{aligned} θ j:=θ jαθjJ(θ )
∂ J ( θ ⃗ ) ∂ θ j = − 1 m ∑ i = 1 i = m ( y ( i ) ( h ( x ⃗ ( i ) ) ) ′ h ( x ⃗ ( i ) ) + ( 1 − y ( i ) ) − ( h ( x ⃗ ( i ) ) ) ′ 1 − h ( x ⃗ ( i ) ) ) = − 1 m ∑ i = 1 i = m ( ( y ( i ) h ( x ⃗ ( i ) ) − 1 − y ( i ) 1 − h ( x ⃗ ( i ) ) ) ( h ( x ⃗ ( i ) ) ) ′ ) = − 1 m ∑ i = 1 i = m ( ( ( 1 + e − θ ⃗ T x ⃗ ( i ) ) ( y ( i ) e − θ ⃗ T x ⃗ ( i ) + y ( i ) − 1 ) e − θ ⃗ T x ⃗ ( i ) ) ( e − θ ⃗ T x ⃗ ( i ) x j ( i ) ( 1 + e − θ ⃗ T x ⃗ ( i ) ) 2 ) ) = 1 m ∑ i = 1 i = m ( x j ( i ) − x j ( i ) y ( i ) ( 1 + e − θ ⃗ T x ⃗ ( i ) ) 1 + e − θ ⃗ T x ⃗ ( i ) ) = 1 m ∑ i = 1 i = m ( h ( x ⃗ ( i ) ) − y ( i ) ) x j ( i ) \begin{aligned} \frac{\partial J(\vec{\theta})}{\partial \theta_j} &= -\frac{1}{m}\sum_{i=1}^{i=m}(y^{(i)}\frac{(h(\vec{x}^{(i)}))^{'}}{h(\vec{x}^{(i)})}+(1-y^{(i)})\frac{-(h(\vec{x}^{(i)}))^{'}}{1-h(\vec{x}^{(i)})}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{y^{(i)}}{h(\vec{x}^{(i)})}-\frac{1-y^{(i)}}{1-h(\vec{x}^{(i)})})(h(\vec{x}^{(i)}))^{'}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})(y^{(i)}e^{-\vec{\theta}^T\vec{x}^{(i)}}+y^{(i)}-1)}{e^{-\vec{\theta}^T\vec{x}^{(i)}}})(\frac{e^{-\vec{\theta}^T\vec{x}^{(i)}}x_j^{(i)}}{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})^2})) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(\frac{x_j^{(i)}-x_j^{(i)}y^{(i)}(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})}{1+e^{-\vec{\theta}^T\vec{x}^{(i)}}}) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)} \\ \end{aligned} θjJ(θ )=m1i=1i=m(y(i)h(x (i))(h(x (i)))+(1y(i))1h(x (i))(h(x (i))))=m1i=1i=m((h(x (i))y(i)1h(x (i))1y(i))(h(x (i))))=m1i=1i=m((eθ Tx (i)(1+eθ Tx (i))(y(i)eθ Tx (i)+y(i)1))((1+eθ Tx (i))2eθ Tx (i)xj(i)))=m1i=1i=m(1+eθ Tx (i)xj(i)xj(i)y(i)(1+eθ Tx (i)))=m1i=1i=m(h(x (i))y(i))xj(i)

返回目录

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值