逻辑回归(LR)总结

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/junxing2018_wu/article/details/117520184

逻辑回归(logistic regression)

性质

  • 针对二分类问题
  • 基于条件概率出发
  • 线性回归函数( y = w T x + b y=w^Tx+b y=wTx+b 和 逻辑函数( y = 1 1 + e − x y=\frac {1} {1+e^{-x}} y=1+ex1)组合
  • 是线性分类器(决策边界决定了LR是线性分类器)

计算过程

  1. 对于二分类问题
    p ( y = 1 ∣ x , w ) = 1 1 + e − ( w T x + b ) p ( y = 0 ∣ x , w ) = e − ( w T x + b ) 1 + e − ( w T x + b ) p(y=1| x, w) = \frac {1} {1+e^{-(w^Tx+b)}} \\ p(y=0| x, w) = \frac {e^{-(w^Tx+b)}} {1+e^{-(w^Tx+b)}} p(y=1x,w)=1+e(wTx+b)1p(y=0x,w)=1+e(wTx+b)e(wTx+b)
    可以合并成:
    p ( y ∣ x , w , b ) = p ( y = 1 ∣ x , w , b ) y [ ( 1 − p ( y = 1 ∣ x , w , b ) ) ] 1 − y p(y| x, w,b) = p(y=1| x, w,b) ^y [(1 - p(y=1| x, w,b) )]^{1-y} p(yx,w,b)=p(y=1x,w,b)y[(1p(y=1x,w,b))]1y

  2. 目标函数(objective function)
    假设我们拥有数据集 D = ( x i , y i ) i = 1 n D={(x_i, y_i)}_{i=1}^{n} D=(xi,yi)i=1n x i ∈ R d x_i \in R^d xiRd y i ∈ { 0 , 1 } y_i \in \{0, 1\} yi{0,1}
    此时我们需要最大化目标函数(最大似然估计MLE):
    w ^ M L E , b ^ M L E = a r g m a x w , b ∏ i = 1 n p ( y i ∣ x i , w , b ) = a r g m a x w , b ∑ i = 1 n l o g ( p ( y i ∣ x i , w , b ) ) = a r g m i n w , b − ∑ i = 1 n l o g ( p ( y i ∣ x i , w , b ) ) = a r g m i n w , b − ∑ i = 1 n l o g ( p ( y i = 1 ∣ x , w , b ) y i [ ( 1 − p ( y i = 1 ∣ x , w , b ) ) ] 1 − y i ) = a r g m i n w , b − ( ∑ i = 1 n y i ⋅ l o g ( p ( y i = 1 ∣ x , w , b ) ) + ( 1 − y i ) ⋅ l o g ( 1 − p ( y i = 1 ∣ x , w , b ) ) ) = a r g m i n w , b − ( ∑ i = 1 n y i ⋅ l o g ( σ ( w T x + b ) ) + ( 1 − y i ) ⋅ l o g ( 1 − σ ( w T x + b ) ) ) \hat w_{MLE}, \hat b_{MLE} =argmax_{w, b} \prod_{i=1}^{n} p( y_ {i} |x_ {i} ,w,b) \\ = argmax_{w,b} \sum_{i=1}^{n} log(p( y_ {i} |x_ {i} ,w,b)) \\ = argmin_{w,b} -\sum_{i=1}^{n} log(p( y_ {i} |x_ {i} ,w,b)) \\ = argmin_{w,b} -\sum_{i=1}^{n} log(p(y_{i}=1| x, w,b) ^{y_{i}} [(1 - p(y_{i}=1| x, w,b) )]^{1-y_{i}}) \\ = argmin_{w,b} -(\sum_{i=1}^{n} y_{i} \cdot log(p(y_{i}=1| x, w,b))+(1-y_{i}) \cdot log(1 - p(y_{i}=1| x, w,b))) \\ = argmin_{w,b} -( \sum_{i=1}^{n} y_{i} \cdot log(\sigma(w^Tx+b)) +(1-y_{i}) \cdot log(1 - \sigma(w^Tx+b))) w^MLE,b^MLE=argmaxw,bi=1np(yixi,w,b)=argmaxw,bi=1nlog(p(yixi,w,b))=argminw,bi=1nlog(p(yixi,w,b))=argminw,bi=1nlog(p(yi=1x,w,b)yi[(1p(yi=1x,w,b))]1yi)=argminw,b(i=1nyilog(p(yi=1x,w,b))+(1yi)log(1p(yi=1x,w,b)))=argminw,b(i=1nyilog(σ(wTx+b))+(1yi)log(1σ(wTx+b)))
    L ( w , b ) = − ( ∑ i = 1 n y i ⋅ l o g ( σ ( w T x + b ) ) + ( 1 − y i ) ⋅ l o g ( 1 − σ ( w T x + b ) ) ) L(w,b) = -( \sum_{i=1}^{n} y_{i} \cdot log(\sigma(w^Tx+b)) +(1-y_{i}) \cdot log(1 - \sigma(w^Tx+b))) L(w,b)=(i=1nyilog(σ(wTx+b))+(1yi)log(1σ(wTx+b)))
    ∂ L ( w , b ) ∂ w = − ( ∑ i = 1 n y i ⋅ σ ( w T x + b ) ⋅ [ 1 − σ ( w T x + b ) ] σ ( w T x + b ) ⋅ x i + ( 1 − y i ) ⋅ − σ ( w T x + b ) ⋅ [ 1 − σ ( w T x + b ) ] 1 − σ ( w T x + b ) ⋅ x i ) = − ∑ i = 1 n y i ⋅ ( 1 − σ ( w T x + b ) ) ⋅ x i + ( y − 1 ) ⋅ σ ( w T x + b ) ⋅ x i ) = − ∑ i = 1 n [ y i − σ ( w T x + b ) ] ⋅ x i = ∑ i = 1 n [ σ ( w T x + b ) − y i ] ⋅ x i \frac {\partial L(w,b)}{ \partial w} = -(\sum_{i=1}^{n} y_{i} \cdot \frac {\sigma(w^Tx + b) \cdot [1-\sigma(w^Tx + b) ]}{\sigma(w^Tx + b)} \cdot x_{i}+(1-y_i) \cdot \frac {-\sigma(w^Tx + b) \cdot [1-\sigma(w^Tx + b)]} {1- \sigma(w^Tx + b)} \cdot x_i) \\ = -\sum_{i=1}^{n} y_{i} \cdot (1-\sigma(w^Tx + b)) \cdot x_i + (y-1) \cdot \sigma(w^Tx + b) \cdot x_i) \\ = -\sum_{i=1}^{n} [y_{i} - \sigma(w^Tx + b)] \cdot x_i \\ = \sum_{i=1}^{n} [\sigma(w^Tx + b) - y_{i}] \cdot x_i wL(w,b)=(i=1nyiσ(wTx+b)σ(wTx+b)[1σ(wTx+b)]xi+(1yi)1σ(wTx+b)σ(wTx+b)[1σ(wTx+b)]xi)=i=1nyi(1σ(wTx+b))xi+(y1)σ(wTx+b)xi)=i=1n[yiσ(wTx+b)]xi=i=1n[σ(wTx+b)yi]xi
    ∂ L ( w , b ) ∂ b = − ( ∑ i = 1 n y i ⋅ σ ( w T x + b ) ⋅ [ 1 − σ ( w T x + b ) ] σ ( w T x + b ) + ( 1 − y i ) ⋅ − σ ( w T x + b ) ⋅ [ 1 − σ ( w T x + b ) ] 1 − σ ( w T x + b ) ) = − ∑ i = 1 n y i ⋅ ( 1 − σ ( w T x + b ) ) + ( y − 1 ) ⋅ σ ( w T x + b ) ) = − ∑ i = 1 n [ y i − σ ( w T x + b ) ] = ∑ i = 1 n [ σ ( w T x + b ) − y i ] \frac {\partial L(w,b)}{ \partial b} = -(\sum_{i=1}^{n} y_{i} \cdot \frac {\sigma(w^Tx + b) \cdot [1-\sigma(w^Tx + b) ]}{\sigma(w^Tx + b)} +(1-y_i) \cdot \frac {-\sigma(w^Tx + b) \cdot [1-\sigma(w^Tx + b)]} {1- \sigma(w^Tx + b)}) \\ = -\sum_{i=1}^{n} y_{i} \cdot (1-\sigma(w^Tx + b)) + (y-1) \cdot \sigma(w^Tx + b)) \\ = -\sum_{i=1}^{n} [y_{i} - \sigma(w^Tx + b)] \\ = \sum_{i=1}^{n} [\sigma(w^Tx + b) - y_{i}] bL(w,b)=(i=1nyiσ(wTx+b)σ(wTx+b)[1σ(wTx+b)]+(1yi)1σ(wTx+b)σ(wTx+b)[1σ(wTx+b)])=i=1nyi(1σ(wTx+b))+(y1)σ(wTx+b))=i=1n[yiσ(wTx+b)]=i=1n[σ(wTx+b)yi]
    注释:

    1. σ ( x ) ′ = σ ( x ) ( 1 − σ ( x ) ) {\sigma(x)}' = \sigma(x)(1-\sigma(x)) σ(x)=σ(x)(1σ(x))
    2. 最后那个表达式中, σ ( w T x + b ) \sigma(w^Tx + b) σ(wTx+b) 是预测值, y i y_i yi是真实值,所以这就意味着我们在梯度下降法的时候会不断地去观测当前样本的预测值和真实值,考虑它们之间的差别,然后通过这样的差别不断地更新 W W W,使得最后学出一个很好的 w w w b b b,相当于预测值和真实值会不断接近。
    3. 注意,当给定的数据线性可分的时候,逻辑回归的参数有可能趋向于正无穷大。(过拟合现象,需要加上正则项)
  3. 梯度下降法

    1. 初始化 w 0 w^0 w0 b 0 b^0 b0
    2. 设置 epoch num: m m m,learning rate: η \eta η
    3. t 从 0开始迭代到m,
      w t + 1 = w t − η ⋅ ∑ i = 1 n [ σ ( w T x + b ) − y i ] ⋅ x i w^{t+1} = w^{t} - \eta \cdot \sum_{i=1}^{n} [\sigma(w^Tx + b) - y_{i}] \cdot x_i wt+1=wtηi=1n[σ(wTx+b)yi]xi
      b t + 1 = b t − η ⋅ ∑ i = 1 n [ σ ( w T x + b ) − y i ] b^{t+1} = b^{t} - \eta \cdot \sum_{i=1}^{n} [\sigma(w^Tx + b) - y_{i}] bt+1=btηi=1n[σ(wTx+b)yi]
    4. 停止条件:
      • ∣ L t ( w , b ) − L t + 1 ( w , b ) ∣ < ϵ \left | L_t(w,b) - L_{t+1}(w,b) \right | < \epsilon Lt(w,b)Lt+1(w,b)<ϵ
      • ∣ w t − w t − 1 ∣ < ϵ \left | w^{t} - w^{t-1} \right| < \epsilon wtwt1<ϵ
      • validation data(early stop)
      • fixed iteration(最大迭代次数)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值