逻辑回归及其数学推导

本文只讨论二分类的情况

一、逻辑回归

P ( Y = 1 ∣ X = x ) = e w T x 1 + e w T x = h ( x ) P ( Y = 0 ∣ X = x ) = 1 1 + e w T x = 1 − h ( x ) l o g P ( Y = 1 ∣ X = x ) P ( Y = 0 ∣ X = x ) = w T x \begin{aligned} & P(Y=1|X=x) = { {e^{w^Tx} } \over {1+e^{w^Tx} } } =h(x) \\ &P(Y=0|X=x) = { {1} \over {1+e^{w^T{x}} } } =1-h(x) \\ & log { {P(Y=1|X=x)} \over {P(Y=0|X=x)} } = w^Tx \end{aligned} P(Y=1X=x)=1+ewTxewTx=h(x)P(Y=0X=x)=1+ewTx1=1h(x)logP(Y=0X=x)P(Y=1X=x)=wTx

二、参数估计(极大似然估计)

似然函数:
l ( w ) = ∏ i = 1 n h ( x i ) y i ( 1 − h ( x i ) ) 1 − y i l(w) = \prod_ {i=1}^{n} h(x_i)^{y_i} (1-h(x_i))^{1-y_i} l(w)=i=1nh(xi)yi(1h(xi))1yi
对数似然函数:
L ( w ) = ∑ i = 1 n ( y i l o g h ( x i ) + ( 1 − y i ) l o g ( 1 − h ( x i ) ) ) = ∑ i = 1 n ( y i w T x i − y i l o g ( 1 + e w T x i ) + ( y i − 1 ) l o g ( 1 + e w T x i ) ) = ∑ i = 1 n ( y i w T x i − l o g ( 1 + e w T x i ) ) \begin{aligned} L(w) & = \sum_ {i=1}^{n} ({y_i}log{h(x_i)} + {(1-y_i)}log{(1-h(x_i))} ) \\ & =\sum_{i=1}^{n} (y_iw^Tx_i-y_ilog(1+e^{w^Tx_i})+(y_i-1)log(1+e^{w^Tx_i})) \\ & =\sum_{i=1}^{n} (y_iw^Tx_i - log(1+e^{w^Tx_i})) \end{aligned} L(w)=i=1n(yilogh(xi)+(1yi)log(1h(xi)))=i=1n(yiwTxiyilog(1+ewTxi)+(yi1)log(1+ewTxi))=i=1n(yiwTxilog(1+ewTxi))
可以证明 L ( w ) L(w) L(w)是关于 w w w的凸函数,有最大值,证明如下:
f ( w ) = y w T x − l o g ( 1 + e w T x ) f(w)=yw^Tx-log(1+e^{w^Tx}) f(w)=ywTxlog(1+ewTx)
∂ f ( w ) ∂ w = y x − e w T x 1 + e w T x {{\partial f(w)} \over {\partial w}} =yx-{ {e^{w^Tx}} \over {1+e^{w^Tx}} } wf(w)=yx1+ewTxewTx

∂ 2 f ( w ) ∂ w ∂ w T = − x e w T x x T ( 1 + e w T x ) 2 = − e w T x ( 1 + e w T x ) 2 x x T \begin{aligned} { { \partial^2f(w) } \over {\partial w \partial w^T} } &=-{ { xe^{ w^Tx }x^T } \over { (1+e^{ w^Tx } )^2 } } \\ &=- { e^{w^Tx} \over { ( 1+e^{ w^Tx } )^2 } } {xx^T} \end{aligned} wwT2f(w)=(1+ewTx)2xewTxxT=(1+ewTx)2ewTxxxT

∀ \forall 非零向量 z z z z T ( x x T ) z = z T x ( z T x ) T ≥ 0 z^T(xx^T)z=z^Tx(z^Tx)^T \ge0 zT(xxT)z=zTx(zTx)T0,又因为 e w T x ( 1 + e w T x ) 2 > 0 { e^{w^Tx} \over { ( 1+e^{ w^Tx } )^2 } } \gt0 (1+ewTx)2ewTx>0,所以 ∂ 2 f ( w ) ∂ w ∂ w T { { \partial^2f(w) } \over {\partial w \partial w^T} } wwT2f(w)是半负定矩阵,即 f ( w ) f(w) f(w)是关于 w w w的凸函数,有最大值。
对数似然函数对向量 w w w求导,可得:

∂ L ( w ) ∂ w = ∑ i = 1 n ( y i x i − e w T x i 1 + e w T x i x i ) = ∑ i = 1 n ( y i − e w T x i 1 + e w T x i ) x i = ∑ i = 1 n ( y i − h ( x i ) ) x i \begin{aligned} {\partial L(w) \over \partial w} &=\sum_ {i=1}^{n} (y_ix_i-{ {e^{w^Tx_i} } \over {1+{e^{w^Tx_i} } } }x_i) \\ &=\sum_ {i=1}^{n} (y_i-{ {e^{w^Tx_i} } \over {1+{e^{w^Tx_i} } } })x_i \\ &=\sum_ {i=1}^{n}(y_i-h(x_i))x_i \end{aligned} wL(w)=i=1n(yixi1+ewTxiewTxixi)=i=1n(yi1+ewTxiewTxi)xi=i=1n(yih(xi))xi

BGD求解:

注意此处 w w w是向量
w = w + λ ∑ i = 1 n ( y i − h ( x i ) ) x i w=w+ \lambda \sum_ {i=1}^{n}(y_i-h(x_i))x_i w=w+λi=1n(yih(xi))xi

SGD求解:

w = w + λ ( y i − h ( x i ) ) x i w=w+ \lambda (y_i-h(x_i))x_i w=w+λ(yih(xi))xi

MBGD求解:

假设每次使用 b b b个样本
f o r i = 1 , 1 + b , 1 + 2 b , . . . for \quad i=1,1+b,1+2b,... fori=1,1+b,1+2b,...
w = w + λ ∑ k = i i + b ( y i − h ( x i ) ) x i w=w+ \lambda \sum_ {k=i}^{i+b}(y_i-h(x_i))x_i w=w+λk=ii+b(yih(xi))xi

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值