logistic回归参数求解推导

记录一下逻辑回归的参数求解推导过程:

损失函数

线性回归的表达式为: f ( x ) = w x + b f(x) = wx+b f(x)=wx+b,为了消除后面的 b b b,令 θ = [ w b ] , x = [ x 1 ] T \theta = [w \quad b], x = [x \quad 1]^T θ=[wb],x=[x1]T,则 f ( x ) = θ x f(x) = \theta x f(x)=θx

将其转换为逻辑回归模型: y = σ ( f ( x ) ) = σ ( θ x ) = 1 1 + e − θ x y=\sigma(f({x}))=\sigma\left({\theta} {x}\right)=\frac{1}{1+e^{-{\theta} {x}}} y=σ(f(x))=σ(θx)=1+eθx1

我们把单个样本看作一个事件,那么这个时间发生的概率为:
P ( y ∣ x ) = { p , y = 1 1 − p , y = 0 P(y \mid {x})=\left\{\begin{array}{r} p, y=1 \\ 1-p, y=0 \end{array}\right. P(yx)={p,y=11p,y=0
它等价于: P ( y i ∣ x i ) = p y i ( 1 − p ) 1 − y i P\left(y_{i} \mid \boldsymbol{x}_{i}\right)=p^{y_{i}}(1-p)^{1-y_{i}} P(yixi)=pyi(1p)1yi

如果我们采集到了一组数据一共N个, { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ( x 3 , y 3 ) … ( x N , y N ) } , \left\{\left({x}_{1}, y_{1}\right),\left({x}_{2}, y_{2}\right),\left({x}_{3}, y_{3}\right) \ldots\left({x}_{N}, y_{N}\right)\right\}, {(x1,y1),(x2,y2),(x3,y3)(xN,yN)}, 这个合成在一起的合事件发生的总概率如下:
P t o t a l = P ( y 1 ∣ x 1 ) P ( y 2 ∣ x 2 ) P ( y 3 ∣ x 3 ) … P ( y N ∣ x N ) = ∏ i = 1 N p y i ( 1 − p ) 1 − y i F ( θ ) = l n ( P t o t a l ) = ∑ i = 1 N l n ( p y i ( 1 − p ) 1 − y i ) = ∑ i = 1 N y i l n p + ( 1 − y i ) l n ( 1 − p ) 其 中 p = 1 1 + e − θ x \begin{aligned} P_{total} &= P(y_1|x_1)P(y_2|x_2)P(y_3|x_3) \ldots P(y_N|x_N) \\ &= \prod_{i=1}^{N} p^{y_{i}}(1-p)^{1-y_{i}} \\ F(\theta) &= ln(P_{total}) = \sum_{i=1}^N ln(p^{y_{i}}(1-p)^{1-y_{i}}) \\ &= \sum_{i=1}^N y_ilnp + (1-y_i)ln(1-p) \\ 其中 p &= \frac{1}{1+e^{-{\theta} {x}}} \end{aligned} PtotalF(θ)p=P(y1x1)P(y2x2)P(y3x3)P(yNxN)=i=1Npyi(1p)1yi=ln(Ptotal)=i=1Nln(pyi(1p)1yi)=i=1Nyilnp+(1yi)ln(1p)=1+eθx1
为了符合损失函数的含义,将其定义为为:
L ( θ ) = − F ( θ ) L(\theta) = -F(\theta) L(θ)=F(θ)

推导

∂ L ∂ θ = ∂ L ∂ p × ∂ p ∂ θ \frac{\partial L}{\partial \theta} = \frac{\partial L}{\partial p} \times \frac{\partial p}{\partial \theta} θL=pL×θp

先求 ∂ p ∂ θ \frac{\partial p}{\partial \theta} θp :
p ′ = ( 1 1 + e − θ x ) ′ = − 1 ( 1 + e − θ x ) 2 ⋅ e − θ x ⋅ − x = 1 1 + e − θ x ⋅ e − θ x 1 + e − θ x ⋅ x = p ( 1 − p ) x \begin{aligned} p' &= (\frac{1}{1+e^{-\theta x}})' \\ &= \frac{-1}{(1+e^{-\theta x})^2} \cdot e^{-\theta x} \cdot -x \\ &= \frac{1}{1+e^{-\theta x}} \cdot \frac{e^{-\theta x}}{1+e^{-\theta x}} \cdot x \\ &= p(1-p)x \end{aligned} p=(1+eθx1)=(1+eθx)21eθxx=1+eθx11+eθxeθxx=p(1p)x

∂ F ∂ θ \frac{\partial F}{\partial \theta} θF :
∇ F ( θ ) = ∇ ( ∑ i = 1 N y i l n p + ( 1 − y i ) l n ( 1 − p ) ) = ∂ F ∂ p × ∂ p ∂ θ = ( ∑ i = 1 N y i 1 p + ( 1 − y i ) − 1 1 − p ) p ′ = ∑ i = 1 N y i ( 1 − p ) x i − ( 1 − y i ) p x i = ∑ i = 1 N ( y i − p ) x i \begin{aligned} \nabla F(\theta) &= \nabla (\sum_{i=1}^N y_ilnp + (1-y_i)ln(1-p)) \\ &= \frac{\partial F}{\partial p} \times \frac{\partial p}{\partial \theta} \\ &= (\sum_{i=1}^N y_i \frac{1}{p} + (1-y_i)\frac{-1}{1-p}) p' \\ &= \sum_{i=1}^N y_i(1-p)x_i - (1-y_i)px_i \\ &= \sum_{i=1}^N (y_i-p) x_i \\ \end{aligned} F(θ)=(i=1Nyilnp+(1yi)ln(1p))=pF×θp=(i=1Nyip1+(1yi)1p1)p=i=1Nyi(1p)xi(1yi)pxi=i=1N(yip)xi
因此 ∂ L ∂ θ = ∑ i = 1 N ( p − y i ) x i \frac{\partial L}{\partial \theta} = \sum_{i=1}^N (p-y_i)x_i θL=i=1N(pyi)xi

梯度更新

通过反向传播, θ \theta θ 的更新过程如下:
θ : = θ − α ∑ i = 1 N ( 1 1 + e − θ x i ) x i \theta := \theta - \alpha \sum_{i=1}^N (\frac{1}{1+e^{-\theta x_i}}) x_i θ:=θαi=1N(1+eθxi1)xi


参考自:逻辑回归 logistics regression 公式推导

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值