逻辑回归的推导和理解(公式清晰)

前言

Logistic Regression 虽然被称为回归,但其实际上是分类模型,并常用于二分类。LR也是面试时常需要手撕的模型,本文从各种角度推导了LR的相关公式,希望对大家有帮助(未完待续)。

模型建立推导

由线性回归的定义式出发, h θ ( x ) h_{\theta}(x) hθ(x)是预测值:
h θ ( x ) = ∑ i = 0 n θ i x i = θ T x h_{\theta}(x)=\sum_{i=0}^n{\theta _i}x_i=\theta ^Tx hθ(x)=i=0nθixi=θTx
Sigmoid函数(s型函数):
g ( x ) = 1 1 + e − x g(x)=\frac{1}{1+e^{-x}} g(x)=1+ex1
在这里插入图片描述
Sigmoid函数求导有如下结果(后面推导会用到):
g ′ ( x ) = ( 1 1 + e − x ) ′ = e − x ( 1 + e − x ) 2 = 1 1 + e − x ⋅ e − x 1 + e − x = 1 1 + e − x ⋅ ( 1 − 1 1 + e − x ) = g ( x ) ⋅ ( 1 − g ( x ) ) g^{\prime}(x)=\left( \frac{1}{1+e^{-x}} \right) ^{\prime}=\frac{e^{-x}}{\left( 1+e^{-x} \right) ^2} \\=\frac{1}{1+e^{-x}}\cdot \frac{e^{-x}}{1+e^{-x}}\\=\frac{1}{1+e^{-x}}\cdot \left( 1-\frac{1}{1+e^{-x}} \right) \\ =g(x)\cdot (1-g(x)) g(x)=(1+ex1)=(1+ex)2ex=1+ex11+exex=1+ex1(11+ex1)=g(x)(1g(x))

θ T x \theta ^Tx θTx代入Sigmoid函数:
h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x h_{\theta}(x)=g\left( \theta ^Tx \right) =\frac{1}{1+e^{-\theta ^Tx}} hθ(x)=g(θTx)=1+eθTx1

模型求解推导

一般的二分类问题,可以记作:

P ( y = 1 ∣ x ; θ ) = h θ ( x ) P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) \begin{aligned} P(y=1\mid x;\theta )&=h_{\theta}(x)\\ P(y=0\mid x;\theta )&=1-h_{\theta}(x)\\ \end{aligned} P(y=1x;θ)P(y=0x;θ)=hθ(x)=1hθ(x)
其中用 h θ ( x ) h_{\theta}(x) hθ(x)代替概率值 θ \theta θ

将两式归为一个式子:
p ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y p(y\mid x;\theta )=\left( h_{\theta}(x) \right) ^y\left( 1-h_{\theta}(x) \right) ^{1-y} p(yx;θ)=(hθ(x))y(1hθ(x))1y

假定样本独立,求似然函数:
L ( θ ) = p ( y ⃗ ∣ X ; θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) \begin{aligned} L(\theta )&=p(\vec{y}\mid X;\theta )\\ &=\prod_{i=1}^m{p}\left( y^{(i)}\mid x^{(i)};\theta \right)\\ &=\prod_{i=1}^m{\left( h_{\theta}\left( x^{(i)} \right) \right) ^{y^{(i)}}}\left( 1-h_{\theta}\left( x^{(i)} \right) \right) ^{1-y^{(i)}}\\ \end{aligned} L(θ)=p(y X;θ)=i=1mp(y(i)x(i);θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)

两边取对数:
l ( θ ) = log ⁡ L ( θ ) = ∑ i = 1 m y ( i ) log ⁡ h ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h ( x ( i ) ) ) l(\theta )=\log L(\theta )=\sum_{i=1}^m{y^{(i)}}\log h\left( x^{(i)} \right) +\left( 1-y^{(i)} \right) \log \left( 1-h\left( x^{(i)} \right) \right) l(θ)=logL(θ)=i=1my(i)logh(x(i))+(1y(i))log(1h(x(i)))

求偏导(注意,这里仅是对一个θ求):
∂ l ( θ ) ∂ θ j = ∑ i = 1 m ( y ( i ) h ( x ( i ) ) − 1 − y ( i ) 1 − h ( x ( i ) ) ) ⋅ ∂ h ( x ( i ) ) ∂ θ j = ∑ i = 1 m ( y ( i ) g ( θ T x ( i ) ) − 1 − y ( i ) 1 − g ( θ T x ( i ) ) ) ⋅ ∂ g ( θ T x ( i ) ) ∂ θ j = ∑ i = 1 m ( y ( i ) g ( θ T x ( i ) ) − 1 − y ( i ) 1 − g ( θ T x ( i ) ) ) ⋅ g ( θ T x ( i ) ) ⋅ ( 1 − g ( θ T x ( i ) ) ) ⋅ ∂ θ T x ( i ) ∂ θ j = ∑ i = 1 m ( y ( i ) ( 1 − g ( θ T x ( i ) ) ) − ( 1 − y ( i ) ) g ( θ T x ( i ) ) ) ⋅ x j ( i ) = ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) ⋅ x j ( i ) \frac{\partial l(\theta )}{\partial \theta _j}=\sum_{i=1}^m{\left( \frac{y^{(i)}}{h\left( x^{(i)} \right)}-\frac{1-y^{(i)}}{1-h\left( x^{(i)} \right)} \right)}\cdot \frac{\partial h\left( x^{(i)} \right)}{\partial \theta _j} \\ =\sum_{i=1}^m{\left( \frac{y^{(i)}}{g\left( \theta ^Tx^{(i)} \right)}-\frac{1-y^{(i)}}{1-g\left( \theta ^Tx^{(i)} \right)} \right)}\cdot \frac{\partial g\left( \theta ^Tx^{(i)} \right)}{\partial \theta _j} \\ =\sum_{i=1}^m{\left( \frac{y^{(i)}}{g\left( \theta ^Tx^{(i)} \right)}-\frac{1-y^{(i)}}{1-g\left( \theta ^Tx^{(i)} \right)} \right)}\cdot g\left( \theta ^Tx^{(i)} \right) \cdot \left( 1-g\left( \theta ^Tx^{(i)} \right) \right) \cdot \frac{\partial \theta ^Tx^{(i)}}{\partial \theta _j} \\ =\sum_{i=1}^m{\left( y^{(i)}\left( 1-g\left( \theta ^Tx^{(i)} \right) \right) -\left( 1-y^{(i)} \right) g\left( \theta ^Tx^{(i)} \right) \right)}\cdot x_{j}^{(i)} \\ =\sum_{i=1}^m{\left( y^{(i)}-g\left( \theta ^Tx^{(i)} \right) \right)}\cdot x_{j}^{(i)} θjl(θ)=i=1m(h(x(i))y(i)1h(x(i))1y(i))θjh(x(i))=i=1m(g(θTx(i))y(i)1g(θTx(i))1y(i))θjg(θTx(i))=i=1m(g(θTx(i))y(i)1g(θTx(i))1y(i))g(θTx(i))(1g(θTx(i)))θjθTx(i)=i=1m(y(i)(1g(θTx(i)))(1y(i))g(θTx(i)))xj(i)=i=1m(y(i)g(θTx(i)))xj(i)

注意最后一项求导之后仅剩下了 x j ( i ) x_{j}^{(i)} xj(i)

为了求最大似然估计,要用到梯度下降:

θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta _j:=\theta _j+\alpha \left( y^{(i)}-h_{\theta}\left( x^{(i)} \right) \right) x_{j}^{(i)} θj:=θj+α(y(i)hθ(x(i)))xj(i)

比较线性回归和逻辑回归的梯度下降规则:

θ j : = θ j + α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta _j:=\theta _j+\alpha \sum_{i=1}^m{\left( y^{(i)}-h_{\theta}\left( x^{(i)} \right) \right)}x_{j}^{(i)} \\ \theta _j:=\theta _j+\alpha \left( y^{(i)}-h_{\theta}\left( x^{(i)} \right) \right) x_{j}^{(i)} θj:=θj+αi=1m(y(i)hθ(x(i)))xj(i)θj:=θj+α(y(i)hθ(x(i)))xj(i)

一点思考

可以观察到: h θ h_{\theta} hθ函数不一样,但学习规则是一样的。 区别是什么?逻辑回归是假定模型服从二项分布,而线性回归是假定模型服从高斯分布,包括泊松分布,这三者之间有一个共同的属性,他们都属于指数族分布,都是广义的线性模型。

损失函数的角度

A. 假设实际值的取值在-1到1之间

y i ∈ { − 1 , 1 } y ^ i = { p i y i = 1 1 − p i y i = − 1 y_i\in \{-1,1\} \\ \hat{y}_i=\left\{ \begin{matrix} p_i& y_i=1\\ 1-p_i& y_i=-1\\ \end{matrix} \right. yi{1,1}y^i={pi1piyi=1yi=1

可以凑出:

L ( θ ) = ∏ i = 1 m p i ( y i + 1 ) / 2 ( 1 − p i ) − ( y i − 1 ) / 2 L(\theta )=\prod_{i=1}^m{p_{i}^{\left( y_i+1 \right) /2}}\left( 1-p_i \right) ^{-\left( y_i-1 \right) /2} L(θ)=i=1mpi(yi+1)/2(1pi)(yi1)/2

两边取对数:
l ( θ ) = ∑ i = 1 m ln ⁡ [ p i ( y i + 1 ) / 2 ( 1 − p i ) − ( y i − 1 ) / 2 ] l(\theta )=\sum_{i=1}^m{\ln}\left[ p_{i}^{\left( y_i+1 \right) /2}\left( 1-p_i \right) ^{-\left( y_i-1 \right) /2} \right] l(θ)=i=1mln[pi(yi+1)/2(1pi)(yi1)/2]
代入 p i = 1 1 + e − f i p_i=\frac{1}{1+e^{-f_i}} pi=1+efi1并通分:
l ( θ ) = ∑ i = 1 m ln ⁡ [ ( 1 1 + e − f i ) ( y i + 1 ) / 2 ( 1 1 + e f i ) − ( y i − 1 ) / 2 ] l(\theta )=\sum_{i=1}^m{\ln}\left[ \left( \frac{1}{1+e^{-f_i}} \right) ^{\left( y_i+1 \right) /2}\left( \frac{1}{1+e^{f_i}} \right) ^{-\left( y_i-1 \right) /2} \right] l(θ)=i=1mln[(1+efi1)(yi+1)/2(1+efi1)(yi1)/2]

取最大就是最大似然,取反取最小就是负的对数似然,求出损失函数:

∴ l o s s ( y i , y ^ i ) = − l ( θ ) = ∑ i = 1 m [ 1 2 ( y i + 1 ) ln ⁡ ( 1 + e − f i ) − 1 2 ( y i − 1 ) ln ⁡ ( 1 + e f i ) ] \therefore \mathrm{loss}\left( y_i,\hat{y}_i \right) =-l(\theta ) \\ =\sum_{i=1}^m{\left[ \frac{1}{2}\left( y_i+1 \right) \ln \left( 1+e^{-f_i} \right) -\frac{1}{2}\left( y_i-1 \right) \ln \left( 1+e^{f_i} \right) \right]} loss(yi,y^i)=l(θ)=i=1m[21(yi+1)ln(1+efi)21(yi1)ln(1+efi)]
写开为两个式子:
= { ∑ i = 1 m [ ln ⁡ ( 1 + e − f i ) ] y i = 1 ∑ i = 1 m [ ln ⁡ ( 1 + e f i ) ] y i = − 1 =\left\{ \begin{matrix} \sum_{i=1}^m{\left[ \ln \left( 1+e^{-f_i} \right) \right]}& y_i=1\\ \sum_{i=1}^m{\left[ \ln \left( 1+e^{f_i} \right) \right]}& y_i=-1\\ \end{matrix} \right. ={i=1m[ln(1+efi)]i=1m[ln(1+efi)]yi=1yi=1
观察到 y i y_i yi f i f_i fi的符号相同,可以写到一起。最后的损失函数:

⇒ l o s s ( y i , y ^ i ) = ∑ i = 1 m [ ln ⁡ ( 1 + e − y i ⋅ f i ) ] \Rightarrow \mathrm{loss}\left( y_i,\hat{y}_i \right) =\sum_{i=1}^m{\left[ \ln \left( 1+e^{-y_i\cdot f_i} \right) \right]} loss(yi,y^i)=i=1m[ln(1+eyifi)]

B. 若 y i y_i yi取值发生改变

y i ∈ { 0 , 1 } y ^ i = { p i y i = 1 1 − p i y i = 0 y_i\in \{0,1\} \\ \hat{y}_i=\left\{ \begin{matrix} p_i& y_i=1\\ 1-p_i& y_i=0\\ \end{matrix} \right. yi{0,1}y^i={pi1piyi=1yi=0

则损失函数推导如下:
L ( θ ) = ∏ i = 1 m p i y i ( 1 − p i ) 1 − y i ⇒ l ( θ ) = ∑ i = 1 m ln ⁡ [ p i y i ( 1 − p i ) 1 − y i ] p i = 1 1 + e − f i ⟶ l ( θ ) = ∑ i = 1 m ln ⁡ [ ( 1 1 + e − f i ) y i ( 1 1 + e f i ) 1 − y i ] ∴ l o s s ( y i , y ^ i ) = − l ( θ ) = ∑ i = 1 m [ y i ln ⁡ ( 1 + e − f i ) + ( 1 − y i ) ln ⁡ ( 1 + e f i ) ] L(\theta )=\prod_{i=1}^m{p_{i}^{y_i}}\left( 1-p_i \right) ^{1-y_i} \\ \Rightarrow l(\theta )=\sum_{i=1}^m{\ln}\left[ p_{i}^{y_i}\left( 1-p_i \right) ^{1-y_i} \right] \\ \quad \frac{p_i=\frac{1}{1+e^{-fi}}}{\longrightarrow}l(\theta )=\sum_{i=1}^m{\ln}\left[ \left( \frac{1}{1+e^{-f_i}} \right) ^{y_i}\left( \frac{1}{1+e^{f_i}} \right) ^{1-y_i} \right] \\ \therefore \mathrm{loss}\left( y_i,\hat{y}_i \right) =-l(\theta ) \\ =\sum_{i=1}^m{\left[ y_i\ln \left( 1+e^{-f_i} \right) +\left( 1-y_i \right) \ln \left( 1+e^{f_i} \right) \right]} L(θ)=i=1mpiyi(1pi)1yil(θ)=i=1mln[piyi(1pi)1yi]pi=1+efi1l(θ)=i=1mln[(1+efi1)yi(1+efi1)1yi]loss(yi,y^i)=l(θ)=i=1m[yiln(1+efi)+(1yi)ln(1+efi)]
两种都是一个意思,只是第一种的更简洁。

任何一个样本都是有标记值和估计值,也可以从交叉熵来解释逻辑回归。

  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值