分类: logistic regression

逻辑回归模型

逻辑分布:
x:continuous variable
累计分布函数: F ( x ) = 1 1 + e x p ( − ( x − μ ) γ ) F(x)=\frac{1}{1+exp(-\frac{(x-\mu)}{\gamma})} F(x)=1+exp(γ(xμ))1
density function: f ( x ) = e x p ( − ( x − μ ) γ ) γ ( 1 + e x p ( − ( x − μ ) γ ) ) 2 f(x)=\frac{exp(-\frac{(x-\mu)}{\gamma})}{\gamma(1+exp(-\frac{(x-\mu)}{\gamma}))^2} f(x)=γ(1+exp(γ(xμ)))2exp(γ(xμ))
f ( x ) f(x) f(x)关于 μ \mu μ对称
在这里插入图片描述

模型

input: x
output: Y label(分类)
对于二分类问题:
p ( Y = 1 ∣ x ) = e x p ( x T β ) 1 + e x p ( x T β ) p(Y=1|x)=\frac{exp(x^T\beta)}{1+exp(x^T\beta)} p(Y=1x)=1+exp(xTβ)exp(xTβ)
p ( Y = 0 ∣ x ) = 1 1 + e x p ( x T β ) p(Y=0|x)=\frac{1}{1+exp(x^T\beta)} p(Y=0x)=1+exp(xTβ)1
模型分析:
如果 x T β → ∞ x^T\beta\rightarrow \infty xTβ, p ( Y = 1 ∣ x ) = 1 p(Y=1|x)=1 p(Y=1x)=1
如果 x T β → − ∞ x^T\beta\rightarrow -\infty xTβ, p ( Y = 0 ∣ x ) = 1 p(Y=0|x)=1 p(Y=0x)=1

广义线性模型:
odds:
p ( Y = 1 ∣ x ) p ( Y = 0 ∣ x ) = e x p ( x T β ) \frac{p(Y=1|x)}{p(Y=0|x)}=exp(x^T\beta) p(Y=0x)p(Y=1x)=exp(xTβ)

log odds:
x T β x^T\beta xTβ

model estimation

observation:
for the i_th subject, ( x i , y i ) (x_i,y_i) (xi,yi)

表示方法:
p ( x i , β ) = p ( Y = 1 ∣ X = x i ) p(x_i,\beta)=p(Y=1|X=x_i) p(xi,β)=p(Y=1X=xi)

maximum likelihood estimation:
独立的伯努利分布

L ( β ) = ∏ i = 1 n p i y i ( 1 − p i ) 1 − y i L(\beta)=\prod _{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i} L(β)=i=1npiyi(1pi)1yi
l ( β ) = ∑ i = 1 n y i l o g p i + ( 1 − y i ) l o g ( 1 − p i ) = ∑ i = 1 n y i l o g p ( x i , β ) + ( 1 − y i ) l o g ( 1 − p ( x i , β ) ) = ∑ i = 1 n y i x i T β − l o g ( 1 + e x p ( x i T β ) ) l(\beta)=\sum _{i=1}^n {y_i}logp_i+{(1-y_i)}log(1-p_i)=\sum _{i=1}^n {y_i}logp(x_i,\beta)+{(1-y_i)}log(1-p(x_i,\beta))=\sum _{i=1}^n {y_i}x_i^T\beta-log(1+exp(x_i^T\beta)) l(β)=i=1nyilogpi+(1yi)log(1pi)=i=1nyilogp(xi,β)+(1yi)log(1p(xi,β))=i=1nyixiTβlog(1+exp(xiTβ))

对似然函数求导:

∂ l ( β ) ∂ β = ∑ i = 1 n x i ( y i − p ( x i , β ) ) \frac{\partial l(\beta) }{\partial \beta}=\sum _{i=1}^n {x_i}(y_i-p(x_i,\beta)) βl(β)=i=1nxi(yip(xi,β))

algorithm:
β n e w = β o l d − ( ∂ 2 l ( β ) ∂ β ∂ β T ) − 1 ∂ l ( β ) ∂ β \beta^{new}=\beta^{old}-(\frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T})^{-1}\frac{\partial l(\beta)}{\partial \beta} βnew=βold(ββT2l(β))1βl(β)

∂ 2 l ( β ) ∂ β ∂ β T = − ∑ i = 1 n x i x i T p ( x i , β ) ( 1 − p ( x i , β ) ) \frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}=-\sum _{i=1}^n x_i{x_i}^Tp(x_i,\beta)(1-p(x_i,\beta)) ββT2l(β)=i=1nxixiTp(xi,β)(1p(xi,β))

将相关函数进行形式改写
P = ( p ( x 1 , β ) , ⋯   , p ( x n , β ) ) T P=(p(x_1,\beta),\cdots,p(x_n,\beta))^T P=(p(x1,β),,p(xn,β))T
W = d i a g ( p ( x 1 , β ) ( 1 − p ( x 1 , β ) ) , ⋯   , p ( x n , β ) ( 1 − p ( x n , β ) ) ) W=diag(p(x_1,\beta)(1-p(x_1,\beta)),\cdots,p(x_n,\beta)(1-p(x_n,\beta))) W=diag(p(x1,β)(1p(x1,β)),,p(xn,β)(1p(xn,β)))

∂ l ( β ) ∂ β = ∑ i = 1 n x i ( y i − p ( x i , β ) ) = X T ( Y − P ) \frac{\partial l(\beta) }{\partial \beta}=\sum _{i=1}^n {x_i}(y_i-p(x_i,\beta))=X^T(Y-P) βl(β)=i=1nxi(yip(xi,β))=XT(YP)
∂ 2 l ( β ) ∂ β ∂ β T = − X T W X \frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}=-X^TWX ββT2l(β)=XTWX

如果我们将 x i x_i xi表示为列向量, X T = ( x 1 , ⋯   , x n ) X^T=(x_1,\cdots,x_n) XT=(x1,,xn)

β n e w = β o l d + ( X T W X ) − 1 X T ( Y − P ) = ( X T W X ) − 1 X T W ( X β o l d + W − 1 ( Y − P ) ) = ( X T W X ) − 1 X T W Z \beta^{new}\\=\beta^{old}+(X^TWX)^{-1}X^T(Y-P)\\=(X^TWX)^{-1}X^TW(X\beta^{old}+W^{-1}(Y-P))\\=(X^TWX)^{-1}X^TWZ βnew=βold+(XTWX)1XT(YP)=(XTWX)1XTW(Xβold+W1(YP))=(XTWX)1XTWZ
Z = ( X β o l d + W − 1 ( Y − P ) ) Z=(X\beta^{old}+W^{-1}(Y-P)) Z=(Xβold+W1(YP))

这个算法被称为iteratively reweighted least squares

comment:
(没有证明过)
1. β ^ \hat{\beta} β^ converge to N ( β , ( X T W X ) − 1 ) N(\beta,(X^TWX)^{-1}) N(β,(XTWX)1)
2. likelihood test:
L R = − 2 m a x β 0 l ( β 0 , β 1 = 0 ) + 2 m a x β 0 , β 1 l ( β 0 , β 1 ) = D E V 0 − D E V 1 LR=-2max_{\beta_0}l(\beta_0,\beta_1=0)+2max_{\beta_0,\beta_1}l(\beta_0,\beta_1)=DEV_0-DEV_1 LR=2maxβ0l(β0,β1=0)+2maxβ0,β1l(β0,β1)=DEV0DEV1
复杂模型似然值-简单模型似然值
follow χ 2 ( n u m   o f   p a r e m e t e r s   i n   β 1 ) \chi^2(num\ of\ paremeters\ in\ \beta_1) χ2(num of paremeters in β1)

multinominal logistic regression

多分类问题: Y ∈ { 1 , ⋯   , K } Y\in \{1,\cdots,K\} Y{1,,K}

p ( Y = k ∣ x ) = e x p ( x T β k ) 1 + e x p ( x T β k ) p(Y=k|x)=\frac{exp(x^T\beta_k)}{1+exp(x^T\beta_k)} p(Y=kx)=1+exp(xTβk)exp(xTβk)
k = 1 , 2 , ⋯   , K − 1 k=1,2,\cdots,K-1 k=1,2,,K1
最后一个类的概率
1 − ∑ i = 1 K − 1 p ( Y = k ∣ x ) 1-\sum_{i=1}^{K-1}p(Y=k|x) 1i=1K1p(Y=kx)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值