更多深度学习资讯都在公众号:深度学习视觉
title: 逻辑回归介绍
mathjax: true
categories: ML
原文链接:https://fainke.com
线性回归模型公式:
g
(
x
)
=
ω
0
+
ω
1
x
1
g(x)=\omega_{0}+\omega_{1} x_{1}
g(x)=ω0+ω1x1
逻辑回归模型公式: f ( x ) = 1 1 + e − g ( x ) f(x)=\frac{1}{1+e^{-g(x)}} f(x)=1+e−g(x)1(包含了线性回归)
在 x x x条件下 y = 1 y=1 y=1发生的概率为: P ( y = 1 ∣ x ) = π ( x ) = 1 1 + e − g ( x ) P(y=1 | x)=\pi(x)=\frac{1}{1+e^{-g(x)}} P(y=1∣x)=π(x)=1+e−g(x)1
在 x x x条件下 y = 1 y = 1 y=1不发生的概率为: P ( y = 0 ∣ x ) = 1 − P ( y = 1 ∣ x ) = 1 − 1 1 + e − g ( x ) = e − g ( x ) 1 + e − g ( x ) = 1 1 + e g ( x ) P(y=0 | x)=1-P(y=1 | x)=1-\frac{1}{1+e^{-g(x)}}=\frac{e^{-g(x)}}{1+e^{-g(x)}}=\frac{1}{1+e^{g(x)}} P(y=0∣x)=1−P(y=1∣x)=1−1+e−g(x)1=1+e−g(x)e−g(x)=1+eg(x)1
事件发生与不发生的概率比(事件发生比odds)为: P ( y = 1 ∣ x ) P ( y = 0 ∣ x ) = p 1 − p = e g ( x ) \frac{P(y=1 | x)}{P(y=0 | x)}=\frac{p}{1-p}=e^{g(x)} P(y=0∣x)P(y=1∣x)=1−pp=eg(x)
接下来将会对这个odds进行操作。
设非线性函数 g ( x ) = w 0 + w 1 x 1 + … + w n x n g(x)=w_{0}+w_{1} x_{1}+\ldots+w_{n} x_{n} g(x)=w0+w1x1+…+wnxn
对odds取对数得到: ln ( p 1 − p ) = g ( x ) = w 0 + w 1 x 1 + … + w n x n \ln \left(\frac{p}{1-p}\right)=g(x)=w_{0}+w_{1} x_{1}+\ldots+w_{n} x_{n} ln(1−pp)=g(x)=w0+w1x1+…+wnxn
假设有m个相互独立的观测样本,观测值分别为 y 1 , y 2 , … , y m y_{1}, y_{2}, \dots, y_{m} y1,y2,…,ym,设 p i = P ( y i = 1 ∣ x i ) p_{i}=P\left(y_{i}=1 | x_{i}\right) pi=P(yi=1∣xi)为给定条件下得到 y i = 1 y_{i}=1 yi=1的概率,则 y i = 0 y_{i}=0 yi=0的概率为 P ( y i = 0 ∣ x i ) = 1 − p i P\left(y_{i}=0 | x_{i}\right)=1-p_{i} P(yi=0∣xi)=1−pi,所以得到一个观测值的概率为: P ( y i ) = p i y i ( 1 − p i ) y i − 1 = p i y i ( 1 − p i ) 1 − y i P\left(y_{i}\right)=\frac{p_{i}^{y_{i}}}{\left(1-p_{i}\right)^{y_{i}-1}}=p_{i}^{y_{i}}\left(1-p_{i}\right)^{1-y_{i}} P(yi)=(1−pi)yi−1piyi=piyi(1−pi)1−yi。因为各个观测样本相互独子,那么它们的联合分布为各边缘分布的乘积。
得到似然函数为: L ( w ) = ∏ i = 1 m ( π ( x i ) ) y i ( 1 − π ( x i ) ) 1 − y i L(w)=\prod_{i=1}^{m}\left(\pi\left(x_{i}\right)\right)^{y_{i}}\left(1-\pi\left(x_{i}\right)\right)^{1-y_{i}} L(w)=∏i=1m(π(xi))yi(1−π(xi))1−yi
这里似然函数的作用就是,求在所有事件发生的odds概率乘积为最大值下的参数值 w ( w 0 , w 1 , … , w n ) w\left(w_{0}, w_{1}, \dots, w_{n}\right) w(w0,w1,…,wn),n+1个参数。
对函数 L ( w ) L(w) L(w)取对数得到: ln L ( w ) = ∑ i = 1 m ( y i ln [ π ( x i ) ] + ( 1 − y i ) ln [ 1 − π ( x i ) ] ) \ln L(w)=\sum_{i=1}^{m}\left(y_{i} \ln \left[\pi\left(x_{i}\right)\right]+\left(1-y_{i}\right) \ln \left[1-\pi\left(x_{i}\right)\right]\right) lnL(w)=∑i=1m(yiln[π(xi)]+(1−yi)ln[1−π(xi)])
接下来分别对这些
w
w
w参数求导,得到n+1个方程。接下来以对参数
w
k
w_k
wk求导为例:
(
y
i
ln
[
π
(
x
i
)
]
+
(
1
−
y
i
)
ln
[
1
−
π
(
x
i
)
]
)
′
\left(y_{i} \ln \left[\pi\left(x_{i}\right)\right]+\left(1-y_{i}\right) \ln \left[1-\pi\left(x_{i}\right)\right]\right)^{\prime}
(yiln[π(xi)]+(1−yi)ln[1−π(xi)])′
=
y
i
π
(
x
i
)
⋅
[
π
(
x
i
)
]
′
+
(
1
−
y
i
)
⋅
−
[
π
(
x
i
)
]
′
1
−
π
(
x
i
)
)
′
=\frac{y_{i}}{\pi\left(x_{i}\right)} \cdot\left[\pi\left(x_{i}\right)\right]^{\prime}+\left(1-y_{i}\right) \cdot \frac{-\left[\pi\left(x_{i}\right)\right]^{\prime}}{1-\pi\left(x_{i}\right)} )^{\prime}
=π(xi)yi⋅[π(xi)]′+(1−yi)⋅1−π(xi)−[π(xi)]′)′
=
[
y
i
π
(
x
i
)
−
1
−
y
i
1
−
π
(
x
i
)
]
⋅
[
π
(
x
i
)
]
′
=\left[\frac{y_{i}}{\pi\left(x_{i}\right)}-\frac{1-y_{i}}{1-\pi\left(x_{i}\right)}\right] \cdot\left[\pi\left(x_{i}\right)\right]^{\prime}
=[π(xi)yi−1−π(xi)1−yi]⋅[π(xi)]′
=
(
y
i
−
π
(
x
i
)
)
g
′
(
x
)
=\left(y_{i}-\pi\left(x_{i}\right)\right) g^{\prime}(x)
=(yi−π(xi))g′(x)
=
x
i
k
[
y
i
−
π
(
x
i
)
]
=x_{i k}\left[y_{i}-\pi\left(x_{i}\right)\right]
=xik[yi−π(xi)]
得出: ∂ ln L ( w k ) ∂ w k = ∑ i = 1 m x i k [ y i − π ( x i ) ] = 0 \frac{\partial \ln L\left(w_{k}\right)}{\partial w_{k}}=\sum_{i=1}^{m} x_{i k}\left[y_{i}-\pi\left(x_{i}\right)\right]=0 ∂wk∂lnL(wk)=∑i=1mxik[yi−π(xi)]=0,当梯度为0时可使得函数值最大,至此求得最优 w k w_k wk 。
欢迎加入微信公众号:深度学习视觉