L o g i s t i c Logistic Logistic分布
设
X
X
X是连续随机变量,
X
X
X服从
L
o
g
i
s
t
i
c
Logistic
Logistic分布是指
X
X
X的分布函数和密度函数分别为:
F
(
x
)
=
P
(
X
⩽
x
)
=
1
1
+
e
−
(
x
−
μ
)
/
γ
f
(
x
)
=
F
′
(
x
)
=
e
−
(
x
−
μ
)
/
γ
γ
(
1
+
e
−
(
x
−
μ
)
/
γ
)
2
F(x)=P(X \leqslant x)=\frac{1}{1+\mathrm{e}^{-(x-\mu) / \gamma}}\\ f(x)=F^{\prime}(x)=\frac{\mathrm{e}^{-(x-\mu) / \gamma}}{\gamma\left(1+\mathrm{e}^{-(x-\mu) / \gamma}\right)^{2}}
F(x)=P(X⩽x)=1+e−(x−μ)/γ1f(x)=F′(x)=γ(1+e−(x−μ)/γ)2e−(x−μ)/γ
其中,
μ
\mu
μ为位置系数,
γ
>
0
\gamma>0
γ>0为形状参数
逻辑回归的定义
L o g i s t i c Logistic Logistic回归目的是从特征学习一个 0 / 1 0/1 0/1分类模型,而这个模型是将特征的线性组合作为自变量,由于自变量的取值范围是负无穷到正无穷,因此,使用 L o g i s t i c Logistic Logistic函数(亦称为 s i g m o i d sigmoid sigmoid函数)将自变量映射到 ( 0 , 1 ) (0,1) (0,1)上,映射后的值被认为是属于 y = 1 y=1 y=1的概率
逻辑回归模型:
P
(
Y
=
1
∣
x
)
=
exp
(
w
⋅
x
+
b
)
1
+
exp
(
w
⋅
x
+
b
)
P
(
Y
=
0
∣
x
)
=
1
1
+
exp
(
w
⋅
x
+
b
)
\begin{array}{l}{P(Y=1 | x)=\frac{\exp (w \cdot x+b)}{1+\exp (w \cdot x+b)}} \\ {P(Y=0 | x)=\frac{1}{1+\exp (w \cdot x+b)}}\end{array}
P(Y=1∣x)=1+exp(w⋅x+b)exp(w⋅x+b)P(Y=0∣x)=1+exp(w⋅x+b)1
这里,
x
∈
R
n
x \in \mathbf{R}^{n}
x∈Rn是输入,
Y
∈
{
0
,
1
}
Y \in\{0,1\}
Y∈{0,1}是输出,
w
∈
R
n
w \in \mathbf{R}^{n}
w∈Rn和
b
∈
R
b \in \mathbf{R}
b∈R是参数,
w
w
w称为权值向量,
b
b
b称为偏置,
w
⋅
x
w \cdot x
w⋅x为
w
w
w和
x
x
x的内积
将权值向量和偏置进行拓展: w = ( w ( 1 ) , w ( 2 ) , ⋯ , w ( n ) , b ) T w=\left(w^{(1)}, w^{(2)}, \cdots, w^{(n)}, b\right)^{\mathrm{T}} w=(w(1),w(2),⋯,w(n),b)T, x = ( x ( 1 ) , x ( 2 ) , ⋯ , x ( n ) , 1 ) T x=\left(x^{(1)}, x^{(2)}, \cdots, x^{(n)}, 1\right)^{\mathrm{T}} x=(x(1),x(2),⋯,x(n),1)T
逻辑回归模型为:
P
(
Y
=
1
∣
x
)
=
exp
(
w
⋅
x
)
1
+
exp
(
w
⋅
x
)
P
(
Y
=
0
∣
x
)
=
1
1
+
exp
(
w
⋅
x
)
\begin{array}{l}{P(Y=1 | x)=\frac{\exp (w \cdot x)}{1+\exp (w \cdot x)}} \\ {P(Y=0 | x)=\frac{1}{1+\exp (w \cdot x)}}\end{array}
P(Y=1∣x)=1+exp(w⋅x)exp(w⋅x)P(Y=0∣x)=1+exp(w⋅x)1
事件的几率(
o
d
d
s
odds
odds)为:
p
1
−
p
\frac{p}{1-p}
1−pp
时间的对数几率( l o g o d d s log odds logodds)或者 l o g i t logit logit函数为: logit ( p ) = log p 1 − p \operatorname{logit}(p)=\log \frac{p}{1-p} logit(p)=log1−pp
对于逻辑回归来说: log P ( Y = 1 ∣ x ) 1 − P ( Y = 1 ∣ x ) = w ⋅ x \log \frac{P(Y=1 | x)}{1-P(Y=1 | x)}=w \cdot x log1−P(Y=1∣x)P(Y=1∣x)=w⋅x
这说明在逻辑回归模型中输出 Y = 1 Y=1 Y=1的对数几率是输入 x x x的线性函数
模型参数估计
对于给定训练集: T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)}, x ∈ R n x \in \mathbf{R}^{n} x∈Rn是输入, Y ∈ { 0 , 1 } Y \in\{0,1\} Y∈{0,1}是输出
应用极大似然估计法估计模型参数,设:
P
(
Y
=
1
∣
x
)
=
π
(
x
)
,
P
(
Y
=
0
∣
x
)
=
1
−
π
(
x
)
,
π
(
x
)
=
exp
(
w
⋅
x
)
1
+
exp
(
w
⋅
x
)
P(Y=1 | x)=\pi(x), \quad P(Y=0 | x)=1-\pi(x), \quad \pi(x)=\frac{\exp (w \cdot x)}{1+\exp (w \cdot x)}
P(Y=1∣x)=π(x),P(Y=0∣x)=1−π(x),π(x)=1+exp(w⋅x)exp(w⋅x)
似然函数为:
∏
i
=
1
N
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
\prod_{i=1}^{N}\left[\pi\left(x_{i}\right)\right]^{y_{i}}\left[1-\pi\left(x_{i}\right)\right]^{1-y_{i}}
i=1∏N[π(xi)]yi[1−π(xi)]1−yi
对数似然函数为:
L
(
w
)
=
∑
i
=
1
N
[
y
i
log
π
(
x
i
)
+
(
1
−
y
i
)
log
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
log
π
(
x
i
)
1
−
π
(
x
i
)
+
log
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
(
w
⋅
x
i
)
−
log
(
1
+
exp
(
w
⋅
x
i
)
]
\begin{aligned} L(w) &=\sum_{i=1}^{N}\left[y_{i} \log \pi\left(x_{i}\right)+\left(1-y_{i}\right) \log \left(1-\pi\left(x_{i}\right)\right)\right] \\ &=\sum_{i=1}^{N}\left[y_{i} \log \frac{\pi\left(x_{i}\right)}{1-\pi\left(x_{i}\right)}+\log \left(1-\pi\left(x_{i}\right)\right)\right] \\ &=\sum_{i=1}^{N}\left[y_{i}\left(w \cdot x_{i}\right)-\log \left(1+\exp \left(w \cdot x_{i}\right)\right]\right.\end{aligned}
L(w)=i=1∑N[yilogπ(xi)+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]=i=1∑N[yi(w⋅xi)−log(1+exp(w⋅xi)]
对
L
(
w
)
L(w)
L(w)求极大值,得到
w
w
w的估计值,可以通过梯度下降或者拟牛顿法进行学习
假设
w
w
w的极大似然估计值为
w
^
\hat{\boldsymbol{w}}
w^,那么逻辑回归模型为:
P
(
Y
=
1
∣
x
)
=
exp
(
w
^
⋅
x
)
1
+
exp
(
w
^
⋅
x
)
P
(
Y
=
0
∣
x
)
=
1
1
+
exp
(
w
^
⋅
x
)
P(Y=1 | x)=\frac{\exp (\hat{w} \cdot x)}{1+\exp (\hat{w} \cdot x)}\\ P(Y=0 | x)=\frac{1}{1+\exp (\hat{w} \cdot x)}
P(Y=1∣x)=1+exp(w^⋅x)exp(w^⋅x)P(Y=0∣x)=1+exp(w^⋅x)1
多类别逻辑回归
随机变量
Y
Y
Y的取值集合为
{
1
,
2
,
⋯
,
K
}
\{1,2, \cdots, K\}
{1,2,⋯,K},
x
∈
R
n
+
1
,
w
k
∈
R
n
+
1
x \in \mathbf{R}^{n+1}, w_{k} \in \mathbf{R}^{n+1}
x∈Rn+1,wk∈Rn+1,多项逻辑回归模型为:
P
(
Y
=
k
∣
x
)
=
exp
(
w
k
⋅
x
)
1
+
∑
k
=
1
K
−
1
exp
(
w
k
⋅
x
)
,
k
=
1
,
2
,
⋯
,
K
−
1
P
(
Y
=
K
∣
x
)
=
1
1
+
∑
k
=
1
K
−
1
exp
(
w
k
⋅
x
)
P(Y=k | x)=\frac{\exp \left(w_{k} \cdot x\right)}{1+\sum_{k=1}^{K-1} \exp \left(w_{k} \cdot x\right)}, \quad k=1,2, \cdots, K-1\\ P(Y=K | x)=\frac{1}{1+\sum_{k=1}^{K-1} \exp \left(w_{k} \cdot x\right)}
P(Y=k∣x)=1+∑k=1K−1exp(wk⋅x)exp(wk⋅x),k=1,2,⋯,K−1P(Y=K∣x)=1+∑k=1K−1exp(wk⋅x)1