机器学习之逻辑回归(Logistic Regression)
逻辑回归(二分类)
s i g m o i d f u n c t i o n : σ ( z ) = 1 1 + e − z sigmoid \ function:\sigma(z)=\frac {1}{1+e^{-z}} sigmoid function:σ(z)=1+e−z1
- z → + ∞ , l i m σ ( z ) = 1 z \rightarrow +\infty,\ lim{\sigma (z)}=1 z→+∞, limσ(z)=1
- z → 0 , σ ( z ) = 1 2 z \rightarrow 0 ,\ {\sigma(z)}={\frac{1}{2}} z→0, σ(z)=21
- z → − ∞ , l i m σ ( z ) = 0 z \rightarrow -\infty,\ lim\sigma(z)=0 z→−∞, limσ(z)=0
推导
D a t a : { ( x i , y i ) i = 1 N } Data:\lbrace (x_i, y_i)_{i=1}^N \rbrace Data:{(xi,yi)i=1N}
假设数据集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x i , y i ) , ⋯ , ( x N , y N ) } D=\lbrace (x_1, y_1),(x_2,y_2),\cdots,(x_i,y_i),\cdots,(x_N, y_N)\rbrace D={(x1,y1),(x2,y2),⋯,(xi,yi),⋯,(xN,yN)},其中 x i ϵ R p x_i\epsilon \mathbb{R}^p xiϵRp
$p_1=P(y=1|x)=\sigma(wTx)=\frac{1}{1+exp(-wTx)}=\frac{exp(wTx)}{1+exp(wTx)}\ \Longrightarrow \ \psi(x;w) $
p 2 = P ( y = 0 ∣ x ) = 1 − P ( y = 1 ∣ x ) = e x p ( − w T x ) 1 + e x p ( − w T x ) = 1 1 + e x p ( w T x ) ⟹ 1 − ψ ( x ; w ) p2=P(y=0|x)=1-P(y=1|x)=\frac {exp(-w^Tx)}{1+exp(-w^Tx)}=\frac{1}{1+exp(w^Tx)}\ \Longrightarrow \ 1-\psi(x;w) p2=P(y=0∣x)=1−P(y=1∣x)=1+exp(−wTx)exp(−wTx)=1+exp(wTx)1 ⟹ 1−ψ(x;w)
P ( y ∣ x ) = p 1 y p 0 1 − y P(y|x)=p_1^y p_0^{1-y} P(y∣x)=p1yp01−y
利用
M
L
E
MLE
MLE进行参数
w
w
w估计:
w
^
=
arg
max
w
l
o
g
P
(
Y
∣
X
)
=
arg
max
w
l
o
g
∏
i
N
P
(
y
i
∣
x
i
)
=
arg
max
w
∑
i
N
l
o
g
P
(
y
i
∣
x
i
)
=
arg
max
w
∑
i
N
(
y
i
l
o
g
p
1
+
(
1
−
y
i
)
l
o
g
p
0
)
=
arg
max
w
∑
i
N
(
y
i
l
o
g
ψ
(
x
i
;
w
)
+
(
1
−
y
i
)
l
o
g
(
1
−
ψ
(
x
i
;
w
)
)
)
\begin{aligned} \hat{w}&=\mathop{\arg\max}_{w}\ logP(Y|X) \\ &=\mathop{\arg\max}_{w}\ log{\prod_i^NP(y_i|x_i)} \\ &=\mathop{\arg\max}_{w}\ \sum_i^NlogP(y_i|x_i) \\ &=\mathop{\arg\max}_{w}\ \sum_i^N(y_ilogp_1+(1-y_i)logp_0) \\ &=\mathop{\arg\max}_{w}\ \sum_i^N(y_ilog\ \psi(x_i;w)+(1-y_i)log\ (1-\psi(x_i;w))) \end{aligned}
w^=argmaxw logP(Y∣X)=argmaxw logi∏NP(yi∣xi)=argmaxw i∑NlogP(yi∣xi)=argmaxw i∑N(yilogp1+(1−yi)logp0)=argmaxw i∑N(yilog ψ(xi;w)+(1−yi)log (1−ψ(xi;w)))
对于上式的求法主要采用梯度下降法以及拟牛顿法
对于逻辑回归的损失函数(对数损失函数):
L
o
s
s
(
w
)
=
−
∑
i
N
(
y
i
l
o
g
ψ
(
x
i
;
w
)
+
(
1
−
y
i
)
l
o
g
(
1
−
ψ
(
x
i
;
w
)
)
)
\begin{aligned} Loss(w) =- \sum_i^N(y_ilog\ \psi(x_i;w)+(1-y_i)log\ (1-\psi(x_i;w))) \end{aligned}
Loss(w)=−i∑N(yilog ψ(xi;w)+(1−yi)log (1−ψ(xi;w)))
而交叉熵损失函数为:
J
(
θ
)
=
−
1
N
∑
i
N
(
y
i
l
o
g
ψ
(
x
i
;
θ
)
+
(
1
−
y
i
)
l
o
g
(
1
−
ψ
(
x
i
;
θ
)
)
)
\begin{aligned} J(\theta) =-\frac{1}{N} \sum_i^N(y_ilog\ \psi(x_i;\theta)+(1-y_i)log\ (1-\psi(x_i;\theta))) \end{aligned}
J(θ)=−N1i∑N(yilog ψ(xi;θ)+(1−yi)log (1−ψ(xi;θ)))
可以看出深度学习中的很多思想都是从逻辑回归引申过去的
多项逻辑回归
上面介绍的逻辑回归是 二项分类模型,用于二分类。可以将其推广为 多项逻辑回归模型,用于多分类。假设离散型随机变量
Y
Y
Y的取值集合是
{
1
,
2
,
⋯
,
K
}
\lbrace 1,2,\cdots,K \rbrace
{1,2,⋯,K},那么多项逻辑回归模型是:
P
(
Y
=
k
∣
x
)
=
e
x
p
(
w
k
T
⋅
x
)
1
+
∑
k
=
1
K
−
1
e
x
p
(
w
k
T
⋅
x
)
,
k
=
1
,
2
,
⋯
,
K
−
1
P
(
Y
=
K
∣
x
)
=
1
1
+
∑
k
=
1
K
−
1
e
x
p
(
w
k
T
⋅
x
)
\begin{aligned} P(Y=k|x)&=\frac{exp(w_k^T \cdot x)}{1+\sum\limits_{k=1}^{K-1}exp(w_k^T \cdot x)},\quad k=1,2,\cdots,K-1 \\ P(Y=K|x)&=\frac{1}{1+\sum\limits_{k=1}^{K-1}exp(w_k^T \cdot x )} \end{aligned}
P(Y=k∣x)P(Y=K∣x)=1+k=1∑K−1exp(wkT⋅x)exp(wkT⋅x),k=1,2,⋯,K−1=1+k=1∑K−1exp(wkT⋅x)1
其中,注意 Y = k Y=k Y=k是一个取 1 1 1到 K − 1 K-1 K−1类其中一类, Y = K Y=K Y=K是指第 K K K类, P ( Y = K ∣ x ) P(Y=K|x) P(Y=K∣x)便是由 1 − ∑ i = 1 K − 1 P ( Y = k ∣ x ) 1-\sum\limits_{i=1}^{K-1}P(Y=k|x) 1−i=1∑K−1P(Y=k∣x)得到
对于多分类函数( s o f t m a x softmax softmax函数): P ( Y = k ∣ x ) = e x p ( w k T x ) ∑ k = 1 K e x p ( w k T x ) P(Y=k|x)=\frac{exp(w_k^Tx)}{\sum\limits_{k=1}^{K}exp(w_k^Tx)} P(Y=k∣x)=k=1∑Kexp(wkTx)exp(wkTx)
参考:
书籍:李航—统计学习方法
博客:https://zhuanlan.zhihu.com/p/56900935