本文只讨论二分类的情况
一、逻辑回归
P ( Y = 1 ∣ X = x ) = e w T x 1 + e w T x = h ( x ) P ( Y = 0 ∣ X = x ) = 1 1 + e w T x = 1 − h ( x ) l o g P ( Y = 1 ∣ X = x ) P ( Y = 0 ∣ X = x ) = w T x \begin{aligned} & P(Y=1|X=x) = { {e^{w^Tx} } \over {1+e^{w^Tx} } } =h(x) \\ &P(Y=0|X=x) = { {1} \over {1+e^{w^T{x}} } } =1-h(x) \\ & log { {P(Y=1|X=x)} \over {P(Y=0|X=x)} } = w^Tx \end{aligned} P(Y=1∣X=x)=1+ewTxewTx=h(x)P(Y=0∣X=x)=1+ewTx1=1−h(x)logP(Y=0∣X=x)P(Y=1∣X=x)=wTx
二、参数估计(极大似然估计)
似然函数:
l
(
w
)
=
∏
i
=
1
n
h
(
x
i
)
y
i
(
1
−
h
(
x
i
)
)
1
−
y
i
l(w) = \prod_ {i=1}^{n} h(x_i)^{y_i} (1-h(x_i))^{1-y_i}
l(w)=i=1∏nh(xi)yi(1−h(xi))1−yi
对数似然函数:
L
(
w
)
=
∑
i
=
1
n
(
y
i
l
o
g
h
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
(
x
i
)
)
)
=
∑
i
=
1
n
(
y
i
w
T
x
i
−
y
i
l
o
g
(
1
+
e
w
T
x
i
)
+
(
y
i
−
1
)
l
o
g
(
1
+
e
w
T
x
i
)
)
=
∑
i
=
1
n
(
y
i
w
T
x
i
−
l
o
g
(
1
+
e
w
T
x
i
)
)
\begin{aligned} L(w) & = \sum_ {i=1}^{n} ({y_i}log{h(x_i)} + {(1-y_i)}log{(1-h(x_i))} ) \\ & =\sum_{i=1}^{n} (y_iw^Tx_i-y_ilog(1+e^{w^Tx_i})+(y_i-1)log(1+e^{w^Tx_i})) \\ & =\sum_{i=1}^{n} (y_iw^Tx_i - log(1+e^{w^Tx_i})) \end{aligned}
L(w)=i=1∑n(yilogh(xi)+(1−yi)log(1−h(xi)))=i=1∑n(yiwTxi−yilog(1+ewTxi)+(yi−1)log(1+ewTxi))=i=1∑n(yiwTxi−log(1+ewTxi))
可以证明
L
(
w
)
L(w)
L(w)是关于
w
w
w的凸函数,有最大值,证明如下:
令
f
(
w
)
=
y
w
T
x
−
l
o
g
(
1
+
e
w
T
x
)
f(w)=yw^Tx-log(1+e^{w^Tx})
f(w)=ywTx−log(1+ewTx)
∂
f
(
w
)
∂
w
=
y
x
−
e
w
T
x
1
+
e
w
T
x
{{\partial f(w)} \over {\partial w}} =yx-{ {e^{w^Tx}} \over {1+e^{w^Tx}} }
∂w∂f(w)=yx−1+ewTxewTx
∂ 2 f ( w ) ∂ w ∂ w T = − x e w T x x T ( 1 + e w T x ) 2 = − e w T x ( 1 + e w T x ) 2 x x T \begin{aligned} { { \partial^2f(w) } \over {\partial w \partial w^T} } &=-{ { xe^{ w^Tx }x^T } \over { (1+e^{ w^Tx } )^2 } } \\ &=- { e^{w^Tx} \over { ( 1+e^{ w^Tx } )^2 } } {xx^T} \end{aligned} ∂w∂wT∂2f(w)=−(1+ewTx)2xewTxxT=−(1+ewTx)2ewTxxxT
∀
\forall
∀非零向量
z
z
z,
z
T
(
x
x
T
)
z
=
z
T
x
(
z
T
x
)
T
≥
0
z^T(xx^T)z=z^Tx(z^Tx)^T \ge0
zT(xxT)z=zTx(zTx)T≥0,又因为
e
w
T
x
(
1
+
e
w
T
x
)
2
>
0
{ e^{w^Tx} \over { ( 1+e^{ w^Tx } )^2 } } \gt0
(1+ewTx)2ewTx>0,所以
∂
2
f
(
w
)
∂
w
∂
w
T
{ { \partial^2f(w) } \over {\partial w \partial w^T} }
∂w∂wT∂2f(w)是半负定矩阵,即
f
(
w
)
f(w)
f(w)是关于
w
w
w的凸函数,有最大值。
对数似然函数对向量
w
w
w求导,可得:
∂ L ( w ) ∂ w = ∑ i = 1 n ( y i x i − e w T x i 1 + e w T x i x i ) = ∑ i = 1 n ( y i − e w T x i 1 + e w T x i ) x i = ∑ i = 1 n ( y i − h ( x i ) ) x i \begin{aligned} {\partial L(w) \over \partial w} &=\sum_ {i=1}^{n} (y_ix_i-{ {e^{w^Tx_i} } \over {1+{e^{w^Tx_i} } } }x_i) \\ &=\sum_ {i=1}^{n} (y_i-{ {e^{w^Tx_i} } \over {1+{e^{w^Tx_i} } } })x_i \\ &=\sum_ {i=1}^{n}(y_i-h(x_i))x_i \end{aligned} ∂w∂L(w)=i=1∑n(yixi−1+ewTxiewTxixi)=i=1∑n(yi−1+ewTxiewTxi)xi=i=1∑n(yi−h(xi))xi
BGD求解:
注意此处
w
w
w是向量
w
=
w
+
λ
∑
i
=
1
n
(
y
i
−
h
(
x
i
)
)
x
i
w=w+ \lambda \sum_ {i=1}^{n}(y_i-h(x_i))x_i
w=w+λi=1∑n(yi−h(xi))xi
SGD求解:
w = w + λ ( y i − h ( x i ) ) x i w=w+ \lambda (y_i-h(x_i))x_i w=w+λ(yi−h(xi))xi
MBGD求解:
假设每次使用
b
b
b个样本
f
o
r
i
=
1
,
1
+
b
,
1
+
2
b
,
.
.
.
for \quad i=1,1+b,1+2b,...
fori=1,1+b,1+2b,...
w
=
w
+
λ
∑
k
=
i
i
+
b
(
y
i
−
h
(
x
i
)
)
x
i
w=w+ \lambda \sum_ {k=i}^{i+b}(y_i-h(x_i))x_i
w=w+λk=i∑i+b(yi−h(xi))xi