1. 逻辑回归模型
逻辑回归将线性回归的预测值转换为0/1值,通常采用sigmoid函数
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1进行转换:
h
(
x
)
=
1
1
+
e
−
(
ω
T
x
+
b
)
h(x)=\frac{1}{1+e^{-(\omega^Tx+b)}}
h(x)=1+e−(ωTx+b)1
即
P
(
y
=
1
∣
x
)
=
1
1
+
e
−
(
ω
T
x
+
b
)
=
e
ω
T
x
+
b
1
+
e
ω
T
x
+
b
P
(
y
=
0
∣
x
)
=
1
1
+
e
ω
T
x
+
b
\begin{aligned} P(y=1|x)&=\frac{1}{1+e^{-(\omega^Tx+b)}}\\ &=\frac{e^{\omega^Tx+b}}{1+e^{\omega^Tx+b}} \end{aligned}\\ P(y=0|x)=\frac{1}{1+e^{\omega^Tx+b}}
P(y=1∣x)=1+e−(ωTx+b)1=1+eωTx+beωTx+bP(y=0∣x)=1+eωTx+b1
2. 损失函数
最大化对数似然函数
L
(
ω
,
b
)
=
∑
i
=
1
m
ln
P
(
y
i
∣
x
i
;
ω
,
b
)
=
∑
i
=
1
m
ln
(
y
i
P
1
(
x
i
;
ω
,
b
)
+
(
1
−
y
i
)
P
0
(
x
i
;
ω
,
b
)
)
=
∑
i
=
1
m
(
y
i
ln
h
(
x
i
)
+
(
1
−
y
i
)
ln
(
1
−
h
(
x
i
)
)
)
\begin{aligned} L(\omega,b)&=\sum_{i=1}^m\ln P(y_i|x_i;\omega,b)\\ &=\sum_{i=1}^m\ln(y_iP_1(x_i;\omega,b)+(1-y_i)P_0(x_i;\omega,b))\\ &=\sum_{i=1}^m(y_i\ln h(x_i)+(1-y_i)\ln(1-h(x_i))) \end{aligned}
L(ω,b)=i=1∑mlnP(yi∣xi;ω,b)=i=1∑mln(yiP1(xi;ω,b)+(1−yi)P0(xi;ω,b))=i=1∑m(yilnh(xi)+(1−yi)ln(1−h(xi)))
损失函数为
E
(
ω
,
b
)
=
−
∑
i
=
1
m
(
y
i
ln
h
(
x
i
)
+
(
1
−
y
i
)
ln
(
1
−
h
(
x
i
)
)
)
E(\omega,b)=-\sum_{i=1}^m(y_i\ln h(x_i)+(1-y_i)\ln(1-h(x_i)))
E(ω,b)=−i=1∑m(yilnh(xi)+(1−yi)ln(1−h(xi)))
3. 损失函数求解
3.1 梯度下降法
令
ω
^
=
(
ω
;
b
)
\hat{\omega}=(\omega;b)
ω^=(ω;b)
E
(
ω
^
,
b
)
=
−
∑
i
=
1
m
(
y
i
ln
h
(
x
i
)
+
(
1
−
y
i
)
ln
(
1
−
h
(
x
i
)
)
)
=
−
∑
i
=
1
m
(
y
i
(
ω
T
x
i
+
b
)
−
ln
(
1
+
e
ω
T
x
i
+
b
)
)
=
−
∑
i
=
1
m
(
y
i
ω
^
T
x
i
−
ln
(
1
+
e
ω
^
T
x
i
)
)
∂
E
∂
ω
^
=
−
∑
i
=
1
m
(
y
i
x
−
e
ω
^
T
x
1
+
e
ω
^
T
x
x
)
=
∑
i
=
1
m
(
h
(
x
i
)
−
y
i
)
x
i
=
X
T
(
h
(
X
)
−
Y
)
\begin{aligned} E(\hat{\omega},b)&=-\sum_{i=1}^m(y_i\ln h(x_i)+(1-y_i)\ln(1-h(x_i)))\\ &=-\sum_{i=1}^m(y_i(\omega^Tx_i+b)-\ln(1+e^{\omega^Tx_i+b}))\\ &=-\sum_{i=1}^m(y_i\hat{\omega}^Tx_i-\ln(1+e^{\hat{\omega}^Tx_i}))\\ \frac{\partial E}{\partial \hat{\omega}}&=-\sum_{i=1}^m(y_ix-\frac{e^{\hat{\omega}^Tx}}{1+e^{\hat{\omega}^Tx}} x)\\ &=\sum_{i=1}^m(h(x_i)-y_i)x_i\\ &=X^T(h(X)-Y) \end{aligned}
E(ω^,b)∂ω^∂E=−i=1∑m(yilnh(xi)+(1−yi)ln(1−h(xi)))=−i=1∑m(yi(ωTxi+b)−ln(1+eωTxi+b))=−i=1∑m(yiω^Txi−ln(1+eω^Txi))=−i=1∑m(yix−1+eω^Txeω^Txx)=i=1∑m(h(xi)−yi)xi=XT(h(X)−Y)
其中
X
X
X为为
m
×
n
m\times n
m×n维的矩阵,
m
m
m代表样本的个数,
n
n
n代表样本的特征数。参数更新公式为:
ω
^
=
ω
^
−
α
X
T
(
h
(
X
)
−
Y
)
\hat{\omega}=\hat{\omega}-\alpha X^T(h(X)-Y)
ω^=ω^−αXT(h(X)−Y)