Sigmoid函数
y
=
1
1
+
e
−
z
y=\frac{1}{1+e^{-z}}
y=1+e−z1
之所以选择sigmoid函数是因为在二分类任务中,如果单纯采用阶梯函数的话,其不连续及在
z
=
0
z=0
z=0处不可导的性质为后续的优化带来麻烦,所以采用sigmoid函数作为一个替代。
对数几率回归模型推导
原理不再赘述,以西瓜书为基础(P59),对数几率回归的似然函数为
l
(
w
,
b
)
=
∑
i
=
1
m
ln
p
(
y
i
∣
x
i
;
w
,
b
)
=
∑
i
=
1
m
ln
(
y
i
p
1
(
x
i
;
β
^
+
(
1
−
y
i
)
p
0
(
x
i
^
;
β
)
)
)
=
∑
i
=
1
m
ln
(
y
i
e
β
T
x
i
^
1
+
e
β
T
x
i
^
+
(
1
−
y
i
)
1
1
+
e
β
T
x
i
^
)
=
−
∑
i
=
1
m
ln
(
(
y
i
−
1
)
1
1
+
e
β
T
x
i
^
−
y
i
e
β
T
x
i
^
1
+
e
β
T
x
i
^
)
\begin{aligned} l(w,b)&=\sum_{i=1}^{m}\ln p(y_i|x_i;w,b) \\ &=\sum_{i=1}^{m} \ln(y_ip_1(\hat{x_i;\beta}+(1-y_i)p_0(\hat{x_i};\beta))) \\ &=\sum_{i=1}^{m} \ln(y_i\frac{e^{\beta^T\hat{x_i}}}{1+e^{\beta^T\hat{x_i}}}+(1-y_i)\frac{1}{1+e^{\beta^T\hat{x_i}}}) \\ &=-\sum_{i=1}^{m} \ln((y_i-1)\frac{1}{1+e^{\beta^T\hat{x_i}}}-y_i\frac{e^{\beta^T\hat{x_i}}}{1+e^{\beta^T\hat{x_i}}}) \end{aligned}
l(w,b)=i=1∑mlnp(yi∣xi;w,b)=i=1∑mln(yip1(xi;β^+(1−yi)p0(xi^;β)))=i=1∑mln(yi1+eβTxi^eβTxi^+(1−yi)1+eβTxi^1)=−i=1∑mln((yi−1)1+eβTxi^1−yi1+eβTxi^eβTxi^)
考虑
y
1
=
0
y_1=0
y1=0和
y
i
=
1
y_i=1
yi=1两种情况:
-
y
=
1
y=1
y=1时:
l ( w , b ) = ∑ i = 1 m ( β T x i ^ − ln ( 1 + e β T x i ^ ) ) \begin{aligned} l(w,b)&=\sum_{i=1}^{m} (\beta^T\hat{x_i}-\ln(1+e^{\beta^T\hat{x_i}})) \end{aligned} l(w,b)=i=1∑m(βTxi^−ln(1+eβTxi^)) -
y
=
0
y=0
y=0时:
l ( w , b ) = ∑ i = 1 m ( − ln ( 1 + e β T x i ^ ) ) \begin{aligned} l(w,b)&=\sum_{i=1}^{m} (-\ln(1+e^{\beta^T\hat{x_i}})) \end{aligned} l(w,b)=i=1∑m(−ln(1+eβTxi^))
综上, l ( w , b ) l(w,b) l(w,b)表示为:
l ( w , b ) = ∑ i = 1 m ( y i β T x i ^ − ln ( 1 + e β T x i ^ ) ) l(w,b)=\sum_{i=1}^{m} (y_i \beta^T\hat{x_i}-\ln(1+e^{\beta^T\hat{x_i}})) l(w,b)=i=1∑m(yiβTxi^−ln(1+eβTxi^))