逻辑回归算法原理
- 逻辑回归算法是一个分类算法,经典的二分类算法
- 在机器学习中,关于算法的选择:先逻辑回归再用复杂的,能简单还是用简单的
- 逻辑回归的决策边界:可以是线性的,也可以是非线性的
Sigmoid 函数
公式:
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
特点:自变量取值为任意实数,值域[0,1]
解释:将任意的输入映射到了[0,1]区间我们在线性回归中可以得到一个预测值,再将该值映射到Sigmoid 函数中这样就完成了由值到概率的转换,也就是分类任务
简化与求解
预测函数:
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
h_\theta (x) = g(\theta ^Tx)=\frac{1}{1+e^{-\theta ^Tx}}
hθ(x)=g(θTx)=1+e−θTx1
其中 θ 0 + θ 1 x 1 + . . . θ n + x n = ∑ i = 1 n θ i x i = θ T x \theta_0+\theta_1x_1+...\theta_n+x_n = \sum_{i=1}^{n}\theta_ix_i = \theta^Tx θ0+θ1x1+...θn+xn=∑i=1nθixi=θTx
分类任务:
P
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
P
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
\begin{aligned} P(y = 1|x;\theta) &= h_\theta(x) \\ P(y = 0|x;\theta) &= 1-h_\theta(x) \end{aligned}
P(y=1∣x;θ)P(y=0∣x;θ)=hθ(x)=1−hθ(x)
对于这两个公式的整合公式如下:
P
(
y
=
1
∣
x
;
θ
)
=
(
h
θ
(
x
)
)
y
(
1
−
h
θ
(
x
)
)
1
−
y
P(y=1|x;\theta) = (h_\theta(x))^y(1-h_\theta(x))^{1-y}
P(y=1∣x;θ)=(hθ(x))y(1−hθ(x))1−y
就上述公式而言,对于二分类任务(0,1),整合后:
y
y
y 取1只保留
h
θ
(
x
)
h_\theta(x)
hθ(x)
y
y
y 取0只保留
1
−
h
θ
(
x
)
1-h_\theta(x)
1−hθ(x)
似然函数:
L
(
θ
)
=
∏
i
=
1
m
P
(
y
i
∣
x
i
;
θ
)
=
∏
i
=
1
m
(
h
θ
(
x
i
)
)
y
i
(
1
−
h
θ
(
x
i
)
)
1
−
y
i
L(\theta) = \prod_{i=1}^{m}P(y_i|x_i;\theta)=\prod_{i=1}^{m}(h_\theta(x_i))^{y_i}(1-h_\theta(x_i))^{1-y_i}
L(θ)=i=1∏mP(yi∣xi;θ)=i=1∏m(hθ(xi))yi(1−hθ(xi))1−yi
对数似然:
l
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
m
[
y
i
l
o
g
h
θ
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
l(\theta)=logL(\theta) =\sum _{i=1}^{m}[y_ilogh_\theta(x_i)+(1-y_i)log(1-h_\theta(x_i))]
l(θ)=logL(θ)=i=1∑m[yiloghθ(xi)+(1−yi)log(1−hθ(xi))]
对于此公式,我们想要达到目标需要应用梯度上升求最大值,所以在这里还需要一点小变动,引入
J
(
θ
)
=
−
1
m
l
(
θ
)
J(\theta)=-\frac{1}{m}l(\theta)
J(θ)=−m1l(θ) 转化为梯度下降求解
求偏导:
l
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
m
[
y
i
l
o
g
h
θ
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
∂
J
(
θ
)
∂
θ
j
=
−
1
m
∑
i
=
1
m
[
y
i
1
h
θ
(
x
i
)
∂
h
θ
(
x
i
)
∂
θ
j
−
(
1
−
y
i
)
1
1
−
h
θ
(
x
i
)
∂
h
θ
(
x
i
)
∂
θ
j
]
=
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
\begin{aligned} l(\theta)=logL(\theta) &=\sum _{i=1}^{m}[y_ilogh_\theta(x_i)+(1-y_i)log(1-h_\theta(x_i))] \\ \frac{\partial J(\theta )}{\partial \theta_j} &=-\frac{1}{m}\sum_{i=1}^{m}[y_i\frac{1}{h_\theta(x_i)}\frac{\partial h_\theta(x_i)}{\partial \theta_j}-(1-y_i)\frac{1}{1-h_\theta(x_i)}\frac{\partial h_\theta(x_i)}{\partial \theta_j}] \\ &=\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)x_i^j \end{aligned}
l(θ)=logL(θ)∂θj∂J(θ)=i=1∑m[yiloghθ(xi)+(1−yi)log(1−hθ(xi))]=−m1i=1∑m[yihθ(xi)1∂θj∂hθ(xi)−(1−yi)1−hθ(xi)1∂θj∂hθ(xi)]=m1i=1∑m(hθ(xi)−yi)xij
上面的步骤与线性回归的推导方法类似,但线性回归是根据此目标函数直接求解,逻辑回归算法在获得了目标函数之后再进行梯度下降来优化参数
参数更新公式:
θ
j
′
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
{\theta}'_j = \theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)x_i^j
θj′=θj−αm1i=1∑m(hθ(xi)−yi)xij