前言
逻辑回归(Logistic Regression)虽然名字叫回归,然而是一种用于分类的机器学习算法。逻辑回归的本质就是一个线性分类模型,它通过一个非线性化映射输出一个概率值来评判分类的标准。
逻辑回归使用Sigmoid函数作为假设函数:
h
θ
(
x
i
)
=
g
(
θ
T
x
i
)
=
1
1
+
e
−
θ
T
x
i
h_{\theta}(x_i) = g(\theta^Tx_i) = \frac{1}{1+e^{-\theta^Tx_i}}
hθ(xi)=g(θTxi)=1+e−θTxi1
其中,
g
(
z
)
g(z)
g(z)的图像如下:
对
g
(
z
)
=
1
/
(
1
+
e
−
z
)
g(z)=1/(1+e^{-z})
g(z)=1/(1+e−z)求导可得:
g
(
z
)
′
=
g
(
z
)
(
1
−
g
(
z
)
)
g(z)^\prime=g(z)(1-g(z))
g(z)′=g(z)(1−g(z))
算法原理
逻辑回归假设数据服从伯努利分布,从上图可以知道,Sigmoid的函数输出是介于(0,1)之间的,中间值是0.5,在二分类情况下,对于某个样本x,
h
θ
(
x
)
h_{\theta}(x)
hθ(x)<0.5表示x属于A类,
h
θ
(
x
)
h_{\theta}(x)
hθ(x)>0.5,表示x属于B类。我们以
h
θ
(
x
)
h_{\theta}(x)
hθ(x)的值表示事件1发生的概率。
p
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
p(y=1|x;\theta) = h_{\theta}(x)
p(y=1∣x;θ)=hθ(x)
p
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
p(y=0|x;\theta) = 1-h_{\theta}(x)
p(y=0∣x;θ)=1−hθ(x)
接下来需要求解参数
θ
\theta
θ,对参数
θ
\theta
θ的估计我们采用梯度上升法,因为求偏导数求解参数
θ
\theta
θ不能得到解析解。接下来我们来看关于
θ
\theta
θ的极大似然估计。
θ
\theta
θ的极大似然估计
根据上式,接下来我们可以使用概率论中极大似然估计的方法去求解损失函数,首先得到概率函数为:
p
(
y
∣
x
;
θ
)
=
(
h
θ
(
x
)
)
y
(
1
−
h
θ
(
x
)
)
1
−
y
p(y|x;\theta) = (h_{\theta}(x))^y(1-h_{\theta}(x))^{1-y}
p(y∣x;θ)=(hθ(x))y(1−hθ(x))1−y
因为样本数据(m个)独立,所以它们的联合分布可以表示为各边际分布的乘积,取似然函数为:
L
(
θ
)
=
∏
i
=
1
m
p
(
y
i
∣
x
i
;
θ
)
=
∏
i
=
1
m
[
h
θ
(
x
i
)
]
y
i
[
1
−
h
θ
(
x
i
)
]
1
−
y
i
L(\theta)=\prod_{i=1}^m p(y_i|x_i;\theta)=\prod_{i=1}^m [h_{\theta}(x_i)]^{y_i} [1-h_{\theta}(x_i)]^{1-y_i}
L(θ)=i=1∏mp(yi∣xi;θ)=i=1∏m[hθ(xi)]yi[1−hθ(xi)]1−yi
取对数似然函数:
ℓ
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
m
y
i
l
o
g
h
θ
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
\ell(\theta)=logL(\theta) = \sum_{i=1}^my_ilogh_{\theta}(x_i)+(1-y_i)log(1-h_{\theta}(x_i))
ℓ(θ)=logL(θ)=i=1∑myiloghθ(xi)+(1−yi)log(1−hθ(xi))
对数似然函数求偏导数:
∂
ℓ
(
θ
)
∂
θ
j
=
∑
i
=
1
m
y
i
h
θ
(
x
i
)
∂
h
θ
(
x
i
)
∂
θ
j
−
1
−
y
i
1
−
h
θ
(
x
i
)
∂
h
θ
(
x
i
)
∂
θ
j
=
∑
i
=
1
m
y
i
g
(
θ
T
x
i
)
∂
g
(
θ
T
x
i
)
∂
θ
j
−
1
−
y
i
1
−
g
(
θ
T
x
i
)
∂
g
(
θ
T
x
i
)
∂
θ
j
=
∑
i
=
1
m
y
i
g
(
θ
T
x
i
)
g
(
θ
T
x
i
)
[
1
−
g
(
θ
T
x
i
)
]
∂
θ
T
x
i
∂
θ
j
−
1
−
y
i
1
−
g
(
θ
T
x
i
)
g
(
θ
T
x
i
)
[
1
−
g
(
θ
T
x
i
)
]
∂
θ
T
x
i
∂
θ
j
=
∑
i
=
1
m
y
i
[
1
−
g
(
θ
T
x
i
)
]
x
i
j
−
(
1
−
y
i
)
g
(
θ
T
x
i
)
x
i
j
=
∑
i
=
1
m
(
y
i
−
g
(
θ
T
x
i
)
)
x
i
j
=
∑
i
=
1
m
(
y
i
−
h
θ
(
x
i
)
)
x
i
j
\frac{\partial \ell(\theta)}{\partial \theta_j} = \sum_{i=1}^m \frac{y_i}{h_{\theta}(x_i)} \frac{\partial h_{\theta}(x_i) }{\partial \theta_j}-\frac{1-y_i}{1-h_{\theta}(x_i)}\frac{\partial h_{\theta}(x_i) }{\partial \theta_j} \\ =\sum_{i=1}^m \frac{y_i}{g(\theta^Tx_i)} \frac{\partial g(\theta^Tx_i) }{\partial \theta_j}-\frac{1-y_i}{1-g(\theta^Tx_i)} \frac{\partial g(\theta^Tx_i) }{\partial \theta_j} \\ \qquad\qquad\qquad\qquad\qquad \qquad \qquad =\sum_{i=1}^m \frac{y_i}{g(\theta^Tx_i)} g(\theta^Tx_i)[1-g(\theta^Tx_i)] \frac{\partial \theta^Tx_i }{\partial \theta_j} -\frac{1-y_i}{1-g(\theta^Tx_i)} g(\theta^Tx_i)[1-g(\theta^Tx_i)] \frac{\partial \theta^Tx_i }{\partial \theta_j} \\ =\sum_{i=1}^m y_i[1-g(\theta^Tx_i)]x_{ij} - (1-y_i)g(\theta^Tx_i)x_{ij} \qquad \\ = \sum_{i=1}^m (y_i - g(\theta^Tx_i) ) x_{ij} = \sum_{i=1}^m (y_i - h_{\theta}(x_i) ) x_{ij} \qquad
∂θj∂ℓ(θ)=i=1∑mhθ(xi)yi∂θj∂hθ(xi)−1−hθ(xi)1−yi∂θj∂hθ(xi)=i=1∑mg(θTxi)yi∂θj∂g(θTxi)−1−g(θTxi)1−yi∂θj∂g(θTxi)=i=1∑mg(θTxi)yig(θTxi)[1−g(θTxi)]∂θj∂θTxi−1−g(θTxi)1−yig(θTxi)[1−g(θTxi)]∂θj∂θTxi=i=1∑myi[1−g(θTxi)]xij−(1−yi)g(θTxi)xij=i=1∑m(yi−g(θTxi))xij=i=1∑m(yi−hθ(xi))xij
令求偏导数的结果等于0,不能显示求得参数的值,所以我们使用梯度上升法求解似然函数最大时
θ
\theta
θ的值。
θ
\theta
θ梯度上升
θ
j
:
=
θ
j
+
α
∇
θ
j
ℓ
(
θ
)
\theta_j := \theta_j + \alpha \nabla_{\theta_j} \ell(\theta)
θj:=θj+α∇θjℓ(θ)
θ
j
:
=
θ
j
+
α
∑
i
=
1
m
(
y
i
−
h
θ
(
x
i
)
)
x
i
j
\theta_j := \theta_j + \alpha \sum_{i=1}^m (y_i - h_{\theta}(x_i) ) x_{ij}
θj:=θj+αi=1∑m(yi−hθ(xi))xij
代码实现
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
# 逻辑回归,批量梯度上升
def LogisticsRegession(X, Y, alpha, iternum):
samplesnum, sampleFeature = np.shape(X)
weights = np.ones((sampleFeature, 1))
for i in range(iternum):
hx = sigmoid(X @ weights)
weights += alpha * X.T @ (Y - hx)
return weights
def accuracyRate(weights, x, y):
numSamples = np.size(x, 0)
hx = sigmoid(x @ weights)
print(hx)
hx = hx > 0.5
hx = hx == y
print('逻辑回归模型准确率为{0}%'.format(hx.sum() / numSamples * 100))
实验结果