逻辑回归
逻辑回归是机器学习中的经典分方法。虽然它称之为回归,但它是分类算法,这里我觉得此算法是做了一个决策边界的拟合,即决策边界的回归,SVM的分类也可以这样理解。逻辑回归算法步骤描述如下:
- 假设决策函数;
- 引入Sigmoid函数,将函数值映射为概率值;
- 计算损失函数-交叉熵,进行参数更新:
- 若结果收敛,退出计算。
1、决策函数
假设决策函数为
θ
T
X
=
[
θ
0
+
θ
1
+
⋯
+
θ
n
]
⋅
[
1
+
x
+
⋯
+
x
n
−
1
]
T
=
0
\theta ^TX=\left[ \theta _0+\theta _1+\cdots +\theta _n \right] \cdot \left[ 1+x+\cdots +x^{n-1} \right] ^T=0
θTX=[θ0+θ1+⋯+θn]⋅[1+x+⋯+xn−1]T=0
2、假设函数(sigmoid函数)
引入sigmoid函数,将决策函数的函数值映射为某一类的概率,其表达式为:
h
θ
(
X
)
=
1
1
+
e
−
θ
T
X
h_{\theta}\left( X \right) =\frac{1}{1+e^{-\theta ^TX}}
hθ(X)=1+e−θTX1
该函数将决策边界函数映射到[0,1]之间,表现为x为某类的概率。
h
θ
(
X
)
=
{
>
0.5
,
θ
T
X
>
0
0.5
,
θ
T
X
>
0
<
0.5
,
θ
T
X
>
0
h_{\theta}\left( X \right) =\begin{cases} >0.5, \theta ^TX>0\\ 0.5, \theta ^TX>0\\ <0.5, \theta ^TX>0\\ \end{cases}
hθ(X)=⎩⎪⎨⎪⎧>0.5,θTX>00.5,θTX>0<0.5,θTX>0
两者之间得关系如下图所示:
3、预测函数
预测函数为:
Y
=
s
i
g
n
a
l
(
h
θ
(
X
)
)
=
{
0
,
h
θ
(
X
)
<
t
h
r
e
s
h
o
l
d
1
,
h
θ
(
X
)
>
t
h
r
e
s
h
o
l
d
Y=signal\left( h_{\theta}\left( X \right) \right) =\begin{cases} 0, h_{\theta}\left( X \right) <threshold\,\,\\ 1, h_{\theta}\left( X \right) >threshold\\ \end{cases}
Y=signal(hθ(X))={0,hθ(X)<threshold1,hθ(X)>threshold
其中,threshold为阈值,其设置依实际需求而定,比如,当根据肿瘤大小判断是否得癌症时,可以将阈值设置较小,以使患者进行更加全面得检查。
4、损失函数(cost function)
使用均方误差(MSE)作为损失函数,参数容易陷入局部最优,因此这里使用交叉熵作为损失函数。
cos
t
(
h
θ
(
x
)
,
y
)
=
{
−
log
(
h
θ
(
x
)
)
,
y
=
1
−
log
(
1
−
h
θ
(
x
)
)
,
y
=
0
,
即
cos
t
(
h
θ
(
x
)
,
y
)
=
−
y
log
(
h
θ
(
x
)
)
−
(
1
−
y
)
log
(
1
−
h
θ
(
x
)
)
\cos t\left( h_{\theta}\left( x \right) ,y \right) =\left\{ \begin{array}{c} \begin{array}{l} -\log \left( h_{\theta}\left( x \right) \right) , y=1\\ -\log \left( 1-h_{\theta}\left( x \right) \right) , y=0\\ \end{array}\\ \end{array} \right. \\ ,\text{即}\cos t\left( h_{\theta}\left( x \right) ,y \right) =-y\log \left( h_{\theta}\left( x \right) \right) -\left( 1-y \right) \log \left( 1-h_{\theta}\left( x \right) \right)
cost(hθ(x),y)={−log(hθ(x)),y=1−log(1−hθ(x)),y=0,即cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
所以
J
(
θ
)
=
1
m
∑
i
=
1
m
cos
t
(
h
θ
(
x
(
i
)
)
,
y
(
i
)
)
=
−
1
m
∑
i
=
0
m
(
y
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
)
J\left( \theta \right) =\frac{1}{m}\sum_{i=1}^m{\begin{array}{c} \cos t\left( h_{\theta}\left( x^{\left( i \right)} \right) ,y^{\left( i \right)} \right) =-\frac{1}{m}\sum_{i=0}^m{\left( y\log \left( h_{\theta}\left( x^{\left( i \right)} \right) \right) +\left( 1-y^{\left( i \right)} \right) \log \left( 1-h_{\theta}\left( x^{\left( i \right)} \right) \right) \right)}\\ \end{array}}
J(θ)=m1i=1∑mcost(hθ(x(i)),y(i))=−m1∑i=0m(ylog(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))))
为取得损失函数的最小值,使用梯度下降算法进行参数更新。
计算损失函数关于参数θ的梯度,其结果为:
∂
J
(
θ
)
∂
θ
j
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\frac{\partial J\left( \theta \right)}{\partial \theta _j}=\frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( x^{\left( i \right)} \right) -y^{\left( i \right)} \right)}x_{j}^{\left( i \right)}
∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
所以参数更新公式为:
θ
n
e
w
=
θ
o
l
d
−
α
∂
J
(
θ
)
∂
θ
j
=
θ
o
l
d
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta ^{new}=\theta ^{old\:}-\alpha \frac{\partial J\left( \theta \right)}{\partial \theta _j}=\theta ^{old\:}-\alpha \frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( x^{\left( i \right)} \right) -y^{\left( i \right)} \right)}x_{j}^{\left( i \right)}
θnew=θold−α∂θj∂J(θ)=θold−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
5、代码实现及可视化
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.metrics import confusion_matrix
%matplotlib
np.random.seed(3)
X, Y = make_classification(n_samples=100,n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1, n_classes=2)
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2)
x_train=np.hstack((x_train,np.ones([y_train.shape[0],1])*0.01))
x_test=np.hstack((x_test,np.ones([y_test.shape[0],1])*0.01))
m=x_train.shape[0]
Theta=np.random.rand(x_train.shape[1])
def h(x,theta):#假设函数
h1=[]
for i in range(len(x)):
h1.append(1/(1+np.power(np.e,-np.dot(theta,x[i]))))
return np.array(h1)
#print(h(x_train,Theta).shape)
def partial_theta(x,theta,y):#求偏导
return np.dot(h(x,theta)-y,x)/m
#print(partial_theta(x_train,Theta,y_train).shape)
def J(x,y,theta):#损失函数
return -np.dot(np.ones(x.shape[0]),y*np.log(h(x,theta))+(1-y)*np.log(1-h(x,theta)))/m
lr=0.02
cost=[]
iteration_times=3000
def BGD(x,theta,y):#Batch gradient decent#梯度下降进行参数更新
for i in range(iteration_times):
cost.append(J(x,y,theta))
theta=theta-lr*partial_theta(x,theta,y)
def LR_predict(x,threshold):#预测函数
y_pred=h(x,Theta)
for i in range(len(y_pred)):
if y_pred[i]>=threshold:
y_pred[i]=1
else:
y_pred[i]=0
return y_pred
def visualization(resolution):
fig=plt.figure(figsize=(20,10))
cmp = mpl.colors.ListedColormap(['b','g'])
ax1=fig.add_subplot(2,2,1)
ax1.plot(range(iteration_times),cost)
ax2=fig.add_subplot(2,2,2)
plt.xlim(x_train[:,0].min()-0.2, x_train[:,0].max()+0.2)
plt.ylim(x_train[:,1].min()-0.2, x_train[:,1].max()+0.2)
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1 #第一个特征取值范围作为横轴
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1 #第二个特征取值范围作为纵轴
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution)) #reolution是网格剖分粒度,xx1和xx2数组维度一样
Z = LR_predict(np.array([xx1.ravel(), xx2.ravel(),np.ones(len(xx1.ravel()))*0.01]).T,0.5)
Z = Z.reshape(xx1.shape)
plt.pcolormesh(xx1,xx2, Z, cmap=plt.cm.Paired)
# plt.contourf(xx1, xx2, Z, alpha=0.4,cmap =cmp )
ax2.scatter(x_test[:, 0], x_test[:, 1], alpha=0.8,marker='o', c=y_test)
ax2.scatter(x_train[:, 0], x_train[:, 1], alpha=0.8,marker='x', c=y_train)
#决策边界
ax2.plot(np.arange(X[:, 0].min(), X[:, 0].max(),0.1),-Theta[0]/Theta[1]*np.arange(X[:, 0].min(), X[:, 0].max(),0.1)-Theta[2]/Theta[1]*0.01)
confusion = confusion_matrix(LR_predict(x_test,0.5), y_test)
ax3=fig.add_subplot(2,2,3)
ax3.imshow(confusion, cmap=plt.cm.Blues)
thresh=0.2
for first_index in range(len(confusion)):
for second_index in range(len(confusion[first_index])):
plt.text(first_index, second_index, confusion[first_index][second_index],
color="red" if confusion[first_index][second_index]>thresh else "blue")
plt.show()
if __name__=="__main__":
BGD(x_train,Theta,y_train)
visualization(0.08)
print(Theta)