机器学习 - 逻辑回归(Logistic Regression)

最新推荐文章于 2024-06-20 15:55:13 发布

Nora Taki

最新推荐文章于 2024-06-20 15:55:13 发布

阅读量205

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/u011974819/article/details/100111961

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. 目标

线性回归本身是一种回归算法，目的在于利用训练集所得出的学习器，预测新数据所对应的连续值结果。而逻辑回归，其本质是一种分类算法，是希望通过拟合一条曲线，以曲线为决策边界，将空间中标签值不同的点进行划分。

2. 算法原理

2.1 sigmoid函数

$y=\frac{1}{1+e^{-z}}$
在这里插入图片描述
sigmoid函数在输入趋近于正无穷时，函数值趋近于1；输入趋近于负无穷时，函数值趋近于0；且整个曲线平滑。故可以利用sigmoid函数的特性，对标签值进行判断。

2.2 边界函数与损失函数

假设空间上有两类点，现在要找到两类点的划分边界。
假定 $\Theta X$ 是划分边界的曲线，所希望的是所有正样本都能得到 $f (x) > 0$ ，和所有负样本都能得到 $f (x) < 0$
而此处利用sigmoid函数， $\frac{1}{1+e^{-f(x)}}$ 得到的特征是 $S (f (x))$ 的值永远在 $(0, 1)$ 之间，且当 $f (x) = 0$ 的时候 $S (x) = 0.5$
换个思路，可以把 $S (f (x))$ 看成一个是正样本的概率 $P$ ，当 $f (x) > 0$ 时，概率 $P > 0.5$ ，反之 $P < 0.5$

由此引出了对数损失/二元交叉熵公式
设 $h_\theta(x) = S(f(x))$
$Cost(h_\theta(x),y) = \begin{cases}-log(h_\theta(x)), y = 1\\-log(1-h_\theta(x)), y = 0\\\end{cases}$
在这里插入图片描述

当 $y = 1$ ，即正样本时， $f (x)$ 越趋近于正无穷， $h_\theta(x)$ 即正样本概率 $P$ 越趋近于1，损失函数值越趋近于0
当 $y = 0$ ，即负样本时， $f (x)$ 越趋近于负无穷， $h_\theta(x)$ 即正样本概率 $P$ 越趋近于0，损失函数值越趋近于0
然后进一步引出损失函数，其中m是样本数量：
$J(\theta) =\frac{1}{m}\sum^m_{i=1}Cost(h_\theta(x^{(i)},y^{(i)}))$ 可以将 $Cost(h_\theta(x^{(i)},y^{(i)}))$ 拆开，这里限定 $y\in\{0,1\}$ ： $J(\theta) =-\frac{1}{m}[\sum^m_{i=1}y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]$ 再进一步，为了防止曲线过拟合，增加正则项，其中n是属性系数个数： $J(\theta) =-\frac{1}{m}[\sum^m_{i=1}y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))] + \frac{\lambda}{2m}\sum^n_{j=1}\theta^2_j$

于是问题就又转化为了损失函数的极小值求解问题，由于是凸函数，依旧可以使用梯度下降法递归求解求解。求解后得到 $\Theta$ ，就可以得到分类器 $h (x)$ ，从而对新数据进行预测，而此时预测的结果是一个 $(0, 1)$ 的值，相当是属于某一分类的概率。

2.3 多分类问题

上面讨论的是二分类情况，多分类有两种思路，one vs one和one vs rest

one vs one
即两两进行分类，形成分类器，每一个分类器对于新数据均能获得一个概率，取其中最大的概率所对应的分类作为当前数据点所属分类
one vs rest
即对某一个分类进行yes或no的判断，形成分类器，每一个分类器对于新数据均能获得一个是此分类的概率，取其中最大的概率所对应的分类作为当前数据点所属分类

3. Python实例 - 鸢尾花

sklearn库中load_iris是鸢尾花的数据集，以此进行实验。

3.1 两特征二分线性逻辑回归

from sklearn.datasets import load_iris
data = load_iris()
data.keys()
# dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

print(data.feature_names,data.target_names)
# ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] ['setosa' 'versicolor' 'virginica']

# 绘制setosa和versicolor关于属性sepal length (cm)和sepal width (cm)的散点图
target = data.target
feature = data.data

t_setosa = np.argwhere(target==0)
t_versicolor = np.argwhere(target==1)
setosa_feature = feature[t_setosa[0][0]:t_setosa[-1][0] + 1]
versicolor_feature = feature[t_versicolor[0][0]:t_versicolor[-1][0] + 1]

plt.scatter(setosa_feature[:,0], setosa_feature[:,1], marker='+', c='c', s=60, linewidth=2)
plt.scatter(versicolor_feature[:,0], versicolor_feature[:,1], marker='.', c='r', s=60, linewidth=2)
plt.title('iris')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.legend(['setosa','versicolor'])
plt.show()

在这里插入图片描述

# 拼接数据
X = np.concatenate((setosa_feature[:,:2],versicolor_feature[:,:2]),axis=0)
y = target[t_setosa[0][0]:t_versicolor[-1][0] + 1]

# 使用逻辑回归拟合
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(random_state=0, solver='lbfgs',
                         multi_class='multinomial')
clf.fit(X,y)
print(clf.coef_,clf.intercept_)
# [[ 1.93673146 -1.9187342 ]] [-4.51924367]

# 绘制拟合线
predict_x1 = np.arange(4.5,6.5,0.1)
predict_x2 = (clf.intercept_[0] + clf.coef_[:,0][0] * predict_x1)/ -clf.coef_[:,1][0]

plt.plot(predict_x1,predict_x2,c='g',label='Logistic Regression')
plt.scatter(setosa_feature[:,0], setosa_feature[:,1], marker='+', c='c', s=60, linewidth=2,label='setosa')
plt.scatter(versicolor_feature[:,0], versicolor_feature[:,1], marker='.', c='r', s=60, linewidth=2,label='versicolor')
plt.title('iris')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.legend()
plt.show()

在这里插入图片描述

3.2 三特征二分逻辑回归

# 使用线性拟合
from sklearn.linear_model import LogisticRegression
X = np.concatenate((setosa_feature[:,:3],versicolor_feature[:,:3]),axis=0)
y = target[t_setosa[0][0]:t_versicolor[-1][0] + 1]

clf = LogisticRegression(random_state=0, solver='lbfgs',
                         multi_class='multinomial')
clf.fit(X,y)
print(clf.coef_,clf.intercept_)
# [[ 0.26169734 -0.60095286  1.48847771]] [-3.59941687]

# 绘制拟合平面
predict_x1 = np.arange(4.5,6.5,0.1)
predict_x2 = np.arange(2.5,4,0.1)
predict_x1, predict_x2 = np.meshgrid(predict_x1, predict_x2)
predict_x3 = (clf.intercept_[0] + clf.coef_[:,0][0] * predict_x1 + clf.coef_[:,1][0])/ -clf.coef_[:,2][0]

fig = plt.figure(figsize=(10,6))
ax = fig.gca(projection='3d')
ax.scatter(setosa_feature[:,0], setosa_feature[:,1], setosa_feature[:,2], zdir='z', s=20, c='c', depthshade=True)
ax.scatter(versicolor_feature[:,0], versicolor_feature[:,1], versicolor_feature[:,2], zdir='z', s=20, c='r', depthshade=True)
surf = ax.plot_surface(predict_x1, predict_x2, predict_x3, rstride=1, cstride=1,linewidth=0, antialiased=False)
ax.set_xlabel('sepal length (cm)')
ax.set_ylabel('sepal width (cm)')
ax.legend(['setosa','versicolor'])
ax.set_xlim(4.2, 7)
ax.set_ylim(2.3, 4.5)
ax.set_zlim(1, 6)
ax.view_init(elev=2., azim=25)
plt.show()