2、逻辑回归Logistic regression

Cherish450

已于 2023-08-09 10:23:07 修改

阅读量54

点赞数

分类专栏：机器学习文章标签：逻辑回归算法机器学习

于 2023-08-08 20:40:35 首次发布

本文链接：https://blog.csdn.net/qq_53453329/article/details/132174926

版权

机器学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

逻辑回归

1、介绍：
2、推导：
3、分类实战

1、介绍：

1、经典的二分类算法，也可以多分类
2、可用于非线性的
3、虽然叫回归，但是解决的是分类问题，因为输入时线性的，输出是非线性的

2、推导：

0、Sigmoid函数
在这里插入图片描述
自变量是任意实数，值域是[0,1]
将任意的输入映射到[0,1]区间，就可以完成由值到概率的转换，也就是分类问题，通过比较两个分类的概率大小，决定分类类别

2、扩展到逻辑回归：
1、预测函数：在这里插入图片描述

2、分类任务
在这里插入图片描述
整合后：
3、化成似然函数

此时应用梯度上升求最大值，引入转换成求梯度下降的最小值

4、用梯度下降法求其极值
在这里插入图片描述
3、由二分类推广到多分类

3、分类实战

1、sklearn内置项

from sklearn import datasets
iris = datasets.load_iris()
print(list(iris.keys()))#['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']
#DESC：详细内容
print(type(iris))#<class 'sklearn.utils._bunch.Bunch'>

2、特征方程

1、数据预处理，处理成二分类

X = iris['data'][:,3:]
y = (iris['target'] == 2).astype(np.int)

2、训练模型

from sklearn.linear_model import LogisticRegression
log_res = LogisticRegression()
log_res.fit(X,y)

3、测试模型

X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = log_res.predict_proba(X_new)

随着特征数据的变化，结果的概率值会随之改变

plt.figure(figsize=(12,5))
decision_boundary = X_new[y_proba[:,1]>=0.5][0]
print(y_proba[:5,1])
plt.plot([decision_boundary,decision_boundary],[-1,2],'k:',linewidth = 2)
plt.plot(X_new,y_proba[:,1],'g-',label = 'Iris-Virginica')
plt.plot(X_new,y_proba[:,0],'b--',label = 'Not Iris-Virginica')
plt.arrow(decision_boundary,0.08,-0.3,0,head_width = 0.05,head_length=0.1,fc='b',ec='b')
plt.arrow(decision_boundary,0.92,0.3,0,head_width = 0.05,head_length=0.1,fc='g',ec='g')
plt.text(decision_boundary+0.02,0.15,'Decision Boundary',fontsize = 16,color = 'k',ha='center')
plt.xlabel('Peta width(cm)',fontsize = 16)
plt.ylabel('y_proba',fontsize = 16)
plt.axis([0,3,-0.02,1.02])
plt.legend(loc = 'center left',fontsize = 16)

在这里插入图片描述

3、决策边界

步骤：
1、构建坐标数据，合理范围中，根据实际训练时输入的数据来决定
（找到坐标轴的最大，最小值）

print(X[:,0].min(),X[:,0].max(),X[:,1].min(),X[:,1].max())
#1.0 6.9 0.1 2.5

2、整合坐标，得到所有输入数据坐标点
3、预测，得到所有点的概率值
4、绘制等高线，完成决策边界

#1、网格化
x0,x1 = np.meshgrid(np.linspace(1.0,7,500).reshape(-1,1),np.linspace(0.1,2.7,200).reshape(-1,1))
#2、拉平组成新数组
X_new = np.c_[x0.ravel(),x1.ravel()]
y_proba = log_res.predict_proba(X_new)
#3、预测作为新坐标，就像地形图一样是凸起来的
#注意一定要原形状保持一致
zz = y_proba[:,1].reshape(x0.shape)
#画地形图
contour = plt.contour(x0,x1,zz,cmap=plt.cm.brg)
#等高线进行标注
plt.clabel(contour,inline = 1)
plt.axis([2.9,7,0.8,2.7])
plt.text(3.5,1.5,'NOT Vir',fontsize = 16,color = 'b')
plt.text(6.5,2.3,'Vir',fontsize = 16,color = 'g')

plt.figure(figsize=(10,4))
plt.plot(X[y==0,0],X[y==0,1],'bs')
plt.plot(X[y==1,0],X[y==1,1],'g^')

在这里插入图片描述

4、多分类

可以当作多个二分类
1、数据处理

X = iris['data'][:,(2,3)]
y = iris['target']

2、训练模型

softmax_reg = LogisticRegression(multi_class = 'multinomial',solver='lbfgs')
softmax_reg.fit(X,y)

1、直接预测是那个类别
softmax_reg.predict([[5,2]])
array([2])
2、预测出属于各类别的概率
softmax_reg.predict_proba([[5,2]])
array([[2.43559894e-04, 2.14859516e-01, 7.84896924e-01]])

3、画图展示
在这里插入图片描述