Datawahle机器学习算法-Task1基于逻辑回归的分类预测

最新推荐文章于 2024-07-24 19:35:30 发布

cyberEther

最新推荐文章于 2024-07-24 19:35:30 发布

阅读量187

点赞数

分类专栏： Datawhale-数据分析文章标签：机器学习算法可视化 python 逻辑回归

本文链接：https://blog.csdn.net/weixin_45404964/article/details/108135597

版权

Datawhale-数据分析专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Datawahle机器学习算法-Task1基于逻辑回归的分类预测

简介

本次作业采用阿里云提供的天池平台，模型调用自scikit-learn的linear_model.LogisticRegression。本次作业之后掌握的知识包括1. 基础的逻辑回归理论，2. 用scikt-learn实现预测任务，3. 决策边界的绘图

理论

逻辑回归实际上是一种因变量基于Binomial分布的模型，从而处理的是分类问题。线性模型经过sigmoid函数变换得到样本是某一类别的概率。采用梯度下降法得到参数。

具体实现

拟合和预测

x_fearures = np.array([[-1, -2], [-2, -1], [-3, -2], [1, 3], [2, 1], [3, 2]])
y_label = np.array([0, 0, 0, 1, 1, 1])

## 调用逻辑回归模型
lr_clf = LogisticRegression()

## 用逻辑回归模型拟合构造的数据集
lr_clf = lr_clf.fit(x_fearures, y_label) #其拟合方程为 y=w0+w1*x1+w2*x2

参数权重和Y轴的截距

print(f"weights of the LR: {lr_clf.coef_}")
print(f"the intercept: {lr_clf.intercept_}")

样本分布

plt.figure()
plt.scatter(x_fearures[:,0], x_fearures[:,1], c=y_label, s=50, cmap="viridis")

在这里插入图片描述

可视化决策边界

plt.figure()
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

nx, ny = 200, 100
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx),np.linspace(y_min, y_max, ny))

z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()])
z_proba = z_proba[:, 1].reshape(x_grid.shape)
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

在这里插入图片描述

可视化预测新样本

plt.figure()
## new point 1
x_fearures_new1 = np.array([[0, -1]])
plt.scatter(x_fearures_new1[:,0],x_fearures_new1[:,1], s=50, cmap='viridis')
plt.annotate(s='New point 1',xy=(0,-1),xytext=(-2,0),color='blue',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))

## new point 2
x_fearures_new2 = np.array([[1, 2]])
plt.scatter(x_fearures_new2[:,0],x_fearures_new2[:,1], s=50, cmap='viridis')
plt.annotate(s='New point 2',xy=(1,2),xytext=(-1.5,2.5),color='red',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))

## 训练样本
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

# 可视化决策边界
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

在训练集和测试集上分布利用训练好的模型进行预测

y_label_new1_predict=lr_clf.predict(x_fearures_new1)
y_label_new2_predict=lr_clf.predict(x_fearures_new2)
print('The New point 1 predict class:\n',y_label_new1_predict)
print('The New point 2 predict class:\n',y_label_new2_predict)

用predict_proba函数预测其概率

y_label_new1_predict_proba=lr_clf.predict_proba(x_fearures_new1)
y_label_new2_predict_proba=lr_clf.predict_proba(x_fearures_new2)
print('The New point 1 predict Probability of each class:\n',y_label_new1_predict_proba)
print('The New point 2 predict Probability of each class:\n',y_label_new2_predict_proba)

cyberEther

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Datawahle机器学习算法-Task1基于逻辑回归的分类预测

Datawahle机器学习算法-Task1基于逻辑回归的分类预测简介本次作业采用阿里云提供的天池平台，模型调用自scikit-learn的linear_model.LogisticRegression。本次作业之后掌握的知识包括1. 基础的逻辑回归理论，2. 用scikt-learn实现预测任务，3. 决策边界的绘图理论逻辑回归实际上是一种因变量基于Binomial分布的模型，从而处理的是分类问题。线性模型经过sigmoid函数变换得到样本是某一类别的概率。采用梯度下降法得到参数。具体实现拟合和
复制链接

扫一扫

专栏目录