用python做龟兔赛跑_Logistic程序python实现——简单易懂

最新推荐文章于 2022-10-21 11:00:25 发布

weixin_39695306

最新推荐文章于 2022-10-21 11:00:25 发布

阅读量570

点赞数

文章标签：用python做龟兔赛跑

本文链接：https://blog.csdn.net/weixin_39695306/article/details/111759454

版权

之前我们已经了解了Logistic回归的分类原理(海人：logistic回归原理分析)，现在我们通过程序实现他。

我在标题写上了简单易懂，至于为什么？因为我也是今天第一次用python语言编写Logistic回归，所有的函数与库都是查阅了许多资料再整理写出的，所以相信您能看懂本篇文章。

一、编程准备

首先，我们需要用到三个库文件，分别为numpy、pandas、scikit-learn(编程或者平时都称为sklearn)，可以通过直接pip下载，也可以在这个网站上下载whl文件间接安装(https://www.lfd.uci.edu/~gohlke/pythonlibs/)，安装教程不多赘述，网上已有大量教程。

二、sklearn库介绍

以上是sklearn库的算法图，因为其包含了机器学习的大量算法(包括聚类、分类、回归、降维)，所以其应用非常广泛，本文的程序就是应用了这个包。

三、sklearn库中常用函数介绍与通用模板

可能一些朋友看见sklearn机器学习的算法就头疼，因为里面涉及了太多函数，一个个查容易丧失学习的信心，我之前学习也是这样的，所以我整理了一些常用的函数已经调用数据集的模板(仅代表个人观点，欢迎讨论)，整个程序代码已放在了文末：

from sklearn.datasets import load_iris

首先，需要导入数据集，数据集我这里选择鸢尾花的数据集，也可以选择(2)中官方链接中的数据集(模板：from sklearn.datasets import ***)，这里插入一下，简单介绍一下鸢尾花数据集：

如图(太长截取一部分)，数据集共有150行，有三个标签0，1，2 (实际为'setosa', 'versicolor', 'virginica'三种鸢尾花花型，导入数据后程序会自动转为数字标签方便处理，可以通过data.target_names查看标签实际名字)，有四个数据特征('sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)')，前0~49个数据是对应标签0的数据，50~99是对应标签1的数据，100~149是对应标签2的数据。接下来我们导入数据集：

data = load_iris()

iris_target = data.target #得到数据对应的标签

iris_features = pd.DataFrame(data=data.data, columns=data.feature_names) #利用Pandas转化为DataFrame格式

如上，得到数据的特征(.data)与标签值(.target)，同时注意到我们需要利用pandas库将数据转换为DataFrame格式，为什么要这样做？因为之后的函数参数和算法参数均要求DataFrame格式，且转换后可以使用pandas库中的函数(如tail() , head())，这也是大家都常用的一步。

我们首先进行二分类，所以，我们取前100个数据，舍弃后50个对应标签2的数据(这一步根据自身需要编写)：

iris_features_part = iris_features.iloc[:100]

iris_target_part = iris_target[:100]

之后，我们进行测试集与训练集的划分：

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(iris_features_part, iris_target_part, test_size = 0.2, random_state = 1234)

这个函数很重要，需要导入一个库文件，该函数是通过随机打乱后选取的训练集与测试集，其中test_size是指划分总数据的多少作为测试集，我选的是数据的20%作为测试集，所以剩下80%作为训练集，同时需要指定一个种子random_state，值任意。

from sklearn.linear_model import LogisticRegression

## 定义逻辑回归模型

clf = LogisticRegression(max_iter=1000)

这一步选择方法，我们选择Logistic回归方法，也可以选择其他合适的算法，但需注意开始的库引用需对应

开始训练：

# 在训练集上训练逻辑回归模型

clf.fit(x_train, y_train)

## 查看其对应的w

print('the weight of Logistic Regression:',clf.coef_)

## 查看其对应的w0

print('the intercept(w0) of Logistic Regression:',clf.intercept_)

## 在训练集和测试集上分布利用训练好的模型进行预测

训练函数：fit()，可查看对应的决策边界的系数(系数coef_、截距intercept_)

开始预测：

## 在训练集和测试集上分布利用训练好的模型进行预测

train_predict = clf.predict(x_train)

test_predict = clf.predict(x_test)

预测的函数：predict

以上，我们已经完成了Logistic回归的二分类，但是我们需要查看预测的好坏，所以我们使用一个计算精确度的函数accuracy_score：

from sklearn import metrics

print('The accuracy of the Logistic Regression for Traning Set is: %d%%' % (metrics.accuracy_score(y_train,train_predict)*100))

print('The accuracy of the Logistic Regression for Test Set is: %d%%' % (metrics.accuracy_score(y_test,test_predict)*100))

注意这里需要调用库中的metrics

最终，我们得到计算结果：

the weight of Logistic Regression: [[ 0.4441713 -0.84101101 2.20417976 0.86229397]]

the intercept(w0) of Logistic Regression: [-6.64929893]

The accuracy of the Logistic Regression is: 100%

本次Logitic分类准确度100%！

四、多元Logistic分类

只要认真看了(3)，那么多元分类便非常简单了，只需在划分时对所有数据集进行划分即可:

x_train, x_test, y_train, y_test = train_test_split(iris_features, iris_target, test_size = 0.2, random_state = 1234)

其他内容不变！

结果：

the weight of Logistic Regression:

[[-0.45928925 0.83069892 -2.26606529 -0.99743983]

[ 0.33117319 -0.72863426 -0.06841147 -0.98711029]

[ 0.12811606 -0.10206466 2.33447676 1.98455011]]

the intercept(w0) of Logistic Regression:

[ 9.4388065 3.93047365 -13.36928016]

The accuracy of the Logistic Regression is: 98%

The accuracy of the Logistic Regression is: 86%

五、完整代码(二分类+多分类)

import pandas as pd

## 从sklearn中导入逻辑回归模型

from sklearn.linear_model import LogisticRegression

## 为了正确评估模型性能，将数据划分为训练集和测试集，并在训练集上训练模型，在测试集上验证模型性能。

from sklearn.model_selection import train_test_split

from sklearn import metrics

## 我们利用 sklearn 中自带的 iris 数据作为数据载入，并利用Pandas转化为DataFrame格式

from sklearn.datasets import load_iris

data = load_iris() #得到数据特征

iris_target = data.target #得到数据对应的标签

iris_features = pd.DataFrame(data=data.data, columns=data.feature_names) #利用Pandas转化为DataFrame格式

## 选择其类别为0和1的样本 (不包括类别为2的样本)

iris_features_part = iris_features.iloc[:100]

iris_target_part = iris_target[:100]

## 测试集大小为20%， 80%/20%分

x_train, x_test, y_train, y_test = train_test_split(iris_features_part, iris_target_part, test_size = 0.2, random_state = 1234)

## 定义逻辑回归模型

clf = LogisticRegression(max_iter=1000)

# 在训练集上训练逻辑回归模型

clf.fit(x_train, y_train)

## 查看其对应的w

print('the weight of Logistic Regression:',clf.coef_)

## 查看其对应的w0

print('the intercept(w0) of Logistic Regression:',clf.intercept_)

## 在训练集和测试集上分布利用训练好的模型进行预测

train_predict = clf.predict(x_train)

test_predict = clf.predict(x_test)

## 利用accuracy(准确度)【预测正确的样本数目占总预测样本数目的比例】评估模型效果

print('The accuracy of the Logistic Regression for Traning Set is: %d%%' % (metrics.accuracy_score(y_train,train_predict)*100))

print('The accuracy of the Logistic Regression for Test Set is: %d%%' % (metrics.accuracy_score(y_test,test_predict)*100))

## 多分类

## 测试集大小为20%， 80%/20%分

x_train, x_test, y_train, y_test = train_test_split(iris_features, iris_target, test_size = 0.2, random_state = 1234)

## 定义逻辑回归模型

clf = LogisticRegression(max_iter=1000)

# 在训练集上训练逻辑回归模型

clf.fit(x_train, y_train)

## 查看其对应的w

print('the weight of Logistic Regression:\n',clf.coef_)

## 查看其对应的w0

print('the intercept(w0) of Logistic Regression:\n',clf.intercept_)

## 由于这个是3分类，所有我们这里得到了三个逻辑回归模型的参数，其三个逻辑回归组合起来即可实现三分类。

## 在训练集和测试集上分布利用训练好的模型进行预测

train_predict = clf.predict(x_train)

test_predict = clf.predict(x_test)

## 利用accuracy(准确度)【预测正确的样本数目占总预测样本数目的比例】评估模型效果

print('The accuracy of the Logistic Regression for Traning Set is: %d%%' % (metrics.accuracy_score(y_train,train_predict)*100))

print('The accuracy of the Logistic Regression for Test Set is: %d%%' % (metrics.accuracy_score(y_test,test_predict)*100))六、文献参考

[1] 阿里云-天池龙珠计划

weixin_39695306

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python做龟兔赛跑_Logistic程序python实现——简单易懂

之前我们已经了解了Logistic回归的分类原理(海人：logistic回归原理分析)，现在我们通过程序实现他。我在标题写上了简单易懂，至于为什么？因为我也是今天第一次用python语言编写Logistic回归，所有的函数与库都是查阅了许多资料再整理写出的，所以相信您能看懂本篇文章。一、编程准备首先，我们需要用到三个库文件，分别为numpy、pandas、scikit-learn(编程或者平时都称...
复制链接

扫一扫