Python机器学习之LogisticRegression——鸢尾花分类

Python机器学习之LogisticRegression:

鸢尾花分类问题:

鸢尾花分类及特征属性:
鸢尾花
鸢尾花是一种多年生草本植物。sklearn.datasets.load_iris()数据集将其归为三类:setosa,versicolor和virginnica,分别标注为0, 1, 2;包含了鸢尾花的四种特征维度,分别是花萼的长度、宽度和花瓣的长度、宽度。

逻辑回归——建立模型

import numpy as np
from sklearn import datasets  # sklearn数据集
import matplotlib.pyplot as plt  # 绘图
from sklearn.linear_model import LogisticRegression  # 导入逻辑回归
data_iris = datasets.load_iris()  # 鸢尾花数据集
print(list(data_iris.keys()))
print(data_iris['DESCR'])

print(list(data_iris.keys()))结果为
[‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’, ‘filename’]
print(data_iris[‘DESCR’])结果包含:
Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================
不难发现,petal length(花瓣长度)和petal width(花瓣宽度)对Class Correlation影响最大。作为初学者,这里以petal width作为预测指标展开讲解,以达到简单易懂的效果,从而不会产生疲惫难学之感。

X = data_iris['data'][:, 3:]  # 二维数组 'petal width (cm)'
y = data_iris['target']
log_reg = LogisticRegression(multi_class='ovr', solver='sag')
log_reg.fit(X, y)
print(X)  #  150个鸢尾花花瓣宽度
print(y)  # 标签检测情况

print(X)结果为:[[0.2]
[0.2]
[0.2]
[0.2]
[0.2]
[0.4]
[0.3]
[0.2]
[0.2]
[0.1]
[0.2]
[0.2]
[0.1]
[0.1]
[0.2]
[0.4]
[0.4]
[0.3]
[0.3]
[0.3]
[0.2]
[0.4]
[0.2]
[0.5]
[0.2]
[0.2]
[0.4]
[0.2]
[0.2]
[0.2]
[0.2]
[0.4]
[0.1]
[0.2]
[0.2]
[0.2]
[0.2]
[0.1]
[0.2]
[0.2]
[0.3]
[0.3]
[0.2]
[0.6]
[0.4]
[0.3]
[0.2]
[0.2]
[0.2]
[0.2]
[1.4]
[1.5]
[1.5]
[1.3]
[1.5]
[1.3]
[1.6]
[1. ]
[1.3]
[1.4]
[1. ]
[1.5]
[1. ]
[1.4]
[1.3]
[1.4]
[1.5]
[1. ]
[1.5]
[1.1]
[1.8]
[1.3]
[1.5]
[1.2]
[1.3]
[1.4]
[1.4]
[1.7]
[1.5]
[1. ]
[1.1]
[1. ]
[1.2]
[1.6]
[1.5]
[1.6]
[1.5]
[1.3]
[1.3]
[1.3]
[1.2]
[1.4]
[1.2]
[1. ]
[1.3]
[1.2]
[1.3]
[1.3]
[1.1]
[1.3]
[2.5]
[1.9]
[2.1]
[1.8]
[2.2]
[2.1]
[1.7]
[1.8]
[1.8]
[2.5]
[2. ]
[1.9]
[2.1]
[2. ]
[2.4]
[2.3]
[1.8]
[2.2]
[2.3]
[1.5]
[2.3]
[2. ]
[2. ]
[1.8]
[2.1]
[1.8]
[1.8]
[1.8]
[2.1]
[1.6]
[1.9]
[2. ]
[2.2]
[1.5]
[1.4]
[2.3]
[2.4]
[1.8]
[1.8]
[2.1]
[2.4]
[2.3]
[1.9]
[2.3]
[2.5]
[2.3]
[1.9]
[2. ]
[2.3]
[1.8]]
print(y)结果为:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
逻辑回归——预测分类结果
简单起见,我们用numpy.linspace生成0-3间均等划分的1000个数的二维数组。

X_new = np.linspace(0, 3, 1000).reshape(-1, 1)

预测分类

y_hat = log_reg.predict(X_new)  # 根据花瓣宽度预测属于哪类
print(y_hat)
y_prob = log_reg.predict_proba(X_new)  # 鸢尾花三种分类的可能性概率
print(y_prob)

当花瓣宽度为1.7, 1.5, 0.5时,预测一下分类。

print(log_reg.predict([[1.7], [1.5], [0.5]]))

结果为:[2 1 0]。当花瓣宽度为1.7cm时,为virginnica的可能性更大;1.5cm时,versicolor的可能性更大;0.5cm时,为setosa的可能性更大。
用matplotlib画图看一下。

plt.plot(X_new, y_prob[:, 2], 'g-', label='Iris-Virginica')
plt.plot(X_new, y_prob[:, 1], 'r-', label='Iris-Versicolour')
plt.plot(X_new, y_prob[:, 0], 'b-', label='Iris-Setosa')
plt.legend()
plt.show()

根据花瓣宽度预测属于哪类花
横坐标代表花瓣的宽度,从0cm到3cm。纵坐标代表概率值,概率值越大,可能性越大。
至此,以花瓣宽度为单维度指标的鸢尾花分类的模型建立和分类预测已经完成。

  • 4
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值