1.了解鸢尾花数据基本信息
from sklearn import datasets
iris = datasets.load_iris()
print(iris.data) #150个鸢尾花萼片和花瓣的长宽数据
print(iris.target) #150个鸢尾花的属性种类
print(iris.target_names) #鸢尾花属性种类值所代表的名称
Iris数据
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
...
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]]
属性种类
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
种类名称
['setosa' 'versicolor' 'virginica']
2.绘制Iris数据散点图
按花萼分类
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
x = iris.data[:,0]
y = iris.data[:,1]
species = iris.target
x_min, x_max = x.min()-0.5, x.max()+0.5
y_min, y_max = y.min()-0.5, y.max()+0.5
plt.figure()
plt.title('Iris Dataset - Classfication By Sepal Sizes')
plt.scatter(x,y,c=species) #按照种类赋不同颜色
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()
按花瓣分类
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
x = iris.data[:,2]
y = iris.data[:,3]
species = iris.target
x_min, x_max = x.min()-0.5, x.max()+0.5
y_min, y_max = y.min()-0.5, y.max()+0.5
plt.figure()
plt.title('Iris Dataset - Classfication By Petal Sizes')
plt.scatter(x,y,c=species) #按照种类赋不同颜色
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()
3.主成分分解PCA 降维
如何用4项测量数据描述3种花的特点:fit_transform()函数
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
species = iris.target
x_reduced= PCA(n_components=3).fit_transform(iris.data)
fig = plt.figure()
ax = Axes3D(fig,auto_add_to_figure=False)
ax.set_title('Iris Daraset by PCA', size=14)
ax.scatter(x_reduced[:,0],x_reduced[:,1],x_reduced[:,2], c=species)
ax.set_xlabel('First eigenvector')
ax.set_ylabel('Second eigenvector')
ax.set_zlabel('Third eigenvector')
ax.w_xaxis.set_ticklabels(())
ax.w_yaxis.set_ticklabels(())
ax.w_zaxis.set_ticklabels(())
参考:
法比奥·内利. Python数据分析实战:第2版.北京:人民邮电出版社, 2019.11.