数据划分和介绍
数据集划分为训练集和测试集,训练集比例较大
from sklearn.datasets import load_iris
li=load_iris() #字典格式
print("获取特征值")
print(li.data)
print("获取目标值")
print(li.target)
print(li.DESCR)
运行结果:
获取特征值
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]...
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]]
获取目标值
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2...]
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- ...
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
li=load_iris()
train_test_split(li.data,li.target,test_size=0.25) #返回训练集和测试集
#训练集train x_train y_train 测试集 test x_test y_test
#特征值 目标值
x_train,x_test,y_train,y_test=train_test_split(li.data,li.target,test_size=0.25) #返回训练集和测试集
#顺序固定好
print("训练集特征值和目标值:",x_train,y_train)
print("测试集特征值和目标值:",x_test,y_test)
from sklearn.datasets import load_iris,fetch_20newsgroups,load_boston
news=fetch_20newsgroups(subset='all') #获取新闻数据集
print(news.data)
print(news.target)
lb=load_boston() #获取回归数据集
print("获取特征值")
print(lb.data)
print("获取目标值")
print(lb.target)
print(lb.DESCR)
转换器与估计器
fit_transform=fit+transform
两次结果不同的原因是:fit出来的是一套标准,第一个fit是相同的数据,所以平均值等都一样,第二个不一样,出来的标准也就不一样