目录
sklearn数据集
数据集划分
机器学习一般的数据集会划分为两个部分:
训练数据:用于训练,构建模型
测试数据:在模型检验时使用,用于评估模型是否有效
scikit-learn数据集API介绍
获取数据集返回的类型
sklearn分类数据集
from sklearn.datasets import load_iris
li = load_iris()
print("获取特征值")
print(li.data)
print("目标值")
print(li.target)
数据集进行分割
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
li = load_iris()
# 注意返回值,训练集 train x_train y_train,测试集 test x_test y_test
x_train, x_test, y_train, y_test = train_test_split(li.data,li.traget,test_size=0.25)
print("训练集特征值和目标值:",x_train, y_train)
print("测试集特征值和目标值:",x_test, y_test)
用于分类的大数据集
from sklearn.datasets import load_iris, fetch_20newsgroups
from sklearn.model_selection import train_test_split
li = load_iris()
news = fetch_20newsgroups(subset='all')
print(news.data)
print(news.target)
sklearn回归数据集
from sklearn.datasets import load_iris, fetch_20newsgroups, load_boston
from sklearn.model_selection import train_test_split
li = load_iris()
lb = load_boston()
print(lb.data)
print(lb.target)
print(lb.DESCR)
转换器与估计器
转换器:
估计器: