sklearn.datasets自带数据集介绍

最新推荐文章于 2024-09-10 16:28:36 发布

weixin_30587025

最新推荐文章于 2024-09-10 16:28:36 发布

阅读量3.1k

点赞数 3

文章标签：人工智能 python javascript ViewUI

原文链接：http://www.cnblogs.com/shujuxiong/p/11273829.html

版权

本文介绍了sklearn.datasets提供的各种数据集，包括自带的小数据集如鸢尾花、手写数字、乳腺癌等，以及如何使用load_svmlight_file加载svmlight/libsvm格式的数据。此外，还讲解了如何生成分类、回归和聚类任务的数据集，如make_blobs、make_classification等。

摘要由CSDN通过智能技术生成

sklearn 的数据集有好多个种(数据集官网链接https://scikit-learn.org/stable/auto_examples/#dataset-examples)

自带的小数据集（packaged dataset）：sklearn.datasets.load_<name>
可在线下载的数据集（Downloaded Dataset）：sklearn.datasets.fetch_<name>
计算机生成的数据集（Generated Dataset）：sklearn.datasets.make_<name>
svmlight/libsvm格式的数据集:sklearn.datasets.load_svmlight_file(...)
从买了data.org在线下载获取的数据集:sklearn.datasets.fetch_mldata(...)

①自带的数据集

其中的自带的小的数据集为：sklearn.datasets.load_<name>

这些数据集都可以在官网上查到，以鸢尾花为例，可以在官网上找到demo，http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

from sklearn.datasets import load_iris
#加载数据集
iris=load_iris()
iris.keys()　　#dict_keys(['target', 'DESCR', 'data', 'target_names', 'feature_names'])
#数据的条数和维数
n_samples,n_features=iris.data.shape
print("Number of sample:",n_samples) #Number of sample: 150 print("Number of feature",n_features)　　#Number of feature 4
#第一个样例 print(iris.data[0])　　　　　　#[ 5.1  3.5  1.4  0.2] print(iris.data.shape)　　　　#(150, 4) print(iris.target.shape)　　#(150,) print(iris.target)
"""

　　[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
　　0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
　　1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
　　2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
　　2 2]