Python下的sklearn库的datasets模块下有load_breast_cancer( )(乳腺癌数据集),load_iris( ) (鸢尾花)等数据集,
将这些数据集保存在本地,可以很方便观察,处理这些数据集。这里以load_breast_cancer()数据集为例进行操作
import pandas as pd from sklearn.tree import export_graphviz from sklearn import tree from sklearn.datasets import load_breast_cancer data = load_breast_cancer()#从sklearn.datasets下载良/恶性肿瘤预测数据 #将breast_cancer数据存入Excel表格 outputfile = "D:/PYdata/cancer.xls" column = list(data['feature_names']) df = pd.DataFrame(data.data,index=range(569),columns= column) pf = pd.DataFrame(data.target,index=range(569),columns=['outcome']) jj = df.join(pf,how='outer')#用到DataFrame的合并方法,将data.data数据与data.target数据合并
jj.to_excel(outputfile)#将数据保存到outputfile文件中
load_iris() |