决策树的构造与涉及参数
树模型参数:
1.criteration gini or entropy
2.spliter best (默认)or random
3.max features None
4.max_detph
5.min_sample_split(小于某个值的时候不再分裂)
6.min_sample_leaf(叶子节点最小的样本数,小于该数则剪枝)
7.min_weight_fraction_leaf(叶子节点权重和的最小值)
8.max_leaf_nodes
9.class_weight
10.min_impurity_split
n_estimators
import matplotlib.pyplot as pd
import pandas as pd
from sklearn.datasets.california_housing import fetch_california_housing
housing=fetch_california_housing()
print(housing.DESCR)
from sklearn import tree
dtr=tree.DecisionTreeRegressor(max_depth=2)
#传递x,y参数:x,数据;y,label
print(dtr.fit(housing.data[:,[6,7]],housing.target))
#可视化显示需要安装graphviz http://www.graphviz.org/Download..php
dot_data= \
tree.export_graphviz(
dtr,
out_file=None,
feature_names=housing.feature_names[6:8],#特征名字