决策树——回归决策树

模型原型
class sklearn.tree.DecisionTreeRegressor(criterion=’mse’,splitter=’best’,max_depth=None,
min_samples_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,
max_features=None,random_state=None,max_leaf_nodes=None,presor=False)
参数

  • criterion:指定切分质量的评分准则
  • splitter:指定切分原则
    • ’best’:选择最优的切分
    • ‘random’:随机切分
  • max_depth:树的最大深度
  • min_samples_split:指定每个内部节点(非叶节点)包含的最少的样本数
  • min_samples_leaf:指定每个叶子节点包含的最少样本数
  • min_weight_fraction_leaf:叶子节点中样本的最小权重系数
  • max_features:寻找best split时考虑的特征数量
    • 整数:每次切分只考虑max_features个特征
    • 浮点数:每次切分只考虑max_features*n_features个特征(max_features指定了百分比)
    • ‘auto’或’sqrt’:max_features=n_features
    • ‘log2’:max_features=log2_(n_features)
    • ‘None’:max_features=n_features
  • random_state
  • max_leaf_nodes:指定叶节点的最大数量
  • presor:是否提前排序数据从而加速寻找最优切分的过程(True:大数据集会减慢总体的训练过程;小数据集或设定了最大深度的情况下,会加速训练过程)

属性

  • featureimportances:给定特征的重要程度(又称Gini importance)
  • maxfeatures:max_features的推断值
  • nfeatures:fit之后,特征的数量
  • noutputs:fit之后,输出的数量
  • tree_:底层的决策树

方法

  • fit(X,y[,sample_weight,check_input,…])
  • predict(X[,check_input])
  • score(X,y[,sample_weight])
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn import cross_validation
import matplotlib.pyplot as plt

产生随机的数据集

def creat_data(n):
    np.random.seed(0)
    X=5*np.random.rand(n,1)
    y=np.sin(X).ravel()
    noise_num=(int)(n/5)
    y[::5]+=3*(0.5-np.random.rand(noise_num))
    return cross_validation.train_test_split(X,y,test_size=0.25,random_state=1)

使用DecisionTreeRegressor

def test_DecisionTreeRegressor(*data):
    X_train,X_test,y_train,y_test=data
    regr=DecisionTreeRegressor()
    regr.fit(X_train,y_train)
    print('Training score:%f'%(regr.score(X_train,y_train)))
    print('Testing score:%f'%(regr.score(X_test,y_test)))
    #绘图
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    X=np.arange(0.0,5.0,0.01)[:,np.newaxis]
    Y=regr.predict(X)
    ax.scatter(X_train,y_train,label='train sample',c='g')
    ax.scatter(X_test,y_test,label="test sample",c='r')
    ax.plot(X,Y,label='predict_value',linewidth=2,alpha=0.5)
    ax.set_xlabel('data')
    ax.set_ylabel('target')
    ax.set_title('Decision Tree Regresion')
    ax.legend(framealpha=0.5)
    plt.show()

X_train,X_test,y_train,y_test=creat_data(100)
test_DecisionTreeRegressor(X_train,X_test,y_train,y_test)

检验随机划分和最优化的影响

def test_DecisionTreeRegressor_splitter(*data):
    X_train,X_test,y_train,y_test=data
    splitters=['best','random']
    for splitter in splitters:
        regr=DecisionTreeRegressor(splitter=splitter)
        regr.fit(X_train,y_train)
        print('Splitter %s'%splitter)
        print('Train score:%f'%(regr.score(X_train,y_train)))
        print('Testint score:%f'%(regr.score(X_test,y_test)))

X_train,X_test,y_train,y_test=creat_data(100)
test_DecisionTreeRegressor_splitter(X_train,X_test,y_train,y_test)

决策树深度的影响

def test_DecisionTreeRegressor_depth(*data,maxdepth):
    X_train,X_test,y_train,y_test=data
    depths=np.arange(1,maxdepth)
    training_scores=[]
    testing_scores=[]
    for depth in depths:
        regr=DecisionTreeRegressor(max_depth=depth)
        regr.fit(X_train,y_train)
        training_scores.append(regr.score(X_train,y_train))
        testing_scores.append(regr.score(X_test,y_test))
    #绘图
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(depths,training_scores,label='training score')
    ax.plot(depths,testing_scores,label='testing score')
    ax.set_xlabel('maxdepth')
    ax.set_ylabel('score')
    ax.set_title('Decision Tree Regression')
    ax.legend(framealpha=0.5)
    plt.show()

X_train,X_test,y_train,y_test=creat_data(100)
test_DecisionTreeRegressor_depth(X_train,X_test,y_train,y_test,maxdepth=20)
  • 1
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值