算法模型之分类模型(决策树、随机森林算法)

当决策树出现过拟合的现象的时候,我们通常使用随机森林来解决问题
随机森林的定义:
    1.随机:
        数据集的随机
        特征的随机,这里取m和特征,其中总特征M >> m
    2.森林:
        非常多棵决策树,结果为决策树结果的众数
随机森林的原理:
    1.随机:
        数据集的随机,这里主要是采用Bootstrap抽样,是一种随机有放回的抽样
        特征的随机,这里取m和特征,其中总特征M >> m,这里我们可以起到降维的作用,因此采用随机森林可以不需要对特征进行降维

# 这里我们使用决策树和随机森林对鸢尾花数据集的分类结果进行比较
# 1.数据获取
from sklearn.datasets import load_iris
iris = load_iris()
import pandas as pd
x = pd.DataFrame(iris.data, columns=iris.feature_names)
x.head()
x = x.to_dict(orient='records')
# 2.训练集和测试集分离
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, iris.target, random_state=22)
x = x.to_dict(orient='records')
# 2.训练集和测试集分离
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, iris.target, random_state=22)
# 3.字典特征提取
from sklearn.feature_extraction import DictVectorizer
transfer = DictVectorizer()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 4.决策树算法
# 1)实例化
from sklearn.tree import DecisionTreeClassifier
estimator = DecisionTreeClassifier()
# 2)交叉验证
from sklearn.model_selection import GridSearchCV
param_dict = {'max_depth': [5, 6, 7, 8, 10]}
estimator = GridSearchCV(estimator=estimator, param_grid=param_dict, cv=10)
# 3)模型评估
estimator.fit(x_train, y_train)
print(estimator.score(x_test, y_test))
print(estimator.best_estimator_)
print(estimator.best_params_)
print(estimator.best_score_)
print(estimator.cv_results_)

# 5.随机森林算法
from sklearn.ensemble import RandomForestClassifier
estimator = RandomForestClassifier()
# 2)交叉验证
from sklearn.model_selection import GridSearchCV
param_dict = {'n_estimators':[50, 100, 150, 200, 1000],'max_depth': [5, 6, 7, 8, 10]}
estimator = GridSearchCV(estimator=estimator, param_grid=param_dict, cv=10)
# 3)模型评估
estimator.fit(x_train, y_train)
print(estimator.score(x_test, y_test))
print(estimator.best_estimator_)
print(estimator.best_params_)
print(estimator.best_score_)
print(estimator.cv_results_)
from sklearn.tree import export_graphviz


estimator = RandomForestClassifier(max_depth=5, n_estimators=50)
estimator.fit(x_train, y_train)
# 循环打印每棵树
for idx, estimator in enumerate(estimator.estimators_):
    export_graphviz(estimator,
                    out_file='tree{}.dot'.format(idx),
                    feature_names=transfer.get_feature_names())

结果如下:

0.9210526315789473
DecisionTreeClassifier(max_depth=6)
{'max_depth': 6}
0.9462121212121211
{'mean_fit_time': array([0.00143809, 0.00100367, 0.00109584, 0.00099909, 0.00079765]), 'std_fit_time': array([5.52268546e-04, 1.46428803e-05, 2.98804386e-04, 1.92851747e-05,
       3.98836077e-04]), 'mean_score_time': array([0.00055649, 0.00029988, 0.00039973, 0.00019739, 0.0001996 ]), 'std_score_time': array([0.0004703 , 0.00045808, 0.00048958, 0.00039484, 0.00039921]), 'param_max_depth': masked_array(data=[5, 6, 7, 8, 10],
             mask=[False, False, False, False, False],
       fill_value='?',
            dtype=object), 'params': [{'max_depth': 5}, {'max_depth': 6}, {'max_depth': 7}, {'max_depth': 8}, {'max_depth': 10}], 'split0_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667]), 'split1_test_score': array([1., 1., 1., 1., 1.]), 'split2_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.81818182, 0.81818182]), 'split3_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split4_test_score': array([1., 1., 1., 1., 1.]), 'split5_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split6_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 1.        ]), 'split7_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split8_test_score': array([0.90909091, 1.        , 0.90909091, 0.90909091, 1.        ]), 'split9_test_score': array([1., 1., 1., 1., 1.]), 'mean_test_score': array([0.93712121, 0.94621212, 0.93712121, 0.9280303 , 0.94621212]), 'std_test_score': array([0.04122354, 0.04397204, 0.04122354, 0.05433989, 0.05988683]), 'rank_test_score': array([3, 1, 3, 5, 1])}
0.9473684210526315
RandomForestClassifier(max_depth=5, n_estimators=50)
{'max_depth': 5, 'n_estimators': 50}
0.9553030303030303
{'mean_fit_time': array([0.05018623, 0.09535296, 0.14412432, 0.18932312, 0.93800838,
       0.04907088, 0.09905245, 0.14481421, 0.20030749, 0.98239958,
       0.04997551, 0.10010819, 0.1454407 , 0.19868982, 0.93460553,
       0.04729321, 0.09386044, 0.14344218, 0.19072835, 0.96295292,
       0.04837091, 0.09713655, 0.14466164, 0.19756453, 0.98244824]), 'std_fit_time': array([0.00549551, 0.00111557, 0.00298402, 0.00355555, 0.01051734,
       0.00096784, 0.00181035, 0.00221041, 0.00877568, 0.01856308,
       0.00227718, 0.00446964, 0.0048074 , 0.01044006, 0.0083214 ,
       0.0007801 , 0.00163368, 0.00630043, 0.004367  , 0.01240057,
       0.00128542, 0.00354662, 0.00396518, 0.00851541, 0.05256273]), 'mean_score_time': array([0.00468037, 0.00767879, 0.01116941, 0.01524973, 0.0733043 ,
       0.00419998, 0.00816982, 0.01167476, 0.0161761 , 0.07470632,
       0.00458324, 0.0084605 , 0.01136177, 0.01496556, 0.07270904,
       0.00428011, 0.00797391, 0.01126175, 0.01515107, 0.07511196,
       0.00428872, 0.00787926, 0.01157291, 0.01616106, 0.07380295]), 'std_score_time': array([0.00046031, 0.00045725, 0.00038989, 0.00045339, 0.00194815,
       0.00039521, 0.00060154, 0.00046099, 0.00158587, 0.0044906 ,
       0.00080154, 0.00128708, 0.00047986, 0.00089548, 0.00211964,
       0.00044427, 0.00062439, 0.00045362, 0.00039306, 0.00244254,
       0.00044794, 0.00053758, 0.00066463, 0.00188194, 0.00178191]), 'param_max_depth': masked_array(data=[5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8,
                   8, 8, 10, 10, 10, 10, 10],
             mask=[False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False],
       fill_value='?',
            dtype=object), 'param_n_estimators': masked_array(data=[50, 100, 150, 200, 1000, 50, 100, 150, 200, 1000, 50,
                   100, 150, 200, 1000, 50, 100, 150, 200, 1000, 50, 100,
                   150, 200, 1000],
             mask=[False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False],
       fill_value='?',
            dtype=object), 'params': [{'max_depth': 5, 'n_estimators': 50}, {'max_depth': 5, 'n_estimators': 100}, {'max_depth': 5, 'n_estimators': 150}, {'max_depth': 5, 'n_estimators': 200}, {'max_depth': 5, 'n_estimators': 1000}, {'max_depth': 6, 'n_estimators': 50}, {'max_depth': 6, 'n_estimators': 100}, {'max_depth': 6, 'n_estimators': 150}, {'max_depth': 6, 'n_estimators': 200}, {'max_depth': 6, 'n_estimators': 1000}, {'max_depth': 7, 'n_estimators': 50}, {'max_depth': 7, 'n_estimators': 100}, {'max_depth': 7, 'n_estimators': 150}, {'max_depth': 7, 'n_estimators': 200}, {'max_depth': 7, 'n_estimators': 1000}, {'max_depth': 8, 'n_estimators': 50}, {'max_depth': 8, 'n_estimators': 100}, {'max_depth': 8, 'n_estimators': 150}, {'max_depth': 8, 'n_estimators': 200}, {'max_depth': 8, 'n_estimators': 1000}, {'max_depth': 10, 'n_estimators': 50}, {'max_depth': 10, 'n_estimators': 100}, {'max_depth': 10, 'n_estimators': 150}, {'max_depth': 10, 'n_estimators': 200}, {'max_depth': 10, 'n_estimators': 1000}], 'split0_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,
       0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,
       0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,
       0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,
       0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667]), 'split1_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.]), 'split2_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.]), 'split3_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split4_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.]), 'split5_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split6_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split7_test_score': array([0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091,
       0.90909091, 0.90909091, 0.90909091, 0.90909091, 0.90909091]), 'split8_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.]), 'split9_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.]), 'mean_test_score': array([0.95530303, 0.95530303, 0.95530303, 0.95530303, 0.95530303,
       0.95530303, 0.95530303, 0.95530303, 0.95530303, 0.95530303,
       0.95530303, 0.95530303, 0.95530303, 0.95530303, 0.95530303,
       0.95530303, 0.95530303, 0.95530303, 0.95530303, 0.95530303,
       0.95530303, 0.95530303, 0.95530303, 0.95530303, 0.95530303]), 'std_test_score': array([0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483,
       0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483,
       0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483,
       0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483, 0.0447483,
       0.0447483]), 'rank_test_score': array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1])}

最后,会在目录“dot文件”文件夹中出现50个dot文件

Webgraphviz网址可以可视化每一颗树

学习地址:

黑马程序员3天快速入门python机器学习_哔哩哔哩_bilibili

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值