oob(out-of-bag)和关于bagging的更多讨论

最新推荐文章于 2021-09-18 23:04:31 发布

_卷心菜_

最新推荐文章于 2021-09-18 23:04:31 发布

阅读量471

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/Thumb_/article/details/113279465

版权

机器学习专栏收录该内容

29 篇文章 4 订阅

订阅专栏

在这里插入图片描述不用 train_test_split

#%%  使用oob

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

bagging_clf = BaggingClassifier(DecisionTreeClassifier(),
                                n_estimators=500,max_samples=100,  # 集成500个决策树这样的子模型，每个子模型要看100个样本数
                                bootstrap=True,oob_score=True)   # bootstrap=True,放回取样   oob_score=True，记录哪些被取到 
bagging_clf.fit(X,y)

bagging_clf.oob_score_

结果：0.918
在这里插入图片描述 n_jobs=-1，使用所有内核并行操作，观察运行时间

%%time
bagging_clf = BaggingClassifier(DecisionTreeClassifier(),
                                n_estimators=500,max_samples=100,  # 集成500个决策树这样的子模型，每个子模型要看100个样本数
                                bootstrap=True,oob_score=True)   # 选择 True ,放回取样
bagging_clf.fit(X,y)

结果：Wall time: 914 ms

#%% n_jobs
%%time
bagging_clf = BaggingClassifier(DecisionTreeClassifier(),
                                n_estimators=500,max_samples=100,  # 集成500个决策树这样的子模型，每个子模型要看100个样本数
                                bootstrap=True,oob_score=True,    # 选择 True ,放回取样
                                n_jobs=-1)   
bagging_clf.fit(X,y)

结果：Wall time: 527 ms

在这里插入图片描述
针对特征进行随机采样

random_subspaces_clf = BaggingClassifier(DecisionTreeClassifier(),
                                n_estimators=500,max_samples=500,  
                                bootstrap=True,oob_score=True,
                                n_jobs=-1,
                                max_features=1,bootstrap_features=True)   # 取一个特征，  选择 True ,放回取样
random_subspaces_clf.fit(X,y)
random_subspaces_clf.oob_score_

结果：0.83

既针对样本，又针对特征进行随机采样

random_patches_clf = BaggingClassifier(DecisionTreeClassifier(),
                                n_estimators=500,max_samples=100,  # 集成500个决策树这样的子模型，每个子模型要看100个样本数
                                bootstrap=True,oob_score=True,
                                n_jobs=-1,
                                max_features=1,bootstrap_features=True)   # 取一个特征，  选择 True ,放回取样
random_patches_clf.fit(X,y)
random_patches_clf.oob_score_