Machine Learning with Scikit-Learn and Tensorflow 7.3 Out-of-Bag评价方式

最新推荐文章于 2022-03-15 11:21:44 发布

qinhanmin

最新推荐文章于 2022-03-15 11:21:44 发布

阅读量1.1k

点赞数

分类专栏：机器学习

机器学习专栏收录该内容

33 篇文章 0 订阅

订阅专栏

书籍信息
Hands-On Machine Learning with Scikit-Learn and Tensorflow
出版社: O’Reilly Media, Inc, USA
平装: 566页
语种: 英语
ISBN: 1491962291
条形码: 9781491962299
商品尺寸: 18 x 2.9 x 23.3 cm
ASIN: 1491962291

系列博文为书籍中文翻译
代码以及数据下载：https://github.com/ageron/handson-ml

在bagging的过程中，部分训练数据可能会被多次抽样，部分训练数据可能不会被抽样。默认情况下，假设训练数据数量是m，bagging会利用有放回的抽样抽取m个训练数据。也就是说，训练特定模型时，大约只有63%的训练数据被被抽样，剩余37%左右的数据被称为out-of-bag（oob）的训练数据。当然，训练不同模型时，这些数据可能是不同的。

因为训练特定模型时，oob的训练数据是不被使用的，所以我们可以利用这些数据对当前模型进行验证，而不需要单独的验证集或进行交叉验证。我们可以通过这些结果的平均值衡量集成学习的结果。在scikit-learn，我们可以通过设置oob_score=True实现以上思想。

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True, random_state=40
)
bag_clf.fit(X_train, y_train)
print "oob score", bag_clf.oob_score_
y_pred = bag_clf.predict(X_test)
print "test set score", accuracy_score(y_test, y_pred)
# output
# oob score 0.901333333333
# test set score 0.912

可以发现，模型在oob数据的准确率是90.1%，在测试数据的准确率是91.2%，两者非常接近。

另外，我们可以通过oob_decision_function_获得oob数据的决策函数（基础分类器能够预测类别的概率）。例如，第1个训练数据有31.6%的概率属于第1类（0），有68.4%的概率属于第2类（1）。

bag_clf.oob_decision_function_[:10]
# output
# array([[ 0.31578947,  0.68421053],
#       [ 0.34117647,  0.65882353],
#       [ 1.        ,  0.        ],
#       [ 0.        ,  1.        ],
#       [ 0.        ,  1.        ],
#       [ 0.08379888,  0.91620112],
#       [ 0.31891892,  0.68108108],
#       [ 0.02923977,  0.97076023],
#       [ 0.97687861,  0.02312139],
#       [ 0.97777778,  0.02222222]])

qinhanmin

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning with Scikit-Learn and Tensorflow 7.3 Out-of-Bag评价方式

Hands-On Machine Learning with Scikit-Learn and Tensorflow
复制链接

扫一扫