sklearn.ensemble之RandomForestClassifier源码解读（一）

最新推荐文章于 2024-07-24 22:51:57 发布

赫夫曼树

最新推荐文章于 2024-07-24 22:51:57 发布

阅读量3.2k

点赞数 2

分类专栏： sklearn 文章标签：源码集成学习随机森林分类算法

本文链接：https://blog.csdn.net/ys676623/article/details/78111196

版权

本文详细解读了sklearn.ensemble库中的RandomForestClassifier类，涵盖了其参数设置，如bootstrap、criterion、max_features等，并探讨了其属性，如estimators_、classes_、feature_importances_等。通过对源码的分析，帮助理解随机森林分类器的工作原理。

摘要由CSDN通过智能技术生成

class RandomForestClassifier(ForestClassifier)

    A random forest classifier.

    A random forest is a meta estimator that fits a number of decision tree
    classifiers on various sub-samples of the dataset and use averaging to
    improve the predictive accuracy and control over-fitting.

    The sub-sample size is always the same as the original
    input sample size but the samples are drawn with replacement if
    `bootstrap=True` (default).

    # 将数据集（dataset）分成若干子集（sub-sample）
    # 每个子集作为一棵决策树（decision tree）的训练集（training data）
    # 参数 bootstrap 的值会影响到数据子集（sub-sample）的划分

参数（Parameters）:

[ bootstrap ] ==> boolean, optional (default=True)

    Whether bootstrap samples are used when building trees.

    # 构建树（即子分类器）的时候，样本选取是否采用有放回抽样。

[ criterion ] ==> string, optional (default=”gini”)

    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "entropy" for the information gain.

    Note: this parameter is tree-specific.

    # 不纯度判断标准，判断决策树节点是否需要继续分裂时采用的计算方法，
    # 默认是gini，可以修改为entropy。

[ max_features ] ==> int, float, string or None, optional (default=”auto”)

    The number of features to consider when looking for the best split:
        - If int, then consider `max_features` features at each split.
        - If float, then `max_features` is a percentage and
          `int(max_features * n_features)` features are considered at each split.
        - If "auto", then `max_features=sqrt(n_features)`.
        - If "sqrt", then `max_features=sqrt(n_features)` (same as "auto").
        - If "log2", then `max_features=log2(n_features)`.
        - If None, then `max_features=n_features`.

    Note: the search for a split does not stop until at least one
        valid partition of the node samples is found, even if it requires to
        effectively inspect more than ``max_features`` features.

    # 节点分裂的时候，参与判断的最大特征数，默认是auto模式。
    #        int：个数
    #        float：占所有特征的百分比
    #        auto：所有特征数的开方
    #        sqrt：所有特征数的开方
    #        log2：所有特征数的log2值
    #        None：等于所有特征数

[ max_depth ] ==> integer or None, optional (default=None)

    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    mi

最低0.47元/天解锁文章

赫夫曼树

关注

2
点赞
踩
16

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录