sklearn.ensemble之RandomForestClassifier源码解读(一)

本文详细解读了sklearn.ensemble库中的RandomForestClassifier类,涵盖了其参数设置,如bootstrap、criterion、max_features等,并探讨了其属性,如estimators_、classes_、feature_importances_等。通过对源码的分析,帮助理解随机森林分类器的工作原理。
摘要由CSDN通过智能技术生成

class RandomForestClassifier(ForestClassifier)

    A random forest classifier.

    A random forest is a meta estimator that fits a number of decision tree
    classifiers on various sub-samples of the dataset and use averaging to
    improve the predictive accuracy and control over-fitting.

    The sub-sample size is always the same as the original
    input sample size but the samples are drawn with replacement if
    `bootstrap=True` (default).

    # 将数据集(dataset)分成若干子集(sub-sample)
    # 每个子集作为一棵决策树(decision tree)的训练集(training data)
    # 参数 bootstrap 的值会影响到数据子集(sub-sample)的划分

参数(Parameters):

[ bootstrap ] ==> boolean, optional (default=True)

    Whether bootstrap samples are used when building trees.

    # 构建树(即子分类器)的时候,样本选取是否采用有放回抽样。

[ criterion ] ==> string, optional (default=”gini”)

    The function to measure the quality of a split. Supported criteria are
    "gini" for the Gini impurity and "entropy" for the information gain.

    Note: this parameter is tree-specific.

    # 不纯度判断标准,判断决策树节点是否需要继续分裂时采用的计算方法,
    # 默认是gini,可以修改为entropy。

[ max_features ] ==> int, float, string or None, optional (default=”auto”)

    The number of features to consider when looking for the best split:
        - If int, then consider `max_features` features at each split.
        - If float, then `max_features` is a percentage and
          `int(max_features * n_features)` features are considered at each split.
        - If "auto", then `max_features=sqrt(n_features)`.
        - If "sqrt", then `max_features=sqrt(n_features)` (same as "auto").
        - If "log2", then `max_features=log2(n_features)`.
        - If None, then `max_features=n_features`.

    Note: the search for a split does not stop until at least one
        valid partition of the node samples is found, even if it requires to
        effectively inspect more than ``max_features`` features.

    # 节点分裂的时候,参与判断的最大特征数,默认是auto模式。
    #        int:个数
    #        float:占所有特征的百分比
    #        auto:所有特征数的开方
    #        sqrt:所有特征数的开方
    #        log2:所有特征数的log2值
    #        None:等于所有特征数

[ max_depth ] ==> integer or None, optional (default=None)

    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    mi
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值