class RandomForestClassifier(ForestClassifier)
A random forest classifier.
A random forest is a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and use averaging to
improve the predictive accuracy and control over-fitting.
The sub-sample size is always the same as the original
input sample size but the samples are drawn with replacement if
`bootstrap=True` (default).
# 将数据集(dataset)分成若干子集(sub-sample)
# 每个子集作为一棵决策树(decision tree)的训练集(training data)
# 参数 bootstrap 的值会影响到数据子集(sub-sample)的划分
参数(Parameters):
[ bootstrap ] ==> boolean, optional (default=True)
Whether bootstrap samples are used when building trees.
# 构建树(即子分类器)的时候,样本选取是否采用有放回抽样。
[ criterion ] ==> string, optional (default=”gini”)
The function to measure the quality of a split. Supported criteria are
"gini" for the Gini impurity and "entropy" for the information gain.
Note: this parameter is tree-specific.
# 不纯度判断标准,判断决策树节点是否需要继续分裂时采用的计算方法,
# 默认是gini,可以修改为entropy。
[ max_features ] ==> int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
- If int, then consider `max_features` features at each split.
- If float, then `max_features` is a percentage and
`int(max_features * n_features)` features are considered at each split.
- If "auto", then `max_features=sqrt(n_features)`.
- If "sqrt", then `max_features=sqrt(n_features)` (same as "auto").
- If "log2", then `max_features=log2(n_features)`.
- If None, then `max_features=n_features`.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than ``max_features`` features.
# 节点分裂的时候,参与判断的最大特征数,默认是auto模式。
# int:个数
# float:占所有特征的百分比
# auto:所有特征数的开方
# sqrt:所有特征数的开方
# log2:所有特征数的log2值
# None:等于所有特征数
[ max_depth ] ==> integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
mi